-
Notifications
You must be signed in to change notification settings - Fork 398
Is there a way to print a schema with references resolved? #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I understand correctly, say you have: {
"type": "array",
"items": { "$ref": "foo://bar#" }
} and at JSON Reference {
"type": "string",
"minLength": 2
} then you would like to have: {
"type": "array",
"items": {
"type": "string",
"minLength": 2
}
} and this, recursively? |
Yes, that is exactly right. Is this possible with the current API? (I'm using version 2.0.0) |
Well, by writing a processor, yes it is. However, it is quite delicate. Consider this schema: {
"type": "items",
"minItems": 2,
"maxItems": 2,
"items": { "oneOf": [ { "type": "integer" }, { "$ref": "#" } ]
} This is, by essence, a recursive schema -- it would lead to an infinite loop on expansion. More generally, such a processor should fail if a resolved JSON Reference is contained within the schema, and the pointer in the schema at which this reference was found is a child of the pointer of the reference. Oh, and there is the case of And that is not all: if we expand a draft v4 schema, we don't want a draft v3 schema to come into the picture either. As to other cases:
Such a processor is writable, but not easy! |
Hmmno, this is not quite right, |
Anyway, I think this is an interesting thing to have, so I'll give a go at it -- but not immediately: I have an Avro converter to write! |
There is also the solution that you give a go at it, of course. In this case, if you have questions, do not hesitate to ask ;) |
Thanks! This api is new to me but I'll give it a shot. |
You will need to have a good shot at the API of json-schema-core as well since this is what you will use to build your chains: http://fge.github.com/json-schema-core/stable/index.html In particular, look at the |
fge, Perhaps I'm going about this the wrong way, but I basically need to "walk" the schema the same way that ValidationProcessor "walks" the schema. I see that you use the ArraySchemaSelector and ObjectSchemaSelector as helpers to cache/lookup things. Is there a reason that ObjectSchemaSelector, ObjectSchemaDigester, and ArraySchemaDigester are public, but ArraySchemaSelector has package access? |
Uhm, that is a visibility bug... The initial plan was to have them all package visible! But now that you mention it, maybe they should not... Your need here, and another one I will have in the near future, will require that I walk a JSON Schema as well (note that syntax validation also walks it in some way), so maybe this could be factorized away and reused for both syntax and instance validation, and for your use case. Out of curiosity, how did you plan to use these selectors? And would you mind sending a pull request making ArraySchemaSelector public? For the general case I'll have to think it out some more. |
Basically, I am doing deserialization/serialization of proprietary formats. Internally, we want our data representation and parsing to be very flexible. For example, if I have a json schema: I can write a generic serializer/deserializer from this schema. I will "walk" this schema, then I can know that the first field is called "name" and the 2nd field is called "age", and also validate that the data received is in the appropriate format (i.e. required, string, date, number etc). Of course, any property can be a reference to another schema if the format is very complex. When we need to adjust the schema, then all it will take is extending/modifying the json schema and our parsing logic can remain intact. Doing this with POJO's is too difficult to maintain. The way you are walking the schema and json value during validation is exactly what I need to do. But instead of a json value, I will be walking a proprietary serialization format. |
What do you mean by proprietary format? A "proprietary JSON Schema"? If yes, there may be another solution: write a processor which converts this to a JSON Schema, and if some constraints are not enforceable by existing keywords, you can create your own. This is what I am currently doing with Avro Schema (I am writing an Avro to JSON Schema translator at the moment). |
No, this is not a proprietary json schema, this is a format such as CSV, XML, fixed width, flat file, or some other proprietary format based on raw bytes. |
OK, and you need to "flatten" JSON Schemas for your particular use case? The more I think of it, the more I think what is needed is this:
Comments? |
My original work around was to create a flattened/resolved json schema, and stuff it into a JsonNode that I could walk. After looking at the way ValidationProcessor is written, I think that your approach is much better. Making the walking logic generic would essentially remove the need for me to create a flattened json schema. I would basically take a set of bytes, and break it up into sub segments as I traversed down into the subfields of the JSON schema. Of course, as I walked through the bytes and schema, I would be building up a value to return to the user of my custom processor. Theoretically, if the behavior was pulled out of ValidationProcessor, I could choose to walk the schema and build the flattened version, or walk the schema and parse my payload directly. |
OK, there is something to account for: as you may have seen, the logic in ValidationProcessor is driven by the data -- and, specifically, JSON, and even more specifically, Jackson's But if I understand correctly, you want other types than JSON to be handled? Or do you convert your data to JSON before processing? |
I did notice that it was driven by the data. I was able to put together a Processor that mimicks the behavior of the ValidationProcessor, but instead of walking the data, it walks the schema while resolving references. I have two major things to figure out now.
|
The more I think about it, the more difficult I think it will be to break the coupling between walking the schema and walking my data. The knowledge on how to "walk" the schema lives in the process, processArray, and processObject methods. These methods also need to contain customized logic on how to break down a chunk of data into the appropriate sub components. The decision on how to break up the chunks can be anywhere between commas (CSV files), to a field on the payload telling you how many chunks follow, and how long each one is. |
As to your first questions:
{
"properties": { "p9": {} },
"patternProperties": { "^p": {}, "\\d+$": {} }
} If you have a member with name "p9", it will have to be valid against all three schemas. Which means you will end up breaking the data into its individual components anyway, and be driven by the data...
The main difference with, say, With future Jackson, it will be possible to write something like this: final JsonNode newNode = node.thaw().put("foo", "bar").etc().etc().freeze(); I plan to extend Now, to your second comment: yes, I detected that difficulty as well. What would be needed is a generic way to walk the data. Not impossible, mind. After all, this is part of the plan for Jackson as well, with a beefed up (why does github mess numbered list items?) |
Hello again, Is the source code of your processor available somewhere, or does it "touch private matters" already? I'd like to see how you did it, I must say I lack inspiration to get started for generalizing |
Here is a link to the basic stripped down Validator. I called it ParserProcessor.
This works for the ObjectNodes, but I haven't tested ArrayNodes yet. Let me know if you have any suggestions or if I'm making any heinous mistakes here. Thanks! |
OK, a couple of remarks:
{
"additionalProperties": { "$ref": "some://where" }
} the digester will not tell you with the current code. In fact, you not only need to walk Anyway: as I need such a walker for what I am going to do next (JSON Schema to Avro), I'm going to have a go at it too. And I have thought about it more again: maybe what is more suited is the way |
The general idea seems to be there. Based on the code of SyntaxProcessor and SyntaxChecker. Both of these already have all the logic to correctly walk schemas, and only schemas. Now all that is needed is a more generic implementation. Spawned from the discussion in issue #41. Signed-off-by: Francis Galiegue <[email protected]>
OK, look at the commit referenced above: it contains the general idea. If you look at, for instance, And this is the general idea of walking here: all syntax checkers have the logic in place (and tested), that just needs to be made more generic. The There can be many other uses for this. I just have to think a little more about how to make this really generic. |
Thanks! I'll take a look and try the SyntaxProcessor/SchemaWalker way of doing things. |
In fact I'm already on it ;) I have transformed it so as to have a pure walker at first. Hold your breath for a couple of minutes and I should have a first version of it in wording order real soon. |
OK, I'll hang on ;) |
Issue #41 continued. While not as generic as it could be (it will probably get more generic than that later), the need here is to process the current schema before going any further. Make SchemaWalker abstract. Require implementations to implement one method, processCurrent(ProcessingReport, SchemaTree), so that whatever is needed to be "done to the schema" is done _before_ walking further. Signed-off-by: Francis Galiegue <[email protected]>
Issue #41, continued Implement all of the test infrastructure to test individual pointer collectors. Signed-off-by: Francis Galiegue <[email protected]>
Mind helping me a little? Won't be that hard, but it needs to be done ;) If you are willing, I'll explain you what to do. It is really not hard. |
OK, I have a first version. But it is butt ugly. It works. But it's ugly. And I know how I can make it break quite easily. I need to have a mutable tree that I fill on the go, this version is real crap. But... Well... For simple cases like the one I wrote, it works OK... |
OK, I need to think about it some more. The problem is not with the logic of I'll think about it some more, right now I need sleep ;) But basically it is needed that we pass a mutable object to process all along the chain, and process it when we walk. If you think of a design and have some time ahead of you, I'm open to ideas! |
OK, I have a plan. First, schema walking will be split in two: one walking strategy will not resolve the refs, the other will. Then there will be an interface: public interface SchemaListener
{
void onWalk(final SchemaTree tree);
void onTreeChange(final SchemaTree oldTree, final SchemaTree newTree);
} The first will be called each time the I'll implement this and let you know how it went. |
OK, good news, I have a fairly complete working schema walker, with associate listeners. I could implement schema substitution the way you initially asked for, and this will also help me for Avro, so it is close to being done. You talked about other uses for this, I'd be curious to know them? Note that the interface is not finalized yet, I need to find better names, document etc. |
UGLY AS HELL. But first answer to issue #41. I need a mutable tree, this really won't do. Signed-off-by: Francis Galiegue <[email protected]>
OK, the walk branch is now obsolete, the code has been merged into the |
Thanks! I will take a look. Basically here are my use cases:
|
OK, that makes things more clear, and I have some questions ;) As to point 1, this unmarshalling can be done with a separate processor, which means only -core is needed, right? What is left to do is to build the appropriate inputs for -validator to operate; what is more, you say "schema or data": if it is a schema only, what about the data? If it is the data, what about the schema? See below for more, however. As to point 2: am I correct in assuming this is why you needed ref resolved (for the schema)? As to point 3: As to point 4: here again, And I don't quite understand point 5? |
And here is the "below for more". I have the intent to provide, in -core, a mechanism to fuse the output of two processors into the input for another: public interface ProcessorJoiner<OUT1, OUT2, IN>
{
IN join(final OUT1 out1, final OUT2 out2);
} and the same for split. In your use case, this could be used, for instance, to plug a processor producing a |
Note: I have just committed the removal of the walking mechanism from -validator, it is now in -core. Which means I'll continue work there. The discussion can go on in this issue however. |
Comments on your comments: (response to 1, including the reference to below): I think that would work well, since one processor will walk the schema, and one custom processor will have logic to walk the data, and the output of both of these values will be sent to "join" for processing. The result can be collected in the return value. In some cases, such as walking an array, the same schema will be passed with each value in the array. (response to 2): The refs will need to be resolved for marshalling and unmashalling the bytes. I will need refs resolved in both cases, and also if my "walking" is driven by the schema or the data. (response to 3): I think in some cases, it will be extremely valuable to me to directly access the SchemaTree of any SchemaNode. I could then actually store more metadata in the JSON Schema, and access it through the exposed SchemaTree. It will definitely give me more flexibility and allow me to ask questions about the schema without having to traverse the entire thing. (response to 5): For my protocol specifications, I could potentially have the following: {
"title":"person v1",
"type":"object",
"properties": {
"name" : {
"type": "string",
"required":true
}
}
} And also {
"title":"person v2",
"type":"object",
"properties": {
"name" : {
"type": "string",
"required":true
},
"age" : {
"type": "number"
}
}
} My parsing logic will first have to inspect the data to determine if I should parse the payload with v1 or v2. Then I will lookup "person v2" from some dictionary, so that I can parse the payload with the proper schema. This is really just a namespace issue and I think this is already supported. |
(what do you call From your latter point, if I understand you correctly, you need to be able to tell apart what (un)marshalling process(or) will be used according to a defined field in the JSON representation of the data? I understood at first that you wanted to have two different schemas at the same URI. This is not the same thing ;) This problem has already been expressed by @dportabella in another issue, and @joelittlejohn has also weighed in, since his project has the ability to generate a JSON Schema out of a POJO. But at a first step, have you looked at https://github.com/fge/json-schema-core/wiki/Architecture Or, if you can extract it more simply, use a What you would need is a predicate on the data here, so that the correct processing be selected afterwards. Which means processor join is not really relevant to your case, is it? |
I shouldn't need to extend any SchemaTree. I believe I can leverage defined JSON Schema features to get what I need here. I think we are on the same page. Different "versions" of my messages will be at different URI's. Oh, I have not read up on ProcessorMap yet. Let me dig into this a bit and I'll get back to you to see if this works. I probably won't be able to work through an example until Monday though. |
I guess I need to know what you call "defined JSON Schema features" here -- none exist that can tell one "serialized" type from another one, unless you choose to interpret an existing keyword in your own way. But then remember that you can add your own keywords. Can you explain more? |
OK, I have a working, tested implementation of Schemawalker -- one which resolves references, the other one which doesn't. I just need to make a processor out of it, document everything, and it'll be done. As to the ref expander itself, I think I'll put it in |
-core has been released with I'll illustrate how it works in json-schema-processor-examples, starting with a ref expander which will return a schema with all JSON References resolved. But that is not really needed anymore now ;) |
OK, -core 1.0.2, and -validator 2.0.1 have been released. Note: syntax validation and ref resolving have moved into -core. All the schema walking tree is into core. As to ref expanding, there is an example here: However, since there is a |
Thanks for the updates! I got held up on some other tasks today, but I will work this in and let you know how it goes soon. |
So far so good on using the ResolvingSchemaWalker! I also noticed that the json schema throws warnings on unsupported keys. Warnings are perfect since it doesn't prevent the schema from being validated, and I can use unsupported keys as ways to mark-up the schema for my marshalling/unmarshalling logic. I will mark this as closed for now and I will create another thread if I have any questions regarding the walkers. Thanks very much for your help! |
I can't see RefExpander anymore in the master tree? Has it been removed? |
I have the same requirement, is it a release feature to print a complete schema? |
Have the same question also, did you get an answer on this? |
I think where they landed, after reading the discussion is that you need to implement your own as a Schema walker. I didn't see them merge anything into the tree to do it. Happy to take a default implementation of it if any of you do it. |
@olgabo , @queenaqian , @simmosn , @huggsboson @fge actually provided ResolvingSchemaWalker which does everything what the OP needs out of the box. It took me to while to understand by looking at the source code, but it is very straightforward. Here's the code snippet for you to use.
resolvedSchemaTree should now have all "$ref" in your schema resolved. Hope that answers your question. |
@neerajsu is the ResolvingSchemaWalker still available? I'm not able to find it in the repository. Any idea if it has been removed with more recent version changes? The latest version I can find mention of the ResolvingSchemaWalker is 1.1.8 but the latest version still available on maven central is 1.5. |
My comment is almost 2 years old. At that time, ResolvingSchemaWalker was experimental. So it's probably renamed and/or incorporated into main code. You'll have to dig into the latest source code to figure out. Im sure the functionality exists in the latest code. |
Hi,
I'm not having any issues with SchemaNode and validation, but I would like a way to print my SchemaNode with $ref resolved. An example of this is on http://www.jsonschema.net/ where they can pretty print the Json Schema in JSON format.
Is there any way to do this with json-schema-validator's SchemaNode or SchemaTree?
Thanks!
The text was updated successfully, but these errors were encountered: