OpenStreetMap DTD, XSD, JSON Schema, etc for commonly shared data

I’m interested in eventually creating mapping data at more semantic level (easier automated data integration) and document how people can use it by existing standards. However I noticed that despite the reference format being XML (tooling more mature than JSON and binary formats), the wiki at OSM XML - OpenStreetMap Wiki admit this:

No official .xsd Schema exists. See below and the OSM XML/XSD, and OSM XML/DTD pages for details of unofficial attempts to define the format in those languages!

Since syntax precedes semantics (and is computationally cheaper), already would make sense DTDs/XSDs/etc ready to use!

The idea draft

  1. I see no problem in also having the Wiki documentation like is today (because this is important for humans!), however this idea would eventually have some sort of central repository for the DTD / XSD / etc, and all other common formats used to exchange data between tools even if it is not XML (e.g. if JSON outputs, then common practice would be to have JSON Schema), which is important for non-humans (or tools helping humans, in special developers, find errors).

  2. Then, while not in the same repository (but avoiding being split in several ones, e.g. do something like a monorepo for test cases) have examples of data using these formats. While not ideal to store huge files, it could over time have a considerable amount of data (including small binary formats), so this is why not be on the same main repo. Also, the way CI works, is very convenient to do git clone than download files.

  3. Today is viable for free use GitHub actions to install the dependencies (which likely to be somewhat intense, and while the specs unlikely to change over time, the validations tools might break ) to automate checks online of the validators themselves, so could be viable receive small changes over time and without need to people install the tooling in their computers.

What I could do to help this idea

Well I’m new to OpenStreetMap, so I am learning things as I go!. And setting the minimum from the idea above, by just looking at what already exists in the wild is viable. No problem to me do public domain all the way (so others can copy paste things in their codebase or don’t need to cite authors), however, if this get minimally usable, the ideal scenario would be recreate the repositories in something with review for others, preferable one or more developers from software of APIs or tools that exchange data in some way. I’m very careful to make this viable for others to keep running, however at first it could be easier to do a quick rush.

I’m not as sure how active on OpenStreetMap I would be In next years, so since this is something that less people get engaged (but is important for developers) my idea on this post not just get more feedback on what’s worth to add the schemas, but some approach do donate the thing for others review. In this aspect, one good question: how could someone donate schemas? It’s something between code and data.

Other comments

Anyone feel free to make other comments or suggestions! Might take some time for me to compile more practical examples, but at this moment I’m mostly reading what is on the Wiki. But obviously, I’m likely to miss what is relevant.

But I noticed that (while less common on OpenStreetMap than other communities) sometimes people rely on projects, but things get not maintained. Schemas themselves don’t stop working, but while I’m new here, making it easier for others to validate changes makes it more future proof.

Edit 1: One initial tag, the “data-validation” was removed. This might lead to confusion.

6 posts - 4 participants

Read full topic

Ce sujet de discussion accompagne la publication sur