Schema-normalized additional API endpoint

Recommended reading: Overturemaps.org - big-businesses OSMF alternative

As much as this might be a bit off (because I’m new contributor, so this really should be a matter of others here), I would like to put for open criticism i makes sense to have some sort of read-only version of existing API data (the one that contains de facto data for users) that could be possible to pass for automated modifications and be under OSMF umbrella (not outside project).

While what actually notifications are mostly here for example, and how this will be done is also any future discussion, the real request for feedback here is if having such additional versions makes sense.

Some examples of such automated conversion from reference OSM data to have

Disclaimer: again, remember that I’m a new user. What can and what cannot would mostly be decided by others. However, such a version could ideally be rebuilt from scratch from time to time. But would have less use than existing APIs, and maybe not even require rebuild the map, just the API data

  1. At minimum, this would mean opinionated changes from tagging schemas that are clearly aliases and removal from tags from very old times that have no impact today.

  2. Values that clearly are impossible (such as if maximum highway speed in a country is 120 km/h, have 300 km/h in a highway=residential likely to be extra 0) the output could clean that Information. No questions asked. Just make it happen and live with that for each rebuild.

  3. For some types of field that accept a list of Items, if there’s more than one separator (like ; vs ; ) regex-based clean up could be used to enforce one schema of the separator. Obviously there are other patterns to clean, but here is mostly to do data cleaning. Without this it is impossible to create documentation for end-users of what to expect, and we really want pristine documentation

  4. (Not sure applicability; needs feedback) changesets with higher likelihood to be suspect (like mass deletions), might intentionally not be applied if their submission is too recent. We assume they don’t exist

  5. (Not sure applicability; needs feedback) same as previous, but even after passing the time that would allow that changeset to be merged, if it gets changeset comments by someone, we add even more delay to be merged. Maybe we suggest explicit keywords in the first or second comment to give hints about good or not.

  6. … Suggestions? Please let’s keep it simple, something we know could be done in just a few months. On this topic let’s focus on output-only, avoid proposing things that change how to insert data on OSM (or at least not ones that would be hard to implement).

But why?

I think changes that Overture Maps Foundation “OMF” would claim to do to “clean/validate/remove vandalism” from what they call “several sources” could be done by OSMF itself with a free-to-be-top-down opinionated decision. And nor just this, but we should initially focus on getting more well cared for the feature: for example, they will focus on first interaction (in this case, highways, and seems also some amenities, so I would suppose gas stations), so it makes sense we do this, be focused, instead of trying to improve everything. As much as people might be upset with some false positives (but this doesn’t change original data, just a version from it), and even the fact itself of have normalized version, I think this approach could allow fast response by half of 2023 (which is the release date of OMF project).

By next year, the organizations behind OMF (even if they add some extra features like AI generated roads from where it doesn’t exist) will likely to be able to provide even OSMF data in some sort of normalized form, so really make sense to at minimum not allow such type of argumentation be valid on their press releases to a point of OSMF really cannot defend itself. Both organizations produce open data (but OSMF is more traditional, so even those who like Linux Foundation would try side with OSMF) so this kind of version of own OSMF data, likely even the validation/cleaning rules be open to them submit, would disincentive they to not cooperate for some automated rules we could do to transform OSM data from the formats. Add to this that even existing organizations that work directly with OSMF for a longer time might start to move to OMF if some kinds of data cleaning would not be able to be done directly under the OSMF umbrella. This is unlikely to happens very fast (at least not for organizations not already part of OMF members) but over years could happens.

So I think still technologically viable to OSMF both keep it’s community side (not even require changes on data it have, just automate generation from it) and offer things that OMF would do.

Potential Timeline to decision

I don’t think the decision about this is urgent, but unless there’s heavy rejection, even if not implemented/approved, people who could think about how to make it viable could already prepare scripts and/or how to design the infra.

But if have some sort of ideal date, should be before OMF start to deploy services (as this date, their faq says: “Overture will release its first datasets in the first half of 2023) so make easier for OSMF deal with press release comparing both organizations and (what obviously will happens) would occur benchmarks comparing vandalism/harmful errors on both, so this automated approach could allow fast fixes. As much as OMF would try on press releases to get developers that use Google services to move for them, I think we, as either community or as suggestion to OSMF, aggressively to push opinions from developers and news media to know about existence of OpenStreetMap.

About conflicts of interest
I, Emerson Rocha, declare no conflicts of interest on this proposal. Neither my dayjob or any contracts are related at all with OpenStreetMap, OSM related services, or with the companies related to both OSMF and OMF. I do think at the time of this writing, for a very utilitarian point of view, that even if for free (no sponsorship or whatever), could be better for both OSMF and OpenStreetMap community allow this type of version of OSM data be under OSMF umbrella, but if agreements with OMF and or individuals working on OMF happens, this should only be open, and that’s it. I’m proposing this (again, as new collaborator to OSM, so feel free to do very harsh criticism) because part of their issues, in special related to make easier data conflagration, seems to be are true (so I dont think is only to avoid show “OpenStreetMap collaborators” or even have someone from these companies on OSMF board, but how to fill gaps when there is no data without need to wait too much), so this approach under might solve 40-70% of the problem to OMF simply not use OSMF data. But even without any agreement with OMF (both explicitly not cooperating with each other), the idea of similar data endpoint for easy hotfixing seems worth to keep online.

1 post - 1 participant

Read full topic


Ce sujet de discussion accompagne la publication sur https://community.openstreetmap.org/t/schema-normalized-additional-api-endpoint/6874