Proposed bot edit: fixing bunch of typos in tags

I propose to fix some obvious typos in tag values. Cases were found automatically and manually reviewed with sampling across values (in process some were entirely retagged).

Typical example would be something like

state before a mechanical edit (example for a tunnel value):

state after an edit:

Full list of values is at https://wiki.openstreetmap.org/wiki/Mechanical_Edits/Mateusz_Konieczny_-_bot_account/fix_many_obvious_typos#What

(in case of finding more cases like this - can I just add them to bot edit? Or should I make a new thread here and on talk mailing list)

Why it is useful? It helps newbies to avoid becoming confused. It protects against such values becoming established. Without drudgery that would be required from the manual cleanup. It also makes easier to add missing surface= values and makes easier to use OpenStreetMap data, including support in editors which explain/translate meaning of surface values.

Why automatic edit? I have a massive queue (in thousands and tens of thousands) of automatically detectable issues which are not reported by mainstream validators, require fixes and fix requires review or complete manual cleanup.

There is no point in manual drudgery here, with values obviously fixable.

This values here do NOT require manual overview. If this cases will turn out to be an useful signal of invalid editing than I will remain reviewing nearby areas where bot edited.

I already skipped edits to primary tags except few blatant cases where mistake is easy to miss (flowerbed until recently was not rendered). Typos in primary tags that cause them to be outright missing from typical map rendering is often coupled with other serious problems. Probably because it indicates mapping by newbies who are likely to be confused also by other complexities. The same goes for access tags that I will keep fixing manually. Though typos in for example shop values are safe to fix automatically, probably because effects are less noticeable. Also, more obvious typos in rare, typically not rendered amenity tags are often safe to fix.

Yes, bot edit WILL cause objects to be edited. Nevertheless, as result map data quality will improve.

This values were found automatically based on taginfo and iD presets, also accessed via taginfo.

Taginfo values statistics list values in OSM database, while iD presets list which values are known for given keys.

Multiple heuristics were applies to find various typos, for example “cuisine=bubble tea” was found to match “cuisine=bubble_tea” from iD presets after space was replaced by underscore.

“cuisine=Thai” to “cuisine=thai” after lowercasing value.

“cuising=regional1” to “cuisine=regional” after skipping ending.

All values were looked at then manually to drop any dubious replacement (for example healthcare=nursery to heatlhcare=nurse was skipped).

Samples was also looked at in OSM, with many values just edited. Note that not each replacement was sampled: as many, many have just few cases, so sampling and verifying ends with just editing all of them manually.

If you see any values where edit would be dubious, not safe or in any way problematic: let me know.

(BTW, one typo in iD presets was found while looking for typos, see https://github.com/openstreetmap/id-tagging-schema/pull/1063 )

Some conversion were found manually in addition to iD presets, currently it is only cycleway:both and shop=gun.

I also contacted community already in some cases (like sport values with ; and in some cases of extra trailing characters) - via changeset comments and notes.

Response confirmed that this changes are a good idea - and that just editing will be better than asking more people.

This edit will be rerun in future as many of such typos are expected to reappear.

3 posts - 2 participants

Read full topic


Ce sujet de discussion accompagne la publication sur https://community.openstreetmap.org/t/proposed-bot-edit-fixing-bunch-of-typos-in-tags/110039