Cleanup and normalization of GNIS imports to only use gnis:feature_id for the id tag

Finally continuing from the general discussion here: Tag synonym cleanup/consolidation - #4 by stevea. I would love to hear any feedback.

I’ve been working with other mappers building tools to better utilize GNIS data in OSM. We’ve also just finished a big cleanup of features related to Secretarial Order 3404 aided by this tooling and cleaned up issues related to bad mergers etc.

I have a lot more experience with the db and this set of feature than in January and would love to proceed with a broad but simple cleanup: normalize the tagging to one tag.

The problems
We currently have 6 tags in wide use that all contain the identifier from the GNIS database. This is an annoyance for consumers of that data and also makes it easy for mappers to inadvertently merge things that should not be merged. Here’s some of the easier ones to detect
as they have mismatched ids in the various tag alternates.

We also have a huge number of id tags that contain leading zeros. ex: 0012345. This makes comparisons in common tools (overpass) more difficult than necessary.

The proposed solution
Move all OSM tagging to the most common GNIS tag of gnis:feature_id. The tags gnis:id, tiger:PLACENS, NHD:GNIS_ID, nhd:gnis_id, ref:gnis will all be deprecated and the wiki updated to reflect this. (~370k items)

All identifiers will have leading zeros stripped from their values.

For each item with only one of the above tags, the value will be moved to gnis:feature_id.
For each item with more than one tag but all the tags agree on the id, move the value to gnis:feature_id.
For each item with more than one tag but differing values of the id, manual review and cleanup. (~250 items)

Edits will be tagged as #gnisIdUpdate so they can be reviewed/tracked/reverted as necessary.

I would do all of these edits in JOSM and probably work state by state, reviewing as I go. I have a lot of experience with tag updates in JOSM and feel very comfortable in that environment.

I will also work with the JOSM maintainers to get their relation template updated from ref:gnis to gnis:feature_id.

I would love to start this work in the beginning of August.

Some questions I have

  1. For items with nhd:gnis_id and NHD:GNIS_ID, is it helpful for folks if I ensure they have source=nhd?
  2. Would folks prefer if I do this in a more programmatic way? I’m happy to write some code for it but seems unnecessary right now.
  3. Should we take this shot to normalize to ref:gnis? Would need to touch more objects (~1100k more)
  4. What am I missing?
  5. Any arguments against doing this update?

(edit: just updating with some numbers and another question)

10 posts - 7 participants

Read full topic

Ce sujet de discussion accompagne la publication sur