Name lists in OSM Americana; or: how I learned to love semicolons

OpenStreetMap Americana has joined other prominent OSM data consumers in supporting the semicolon delimiter in names.

I tell you, a cat must have three different names

Normally, keys like name are only for the feature’s primary name in the local language, while other names are relegated to different keys such as alt_name, official_name, and short_name. But sometimes a feature has multiple names of equal standing due to an inconsistency, border, or dispute in the real world. Likewise, names in other languages normally go in language-specific subkeys like name:es for Spanish. But in some places, multiple languages are commonly spoken, so there isn’t a single “default” language.

Historically, mappers in some officially multilingual regions have standardized on various ad-hoc delimiters between these names, such as a slash or hyphen, with the expectation that a renderer would lazily show the entire name tag verbatim. To put a positive spin on it, let’s call this map of mostly the Baltic Sea a celebration of linguistic diversity:

Belgium and the Baltic Sea (and a little bit of Europe too)

By default, though, Americana does some nicer things with names like tailoring the map to your preferred language. It also appends the name of a city in the local language, giving the map a more cosmopolitan flair and maybe prepping you for your next trip overseas. The main name key is supposed to be suitable for this purpose, but appending the name tag verbatim isn’t an option: if one of the names in it happens to match the name in your preferred language, you’d get a verbose, repetitive label, crowding out other labels for no good reason.

Unfortunately, there’s not much else to do when the name tag forms a list using common punctuation or mere spaces, because these delimiters are so ambiguous:

Semicolons to the rescue

Some 15 years ago, the global community standardized on the semicolon as a uniform, machine-readable character to separate multiple values in a single tag. It was intended to apply to all tags, and data consumers have done lots of things with it in non-name tags. But mappers have adopted it more slowly for name than for other keys, probably due to the effect it has on renderers that label name verbatim.

To show the benefit of a more structured approach, Americana now parses semicolons out of every name tag. When the name:* tag for your preferred language contains multiple names separated by semicolons – say, name:es=Aquí;Allí;Allá – each semicolon turns into something more presentable, such as a line break. The same happens with name if you set your preferred language to an unsupported language like mul (as in “multilingual”), or if the feature is only tagged with multiple values in name but not tagged with a more specific name:*. Your preferred language’s name for the feature only appears once, even if one of the local languages calls it by the same name.

See for yourself

San Francisco has multiple names in Chinese, each of them quite common both locally and abroad. Its name:zh tag contains three names separated by semicolons, which show up if you set your preferred language to Chinese:

San Francisco in Chinese

In Kaser and New Square, New York, most residents speak Yiddish, so name contains both English and Yiddish. Depending on your preferred language, you’ll see the names in English, Yiddish, and your preferred language laid out appropriately, without repeating a name:

Kaser and New Square in the local languages Kaser and New Square in English Kaser and New Square in Hebrew

This works for anything that Americana labels – not only places but also parks, airports, roads, and more. This creek in Denmark has two names in name, both elegantly shown together on the map:

Storå or Højris Å

This community garden in San José, California, has no Spanish name, so speakers of the city’s second most common language see the names in English and Vietnamese instead:

Viet Heritage Garden in Spanish

Tagging for the renderer common good

Americana can only lay out the labels intelligently because the semicolon delimiter is predictable and unambiguous. Unfortunately, ad-hoc delimiters are still much more common than semicolons in names globally and even in the U.S., where there has never been much discussion about delimiters. Hopefully as more data consumers add support for semicolons, mappers will follow suit.

Thanks for everyone who’s already using the standard semicolon to separate multiple values in the name tag in those tricky situations when other name keys just won’t do. This gives renderers and data consumers of all stripes the flexibility to do something a little more useful with the name tag, and now you no longer have to worry about the map looking sloppy.

8 posts - 5 participants

Read full topic

Ce sujet de discussion accompagne la publication sur