Mapillary sourcing doesn't seem to have taken off. Is my analysis below reasonable?

I was interested to find out how much mapillary imagery helps bettering OpenStreetMap. To find out, I downloaded OpenStreetMap extracts for 3 countries (Slovakia, Norway, Lithuania) from GeoFabrik website. I created GeoJSONs from the extracts, then extracted the features where the source contained mapillary. I then counted these features, and plotted them on a map using QGIS. Based on this, it seems to me that mapillary images have not been widely used to better OpenStreetMap.

Below I I) detail the steps I just described and II) show my results. I also III) give an overview of what other Mapillary related tools I looked at and finally IV) ask your opinion on the validity of my findings, and possible ways to improve them.


I: Extract features with source=mapillary

Starting from this gis.SE question, through the GDAL website, I figure I need to an edited version of osmconf.ini. I put ,source at the end of line 38, 58, 90, 108 and 126 and remove it from every other place. My new osm_source.ini:

#
# Configuration file for OSM import
#

# put here the name of keys, or key=value, for ways that are assumed to be polygons if they are closed
# see http://wiki.openstreetmap.org/wiki/Map_Features
closed_ways_are_polygons=aeroway,amenity,boundary,building,craft,geological,historic,landuse,leisure,military,natural,office,place,shop,sport,tourism,highway=platform,public_transport=platform

# Uncomment to avoid laundering of keys ( ':' turned into '_' )
#attribute_name_laundering=no

# Some tags, set on ways and when building multipolygons, multilinestrings or other_relations,
# are normally filtered out early, independent of the 'ignore' configuration below.
# Uncomment to disable early filtering. The 'ignore' lines below remain active.
#report_all_tags=yes

# uncomment to report all nodes, including the ones without any (significant) tag
#report_all_nodes=yes

# uncomment to report all ways, including the ones without any (significant) tag
#report_all_ways=yes

# uncomment to specify the the format for the all_tags/other_tags field should be JSON
# instead of the default HSTORE formatting.
# Valid values for tags_format are "hstore" and "json"
#tags_format=json

[points]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,barrier,highway,ref,address,is_in,place,man_made,source
# keys that, alone, are not significant enough to report a node as a OGR point
unsignificant=created_by,converted_by,time,ele,attribution
# keys that should NOT be reported in the "other_tags" field
ignore=created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

[lines]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,highway,waterway,aerialway,barrier,man_made,railway,source

# type of attribute 'foo' can be changed with something like
#foo_type=Integer/Real/String/DateTime

# keys that should NOT be reported in the "other_tags" field
ignore=created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

#computed_attributes must appear before the keywords _type and _sql
computed_attributes=z_order
z_order_type=Integer
# Formula based on https://github.com/openstreetmap/osm2pgsql/blob/master/style.lua#L13
# [foo] is substituted by value of tag foo. When substitution is not wished, the [ character can be escaped with \[ in literals
# Note for GDAL developers: if we change the below formula, make sure to edit ogrosmlayer.cpp since it has a hardcoded optimization for this very precise formula
z_order_sql="SELECT (CASE [highway] WHEN 'minor' THEN 3 WHEN 'road' THEN 3 WHEN 'unclassified' THEN 3 WHEN 'residential' THEN 3 WHEN 'tertiary_link' THEN 4 WHEN 'tertiary' THEN 4 WHEN 'secondary_link' THEN 6 WHEN 'secondary' THEN 6 WHEN 'primary_link' THEN 7 WHEN 'primary' THEN 7 WHEN 'trunk_link' THEN 8 WHEN 'trunk' THEN 8 WHEN 'motorway_link' THEN 9 WHEN 'motorway' THEN 9 ELSE 0 END) + (CASE WHEN [bridge] IN ('yes', 'true', '1') THEN 10 ELSE 0 END) + (CASE WHEN [tunnel] IN ('yes', 'true', '1') THEN -10 ELSE 0 END) + (CASE WHEN [railway] IS NOT NULL THEN 5 ELSE 0 END) + (CASE WHEN [layer] IS NOT NULL THEN 10 * CAST([layer] AS INTEGER) ELSE 0 END)"

[multipolygons]
# common attributes
# note: for multipolygons, osm_id=yes instantiates a osm_id field for the id of relations
# and a osm_way_id field for the id of closed ways. Both fields are exclusively set.
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,type,aeroway,amenity,admin_level,barrier,boundary,building,craft,geological,historic,land_area,landuse,leisure,man_made,military,natural,office,place,shop,sport,tourism,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

[multilinestrings]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,type,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

[other_relations]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no

# keys to report as OGR fields
attributes=name,type,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes

To download OSM data and create a GeoJSON with sourced features, I wrote osm_source.sh:

set -o errexit
set -o nounset

geofabrik_name=$1

curl https://download.geofabrik.de/europe/${geofabrik_name}-latest.osm.pbf -o latest.osm.pbf

for each in $(ogrinfo latest.osm.pbf | tail -n +3 | awk '{print $2}'); do

    rm -f osm_source_${each}.geojson

    ogr2ogr -f GEOJSON \
            -dialect sqlite \
            -sql "SELECT geometry, source FROM ${each} WHERE source IS NOT NULL" \
            osm_source_${each}.geojson latest.osm.pbf \
            -nln main \
            --config OSM_CONFIG_FILE osm_source.ini

done

To run it for Slovakia, I do: ./osm_source.sh slovakia (after chmod +x osm_source.sh, of course). I have produced 5 files:

  • osm_source_points.geojson
  • osm_source_lines.geojson
  • osm_source_multilinestrings.geojson
  • osm_source_multipolygons.geojson
  • osm_source_other_relations.geojson

I would like to visualize features of these GeoJSONs where mapillary was given as a source. To do that, I use osm_source.py:

import geopandas as gpd
import pandas as pd
import sys

country = sys.argv[1]

points = gpd.read_file("osm_source_points.geojson")
lines = gpd.read_file("osm_source_lines.geojson")
multilinestrings = gpd.read_file("osm_source_multilinestrings.geojson")
multipolygons = gpd.read_file("osm_source_multipolygons.geojson")
other_relations = gpd.read_file("osm_source_other_relations.geojson")

df = gpd.GeoDataFrame(pd.concat([points,lines,multilinestrings,multipolygons,other_relations]))

df = df.assign(geometry = df.geometry.apply(lambda row: row.centroid))

df = df[df.source.str.lower().str.contains("mapillary")]
print(f"Mapillary is cited {len(df)} times as a source in {country}")

df.to_file(f"mapillary_centroids_{country}.geojson")

I run this script via python3.11 osm_source.py slovakia. I repeat the above procudure for Norway and Lithuania.


II: Results

Slovakia

python3.11 osm_source.py slovakia gives the textual output:

Mapillary is cited 1191 times as a source in slovakia

and the file mapillary_centroids_slovakia.geojson. I open this file in QGIS (over OpenStreetMap basemap):

Most contributions are around highways. Some other contributions here and there, but the map is clearly highway heavy.

Lithuania

Mapillary is cited 39 times as a source in lithuania

This low number is even more surprising if we see how good the Mapillary coverage is in Lithuania, even if we consider only panorama images:

Much better than most other countries.

Norway

Mapillary is cited 12 times as a source in norway

All points are in the proximity of Oslo.


III: Other explored connections between Mapillary and OSM

There is the How to Use Mapillary Data in OpenStreetMap-titled Mapillary blogpost. No images load, it seems abandoned. It leads me to Pic4Review, which, after logging in, repeatedly fails to load:

Oops ! Something went wrong when fetching missions (Failed to fetch)

There is also mapillary.com/osm. After clicking Mapillary in RapiD, I get to:

which seems like a standard iD editor. I haven’t spent a lot of time here, but I haven’t been able to figure out how I could use Mapillary imagery through this site.

IV: Conclusions

Based on these findings, contributing to Mapillary does not seem to be a good way to improve OSM. (I admit, I probably I could have analyzed more countries - the trends uncovered in the case of the above 3 are quite worrying nevertheless.)

  • Is there anything fundamental my code misses?

  • Is there a much easier way to perform this source analysis than the one presented above?

  • If my view is wrong, and Mapillary is useful for OSM more than I realize, what tool can I use to efficiently utilize Mapillary imagery?

8 posts - 5 participants

Read full topic


Ce sujet de discussion accompagne la publication sur https://community.openstreetmap.org/t/mapillary-sourcing-doesnt-seem-to-have-taken-off-is-my-analysis-below-reasonable/98856