I was interested to find out how much mapillary imagery helps bettering OpenStreetMap. To find out, I downloaded OpenStreetMap extracts for 3 countries (Slovakia, Norway, Lithuania) from GeoFabrik website. I created GeoJSONs from the extracts, then extracted the features where the source contained mapillary. I then counted these features, and plotted them on a map using QGIS. Based on this, it seems to me that mapillary images have not been widely used to better OpenStreetMap.
Below I I) detail the steps I just described and II) show my results. I also III) give an overview of what other Mapillary related tools I looked at and finally IV) ask your opinion on the validity of my findings, and possible ways to improve them.
I: Extract features with source=mapillary
Starting from this gis.SE question, through the GDAL website, I figure I need to an edited version of osmconf.ini
. I put ,source
at the end of line 38, 58, 90, 108 and 126 and remove it from every other place. My new osm_source.ini
:
#
# Configuration file for OSM import
#
# put here the name of keys, or key=value, for ways that are assumed to be polygons if they are closed
# see http://wiki.openstreetmap.org/wiki/Map_Features
closed_ways_are_polygons=aeroway,amenity,boundary,building,craft,geological,historic,landuse,leisure,military,natural,office,place,shop,sport,tourism,highway=platform,public_transport=platform
# Uncomment to avoid laundering of keys ( ':' turned into '_' )
#attribute_name_laundering=no
# Some tags, set on ways and when building multipolygons, multilinestrings or other_relations,
# are normally filtered out early, independent of the 'ignore' configuration below.
# Uncomment to disable early filtering. The 'ignore' lines below remain active.
#report_all_tags=yes
# uncomment to report all nodes, including the ones without any (significant) tag
#report_all_nodes=yes
# uncomment to report all ways, including the ones without any (significant) tag
#report_all_ways=yes
# uncomment to specify the the format for the all_tags/other_tags field should be JSON
# instead of the default HSTORE formatting.
# Valid values for tags_format are "hstore" and "json"
#tags_format=json
[points]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no
# keys to report as OGR fields
attributes=name,barrier,highway,ref,address,is_in,place,man_made,source
# keys that, alone, are not significant enough to report a node as a OGR point
unsignificant=created_by,converted_by,time,ele,attribution
# keys that should NOT be reported in the "other_tags" field
ignore=created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes
[lines]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no
# keys to report as OGR fields
attributes=name,highway,waterway,aerialway,barrier,man_made,railway,source
# type of attribute 'foo' can be changed with something like
#foo_type=Integer/Real/String/DateTime
# keys that should NOT be reported in the "other_tags" field
ignore=created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes
#computed_attributes must appear before the keywords _type and _sql
computed_attributes=z_order
z_order_type=Integer
# Formula based on https://github.com/openstreetmap/osm2pgsql/blob/master/style.lua#L13
# [foo] is substituted by value of tag foo. When substitution is not wished, the [ character can be escaped with \[ in literals
# Note for GDAL developers: if we change the below formula, make sure to edit ogrosmlayer.cpp since it has a hardcoded optimization for this very precise formula
z_order_sql="SELECT (CASE [highway] WHEN 'minor' THEN 3 WHEN 'road' THEN 3 WHEN 'unclassified' THEN 3 WHEN 'residential' THEN 3 WHEN 'tertiary_link' THEN 4 WHEN 'tertiary' THEN 4 WHEN 'secondary_link' THEN 6 WHEN 'secondary' THEN 6 WHEN 'primary_link' THEN 7 WHEN 'primary' THEN 7 WHEN 'trunk_link' THEN 8 WHEN 'trunk' THEN 8 WHEN 'motorway_link' THEN 9 WHEN 'motorway' THEN 9 ELSE 0 END) + (CASE WHEN [bridge] IN ('yes', 'true', '1') THEN 10 ELSE 0 END) + (CASE WHEN [tunnel] IN ('yes', 'true', '1') THEN -10 ELSE 0 END) + (CASE WHEN [railway] IS NOT NULL THEN 5 ELSE 0 END) + (CASE WHEN [layer] IS NOT NULL THEN 10 * CAST([layer] AS INTEGER) ELSE 0 END)"
[multipolygons]
# common attributes
# note: for multipolygons, osm_id=yes instantiates a osm_id field for the id of relations
# and a osm_way_id field for the id of closed ways. Both fields are exclusively set.
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no
# keys to report as OGR fields
attributes=name,type,aeroway,amenity,admin_level,barrier,boundary,building,craft,geological,historic,land_area,landuse,leisure,man_made,military,natural,office,place,shop,sport,tourism,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes
[multilinestrings]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no
# keys to report as OGR fields
attributes=name,type,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes
[other_relations]
# common attributes
osm_id=yes
osm_version=no
osm_timestamp=no
osm_uid=no
osm_user=no
osm_changeset=no
# keys to report as OGR fields
attributes=name,type,source
# keys that should NOT be reported in the "other_tags" field
ignore=area,created_by,converted_by,time,ele,note,todo,openGeoDB:,fixme,FIXME
# uncomment to avoid creation of "other_tags" field
#other_tags=no
# uncomment to create "all_tags" field. "all_tags" and "other_tags" are exclusive
#all_tags=yes
To download OSM data and create a GeoJSON with sourced features, I wrote osm_source.sh
:
set -o errexit
set -o nounset
geofabrik_name=$1
curl https://download.geofabrik.de/europe/${geofabrik_name}-latest.osm.pbf -o latest.osm.pbf
for each in $(ogrinfo latest.osm.pbf | tail -n +3 | awk '{print $2}'); do
rm -f osm_source_${each}.geojson
ogr2ogr -f GEOJSON \
-dialect sqlite \
-sql "SELECT geometry, source FROM ${each} WHERE source IS NOT NULL" \
osm_source_${each}.geojson latest.osm.pbf \
-nln main \
--config OSM_CONFIG_FILE osm_source.ini
done
To run it for Slovakia, I do: ./osm_source.sh slovakia
(after chmod +x osm_source.sh
, of course). I have produced 5 files:
osm_source_points.geojson
osm_source_lines.geojson
osm_source_multilinestrings.geojson
osm_source_multipolygons.geojson
osm_source_other_relations.geojson
I would like to visualize features of these GeoJSONs where mapillary was given as a source. To do that, I use osm_source.py
:
import geopandas as gpd
import pandas as pd
import sys
country = sys.argv[1]
points = gpd.read_file("osm_source_points.geojson")
lines = gpd.read_file("osm_source_lines.geojson")
multilinestrings = gpd.read_file("osm_source_multilinestrings.geojson")
multipolygons = gpd.read_file("osm_source_multipolygons.geojson")
other_relations = gpd.read_file("osm_source_other_relations.geojson")
df = gpd.GeoDataFrame(pd.concat([points,lines,multilinestrings,multipolygons,other_relations]))
df = df.assign(geometry = df.geometry.apply(lambda row: row.centroid))
df = df[df.source.str.lower().str.contains("mapillary")]
print(f"Mapillary is cited {len(df)} times as a source in {country}")
df.to_file(f"mapillary_centroids_{country}.geojson")
I run this script via python3.11 osm_source.py slovakia
. I repeat the above procudure for Norway and Lithuania.
II: Results
Slovakia
python3.11 osm_source.py slovakia
gives the textual output:
Mapillary is cited 1191 times as a source in slovakia
and the file mapillary_centroids_slovakia.geojson
. I open this file in QGIS (over OpenStreetMap basemap):
Most contributions are around highways. Some other contributions here and there, but the map is clearly highway heavy.
Lithuania
Mapillary is cited 39 times as a source in lithuania
This low number is even more surprising if we see how good the Mapillary coverage is in Lithuania, even if we consider only panorama images:
Much better than most other countries.
Norway
Mapillary is cited 12 times as a source in norway
All points are in the proximity of Oslo.
III: Other explored connections between Mapillary and OSM
There is the How to Use Mapillary Data in OpenStreetMap-titled Mapillary blogpost. No images load, it seems abandoned. It leads me to Pic4Review, which, after logging in, repeatedly fails to load:
Oops ! Something went wrong when fetching missions (Failed to fetch)
There is also mapillary.com/osm. After clicking Mapillary in RapiD, I get to:
which seems like a standard iD editor. I haven’t spent a lot of time here, but I haven’t been able to figure out how I could use Mapillary imagery through this site.
IV: Conclusions
Based on these findings, contributing to Mapillary does not seem to be a good way to improve OSM. (I admit, I probably I could have analyzed more countries - the trends uncovered in the case of the above 3 are quite worrying nevertheless.)
-
Is there anything fundamental my code misses?
-
Is there a much easier way to perform this source analysis than the one presented above?
-
If my view is wrong, and Mapillary is useful for OSM more than I realize, what tool can I use to efficiently utilize Mapillary imagery?
8 posts - 5 participants
Ce sujet de discussion accompagne la publication sur https://community.openstreetmap.org/t/mapillary-sourcing-doesnt-seem-to-have-taken-off-is-my-analysis-below-reasonable/98856