Which steps remain when osm2pgsql fails after the data input, and can I perform these directly against the database?

Which steps remain when osm2pgsql fails after the data input, and can I perform these directly against the database?

My setup:

  • osm2pgsql version 1.6 running in an AWS EC2 instance (Instance type t2.small, 2 GB RAM, 1vCPU, 512 GB storage, running Ubuntu 22.04)
  • Aurora PostgreSQL 14.5 running in an AWS RDS cluster (instance class db.r6g.xlarge, 32 GB RAM, 4 vCPU, storage autoscaling)

I attempted to import the OSM dump of Europe of 2023-05-02T20:16:04Z. This ran for about 8 days almost until the end, with several “All done” messages shown, before terminating with an error message. The generated tables contain about 4 billion rows in all, so it looks like the import of the data was successful. Also, (some, maybe all?) indexes were created.

What I would like to know is whether I have a usable data import or whether some essential steps are still needed. Reading through the osm2pgsql source code did not get me further.

This is the command line I used (connection details redacted):

osm2pgsql --create --slim --hstore \
--style ${HOME}/etc/openstreetmap-carto.style --prefix europe_osm \
--database (...) --username (...) --host (...) --port (...) \
europe-latest.osm.pbf

This is the osm2pgsql output:

2023-05-03 20:27:01  osm2pgsql version 1.6.0
2023-05-03 20:27:01  Database version: 14.5
2023-05-03 20:27:01  PostGIS version: 3.2
2023-05-03 20:27:02  Setting up table 'europe_osm_point'
2023-05-03 20:27:02  Setting up table 'europe_osm_line'
2023-05-03 20:27:02  Setting up table 'europe_osm_polygon'
2023-05-03 20:27:02  Setting up table 'europe_osm_roads'
2023-05-10 18:19:44  Reading input files done in 597162s (165h 52m 42s).                  
2023-05-10 18:19:44    Processed 3164864349 nodes in 18501s (5h 8m 21s) - 171k/s
2023-05-10 18:19:44    Processed 383845957 ways in 431097s (119h 44m 57s) - 890/s
2023-05-10 18:19:44    Processed 6717149 relations in 147564s (40h 59m 24s) - 46/s
2023-05-10 18:19:46  Clustering table 'europe_osm_point' by geometry...
2023-05-10 18:41:33  Creating geometry index on table 'europe_osm_point'...
2023-05-10 19:19:32  Creating osm_id index on table 'europe_osm_point'...
2023-05-10 19:20:56  Analyzing table 'europe_osm_point'...
2023-05-10 19:21:15  Clustering table 'europe_osm_line' by geometry...
2023-05-10 19:51:11  Creating geometry index on table 'europe_osm_line'...
2023-05-10 20:38:50  Creating osm_id index on table 'europe_osm_line'...
2023-05-10 20:40:39  Analyzing table 'europe_osm_line'...
2023-05-10 20:41:09  Clustering table 'europe_osm_polygon' by geometry...
2023-05-10 20:52:56  Clustering table 'europe_osm_roads' by geometry...
2023-05-10 20:56:43  Creating geometry index on table 'europe_osm_roads'...
2023-05-10 20:59:12  Creating osm_id index on table 'europe_osm_roads'...
2023-05-10 20:59:16  Analyzing table 'europe_osm_roads'...
2023-05-10 20:59:16  Done postprocessing on table 'europe_osm_nodes' in 0s
2023-05-10 20:59:16  Building index on table 'europe_osm_ways'
2023-05-11 17:51:29  Done postprocessing on table 'europe_osm_ways' in 75132s (20h 52m 12s)
2023-05-11 17:51:29  Building index on table 'europe_osm_rels'
2023-05-11 17:57:53  Done postprocessing on table 'europe_osm_rels' in 383s (6m 23s)
2023-05-11 17:57:53  All postprocessing on table 'europe_osm_point' done in 3689s (1h 1m 29s).
2023-05-11 17:57:53  All postprocessing on table 'europe_osm_line' done in 4793s (1h 19m 53s).
2023-05-11 17:57:53  ERROR: Database error: ERROR:  could not write to file "base/pgsql_tmp/pgsql_tmp28765.19": No space left on device

I suspect that perhaps some temporary tables still remain, or some indexes were not created.

Currently I have these tables with the specified prefix europe_osm:

europe_osm_line
europe_osm_nodes
europe_osm_point
europe_osm_polygon
europe_osm_rels
europe_osm_roads
europe_osm_ways

and these are the existing indexes on these tables:

europe_osm_line: europe_osm_line_osm_id_idx, europe_osm_line_way_idx
europe_osm_nodes: europe_osm_nodes_pkey
europe_osm_point: europe_osm_point_osm_id_idx, europe_osm_point_way_idx
europe_osm_rels: europe_osm_rels_parts_idx, europe_osm_rels_pkey
europe_osm_roads: europe_osm_roads_osm_id_idx, europe_osm_roads_way_idx
europe_osm_ways: europe_osm_ways_nodes_idx

That is, all tables but europe_osm_polygon have indexes.

So the question: can I avoid a complete re-import of the data by simply performing some final steps manually? What steps remained after the point where the error occurred?

4 posts - 2 participants

Read full topic


Ce sujet de discussion accompagne la publication sur https://community.openstreetmap.org/t/which-steps-remain-when-osm2pgsql-fails-after-the-data-input-and-can-i-perform-these-directly-against-the-database/98991