-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster "perform SQL updates" #1844
Conversation
…done in parallel with each other.
…n new indexes SQL.
data/apply-planet_osm_line.sql
Outdated
mz_road_level IS NOT NULL OR | ||
mz_transit_level IS NOT NULL OR | ||
mz_water_min_zoom IS NOT NULL; | ||
SHARDING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHERE SHARDING;
is new to me. Is it a Postgres-ism, or related to the script changes in data/perform-sql-updates.sh
? Might be worth an inline comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nothing in PostgreSQL, sadly. It comes from a hack where I'm replacing SHARDING
with the range of osm_id
to shard over. I would have used a proper Jinja2 template, but that was going to introduce a lot of incidental complexity.
I'm not totally sure whether an inline comment would work there, or if it would have weird side-effects... I'll have to test and make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments in e684cd8, looks like they're correctly preserved through all the string mangling we're doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a pain to name it something like $SHARDING
and replace on that? Just thinking it might make it clearer, although the comment is good anyway :)
# guide the distribution of jobs, so hopefully they end up mostly evenly sized. | ||
for tbl in polygon line point; do | ||
sql_script="apply-planet_osm_${tbl}.sql" | ||
for pct in 25 50 75; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the machine we run this on always 4 cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, it would be related to the number of cores on the PostgreSQL server, not the machine we're running the script on. However, there's a couple of things that make matching the number of statements to the range of queries hard:
- Doing this in shell script is possible, but makes me uncomfortable. I find that getting shell code correct is a lot harder than it looks. Going to fixed concurrency seemed like a good compromise. We can go to variable concurrency, but I'd want to switch all of this into Python, or something else that has less magic than shell.
- I'm not aware of any call in PostgreSQL to tell us how many cores the server has, which is the important number to match. We could pass that in as an argument, but again; complexity.
Overall, it seemed to me like a good compromise to get something working quickly. If we want it to be more readable and more robust, then I can take a look at doing that next week?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with hardcoding it now... just as long as we're getting good use out of the PostgresSQL Server we generally run this on. If that's 4 cores, so be it! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment, please also note this is configured for a 4 core PostgreSQL machine and would need to be adopted to other cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Fixed in dbebede.
I've tested this out, and it seems to work - the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment nit, otherwise LGTM
…ism in the osm_id loops.
data/apply-planet_osm_line.sql
Outdated
mz_road_level IS NOT NULL OR | ||
mz_transit_level IS NOT NULL OR | ||
mz_water_min_zoom IS NOT NULL; | ||
SHARDING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a pain to name it something like $SHARDING
and replace on that? Just thinking it might make it clearer, although the comment is good anyway :)
…ng' over osm_id for perform-sql-updates.sh.
|
There's a few things going on here:
UPDATE
per table, rather than several. This should reduce the number of passes over the table, especially since the lines and polygons tables touched every row when setting the label placement point.osm_id
for the OSM tables. This helps a lot when theUPDATE
is CPU-bound, rather than disk-bound, as it seems to be when we're calculating a lot ofST_PointOnSurface()
and min zoompl/pgsql
functions.ANALYZE
fromapply-*.sql
, so it can run after index creation.