More Installation Instructions

Published: 2023-06-02

Table of content

As promised in the last blog entry, I would like to comment on the existing extra installation instructions written by other people.

These are extremely useful: there are things that I may take as granted that are not true for everyone. The more people write about their installation experience, the more starting points of knowledge become available.

For example, I currently have a 3.5 MBit landline here, so downloads of 10 MB to 1 GB are painful, but not impossible. Downloading the entire OSM database would be a major undertaking and clearly out of reach. Other people I know have landlines of up to 50 MBit, but still then downloading the entire OSM database would be insane.

I use bash) and ssh literally during every minute of work, and consider icon clicking as unfit for anything beyond trivial tasks. There might be people who try to install a software without any console, and the Docker package is my best guess how to help out those people, although I neither use nor recommend that approach.

ZeLonewolf's Guide

See here

The guide refers to version 0.7.56, and the current version is 0.7.60 as of now. This is not a problem in itself, as you should be able to replace version numbers where they appear. To the contrary, some things change with newer versions. Being explicitly about versions helps the reader to figure out if an unexpected behaviour is due to a change in the software.

Server Prerequisites

(reference)

You still can track the main database even if your server needs 65 seconds to apply a minute diff. Applying multiple minute diffs at once is much faster than applying the same amount of minute diffs consecutively. Thus if your server is too slow, then you will most likely see a behaviour where it stabilizes at some lag. That lag might vary over the course of a day because the sizes of the diffs vary considerably. So, please wait for 24 hours and go then through the logs to figure out whether the server is close enough to real time for your needs.

If the disk is the bottleneck then you might want to try lz4 or gz compression. If the CPU is the bottleneck then you might want to try an uncompressed database. The public instances have the disk as bottleneck, although its total size and the efficient use of the RAM cache rather than I/O speed, but computation power also not abundant, thus they run on lz4 which is a middle ground between gz and an uncompressed database.

The given hardware specs are realistic.

Minute diffs got a lot faster with version 0.7.60. If your server struggles to keep up with updates then you really shall use at minimum that version.

Web Server Configuration

(reference)

I suggest to use SetOutputFilter DEFLATE in the Apache configuration file if you serve data over a network. That configuration directive lets Apache compress all output and thus save half of the bandwidth or more.

Compile and Install Overpass

(reference)

I suggest to use wget -O TARBALL http://dev.overpass-api.de/releases/osm-3s_latest.tar.gz gunzip <"TARBALL" | tar x DISTDIR=$(gunzip <"TARBALL" | tar t | head -n 1) cd $DISTDIR where TARBALL must be replaced by the filename you would like to use locally for the source code bundle. For example osm-3s_latest.tar.gz. This way you do not need to bother about version numbers. The version is then always consistent even after an update. Please note that for example $DISTDIR will anyway contain the version number. So you still can hold all the versions alongside.

Download the Planet

(reference)

The clone files are already lz4 compressed nowadays. So you do no longer need to convert them if you also want to use lz4.

Configure Launch Scripts

(reference)

If you run your own private server then you most likely want to turn off rate limits entirely. Use --rate-limit=0 on both the OSM base and the area dispatcher.

This file is geared towards a private server and may serve as further inspiration.

The sleep 3 might mean that you use disproportionate computation power to keep your areas up to date. I rather suggest sleep 3600 ore even bigger values.

Since version 0.7.57 only relations are subject to the area creation loop, making that loop a lot faster. Ways are now automatically available as areas without any further processing. Processing all relation based areas even from scratch takes only two hours or so, hence it is no longer necessary to enlarge the runtime for the area generation script.

Server Automation

(reference)

If you do not want to keep minute diff files then you can directly use fetch_osc_and_apply.sh.

Performance Verification

(reference)

Area Database Initial Load

Since version 0.7.57 this is rather some hours than 2-3 days.

Firewall Configuration

(reference)

A simpler way to achieve the same effect is to use the Listen directive in Apache.

Recovering a Corrupted Database

(reference)

I strongly advocate for rather updating the database than redownloading it. Of course, this only works if the database is still healthy, i.e. the dispatcher will start with it and queries do not cause error messages that files are missing or indexes are inconsistent.

Catching up a day needs depending on your hardware one to four hours and needs to download 100 MB worth of diffs if any.

Downloading the data can easily exceed 24 hours and needs 230 GB to 540 GB of data, depending on the variant.

Kai Johnson's Guide

See here.

This is an example for an approach that would not spring to my mind, because I do not even nearly have the landline bandwidth to download a copy of the global OSM database, and on mobile even less.

I'm also a bit reluctant to run old desktop hardware, based on cost of electricity, but again this differs wildly between different locations, their source of electricity and its cost. Running a desktop workstation 24/7, even if mostly idle but at 200 watts, can take 145 kWh per month, being about 40 EUR per month in local prices here. EnergyStar 5.0 certified systems will consume at most 400 kWh per year, hence one gets closer to 10 EUR per month.

Setting up the Software

(reference)

The guide moved to the Overpass manual.

Configure Overpass User & Required Dependencies

I've never heard of liblzr-dev and this is most likely a typo of liblz4-dev. Nonetheless, the point about lz4 is absolutely right. I've added the package to the list of packages to install in the manual.

Download the Planet

I'm sorry if download_clone does not resume downloads. I though that it does so. It is basically wget under the hood which in principle is able to resume downloads.

If you use your instance locally only then calling osm3s_query might be preferable. That poses the same queries but without the overhead necessary to communicate over a network.

Configure Launch Scripts

Apparently it is not clear which moving parts exist.

Every database has to solve conflicts between concurrent requests. No optimal solution exists, but rather it depends on the expected access patterns. One quite simple is that only one process can write, arbitrarily many can read, and those that read see over their whole runtime the same state of the database as when they have started. This is the one that the Overpass API applies: writes are always from the minute diffs of the OpenStreetMap main database.

Another highly desirable property is that the whole setup next to never crashes. Thus it makes sense to separate the heavy-lifting high-risk update operation and the even more risky reading operation from the resource control. Resource control needs little CPU, little I/O, but shall be extremely reliable.

So this is why there is the dispatcher which does the resource control. If the dispatcher runs then you can pose requests through cgi-bin/interpreter and bin/osm3s_query. The former is restricted to what makes sense from through a web server, the latter is more versatile but not safe to be called from the open internet. The dispatcher will tell a writing process which blocks it can overwrite and which not, because there are still open reading processes that might read that information, even if slightly outdated.

To apply an update from the main database one needs to do two things:

discover and download new diffs from the main server
apply the diffs to the database such that it shows the newest state afterwards

The first job is done by fetch_osc.sh. The second job is done by apply_osc_to_db.sh. The rationale to split that up is that in case of something goes wrong then you may want to figure out the root cause. Either there is something wrong with the remote diffs and what you see in the diff directory are not proper diffs. Or something is wrong with the update process itself amid the diff files being in order.

If you do not plan to make sophisticated disaster investigation then the joint script for both, fetch_osc_and_apply.sh, is probably the right choice for you. This one deletes diff files immediately after applying it.

Any of the three scripts will search for dispatcher and update_from_dir in the same directory where they are situated. The working directory from which you start the script shall not matter. Unfortunately, I do not know what has gone wrong in this regard in Kai's setting.

Any of the three scripts will only start after the dispatcher. No magic delay is necessary. If you plan to start all from one script then sleep 1 after starting the dispatcher into the background does suffice. You can use run.sh from the Docker container if you want an example for a unified script for everything.

For the shutdown: in case of a proper shutdown no locks remain. So if you need to delete files after your shutdown process then there is something broken there. I'm aware of quite a number of installations where people struggled to get the shutdown proper, hence it must be too complicated today. Version 0.7.61 which is planned to be released in June 2023 will just allow to kill the dispatcher process and everything shuts down properly afterwards. For the classical shutdown please have a look at the Docker container script file.

The areas come atop of this. There is a second writing process osm3s_query --rules that creates the areas from the OSM data and subject to the rules supplied to it on the standard input. Thus, there is a second dispatcher corresponding to this which has as its only job to coordinate rewriting areas and reading areas. The second dispatcher is independent of the first dispatcher and the minute updates, but must run before the area updates.

To make the use of osm3s_query --rules convenient, there are the scripts rules_loop.sh and rules_loop_delta.sh. Running instead osm3s_query --rules via cron is also a viable idea. If you choose --progress as a parameter then you get lots of progress information, one line every about 15 seconds. If you want osm3s_query --rules to be mostly quiet then choose --quiet instead.

As with the shutdown of the main service, you can look at the run.sh how things traditionally are handled. Or use version 0.7.61 or newer where you can just kill the dispatcher.

Please give me feedback whether parts or all of this information are important or non-obvious enough to put them in the main installation manual.

Server Automation

An uncontrolled shutdown alone should not damage the database. Database updates are designed to be atomic: there is new data in extra blocks and a secondary set of index files with suffix shadow. Then there is a short moment during which these index files are linked in place and the old ones discarded. Even if the dispatcher is interrupted during that stage, it will detect its state after the next restart and sort out things. If you observe things otherwise, please tell me. I'm absolutely keen to fix bugs for the database's safety from operational circumstances that are unknown here.

So, if your system crashed:

remove the shared memory osm3s_v$VERSION_osm_base and osm3s_v$VERSION_area if they still exist. Under Linux, these are at /dev/shm/.
remove the sockets osm3s_v$VERSION_osm_base and osm3s_v$VERSION_area from the database directory if they exist.
then restart the dispatcher or dispatchers

Only if this does not fix things then you might have a database problem. After starting the dispatcher you can also re-start the update scripts.

One behaviour that is known dangerous is to run multiple dispatchers concurrently towards the same database. This is basically only possible with two different versions because of the presence of lock files. Do not do that. For that reason, the version marker will be removed in future version from the file name. This prevents the concurrent start of multiple dispatchers for the same part of the database.

Performance Verification

I suggest to use a call to /api/timestamp as prime criterion. If you get a response at all then you know that the dispatcher is up and healthy. The content of the response will inform you whether the update mechanism works.

If the output of that API call is suspect then please inspect $DB_DIR/replicate_id. This is the id of the last successfully applied diff. You can compare it to the entry in state.txt on the Planet mirror to see whether it is plausible. If you have decided to keep a local copy of the diffs then there is also a state.txt in the base directory of the diffs.

You are of course also encouraged to look into the log files that Kai has mentioned. It is just the second place where I look because it takes more time to make sense of a log then to check a one-line API response or text file respectively.

I do not use Slack so you will not get answers from me there. You might want to:

use the Overpass mailing list
ask on community.osm.org
write me an email to first name@last name.nrw (my name is Roland Olbricht)