Overpass API > Blog >

Some Performance Hints

Published: 2017-05-01

Table of content

Of the examples' pages I have checked the section about Understanding Geometry and Topology.

While most of the examples are already perfectly fine, I would like to analyse further the first example:

[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
  way(r.a:outer);
  way._(if:count(ways) > 1);
  rel.a(bw);
  out;
);

The query works, but it is slow.

First of all, I have been more surprised by the number of results in the sample region. This is relatively unusual, but of course the query should work everywhere. In a random different region are only 12 or so results.

To make finding multipolygons in Rome faster, I would like to discuss where the runtime is used, how you can find out, and what you can do to make the query faster.

Because currently the server does not tell you how long a query takes, you can check only wall-clock time. I think I will expose more execution details once I have figured out how to do so in a brief but meaningful way. Second, the wall-clock time is good enough to detect large delays. You cannot spot reliably the difference between 2 and 3 seconds runtime. But you can discriminate between 2 minutes runtime and 3 minutes runtime.

A good approach is to execute the query step-by-step and to observe where it starts to take significant amounts of time.

The first line that is executed is the second line. To cross-check whether the line does something useful, we can add a minimal out statement afterwards. Minimal out statements are out ids or out count. Both do not need any disk activity and are therefore always very fast.

[bbox:{{bbox}}];
rel[type=multipolygon];
out count;

As the first request is a matter of seconds, we can add one further statement. Now it is a little bit tricky to identify the next statement. I take the foreach statement, because the other statements all depend on what the foreach statement delivers as loop element.

[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
);

This is still relatively fast.

Now the loop can be filled:

[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
  way(r.a:outer);
);

It is time for a first optimization trial. The recurse statement inside the foreach loop does need disk activity. There are good chances that the query is faster if we replace multiple executions within the loop with a single execution before the loop. For this purpose, we store the result in an intermediate set w and reuse that set in the foreach loop to avoid the disk activity.

[bbox:{{bbox}}];
rel[type=multipolygon];
way(r:outer)->.w;
foreach -> .a (
  way.w(r.a:outer);
);

This turns out to be much slower. At the moment, I'm pretty sure that I have hit aan uninteded bug, but I have not yet investigated that further. The bottomline is that this optimization does not make sense at least in this case.

We back out to the previous step and add the next line of the original request:

[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
  way(r.a:outer);
  way._(if:count(ways) > 1);
);

The wall-clock time has been 17 seconds in my test run.

[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
  way(r.a:outer);
  way._(if:count(ways) > 1);
  rel.a(bw);
);

The wall-clock time has been about 90 seconds in my test run.

...

[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
  way(r.a:outer);
  rel.a(if:count(ways) > 1);
);
(45 Sek)
[bbox:{{bbox}}];
rel[type=multipolygon];
foreach -> .a (
  way(r.a:outer);
  way._(if:count(ways) > 1);
  ( rel.a(bw);
    .result; )->.result;
);
.result out;