Loop and Group

Published: 2018-05-15

Table of content

Any Type

Often, one does not know beforehand the OSM type of a query result. In particular, almost all real world objects can be modeled as a node, a way, or a closed way depending on their spatial extent.

But have a look at this platform on the map. It is split by a railway track going through, thus it should be modeled as a relation to result in a single object:

A platform that is split by a railtrack going through

Up to now, a query must be phrased with an union statement and the conditions repeated several times in that case:

( node({{bbox}})[public_transport=platform];
  way({{bbox}})[public_transport=platform];
  relation({{bbox}})[public_transport=platform]; );
out geom;

The new syntax nwr allows to simplify this query:

nwr({{bbox}})[public_transport=platform];
out geom;

For the record: While this example works fine for some tools, in particular for Overpass Turbo, it does not work for all tools so far. In particular it does not and cannot work for JOSM. The reason is that out geom generates a slightly non-OSM-standard result syntax to enrich the result objects directly with geometry. The auxilliary ways and nodes to just help with representing the geometry can be omitted that way.

JOSM expects strict OSM standard with version numbers for all objects. It cannot do otherwise because OSM objects in JOSM should be editable, and OSM objects without the proper version number cannot be uploaded back to the OSM database.

To get a minimal result that is standard conforming, one needs to recurse down and use out meta. For most practical purposes, these last two lines do the job:

nwr({{bbox}})[public_transport=platform];
(._;>;);
out meta;

Ask again

Sometimes it is helpful to have a fallback level. For example, do you know, for a given location, how far the next post box is? Me not. Thus, we might get way too much results if we make the search radius too big or no results at all if we make the search radius too small.

To overcome this situation there is now an if statement in the query language. The condition is an evaluator:

node[amenity=post_box](around:300,51.178061,4.4214484);
if (count(nodes) == 0)
{
  node[amenity=post_box](around:1000,51.178061,4.4214484);
}
out;

The first line is a standard query for post boxes, based on the relevant tag and a radius around a coordinate. The second line contains the keyword if and the evaluator count(nodes) == 0. The fourth line is then the same thing as the first line with a bigger radius. It is enclosed in curly braces. These braces denote the conditional block of the if statement.

To lessen the mental burden of remembering the query language, the syntax for block statements has been changed to curly braces, like in JavaScript and Java. The only exception is union: because it has no keyword, it is distinguished by keeping the parentheses-semicolon syntax. The old syntax can still be used for all block statements and will not be removed.

If the first query has a non-empty result then the second query inside the statement block is never executed. The result of the first query is then printed in the last line. If the first query has an empty result then the second query is executed, and its result replaces the empty result of the first query. Then, finally, the result of the second query is printed.

The evaluator is not limited to counting. Any evaluator that does not need an element can be used. For example, you can check whether a certain tag appears in the result set or not:

node[name]({{bbox}});
if (set(t["name:de"]) != "" && set(t["name:fr"]) != "")
{
  make warning text="Multiple language specific name tags "
      + "found. Please use language preferences.";
  out;
}
else
{
  out;
}

The first line searches for named nodes in the given bounding box. The second line is the if statement with the interesting condition: set(t["name:de"]) compiles a list of all values for the tag "name:de" in the result. If there is any such tag then the result is nonempty. Likewise we check for "name:fr" values. If we have both then it is highly likely that we want to choose a language instead of just working with name.

In this case a warning message is compiled (line 4 and 5) and printed (line 6).

If not both extra French and extra German names exist then, in line 9, the result is printed.

Another example: one can spot non-numeric values if one needs numbers:

node[natural=peak][ele]({{bbox}});
if (min(is_number(t["ele"])) == 1)
{
  node._(if:number(t["ele"]) == max(number(t["ele"])));
  out;
}
else
{
  node._(if:!is_number(t["ele"]));
  make warning text="Non-numeric values for ele found: "
      + set("{" + t["ele"] + "}");
  out;
}

Line 1 is a simple combined query for the bounding box and the two tags natural (must be present and equal to peak) and ele (must be present with any value).

Line 2 is the interesting check: the expression min(...) evaluates its inner expression for each element and returns the lowest value. is_number(t["ele"]) is 1 or 0, depending on whether t["ele"] is a number or not. Because 0 is lower than 1, the whole expression is 1 if and only if elements exist and for all elements the value of ele can be understood as number.

If so then lines 4 to 5 are executed. Line 4 is a check similar to this blog post section. Line 5 outputs these nodes with maximum value.

If not so then lines 9 to 11 are executed. Line 9 drops all numeric values from the results: We query for all objects of type node that are from the previous result (._) and for which is_number(t["ele"]) is false. In line 10 an error message is compiled from a fixed text and a semicolon separated list of all expressions "{" + t["ele"] + "}", i.e. all ele values of the in ._ remaining objects enclosed in curly braces.

Groups

Sometimes we rather want to sort the found objects into groups. A very typical case is the one asked for in the Github issue 395:

In a given bounding box, which values of building:condition exist and how many buldings are mapped for each value?

The existing foreach loop does not solve the problem because it treats each element individually. But also the often used set("{"+...+"}") trick does not help: There is no way to convey extra information beside the expressions from the evaluation.

This is why there is now a loop called for: Syntactically similar to if, it gets a single evaluator as argument. The example mentioned before can be written as:

way[building](23.6948,90.3907,23.7248,90.4235);
for (t["building:condition"])
{
   make stat "building:condition"=set(t["building:condition"]),
       count=count(ways);
   out;
}

The first line is a standard query for objects of type way with tag building inside a specific bounding box. Line 2 is the new structure: for is the keyword. The evalutor is the rest of the line and executed for each element. There are four different values that appear, average, good, poor, and the empty string. Thus, the block of statements is executed four times. Each time the block contains the subset of the input set of all the objects that evaluate to the respective value.

Thus, we can count the objects with count(ways) in line 5 per value. In line 4, the expression set(t["building:condition"]) delivers the value of the tag for the subset.

Because it is sometimes expensive or difficult to re-compute the loop evalutor there is a shortcut <Setname>.val to get the value computed by the for loop:

way[building](23.6948,90.3907,23.7248,90.4235);
for (t["building:condition"])
{
   make stat "building:condition"=_.val,
       count=count(ways);
   out;
}

Technically, this is a property the variable is set to by the for statement. If the variable is overwritten, for example by a statement result stored there, then the value is lost.

If we want to do the same thing to list highways by their classification then we run into the problem that the count of way is misleading. Short sections in complex junctions get unduly important. A much more precise indicator is to get the summarized length per highway classification.

For this reason, the element-dependent function length() is available:

way[highway]({{bbox}});
for (t["highway"])
{
   make stat highway=_.val,
       count=count(ways),length=sum(length());
   out;
}

To get the total length, we use in line 5 the aggregator sum(...) over the element specific function length().

There is also a special evaluator to sort elements by the keys they are tagged with. It will be covered in a later blog post.

Finally, we have the slight problem that there are relatively few tools out there which can take advantage of free-form XML. For that reason, one can tell Overpass API to output CSV instead. The last example restated such that you can make a spreadsheet from that:

[out:csv(count,length,highway)];
way[highway]({{bbox}});
for (t["highway"])
{
   make stat highway=_.val,
       count=count(ways),length=sum(length());
   out;
}

The important thing here is the first line: [out:csv(...)] in the first line is the declaration to control the output format. The declarations are always separated from the statements by a single semicolon. Within the parentheses, you can simply list the tag values (or some special values) that are used to select the columns to display. We have chosen the tag values ourselves with the make statement.

Loops

Overpass API does not get a general conditional loop, aka while. The rationale for this is that conditional loops are very prone for infinite looping. While it is annoying when you have an infinite loop eating up your CPU at home, it is a major problem for a shared public resource. Even a surprisingly tiny fraction of clumsy users could clog up a public service with infinite loops.

The most common use case for a conditional loop in queries is to discover a complete network after it has already been discovered partially. The example we work along is to find a complete street in a situation where another street of the same name exists in the same municipality. Searching for the name within the municipality delivers all of them at once. But these are clearly multiple distinct objects.

area[name="Berlin"];
way(area)[name="Hauptstraße"];
out center;

On the other hand, choosing a single object of these does not help either because OSM splits a street of the same name if its other properties differ in different sections. A tiny subset of a probably much longer street:

way(547372893);
out geom;

We could get the immediately adjacent segments by recursing up and down, but we are still not even close to be complete:

way(547372893);
node(w);
way(bn)[name="Hauptstraße"];
out geom;

Line 1 is the query for a single object. Line 2 is a recurse down to get all the nodes of the previous result. Line 3 is a query for all the ways that have a node from the previous result and have for the tag name the value Hauptstraße. Line 4 outputs the result.

This is where complete is useful. The loop with the block of statements is now executed until the result does not change anymore between two loop runs:

way(547372893);
complete
{
  node(w);
  way(bn)[name="Hauptstraße"];
}
out geom;

Lines 2 and 3 of the previous query have been moved into the complete block. This means that the interpreter repeats to execute the recurse-down-recurse-up pair until the result of the recurse-up is the same as after the previous loop.

Now assume you had forgotten the [name="Hauptstraße"]. This means that you get inadvertently not just the accordingly named ways but all connected ways. To avoid that you draw the street network of at least complete Eurasia, the complete loop has always an upper loop count limit. The current default is 4096, but this may be changed later to keep control of the load on the public instances.

You can set that limit to your value of choice with a parameter in parentheses after the keyword. In this example, this lets us get only a part of this cluster of Hauptstraße named streets. If you are curious then try values 1 to 12 for the value 3 in line 2. You will see the connected streets growing progressively.

way(547372893);
complete(3)
{
  node(w);
  way(bn)[name="Hauptstraße"];
}
out geom;

There are many more variants to use complete. For example, you can use an around filter in the block to bridge over small gaps if the street is not connected:

way(547372893);
complete
{
  way(around:30)[name="Hauptstraße"];
}
out geom;

Or a completely different thing: People had asked to get an entire stream networks, and so on.

Next week we will continue with timestamp related block statements.