Overpass API > Blog >

Working with Numbers

Published: 2017-03-13

Table of content

A Powerful Example
Maximum Speed
What has gone wrong?
Maximum Peak
Your Move

First of all, I would like to acknowledge nebulon42 for giving feedback. I have added the example to last week's blog.

A Powerful Example

We would like to make a map of the long distance power transmission lines, like the following one:

Long distance power transmission lines in the Netherlands

We can collect with a regular expression query all lines with more than 100 kV:

area[name="Nederland"];
way(area)[power=line][voltage~"^[1-9].....$"];
out geom;

But this is hard to read for all those that do not work all day with regular expressions. In addition, this would not match hypothetical transmission lines of 1000 kV or more. We would rather like to use the familar greater-equal operator:

area[name="Nederland"];
way(area)[power=line](if:t[voltage]>=100000);
out geom;

This query basically works. However, we get now more results than before. What is going on here? What is going wrong? To enable you to figure our yourself is the purpose of this blog post.

Maximum Speed

We start to have a look at the maxspeed tag on motorways around Cologne:

way({{bbox}})[highway=motorway]
  [maxspeed](if:t[maxspeed]>120);
out geom;

Let us walk through the syntax:

The new part is (if:t[maxspeed]>120). Again (if:...)is the shell to select exactly those elements for which the condition is true. The expression t[maxspeed] evaluates for each element to the value of its maxspeed tag. It would evaluate to the empty string for elements without maxspeed tag but we have already narrowed down to those with maxspeed tag. > compares the expressions on both sides. Finally 120 evaluates always to the number 120. All in all, we ask for all ways that have as maxspeed tag value a value greater than 120.

More than expected of the roads seem to allow for high speed. But a simple cross-check should make you suspicious:

way({{bbox}})[highway=motorway]
  [maxspeed](if:t[maxspeed]>200);
out geom;

Apparently, there is something going wrong here. This is why there are some extra tools to avoid these things:

way({{bbox}})[highway=motorway]
  [maxspeed](if:!is_number(t[maxspeed]));
out geom;

Compare to the previous query! The new part is !is_number(...). The expression is_number(...) evaluates the value it gets to whether it is a number. The shrek is logical negation. All in all we ask here for all ways that have as maxspeed tag value a value that is not a number.

What has gone wrong?

We get quite a large result, but not necessary a clue which values exist. To solve this we can employ another new tool:

way({{bbox}})[highway=motorway]
  [maxspeed](if:!is_number(t[maxspeed]));
make taginfo_of_cologne values=set(t[maxspeed]);
out;

The new thing here is the third line. It consists of three parts:

The expression set(t[maxspeed]) does the real job: set(...) is a so-called aggregator. The job of an aggregator is to evaluate its argument once for each element in the previous result and then to produce a single string that somehow combines all the results. The particular behaviour of set is to append all found values in alphabetically order separated by semi-colons.

Altogether, we go over all the found ways, look at their maxspeed tag and tack all found values together with semi-colons. The result goes in a specified tag of a dedicated element. This should result in:

  <taginfo_of_cologne id="1">
    <tag k="values" v="none;signals"/>
  </taginfo_of_cologne>

Why have these values interfered with the >200 query? At the moment, the greater-than comparsion tries to be smart: If it has a number on both sides then it compares both sides as a number. If one or both sides are strings that cannot be interpreted as number then these strings are compared lexicographically. And both none and signals are in order after 200, thus they pass the filter.

If this behavious makes sense depends on whether there are use cases for string comparison. If you know of any such use cases then please send them. Otherwise we may have the latitude to change that behaviour such that the naive approach works. The hard part is to get the behaviour both intuitive and logically consistent, not the implementation. And if a thing is intuive depends on the use cases.

Maximum Peak

Another application of numbers in tags is the elevation tag ele. We would like to find the highest peak in a given region. A region with quite high peaks in Germany is Baden-Württemberg.

With the approach from the previous section we only can find quite high peaks:

area[name="Baden-Württemberg"];
node(area)[natural=peak](if:t[ele]>1000);
out geom;

In the result we have the Sickersberg, although it is only 978 meters high. What has gone wrong?

The tag value 978 m is not a number because it contains the explicit unit m. As a string, 978 m is lexicographically after 1000. Thus, the engine has included this node in the result. We should take this into account:

area[name="Baden-Württemberg"];
node(area)[natural=peak](if:number(t[ele])>1000);
out geom;

After fixing the issue of not-numbers, we still want not all quite high but only the highest peak:

area[name="Baden-Württemberg"];
node(area)[natural=peak](if:number(t[ele])>1000);
node._(if:t[ele]==max(t[ele]));
out;

We achieve this with a standard trick from mathematics. Let us walk through the query:

Finally, what values are the problem?

area[name="Baden-Württemberg"];
node(area)[natural=peak](if:is_tag(ele) && !is_number(t[ele]));
out geom;

Reduced to the essential:

area[name="Baden-Württemberg"];
node(area)[natural=peak](if:is_tag(ele) && !is_number(t[ele]));
make debug values=set(t[ele]);
out;

Your Move

Walking through the list of the last query we see that there are two kinds of values we need to care for:

We proceed with another example to collect more problem classes.

Let us revisit the query from the beginning:

area[name="Nederland"];
way(area)[power=line](if:t[voltage]>=100000);
out geom;

We now know that the problem are the non-numeric values. And we can fix that:

area[name="Nederland"];
way(area)[power=line](if:number(t[voltage])>=100000);
out geom;

But we are now interested in the non-numeric values:

area[name="Nederland"];
way(area)[power=line](if:is_tag(voltage) && !is_number(t[voltage]));
make debug values=set(t[voltage]);
out geom;

The list looks odd: 380000 appears multiple times, and 0 is somewhere in between, as opposed to any imaginable order. The reason are ways like this with a tag value 380000;110000.

We can overcome that problem:

area[name="Nederland"];
way(area)[power=line](if:is_tag(voltage) && !is_number(t[voltage]));
make debug values=set("{" + t[voltage] + "}");
out geom;

This puts curly brackets around each value and makes values with semi-colons distinguishable.

Ways with semi-colons are the third class of values we need to keep in mind. And do not forget special values like none and signals.

I have two questions I would like to survey for: How shall we treat these four kinds of special values in Overpass API? Treating everything that has digits as a number does not solve the problem, because it would paint over e.g. semi-colon values instead of pointing at the problem.

The second question is: What to make out of the less and greater operators? I will open next weeks blog post with a short history of comparison functions in programming languages. None of the concepts is without downsides. Which one is best depends on use cases. In particular: if you want to retain comparison for strings then please tell me use cases.