Textual Results

Published: 2017-04-03, updated 2019-10-31

Table of content

Group and Count
Quite Always Nodes
Comparison Operators

Personal Remarks

I am sorry that last week there has been no blog post. I simply have overstimated my resources and underestimated the time necessary to write a useful text.

However, good news are that I managed to set up an RSS feed. I promise to do my best to return to the weekly rhythm for the next weeks. There will be a development break when it is time to push forward the next version. But we are not yet there.

Group and Count

When the German Railways had started to provide open data, one of the first things they offered had been the status of elevators. That opened the question if and where elevators were missing in the OpenStreetMap data. To get concise information, we should count elevators in this case.

At the same time this is a good starting point as one could expect that elevators are always nodes. We will come back how to cross-check that assumption later and first figure out how to count groups of nodes.

There is already a command to count results in Overpass API. As an example we will count elevators in a bounding box:

node({{bbox}})[highway=elevator];
out count;

In fact, out count is a shorthand for

node({{bbox}})[highway=elevator];
make count
  nodes = count(nodes),
  ways = count(ways),
  relations = count(relations),
  total = count(nodes) + count(ways) + count(relations);
out;

Now we attempt to get one derived element per station:

node({{bbox}})[railway=station];
foreach(
  node(around:100)[highway=elevator];
  make count
    nodes = count(nodes),
    ways = count(ways),
    relations = count(relations),
    total = count(nodes) + count(ways) + count(relations);
);
out;

Please try it, it does not work. I have added this query to explain a frequent fallacy. The foreach statement does not collect the intermediate objects in the loop, hence this query delivers only the last result. The long term fix will be to alter the semantics of the foreach statement. For the moment, I would like to suggest two workarounds.

The first one is to move the out statement into the foreach loop:

node({{bbox}})[railway=station];
foreach(
  node(around:100)[highway=elevator];
  make count
    nodes = count(nodes),
    ways = count(ways),
    relations = count(relations),
    total = count(nodes) + count(ways) + count(relations);
  out;
);

It is simple and does the job, but it is not necessarily fast. Also, this only works if we do not want to do more with the result than printing it.

The second solution is to collect the desired intermediate result in a separate bucket. I have chosen the name result here:

node({{bbox}})[railway=station];
foreach(
  node(around:100)[highway=elevator];
  make count
    nodes = count(nodes),
    ways = count(ways),
    relations = count(relations),
    total = count(nodes) + count(ways) + count(relations);
  (._; .result;)->.result;
);
.result out;

Please note that the out statement is set to read from result. I myself often tend to forget to set that. Other than that, we use the union statement to collect the per-loop content in the default set into the result set.

The query is still less useful than it could be. We do have a list of anonymous counts, but we want to know the number of elevators per station. We need to have the station node still available at the make statement.

node({{bbox}})[railway=station];
foreach(
  node(around:100)[highway=elevator]->.n;
  make count
    name = set(t["name"]),
    nodes = n.count(nodes),
    ways = n.count(ways),
    relations = n.count(relations),
    total = n.count(nodes) + n.count(ways) + n.count(relations);
  (._; .result;)->.result;
);
.result out;

For that purpose we move the collected nodes into a named set. I have chosen n as a simple name. And again, do not forget to adapt the count evaluators to the right set.

We have succeded. To celebrate that, we format the result as a pretty table.

[out:csv("name","num_nodes","num_ways","num_relations")];
node({{bbox}})[railway=station];
foreach(
  node(around:100)[highway=elevator]->.n;
  make count
    name = set(t["name"]),
    num_nodes = n.count(nodes),
    num_ways = n.count(ways),
    num_relations = n.count(relations);
  (._; .result;)->.result;
);
.result out;

Quite Always Nodes

The good news is that you have help to sort out tagging questions. Taginfo gives an immediate overview. There are indeed about 3000 ways that have a tagging highway=elevator.

According to the wiki, closed ways should be never tagged as highway=elevator. The trick is not take information from the wiki as granted but rather see this as a hint what tagging might make sense and what might be accidential.

Unfortunately, neither the wiki nor taginfo could tell us which of these ways are closed ways. We want to cross-check which non-closed ways exist. I.e., we would like to have something like:

way[highway=elevator];
way._(if:???); // we want a filter for closed ways
out geom;

As a spoiler, there is again not a proper solution, but a workaround for the moment being.

We could again use the foreach statement and the various counting properties:

way[highway=elevator];
foreach(
  node(w)->.n;
  way._(if:n.count(nodes) < count_members());
  (._; .result;)->.result;
);
.result out geom;

Let us walk through the idea of the query. We build around one important observation: A way is closed if it has less child nodes than member entries. This is not strictly logically true, because self-intersecting ways could be be non-closed and still have more member entries than child nodes. But self-intersecting ways are both deprecated and very rare, such that we can neglect that for this workaround.

We get the number of child nodes with a similar approach like we got the elevators. We query the desired elements into a named set, here again n, and they do not interfere with the rest of the query except being counted. The remainder is the helper construction to gather the result.

Unfortunately, the whole thing is unnecessarily slow. For this purpose we look up at the combinations tab of the taginfo page likely candidates for area only tags. You can cross-check with the wiki that e.g. the key building shall amongst ways only be applied to closed ways. We can filter out all of them. Their sheer number would make them rather candidates for quarterly projects. Or similar projects.

This leads us to almost a solution:

way[highway=elevator][!building][!area];
foreach(
  node(w)->.n;
  way._(if:n.count(nodes) < count_members());
  (._; .result;)->.result;
);
.result out geom;

We do not want building or area tags. But unfortunately, the Overpass scheduler makes for building tags sometimes a bad guess, bceause it appears really often. We can overcome this by deferring the negated tags:

way[highway=elevator];
way._[!building][!area];
foreach(
  node(w)->.n;
  way._(if:n.count(nodes) < count_members());
  (._; .result;)->.result;
);
.result out geom;

Comparison Operators

I had promised for last week a discussion about comparison operators. This opens up possibilities how to finally set the behaviour of the comparison operator here.

First of all, we essentially need to care only for one operator. We want that a < b if and only if b < a. Similarly, we want a <= b if and only if b <= a. Finally, we want that a <= b is true if and only if a > b is false. Hence, if we agree on rules for <= then we have rules for all comparison operators.

A couple of other rules make the comparison useful:

It shall be reflexive, i.e. a <= a for every possible object a.
It shall be anti-symmetrical, i.e. if both a <= b and b <= a are true then a must be the same as b.
It shall be transitive, i.e. if a <= b and b <= c then also a <= c.
And it shall be total, i.e. at least one of a <= b or b <= a shall be true.

For example, if comparison were not transitive and anti-symmetrical then you could not use it to sort things. Sorting is already degraded if a comparison operator is not also total.

All of these properties are straightforward for simple numbers. They were also fulfilled for strings if you sort strictly lexicographically. Most problems arise from type conversion, i.e. from autodetecting whether the involved strings shall be rather treated as numbers because they could be understood so. Some extra problems come from special values.

A simple example are the three values 2, 10 and 1m. Because 2 and 10 are numbers they compare as 2 <= 10. Because 1m is not a number, the values 2 and 1m are compared as strings, hence 1m <= 2. For the same reason, the values 1m and 10 are compared as strings, hence 10 <= 1m. This violates transitivity: 10 <= 1m and 1m <= 2, but not 10 <= 2.

As a consequence, you can have more hits for the condition t["addr:housenumer"] <= 2 than for the condition t["addr:housenumer"] <= 10, because 1m does not match t["addr:housenumer"] <= 10.

There are various approaches to get around this:

Programming languages like C++ and Java use a strong typing system such that you simply never could compare 1m as a number. We do not have type information in tags, hence such a solution would be relatively unintuitive.
JavaScript and some other languges do implicit type conversion. This is essentially equivalent to what Overpass does right now. The downside is that most simple expressions are prone to produce unexpected results.
Bash has different names for type conversion of different type, i.e. <= is always a string based comparsion. It is at least much easier to spot errors this way. On the downside, I'm not sure how many people can then understand why 2 <= 10 is false.

I'm still considering whether a bash like solution could be compelling or whether there should be some way of strong typing. Please feel free to give feedback on this or other questions. I'm always trying to shape QL such that simple tasks can be done with straightforward queries.