Overpass API > Blog >

Finding Dark POIs

Published: 2017-03-06

Table of content

Design considerations
Hands On
The Real Work
Caveats
The Shortcut
Feedback

The mapper who has mapped this node may wonder why it never has appeared on the map:

node 4379801321: 51.5133578, -0.1015195
  name = Go Native

Well, although it has a name tag, and thus at least some interesting information, we need more. To get an object rendered, a necessary condition is that the object has a location and a type. The type is missing here.

I would like to call these objects dark POIs, in analogue to dark matter.

Design considerations

Do these elements pose a problem? We will se later on that they are so sparse that this does not matter. However, somebody has put effort to add information here. And we cannot make sense of it. Thus, we could improve the map if we could re-observe the object or get in touch with the mapper.

The deeper reason why this can happen is that OpenStreetMap allows for free-form taggging. This is for a reason: It turned out that there are more features of unexpected types outside than you and me might imagine.

For example, if you would like to create a map of subway lines then you may center you attention about tunnels. But a subway with a full-fledged service quality running every few minutes can be been constructed as a monorail. Even a suspended monorail.

The Wuppertaler Schwebebahn, a subway that is neither sub-surface nor standing on rails.

There are also subways on rubber tires and buses calling in subway tunnels. The free form tagging enables us to model them all as they are in reality.

In fact it is even better: If you design a tool then OpenStreetMap and Overpass API will help you to adapt your data model to reality: you can find all the objects that violate your assumptions and thus update your assumptions quickly. But this is subject of a later blog post.

Hands On

Hence, our aim should be to recover the information that is missing. This requires two steps:

  1. Find non-conforming data
  2. Contact the mapper or if not possible then re-survey the data.

Please refrain from just deleting such data. That would destroy the clue that there is data missing.

Now let us get hands on the first step. The second is out of scope for this blogpost.

Suspicious objects are objects that are not well known. For ways, a very simplistic apporach could be the assumption that all ways are streets or buildings:

way({{bbox}})
  [!building]
  [!highway];
out geom;

We test this on purpose on a small bounding box.

What is this query composed of? It fact it is a little computer program that consists of two commands:

The result has around 400 ways. Thst is pretty a lot for exceptions. Clicking on some of the features reveals further commonplace tags:

way({{bbox}})
  [!building]
  [!"building:part"]
  [!highway]
  [!railway]
  [!waterway];
out geom;

Note that building:part is in quotation marks. Basically, all keys should have been in quotation marks to avoid that their special characters are misinterpreted. But letters are no special characters, hence Overpass API tolerates strings of pure letters also without quotation marks.

The Real Work

It turns out that a lot of ways have no tags at all. Are all these ways mapping errors? No, they might be just members of relations. Collecting them needs some extra work:

rel({{bbox}})->.foo;
( way({{bbox}})
  [!building]
  [!"building:part"]
  [!highway]
  [!railway]
  [!waterway];
 - way({{bbox}})(r.foo); );
out geom;

There are two differences to the previous query. The first line is new: The statement rel({{bbox}}) selects all relations in the bounding box. The suffix ->foo lets the query engine store the result in the set foo - foo has the job of a variable here.

The second difference is that way({{bbox}})... is in parentheses and an extra line follows. The extra line select all ways in the bounding box (the way({{bbox}}) part) that in addition are a member of one or more of the relations stored in foo. Finally, we have the ( ... - ... ) parentheses and the minus. This is the difference statement: The result of the difference statement are all the objects from the result of its first statement that are not contained in the result of the second statement. In our case: All ways with none of the listed tags (the first statement) that are not a member of a relation from our bounding box (the second statement).

Try it yourself!

To faciliate finding further commonplace tags, we can omit all but tags in the result. For this, we switch from out geom to out tags.

rel({{bbox}})->.r;
( way({{bbox}})
  [!building]
  [!"building:part"]
  [!highway]
  [!railway]
  [!waterway];
 - way({{bbox}})(r.r); );
out tags;

Choose to get the data shown when the browser prompts you.

From this point we repeat the process until we have excluded all the elements that we can classify based on their most commonplace tag.

rel({{bbox}})->.r;
( way({{bbox}})
  [!amenity]
  [!barrier]
  [!building]
  [!"building:part"]
  [!"building:wall"]
  [!highway]
  [!landuse]
  [!leisure]
  [!man_made]
  [!natural]
  [!power]
  [!railway]
  [!shop]
  [!surface]
  [!tourism]
  [!"roof:edge"]
  [!"roof:ridge"]
  [!waterway];
 - way({{bbox}})(r.r); );
out geom;

Nodes and relations will have their own sets of tags.

way({{bbox}})->.w;
( node({{bbox}})
  [!"addr:street"]
  [!amenity]
  [!barrier]
  [!highway]
  [!man_made]
  [!natural]
  [!shop]
  [!tourism]
  - node({{bbox}})(w.w); );
rel({{bbox}})->.r;
( ._ - node(r.r); );
out;

This is one possibility for nodes. We need to subtract twice other data from our potential result set. First we remove the nodes referred by ways, because these are by far the most nodes. Then we remove the nodes referred by relations.

rel({{bbox}})
  [!amenity]
  [!boundary]
  [!building]
  [!landcover]
  [!leisure]
  [!network]
  [public_transport!=platform]
  [public_transport!=stop_area]
  [!restriction]
  [type!~"^(bridge|multipolygon|route|site|tunnel)$"]
  [!waterway];
out geom({{bbox}});

And this is one possibility for relations. Currently all classes of relations should have tags. Thus we do not need to exclude any relations because they are members of other relations.

Please note the line with type: This would be identical to a list [type!=bridge][type!=multipolygon]..., but this is faster: It checks all expressions at once using a so called regular expression. These come from the underlying operating system standard, POSIX, and are beyond this blog post.

In the last line, we use a ({{bbox}}) after geom. This restricts the recieved data to the asked-for bounding box. You have often long distance relations in your results. If you had them completely then you would have to handle orders of magintudes more data in an orders of magnitudes larger bounding box.

Caveats

We are after all about a so small set of elements that we can treat each single way with diligence. Do not forget that there might be so many mappers to contact as elements are there.

This is the reason why the list of tags will vary with the bounding box. For example, building:wall is not that commonplace - you should check for any tag in question the numbers on Taginfo. But I do accept for the moment that it is intended for 3D mapping and makes sense there. If you were in 3D mapping you may want to walk through those objects instead of accepting them as of well-known type.

Please note also that we easily might accept objects whose tagging does not make sense. For example, the tagging type=multipolygon lets us accept relation 5802680. But that relation does not have any useful tag for classification. It has just a name and the property being a multipolygon.

To sum up the caveats:

The Shortcut

Version 0.7.54 offers a faster approach to query for objects with insufficient tagging. You can use the number of tags as filter criterion. Get all nodes which have a name tag but no other tag:

node({{bbox}})
  [name]
  (if:count_tags()==1);
out;

The new feature is represented by the line (if:count_tags()==1). This consists of three parts:

Of course this works also with ways. More interesting is to search for ways with no tags at all. To get a relevant result set, we must exclude ways that are part of relations:

rel({{bbox}})->.foo;
( way({{bbox}})
    (if:count_tags()==0);
 - way({{bbox}})(r.foo); );
out geom;

This allows you to find suspect objects without listing explicitly well-known tags. As it needs anyway time to fix these objects, it is a good balancy to make efficient use of time for data gardening. It is straightforward to apply this to other tags that do not classify objects (like area=yes). How the long way helps to figure out whether your data model fits to the actual OpenStreetMap data will be subject of a subsequent blog post.

Feedback

As nebulon42 has suggested, you can search for a lot of old-style multipolygons this way:

relation({{bbox}})
  [type=multipolygon]
  (if:count_tags()==1);
(._;>;);
out;

Thanks a lot.