Some State of the Public Transport

Published: 2017-04-24, updated 2019-10-31

Table of content

Maybe for Buses
Not Calling for Passengers
Somewhere in Europe

Remarks

I have just finished to reorganize the examples page in the wiki. The reorganization may or may not help in navigating the page. My motivation is to understand what the goals are that the Overpass API users are after. I will revise the individual items later on.

As it turns out, one large subject is to understand and count tagging combinations. This is a good opportunity to answer some recurring questions about how public transport tagging is actually in the field.

Maybe for Buses

The most loudly announced issue in the Public Transport v2 scheme is the transition from highway=bus_stop to public_transport=platform and public_transport=stop_position. The former is for bus stops that are modeled by their sign beside the road. The latter is for bus stops that are modeled by an estimation of the stop position of the vehicle. We will first take care of the platforms.

Along with public_transport=platform these stops should have been tagged bus=yes to distinguish these platforms from platforms for other means of transport. So we have already three tags to care for: highway=bus_stop, public_transport=platform, and bus=yes.

A clean solution would be to check any of the eight possible combinations. However, this example should tell you to notice when we do self-deception. I do it right now and take the unchecked assumptions that:

Names serve as an indicator how good the general mapping level is. Hence, we are only interested in the fraction of named stops amongst all stop. These are stops tagged with any of the two other tags or both.
The tag bus=no or bus=yes only makes sense on elements anyway tagged with public_transport=platform.

This means we have to master the following building blocks:

Collect all nodes that have any of the two tags public_transport=platform or highway=bus_stop!
Figure out the fraction of named objects amongst them!
Tell the fraction of nodes amongst all public_transport=platform nodes that have a bus=yes or bus=no tag!

From these requirements I suggest:

area[name="Antwerpen"];
( node(area)[highway=bus_stop];
  node(area)[public_transport=platform]; );
node._[name]->.with_name;
node._[public_transport=platform]->.pt;
node.pt[bus=no]->.not_bus;
node.pt[bus=yes]->.explicit_bus;
make count
  all=count(nodes),
  with_name=with_name.count(nodes),
  pt=pt.count(nodes),
  not_bus=not_bus.count(nodes),
  explicit_bus=explicit_bus.count(nodes);
out;

Let us reassign the lines to the requirements:

Lines 2 to 3 contain the usual union statement with two usual query statements inside.
Line 4 copies from all found elements only those that have a name tag. This is pretty fast by using the item filter to reuse the previous result.
In a similar manner, we copy in lines 5 to 7 in a cascade elements to get the number nodes tagged as public_transport=platform and bus=yes or bus=no.

For the friends of advanced presentation, a variant that really writes fractions in the usual percent notation:

area[name="Antwerpen"];
( node(area)[highway=bus_stop];
  node(area)[public_transport=platform]; );
node._[name]->.with_name;
node._[public_transport=platform]->.pt;
node.pt[bus=no]->.not_bus;
node.pt[bus=yes]->.explicit_bus;
make count
  all=count(nodes),
  with_name=with_name.count(nodes)/count(nodes)*100 + " %",
  pt=pt.count(nodes)/count(nodes)*100 + " %",
  not_bus=not_bus.count(nodes)/pt.count(nodes)*100 + " %",
  explicit_bus=explicit_bus.count(nodes)/pt.count(nodes)*100 + " %";
out;

It looks like at least bus=yes has been carefully applied everywhere - and that the Antwerp tram and light rail is modeled in a somehow different way.

Not Calling for Passengers

Now to the nodes with public_transport=stop_position - it is time for another unchecked assumption: there are no stops where passengers can neither board nor alight (really? And houses always have entrances?) Hence we search for stop positions that have no nearby platforms.

This can be archived in the following steps: First, being close to is always reciprocal. Thus, it is always worth consideration whether we search for one side or the other first. I opt for searching for platforms here. Second, platforms can be nodes, ways, or relations. Thus we need the usual construction of an union statement and multiple query statements.

We cut out of all the stop positions those stop positions that are close to a platform, i.e. at most 10 meters away from a platform:

area[name="Milano"]->.a;
( node(area.a)[public_transport=platform];
  way(area.a)[public_transport=platform];
  rel(area.a)[public_transport=platform]; )->.platforms;
( node(area.a)[public_transport=stop_position];
  - node._(around.platforms:10)->.matched; )->.orphans;
make count
  all=count(nodes),
  orphans=orphans.count(nodes);
out;

This number may justify the hypothesis that there are still a lot of unmatched stop positions.

Somewhere in Europe

It is time to cross-check whether our test cities are representative. For the sake of comfort the two queries can be merged into a single query. To get a concise overview it is best to take a sample fo cities across Europe and to make a table out of the results:

[out:csv("name", "all", "orphans", "with_name", "pt", "not_bus", "explicit_bus")];
( area[name="Hamburg"]["admin_level"=4];
area[name~"^(München|Köln|Milano|Napoli|Birmingham|Manchester|Barcelona|Antwerpen)$"]["admin_level"=6];
  area[name~"^(Lille|Lyon|Marseille)$"]["admin_level"=7];
  area[name="Rotterdam"]["admin_level"=8];
)->.areas;

foreach.areas->.a(
  ( node(area.a)[public_transport=platform];
    way(area.a)[public_transport=platform];
    rel(area.a)[public_transport=platform]; )->.platforms;
  ( node(area.a)[public_transport=stop_position];
    - node._(around.platforms:10); )->.orphans;
  ( node.platforms;
    node(area.a)[highway=bus_stop]; )->.stops;
  node.stops[name]->.with_name;
  node.stops[public_transport=platform]->.pt;
  node.pt[bus=no]->.not_bus;
  node.pt[bus=yes]->.explicit_bus;
  make count
    name=a.set(t["name"]),
    all=stops.count(nodes),
    orphans=orphans.count(nodes),
    with_name=with_name.count(nodes),
    pt=pt.count(nodes),
    not_bus=not_bus.count(nodes),
    explicit_bus=explicit_bus.count(nodes);
  out;
);

I got on 24 Apr 2017 the results:

name	all	orphans	with_name	pt	not_bus	explicit_bus
Manchester	1	0	0	0	0	0
Birmingham	3991	18	3933	142	0	9
Barcelona	3397	766	2685	574	0	360
Marseille	2330	71	2297	712	1	7
Napoli	1710	344	1235	70	0	65
Lyon	3748	339	3712	3570	0	3522
Milano	4529	535	4007	841	0	687
München	2215	71	2206	128	0	122
Lille	2873	149	2842	2711	0	450
Antwerpen	7554	111	7519	7521	0	7300
Rotterdam	737	481	733	137	0	94
Köln	1659	307	1648	528	0	135
Hamburg	3766	410	3717	1150	1	1110

Hence, we can state that

Names found indeed almost universal acclaim. Only some cities in southern Europe have relatively fewer names. And even them are around 70%.
The public_transport key based tagging is not so universally accepted. Some cities have hardly any stops at all tagged with public_transport. Other cities have public_transport but only few of them marked as bus=yes.
Next to nowhere exist platform nodes with bus=no.
The fraction of orphan stop positions without platforms also varies widely between cities.