Overpass API > Blog >
Published: 2020-11-18
For a software developer, it is always good to learn how the users actually use the software. From that point of view, the analysis by StreetComplete has been very instructive.
The entry in the WeeklyOSM even sounds like there were an opportunity to get the OVerpass API overall faster with little effort.
It has shown that there are some misunderstandings still around, even with very experienced users. Thus I would like to address them in the following.
The whole point of server side preprocessing is that you want to get less data, not more.
The given scenario is to filter all OSM data within a hundred meter or so for literally a hundred different criteria. Effectively, many features are potentially downloaded over and over again, thus in the end it is as much data transmitted as would have been anyway for a complete download, just with much more computation server side.
In fact, this approach is inferior already much earlier: - There is substantial connection overhead: round trip times, handshaking, headers in both HTTP and the JSON response, connection management by the server and so on. - In this case much worse, the server is reading the same data again and again from the disk. The total processing time makes it unlikely that the whole needed data resides in the cache. - Data might get inconsistent to each other, because the requests might be routed towards different servers or an update of the data has been applied. - The broken-down requests deprive the server of optimization opportunities, because the server cannot know which requests will follow.
So what to do else?
If your filter criteria line up more or less well with the Overpass API features then build one large request or at least few large requests. With a statement pair like `make marker_42; out;` you can have as many markers to group the response as you want.
If you anyway have your own filters on data within a few hundred meters then it makes sense to download the full data and filter client-side. Whether you are better off with the Overpass API or the main database API depends largely on whether the ultimate purpose is editing - the main API is reserved for editing purposes only, to avoid or mitigate overload problems.
## Flat Worlds, Ball Shaped WorldsThere is [https://github.com/drolbr/Overpass-API/pulls](amongst the open pull requests) no obvious candidate for a huge performance improvement. After reading all of the comments from [https://github.com/westnordost/StreetComplete/issues/1514](the issue in StreetComplete) that is about the performance analysis, it likely refers to [https://github.com/drolbr/Overpass-API/pull/167](this) quite old pull request. This is not a general improvement, but a heuristic for pre-filtering for only the `around` filter.
The problem with that optimization is that removes valid results, i.e. the computation is invalid.
Let's have a look at 60°N. It is convenient because cosine is 0.5 for 60°, i.e. the latitude at 60° has a total length that is only half of the equator. In the bounding box model, one should expect that the eastmost point from 60°N 0°E that is no further away than 11.111111 km is 60°N 0.2°E. In fact, rounded for OSM precision, it is 60.00015°N 0.2000003°E. In other words: a bounding box needs to extend up to 0.2000003°E instead of just 0.2°E. While this is already significant with OSM precision, the effect gets quickly stronger both with approaching the poles and with growing distances.
There are probably solutions for this: The best would be to like a blunt plus one percent for up to 60 degrees latitude from equator and turning off the feature beyond 60 degrees. A more sophisticated approach would rely on the internal block structure of the quadtiles. The whole thing must take segments into account as well.
So there is simply too much work left for an, all in all, rather corner case.
## OptimizationsPlease not that many suggested optimizations are indeed rather tradeoffs. I.e. if I favour a certain special case then many other cases will be executed slower. I avoid those changes whenver possible.
Are there other general optimizations in the pipeline? Any answer is pretty close to vaporware, thus I'm reluctant to tell plans.
Having ids on each and every node makes for a giant data bloat. Depending on the exact data model after the transition, the data would shrink by 50% to 80% without any data loss. Joto and I [...](have presented) this observation on the SotM in Milano.
This is expected to bring a speedup of factor two to five for all operations that involve geomtry for ways. The effort comes from the bookeeping in all edge cases of changesets, no matter how obscure. I'm working on this in the branch [...](geometric ways), together with hardening against updating trouble from occasional bugs in the replication of the main database.
Our data model around relations has a separate flaw that makes working with the hsitory basically impossible. I have not made any final decision on the data model afterwards. The aim is to fulfill requests on attic data for relations in a speed similar to the same request on current data.
Both together are already a program for up to mid-2021 at best.
I'm back with providing updates. After more than a year I'm proud to present a new release. This release provides some smaller new features.
This is not exactly in line with the strategy of: release early, release often The rationale is that all the substantial planned changes will change the underlying database model. This ia a lot of work and bandwith load for all people running their our instances. Thus I hope I can bundle those changes in one large later version.
A first step are the versions 0.7.56.100x (as opposed to the 0.7.56.x line presented here). I do not recommend them yet for private installations because there are no clone files for it. They feature more flexible block sizes. That way they can accomodate the area description for Antartica which had been too large due to a projection artifact. At the same time they are faster because on average less data needs to be decompressed per request. The code currently runs on the z.overpass-api.de instance.
Back to this version, there are three new features: You can measure and work with the angles of ways now. You can recurse on only selected nodes of a way. And some extra shortcuts make it easier to get some but not all types of objects.
It is now possible to measure angles in ways. In the example we query for all ways that have a very acute angle:
way({{bbox}})
(if:lrs_in(1,per_vertex(angle() < -170 || angle() > 170)));
out geom;
Why does this look so involved? First of all, we search for all ways in a given bouding box, contained in the Overpass Turbo link. The angle related part happens in the filter (if:...). This filter is evaluated once for every object, i.e. for each way here, but angles are different per vertex of the way.
For this reason, the angle related condition is stated within per_vertex(...). That evaluator evalutes its expression once for every vertex of the way and returns the results as a semicolon separated list. For illustrative purposes, an example for an (almost) retangular building and for an almost straight street:
way(id:266149365,194926851);
convert info desc=is_tag("highway")?"highway":"building",
angles=per_vertex(angle());
out geom;
You can see the built list of angles in tag angle of each of the two created objects. It is almost zero for all angles for the street, and it mixes almost zero with almost 90 degrees for the building.
A remark to values: I decided to use turning angle as the mental point of view. I.e. a straight road is understood as having an angle close to zero at the respective vertex, a sharply turning road has a higher angle, and a U-turn is having an angle close to 180 degrees. Consequently, acute angles in buildings are also close to 180 or -180 degrees respectively.
The approach is to simplify these numbers to boolean zero or one already in the expression of per_vertex. We put a boolean expression like angle() < -2 in the argument of per_vertex (refined to catch both negative and positive large angles). This way we get zero for almost zero angles and one for all other angles:
way(id:266149365,194926851);
convert info desc=is_tag("highway")?"highway":"building",
angles=per_vertex(angle() < -2 || angle() > 2);
out geom;
We can now use the here documented evaluator lrs_in(1, ...) to check for each way whether there is at least one vertex in a way that fulfills the spike condition. This way, we keep only those ways that have a very acute angle.
For your convenience, a complete example to query for non-rectangular buildings:
way({{bbox}})[building]
(if:lrs_in(1,per_vertex(
angle() < -92 || (angle() > -88 && angle() < -2)
|| (angle() > 2 && angle() < 88) || angle() > 92)));
out geom;
In this case, it could be interesting to figure out where the non-rectangular vertices are. We do this for a single way, because where houses are built wall-to-wall, it is probable that one looks on the wrong building otherwise:
way(60538085); out geom;
Now we select all nodes for which the way has a non-rectangular and non-zero angle in that node:
way(60538085);
node(w)(if:lrs_in(id(),set(per_vertex(
(angle()<-92 || (angle() >-88 && angle() < -2)
|| (angle() > 2 && angle() < 88) || angle() > 92)
? ref() : 0))));
out geom;
There are more evaluators that can be applied per vertex of a way: With the evalutators for pos and ref of an element, one can get out of a recursion only elements on specific positions. We can ask for only the first and the last member of the way just mentioned before:
way(60538085);
node(w)(if:lrs_in(id(),set(
per_member(lrs_in(pos(),"1;"+(count_members()-1))?ref():0))));
out geom;
We use the filter to accept only the ids of those objects that are a member of the way on the desired positions. They are taken from a list constructed for this purpose with per_member.
Why do we not use per_vertex again? The evaluator per_vertex is designed for angles: It skips the first position always and the last position for open ways, because there is no angle defined there. By contrast, per_member is always executed once for every member.
Inside per_member, a list evaulator is used to select the pos from the expected positions 1 or count_members()-1. If it matches, then the id is returned, and that adds the id to the list of accepted ids. Otherwise zero is returned, and this is never a valid id.
For your convenience, there is a shortcut integrated in the recurse filter, as shown in this example. Negative values are counted from the tail of the way, i.e. -2 is the last but one node of the way:
way(60538085); node(w:1,-2); out geom;
The shortcut is not available for relations yet, because I have not figured out so far a compelling syntax for that.
The introduction of the nwr shortcut has improved the readability of queries. Thus the language goes further down that road: there are now shortcuts nw, nr, and wr available. Please feel free to experiment in examples like this: Exchange one shortcut for another and watch how the result changes!
nwr({{bbox}})[amenity=place_of_worship];
out center;
To make the language more consistent, the same set of shortcuts can be used as argument for the count evaluator:
[out:csv(cnt,amenity)];
nwr({{bbox}})[amenity];
for (t["amenity"])
{
make stat cnt=count(nwr),amenity=_.val;
out;
}