Overpass API > Blog >
Working on the examples made some progress. So does also the conversion of enhancement suggestions to new features.
There is quite a huge discussion about a new inconvenience on another important OSM tool, HDYC. As the problem of personal data or potential personal data also affects Overpass API, I would like to dedicate this week's blog on the issue.
As we do not get any further just piling up opinions and personal taste, I would like to go back a step to see the bigger picture: What is OpenStreetMap?
Our standard answer are: It is a comprehensive and free database of general purpose geodata. And it is even more the community of people concerned about growing and maintaining it.
So why should be have an open database? Beside the convenience, there is a more fundamental aspect: Trust.
Or with an historical bonmot: Nobody gets fired for buying IBM. The phrase was coined by the then-incumbent IBM. It means that everybody else, in particular your boss, will have infinitely more trust in the incumbent than in any competitor. Tracing this further does explain why software markets tend to have monopolies or oligopolies.
This is where Open Source kicks in: It remains available even if your supplier goes bankrupt or changes his mind or whatsoever. You can check that it does and only does what it pretends to do. And more. Hence you can trust it.
There are a lot of other aspects where design decisions have been governed to deserve the users' trust. A whole discipline of art and science is about storing documents such that you can later proof they have been unaltered.
This is an important aspect of OpenStreetMap as well. We do have the history and user data as well to prove that all the data has been gathered by individuals and not by large-scale copyright breach.
Note that this is not only a proof that something has been entered in the database. It is even more a proof that something that never appeared in the database has never been entered. By the way, we lose that proof if we need to tamper with history. It is a shame for our society if copyright law requires us to do so.
Orwell had explained to us where this slippy road leads to. The Minithruth's ability to rewrite history is crucial for that dictatorship to keep control of the humans. So do not forget that the ability to trust our records is one crucial thing that protects us. It is again about trust.
The other key factor of that dictatorship is to exploit the human's fear. That dictatorship is able to fabricate any accusation they want because they know enough bits about anyone. That brings us back to the issue of privacy. There is no point in listing and denouncing scenarios. It is more about the unknown unknown than about the known unknown.
It is very similar to discrimination. Discrimination is when you do not know whether it is bad fate or intentionally bad treatment. Or terrorism: No terror group would ever be capable to kill or injure a significant porion of all citizens. But nonetheless, any of us may fear to fall victim to a terror attack because we do not know when and where it is dangerous.
So yes, privacy is again all about trust. Like OSM.
This brings us a different view: A good privacy design should be straightforward once we know how to maximize the total trust in OpenStreetMap.
As we are always short of mappers, their trust is key to the project's success. Mappers that have a long history of editing build trust in the sustainability of the project. Mappers with diverse background build more easily trust with their fellows. Hence, both must be kept in mind.
I would like to present three approaches that have been discussed in the German community and check them against the trust criterion.
One idea presented was to somehow made timestamps less precise after a long time. The hope had been to retain the ability to overcome vandalism while enabling long time privacy for mappers.
From a trust point of view, it is a non-solution: The trust in the integrity of our data is lost, because you cannot properly reconstruct the database's history. And no trust in privacy can be deserved because the mis-use of data reasonably could have already taken place.
The details of a technical implementation are even worse. Hence we better abandon that idea.
Another idea mentioned by Imagico has been to allow for sub-accounts. This means that either automatically or by the users choice you could get a new user id. In database speak, any single user could have many different roles. The OSMF would still have the association between the editing roles. But you could have a full planet file with roles data without being able to trace back edits to users unless they explicitly want you to.
In principle, this would be possible right now if you create enough email addresses to have many accounts. And it is even encouraged by the community to separate distinct duties like a revert account from a business account or a personal account. We essentially change that practice from tolerated to encouraged.
This does not harm trust: The integrity of the database is preserved. It may slightly degrade if much users would try to get a role per changeset. It may even improve if users do sort edits by their intent or context.
To get that in context just have a look at these two changesets: They would be a typical candidate to go to a spearate sub-account. Either they are valuable on their own regardless which OSM aware visitor has made them. Or they should be dismissed based on their content, not the user's reputation. On the other hand, being bound to my account like at the moment they leak the information that I have been recently to Liège.
The approach earns more trust from the mappers because there is an easy-to-control tool to minimize the privacy footprint. And they do not interfere with reputational trust because mappers can keep an unlimited collection of their edits if they want to.
The most disucssed approach was to make privacy related data only accessible after OSM login. The idea is to set up a contractual obligation to use this data only to maintain the OSM data and to prove its integrity. The login then serves as a clear reminder that you have privacy related data in your fingers.
To clarify: Changeset ids, timestamps, version numbers and so on are quite clearly not privacy related. Affected by this limitation would be user ids and user names.
This again does increase mapper's trust: They are at least protected by legislation. There is still plenty room for unknown privacy violations, but the legal protection comes at no price. It does not interfere with reputational trust because the data is fully available within the project. Also, it does not interfere with the database integrity for the same reason.
As Sub Accounts and Contractual Protection both produce a gain in trust we should pursue both ideas further. Even more, they can be combined.