Last year Transport for London (TfL) launched trial to gather “de-personalised WiFi connection data collected at 54 London Underground stations within Zones 1-4 to help improve the services it offers customers.”
But TechCrunch reports TfL has now turned down an FOI request asking for it to release the “full dataset of anonymized data for the London Underground Wifi Tracking Trial” — arguing that it can’t release the data as there is a risk of individuals being re-identified (and disclosing personal data would be a breach of UK data protection law).
“Although the MAC address data has been pseudonymised, personal data as defined under the [UK] Data Protection Act 1998 is data which relate to a living individual who can be identified from the data, or from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller,” TfL writes in the FOI response in which it refuses to release the dataset.
A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual’s patterns are unique enough, outside information can be used to link the data back to an individual. See below :
(A) Trace of an anonymized mobile phone user during a day.The dots represent the times and locations where the user made or received a call. Every time the user has such an interaction, the closest antenna that routes the call is recorded.
(B) The same user’s trace as recorded in a mobility database. The Voronoi lattice, represented by the grey lines, is an approximation of the antennas reception areas, the most precise location information available to us. The user’s interaction times are here recorded with a precision of one hour.
(C) The same individual’s trace when we lower the resolution of our dataset through spatial and temporal aggregation. Antennas are aggregated in clusters of size two, and their associated regions are merged. The user’s interaction is recorded with a precision of two hours. Such spatial and temporal aggregation render the 8:32 am and 9:15 am interactions indistinguishable.