Unique in the shopping mall: On the reidentifiability of credit card metadata


From the abstract of

Unique in the shopping mall: On the reidentifiability of credit card metadata

Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata.

I heard the NPR segment on this — and they made it sound so much more ominous and worrisome than does the abstract, which shows, by the nature of the data itself, how structured and specific it is.  Not saying this isn’t dismaying – but as a would-be data science type, I can think of so many more interesting datasets to struggle with in which anonymized data would be useful, and presumably less easily reverse-engineered.