Unique in the shopping mall: On the reidentifiability of credit card metadata
Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata.
I heard the NPR segment on this — and they made it sound so much more ominous and worrisome than does the abstract, which shows, by the nature of the data itself, how structured and specific it is. Not saying this isn’t dismaying – but as a would-be data science type, I can think of so many more interesting datasets to struggle with in which anonymized data would be useful, and presumably less easily reverse-engineered.
So, I had been regaling you the way the person, me:
Darren Scott Kowitt, a new yorker by birth but a washingtonian by circumstance got to new york first by way of new haven (isn’t that coy!) and in new york I studied marketing at Columbia Business School before coming to DC for love
Now Salesforce.com is a robust platform. It likely could accomodate stories like that without too much difficult int the hands of the appropriate database & platform administrator. but all that broadband personhood i suggestively sketched above — well, it has to be shoe-horned into an unmovable and unforgiving fact about Salesforce.com.
it’s staring you in the face with the name: it’s got sales in its DNA. and sales means nothing without [ACCOUNTS].. Where there are [ACCOUNTS] there may be people. so in terms of the Salesforce.com instance for Columbia Business School Alumni of MetroDC , this is how it plays out concretely:
I, Darren Kowitt, who graduated from Columbia in 1997, I Darren am represented in the database when first loaded/created as:
[ACCOUNT]=MBA’97: Kowitt, Darren
[ACCOUNT].[CONTACT] = Mr. Darren S. Kowitt, residing at…born on…
with as much detail as External Relations in New York cared to provide me with
note how the [ACCOUNT] record is functioning/quacking like an Alum Household — to which a subsidiary partner or spouse might be easily attached — and, provided the coding schema is rigorously applied, and attached in such a way as to be acknowledged and included where appropriate — but not gratuitously and carelessly. this is not an accident. It’s important to remember that coding is a choice of how to represent reality (in all its potential complexity) in the database.
running an ivy league alumni club in a major metro is stimulating and at its best, fun. but it takes a lot of partnering. the people i deal with, wheedling, cajoling, begging, borrowing — oh and let’s not forget: drinking with, conversing expansively with, but also receiving upon their arrival at Union Station,. these personages, some august, others less so — well there’s no easy way to categorize them all: so here’s an attempt at conveying to you the breadth of their diversity:
Alumni (they’re easy: that MBA’YY: LName, FNamesees to that)
Spouses do show up, but not as often as you might think
Professors visit from new york
Admissions stages its dog & pony show each fall
Applicants paw at us, rending our garments in their over-eager enthusiasm to simply put their nose up against the iorn fence. But I digress
the Columbia Business School Alumni Board of course has much traffic with the board of the Columbia University Club of Washington, DC (and in truth, my alums are truly, fully Columbian in both senses)
but we don’t always keep in the family: sometimes we even collaborate on programming with those people from Cambridge, Philadelphia, Chicago, and Palo Alto (the dirty secret of competitive MBA populations is: in the long run, excellence converges)
i’d be remiss in not mentioning congress members, their staff – and the economics/policy beat journalists we run with
and then there is a wonderful organization called CompassDC whose purpose is to harness all that pan-ivy-league intellectual firepower towards helping regional non-profits change for the better and indeed even thrive. every fall they recruit pro bono volunteers for a 7 month collaborative consulting project.
Some of the relationships are essentially/practically permanent: for better or worse, I shall remain myself until I die. thus MBA’97: Kowitt, Darren is quite strongly tied to Mr. Darren S. Kowitt. not all relationships are quite so durable, however. and this is not merely truth spoken from a broken heart. an Alum Club Board has terms of office — or at least in theory good governance principles somehow suggest that, even if such evasions of the unspoken still amount to quite the opposite of permanence, and those Compass projects are by their nature fixed in term.
So the data I want to capture in my CRM system is, to be polite, heterogeneous. And thus every time I set about creating a new Alum in the database, am I faced with this choice
–somewhat paralyzing to the unititiated, I’m afraid. and i’ll leave you to mull over that for now.