Res Ipsa Loquitur
From the abstract of
Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata.
I heard the NPR segment on this — and they made it sound so much more ominous and worrisome than does the abstract, which shows, by the nature of the data itself, how structured and specific it is. Not saying this isn’t dismaying – but as a would-be data science type, I can think of so many more interesting datasets to struggle with in which anonymized data would be useful, and presumably less easily reverse-engineered.
it was with the promise –or was it a threat? — of coming back to discuss matters honestly. To jar your memory, here is the screenshot of the counts of record types:
So, going row by row:
polysemic, as it were. and the logo alone is so soothing to look at
so anyway, the wonderful people (i just hate that word “folks”) over at +mulesoft have released this piece of goodness into the world.
the interface is lovely to look at.
nicest of all, the field select tool enables the user to select fields from first-level relationships among the Salesforce.com tables. so you can get, for example, the RecordTypeName instead of making do with the 15 digits of the RecordTypeID thus, the dataloader (or extractor: in any case, the person doing it, not the tool they’re using) is assured of an export that is human readable for a spot check before committing.
i’m getting ready to populate my new mailchimp account list so as to test the Salesforce.com-mailchimp connector. and Dataloader lets me set up a connection to DropBox and even schedule tasks. so ApexDataloader, so long: no love lost.
hello, mailchimp. so long vertical response.
i’l be doing two exports of one of the Contact Emails and Name fields — hard to populate MailChiimp without those! but the second will be the Vertical Response logs for developing an engagement segmentation histogram so that I can load separate lists into MailChimp: of those who historically have had a tendency to respond and click through — and those whom i spent money with vertical response to reach, but who never opened. the economics, with mailchimp, is better as well: free for lists under 2000
More documentation coming, and soon