Anonymous data isn't real!

Data Services Business Unit Director
Valtech

novembro 21, 2014

Data cannot be both useful and anonymous at the same time!

Picture this: you are in charge of the personal health data for every citizen of a nation, and you need to ensure that this data is both secure and useful in the collection of statistics used in policy decision making. Your challenge is to anonymise this data so that decisions are, while unbiased, made without breaching your citizens privacy. Difficult, you may think, but not impossible, right?

Wrong. Anonymous and useful data doesn’t exist. Any data set that claims to have relevance cannot be truly secure as the process of reverse engineering in conjunction with a secondary data source can reveal everything an individual would wish to know about that data set: the people behind it and their accompanying personal information.

Take the following simple dataset as an example:

Age

Position

Location

Ethnicity

Employer

41-45

Manager

London

Asian

Valtech

21-25

Executive

Bristol

White British

Valtech

26-30

Technical Consultant

Swansea

Asian

Contractor

31-35

S1 Consultant

London

Black

Valtech

So:

This data isn’t anonymous. While it doesn’t specifically name the individuals to which it refers, it would take only one stage of reverse engineering to establish who these people were.

The most common practice, then, is to make the data generic by replacing the specific information with numbers, scrambling the specifics so that without a key you cannot establish which specifics are which. It was suggested at a large and recent data conference that this is a simple way to make your data anonymous, but unfortunately it also makes it useless.

For this data to be at all useful, you need to have at least two of these columns show the actual data as opposed to a numeric cover. If you de-anonymise two, then extrapolation of the the remaining data fields is a far more simple process that requires only a cursory investigation of a secondary data source like “Acorn” to fill in the blanks.

So much data is now publicly available that truly anonymised and useful data doesn’t exist.

Contate-nos