New research from the University of Cambridge shows that information disclosed on Facebook profiles can be used to accurately predict a range of undisclosed sensitive personal data, including sexual orientation, ethnicity, religious and political views, age, and gender.
The research, which involved a study of the Facebook “Likes” of about 58,000 volunteers, found that highly sensitive information could be deduced from those “Likes” with a very high degree of accuracy. For example, the study correctly predicted male sexuality in 88% of cases, ethnicity in 95% of cases, and political views in 85% of cases. The report’s authors admitted using relatively simple methods to make their predictions but emphasised that there was great scope for improvement both in terms of models and data sets used.
This study is significant on a number of grounds. First, it reminds us of the ease with which it is possible for individuals to inadvertently disclose sensitive information about themselves. Second, it highlights the risks of organisations drawing conclusions and/or making decisions about the individuals in question based on the predictions made, some of which could be adverse to the individual. Third, the report also gives credence to the ongoing debate about whether data can ever be truly anonymous, especially as the techniques used in the study are likely to be used by most organisations trying to analyse and monetise big data sets, whether it be for behavioural profiling to assist online behavioural advertising, or strategies related to product development.
A key question is whether the study will impact the deliberations related to the proposed EU General Data Protection Regulation. Most data protection regimes, including in the EU, currently have stricter rules when ‘sensitive data,’ such as sexuality or religious beliefs, are involved. The draft Regulation contains clauses related to the profiling of individuals, as well as to anonymous data. The current draft Regulation would permit individuals to object generally to any profiling of themselves via big data sets or when done for direct marketing purposes, and retains the rule in the Directive on the re-identification of anonymous data. It remains to be seen how the relevant authorities will treat big sets of data which do not explicitly include, but which can be used to accurately predict, information covered by such specific regimes. It is possible that the trend for ‘big data’ will result in additional businesses becoming subject to more scrutiny by the regulatory authorities and more stringent privacy requirements.