Privacy in the context of AI predominantly focuses around our personal information, who has access to it, how it's used, and to borrow a quote from American lawyer Louis Brandeis, our 'right to be let alone'. For a longer read, take a look at Wired's "The Next Big Privacy Hurdle? Teaching AI to Forget".
In terms of how major tech and media platforms handle privacy consent and data handling, The New York Times has an excellent (and worrying) interactive article below:
Personally Identifiable Information or PII ranges from the obvious (e.g. name, contact details, address, phone number) to information we may not typically think as being able to identify us (e.g. browsing history, device location, purchase history). How data collected from an individual may be used depends on the jurisdiction in which you (and/or the company collecting the data) reside. ZDNet has a short read on PII with a US lens below:
Given the sensitive nature of PII and regulations under which it must be protected, people working with data may need to ensure information about an individual is obfuscated in their datasets. A really simple definition of Pseudonymization and Anonymization (and how these data obfuscation methods relate to GDPR) is provided by enterprise data security experts Protegrity:
"Pseudonymization is a method to substitute identifiable data with a reversible, consistent value. Anonymization is the destruction of the identifiable data."
Differential privacy provides a way for aggregate data to be shared without compromising the privacy of the individuals on which that data is based. Further to that, differential privacy is a "strong, mathematical definition of privacy in the context of statistical and machine learning analysis", originally invented by Cynthia Dwork. For further understanding it may help to watch the following video by the USA National Institute of Standards and Technology (NIST):
Protected attributes or classes are typically characteristics of a person that are protected from discrimination under various acts and legislation in each country or jurisdiction. Examples are race, religion, sex, sexual orientation, age, and disability. The IBM Trusted AI research group has a series of tools for identifying then reducing bias and discrimination in machine learning models. You can try out a demo of their AI Fairness 360 toolkit via the link below:
Passive listening, typically enabled in smart assistants, is an area that while enabling innovative services, provides ample opportunity for misuse and unethical practices. Smart assistants like Amazon's Alexa, Google Home, and similar devices are all collecting swathes of data as they wait for you to utter the right voice command. Global law firm Dentons shares an overview of the Italian data protection authority's recommendations pertaining to privacy and the use of smart assistants:
Opt in/Opt out refers to the method by which an individual agrees to data being collected about them (including their actions and behaviours particularly in an online setting). It depends on the jurisdiction of the user and/or the service they're accessing whether opt in (ask beforehand) or opt out (don't ask, start collecting) is legally required. GDPR for instance is very much in the opt in camp. Brian Barrett has this to say in an article on Wired:
"It’s a simple problem to explain. An “opt out” paradigm means that data collection happens automatically, and you have to actively seek out ways to stop it. Under “opt in,” you must affirmatively grant a company the right to access that data before it can do so. You’re in control from the start."
Facial recognition and its applications in police and state surveillance has become a hot topic in recent times. John Oliver's segment below provides a good introduction and some of the dangers inherent in its use:
De-identification is a term that can be used synonymously with anonymization. The goal is to remove any attributes that could identify an individual. Johns Hopkins provides a set of steps to de-identify data here, and the Finnish Social Science Data Archive provides a further definition in their Data Management Guidelines below:
Privacy by Design is a set of design principles introduced by Canadian Dr. Ann Cavoukian the former Information and Privacy Commissioner of Ontario. FutureLearn covers the 7 design principles of Privacy by Design in a module of their Understanding the GDPR course below:
"Data ethics is about responsible and sustainable use of data. It is about doing the right thing for people and society. Data processes should be designed as sustainable solutions benefitting first and foremost humans."
The Deloitte Insights team has a quick read on data ethics 'in the age of big data' and 4 of the biggest issues driving discussion in this space: