Domain 1.0 Fundamental Concepts for Data-Driven Technology Ethics
Objective 1.1 Identify and describe common terminology or concepts important to data-driven technology ethics
AI & Data Science Concepts
Let's get started with Narrow vs. General AI. Here's what Forbes' Kathleen Walch has to say (for further reading, see her article linked below):
The general AI ecosystem classifies these AI efforts into two major buckets: weak (narrow) AI that is focused on one particular problem or task domain, and strong (general) AI that focuses on building intelligence that can handle any task or problem in any domain.
Superintelligence takes us one step further and is loosely defined as any AI that exceeds the intelligence of humans on the majority of tasks. Nick Bostrom's book 'Superintelligence' is a fascinating read on this topic:
Ambient intelligence (AmI) is essentially the convergence of the Internet of Things (IoT) and AI to create environments we can interact with, and that adapt automatically to our needs. Envisaged by Philips over 20 years ago, we're now seeing a plethora of assistive technologies in our work and personal lives.
Black box decision making concerns AI systems that provide inadequate explanations for their outputs. Systems that are not explainable pose risks to businesses such as biased or erroneous decisions leading to litigation for wrongful decisions, lack of trust in the system by consumers, and well documented PR nightmares to name a few.
Model training typically involves applying a variety of machine learning algorithms to a set of training data, and then testing the performance of the model on a set of test data held over from the original dataset. Data scientists experiment to find the most performant model for the type of problem they're trying to solve. To get some hands on experience training some machine learning models, try out the Microsoft learning path below:
One way of understanding a typical data science pipeline is through the acronym OSEMN (Obtain, Scrub, Explore, Model, iNterpret) - covered in detail by Dr. Cher Han Lau below:
Another acronym SEMMA (Sample, Explore, Modify, Model, and Assess) was coined by the enterprise analytics software company SAS. Some data scientists like to add additional steps such as Documentation (for repeatability) and Deployment - ensuring that the model is successful in production code.
A ground truth dataset could be collected from an accurate source (e.g. sensors deployed in a real world location) as a reality check for models, hand labelled by humans, or, the labelling determined by the decision makers in a project. It is an attempt to reflect as accurately as possible the real world data. Cassie Kozyrkov warns of some of the pitfalls due to subjectivity in labelling processes in the article below:
Read on in Part 2 - Legal-Related Concepts: