Glossary

This glossary is a list of definitions of terms and concepts relevant to the recops project, a critical investigation for the analysis and critique of facial recognition technologies. Some of these terms may be familiar to you, others you may not have come across before. In fact, many of these terms are entire fields of study on their own. The intention isn’t to give a comprehensive definition, but rather to pique your interest to explore further.

Below are brief definitions of terms and concepts in alphabetical order. (NOTE: not all letters of the alphabet are listed if they do not contain relevant enough terms.) Contents:

A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z

A

algorithm: in computer science, a traditional algorithm is a precise set of rules or instructions that tell a computer how to solve a problem. With a traditional algorithm, to complete the task the computer must follow the steps in the order they’re laid out by the programmer. By contrast, a machine learning (ML) algorithm creates the rules or ideal steps itself, by experimenting with the task and learning as it goes.
artificial intelligence (AI): artificial intelligence or AI for short. Any intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. In computer science, AI research is defined as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term artificial intelligence is applied when a machine mimics cognitive functions that humans associate with other human minds, such as learning and problem solving.
accuracy: accuracy expresses how precise the machine learning (ML) model is based on the percentage of correct predictions. The number of correct predictions divided by the number of total predictions allows calculating the accuracy of the model.
active learning: active learning represents a training algorithm that interactively queries the information source to label fresh data points with the intended outputs. The active learning algorithm randomly chooses the data points it learns from, which is particularly valuable when labeled examples are scarce or expensive to obtain.
annotation: the process of labeling the input data in preparation for AI training. In computer vision, the input images and video must be annotated according to the task you want the AI model to perform. For example, if you want the model to perform image segmentation, the annotations must include the location and shape of each object in the image.
annotation format: the particular way of encoding an annotation. There are many ways to describe a bounding box’s size and position (JSON, XML, TXT, etc) and to delineate which annotation goes with which image.
annotation group: describes what types of object you are identifying. For example, “Chess Pieces” or “Vehicles”. Classes (eg “rook”, “pawn”) are members of an annotation group.
automation bias: when a human decision maker favors recommendations made by an automated decision-making system over information made without automation, even when the automated decision-making system makes errors.
attribute: synonym for feature. In fairness, attributes often refer to characteristics pertaining to individuals.

B

biometrics: a measurable physical characteristic or personal behavioral trait used to recognize the identity, or verify the claimed identity, of an applicant. Facial images, fingerprints, and iris scan samples are all examples of biometrics.
bias: bias is stereotyping, prejudice, or preference towards particular items (data points) over others. In machine learning (ML), bias is considered a systematic error that occurs in the training set or machine learning (ML) model when the algorithm outcome is distorted in favor of or against a certain idea. Bias impacts data collection and interpretation, system design, and users’ engagement with a system.
benchmark dataset: benchmark dataset is a test dataset. But typically refers to dataset shared with other researchers or organizations for the purpose of comparing algorithmic performance.
bounding box: a bounding box is a rectangular area around an object on a digital image, that’s typically described by its (x, y) coordinates around an area of interest.

C

convolutional neural networks (CNN, or ConvNet): are by far the most popular neural networks for computer vision (CV) and image analysis tasks due to their ability to extract features and detect patterns via hidden convolutional layers within the network.
classification: classification is a supervised learning technique that aims to categorize the target variables. For instance, detecting whether an email is a spam or not is a classification task. It is also called a binary classification since the target variable has only two possible values, spam or not. If the target variable contains more than two values (i.e., classes), it is known as multi-class classification.
classification model: a type of model that distinguishes among two or more discrete classes. For example, a natural language processing classification model could determine whether an input sentence was in French, Spanish, or Italian.
clustering: clustering allows for grouping data points based on their similarity in one cluster. Unlike classification, the data points in clustering do not have labels, hence it’s an unsupervised learning technique.
computer vision (CV): computer vision or CV for short – is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information.
class: a class is a label that gives information about the instance. To illustrate, when annotating images of apples, we would annotate the apples and assign each of them to the class “Apple”.
confirmation bias: the tendency to search for, interpret, favor, and recall information in a way that confirms one’s preexisting beliefs or hypotheses. Machine learning (ML) developers may inadvertently collect or label data in ways that influence an outcome supporting their existing beliefs. Confirmation bias is a form of implicit bias.
central processing unit (CPU): the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output operations specified by the instructions.

D

deep learning: deep learning is a subset of machine learning (ML), inspired by how the human brain works. Deep Learning relies on algorithms by which a machine/computer can teach itself how to do something based on ‘looking at’ and learning about a huge dataset of examples.
deep convolution neural network (DCNN): refers to using convolutional network layers to learn visual features. A convolution is an image (data matrix) operation that convolves (combines) nearby visual information using a transformation function. For example, using and edge filter in a graphics editing program is a convolution matrix. Another example would be an unsharp mask, as shown in this interactive convolutional demo. Another helpful example is https://setosa.io/ev/image-kernels/. A DCNN uses multiple layers of convolutions to understand visual features and concepts within images.
data: information of any kind. It could be images, text, sound, or tabular.
data set or dataset: dataset refers to a collection of images with associated metadata used for training, validating, testing a computer vision (CV) algorithm. Typically the dataset comprises a compressed ZIP file with folders of JPEGs and JSON or XML metadata text files. The dataset of images is often divided into 3 subsets called training, validation, and test.
data exploration: is a critical step in artificial intelligence (AI) and machine learning (ML). With data exploration, analysts attempt to find patterns and details in large pools of data. Data exploration uses a mix of different manual and automated techniques and processes. Its function is not to sort all the data, but rather look specifically for the broad picture strokes that are evident within the data.
deep model: a type of neural network containing multiple hidden layers.
data analysis: obtaining an understanding of data by considering samples, measurement, and visualization. Data analysis can be particularly useful when a dataset is first received, before one builds the first model. It is also crucial in understanding experiments and debugging problems with the system.

E

ethnicity estimation: ethnicity estimation refers to classifying individuals based on perceived ethnic and or racial groups. Ethnicity and racial classifications systems are rife with subjectivity, often having ethnicity and racial labels applied by crowdsourced annotation workers who are never able to know the ground truth and only provide a subjective classification.
epoch: in the context of training deep Learning (DL) models, one pass of the full training data set.
embeddings: a categorical feature represented as a continuous-valued feature. Typically, an embedding is a translation of a high-dimensional vector into a low-dimensional space.

F

face recognition: face recognition refers to a system of algorithms that compares the similarity of two face images and provides a similarity score based on the distance between two face vectors. A face vector is a high-dimensional representation of face descriptors used to describe the features of a face that make it separable from other faces. The face vector is unique to the network, not the face.
face detection: face detection is a type of object detection that detects a single object class (a face). It is important to understand that detection and recognition are entirely separate algorithms. A face recognition system is a software application that uses face detection to locate a face followed by face-alignment to normalize the face position, then runs the cropped and aligned face chip through a “face recognition” network to compute its feature embedding.
facial recognition system: a system that uses facial recognition software.
face landmarks: face landmarks refer to predefined facial positions, for example the left corner of the left eye. Face landmarking algorithms compute biometric information, though it is primary not used to identify an individual, rather to perform face alignment prior to face recognition.
face embeddings: an array of floating numbers that represent the values of facial features. Face embeddings can be thought of as face adjectives. Typically face-embeddings are between 128-4096 “adjectives” long. A longer number does necessarily correlate to higher performance.
facial recognition software: software used to compare the visible physical structure of an individual’s face with a stored facial template.
face alignment: face alignment is a computer vision (CV) technology for identifying the geometric structure of human faces in digital images. Given the location and size of a face, it automatically determines the shape of the face components such as eyes and nose. A face alignment program typically operates by iteratively adjusting a deformable models, which encodes the prior knowledge of face shape or appearance, to take into account the low-level image evidences and find the face that is present in the image.
feature: an input variable used in making predictions.
faceprinting: a fundamental step in the process of face recognition, faceprinting is the automated analysis and translation of visible characteristics of a face into a unique mathematical representation of that face. Both collection and storage of this information raise privacy and safety concerns.
face matching: any comparison of two or more faceprints. This includes face identification, face verification, face clustering, and face tracking.
face identification: compares (i) a single faceprint of an unknown person to (ii) a set of faceprints of known people. The goal is to identify the unknown person. Face identification may yield multiple results, sometimes with a “confidence” indicator showing how likely the system determines the returned image matches the unknown image.
face verification: compares (i) a single faceprint of a person seeking verification of their authorization to (ii) one or more faceprints of authorized individuals. The verified person might or might not be identified as a specific person; a system may verify that two faceprints belong to the same person without knowing who that person is. Face verification may be used to unlock a phone or to authorize a purchase.
facial template: a digital representation of distinct characteristics of a Subject’s face, representing information extracted from a photograph using a facial recognition algorithm.
facial image: a photograph or video frame or other image that shows the visible physical structure of an individual’s face.
face clustering: compares all the faceprints in a collection of images to one another, in order to group the images containing a particular person or group of people. The clustered people might or might not then be identified as known individuals. For example, each of the people in a library of digital photos (whether a personal album or a police array of everyone at a protest) could have their various pictures automatically clustered into a discrete set.
face tracking: uses faceprints to follow the movements of a particular person through a physical space covered by one or more surveillance cameras, such as the interior of a store or the exterior sidewalks in a city’s downtown. The tracked person might or might not be identified. The tracking might be real-time or based on historical footage.
face analysis, also known as face inference: any processing of a faceprint, without comparison to another individual’s faceprint, to learn something about the person from whom the faceprint was extracted. Face analysis by itself will not identify or verify a person. Some face analysis purports to draw inferences about a person’s demographics (such as race or gender), emotional or mental state (such as anger), behavioral characteristics, and even criminality.
facial recognition data: data derived from the application of facial recognition software, including facial template and associated metadata.

G

graphics processing unit (GPU): a specialized hardware device used in computers, smartphones, and embedded systems originally built for real-time computer graphics rendering. However, the ability of GPUs to efficiently process many inputs in parallel has made them useful for a wide range of applications—including training AI models.

H

hash: the result of a mathematical function known as a “hash function” that converts arbitrary data into a unique (or nearly unique) numerical output. In facial authentication, for example, a complex hash function encodes the identifying characteristics of a user’s face and returns a numerical result. When a user attempts to access the system, their face is rehashed and compared with existing hashes to verify their identity.

I

image quality control: the use of AI and machine learning (ML) to perform automatic quality control on visual data, such as images and videos. For example, image quality control tools can detect image defects such as blurriness, nudity, deepfakes, and banned content, and correct the issue or delete the image from the dataset.
image recognition: a subfield of AI and computer vision (CV) that seeks to recognize the contents of an image by describing them at a high level. For example, a trained image recognition model might be able to distinguish between images of dogs and images of cats. Image recognition is contrasted with image segmentation, which seeks to divide an image into multiple parts (e.g. the background and different objects).
image segmentation: a subfield of computer vision (CV) that seeks to divide an image into contiguous parts by associating each pixel with a certain category, such as the background or a foreground object.
implicit bias: automatically making an association or assumption based on one’s mental models and memories. Implicit bias can affect the following: How data is collected and classified. How machine learning (ML) systems are designed and developed.

J

json response: a response to an API request that uses the popular and lightweight JSON (JavaScript Open Notation) file format. A JSON response consists of a top-level array that contains one or more key-value pairs (e.g. { “name”: “John Smith”, “age”: 30 }).

L

labeling: the process of assigning a label that provides the correct context for each input in the training dataset, or the “answer” that you would like the AI model to return during training. In computer vision (CV), there are two types of labeling: annotation and tagging. Labeling can be performed in-house or through outsourcing or crowdsourcing services.
liveness detection: a security feature for facial authentication systems to verify that a given image or video represents a live, authentic person, and not an attempt to fraudulently bypass the system (e.g. by wearing a mask of a person’s likeness, or by displaying a sleeping person’s face).

M

machine learning (ML): machine learning or ML for short – is a subset of AI. A program or system that builds (trains) a predictive model from input data. The system uses the learned model to make useful predictions from new (neverbefore-seen) data drawn from the same distribution as the one used to train the model. Machine learning (ML) also refers to the field of study concerned with these programs or systems.
metadata: data that describes and provides information about other data. For visual data such as images and videos, metadata consists of three categories: technical (e.g. the camera type and settings), descriptive (e.g. the author, date of creation, title, contents, and keywords), and administrative (e.g. contact information and copyright).
model: the representation of what a machine learning (ML) system has learned from the training data.

N

neural network: a neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. In this sense, neural networks refer to systems of neurons, either organic or artificial in nature.
neural network model: model refers to the trained neural network file. This can be a single file or multiple files that defined the parameters of the final neural network. Often these files are several hundred megabytes in size while smaller, optimized versions can be only a few megabytes.
noise: mislabeled data points, misrecorded or omitted feature values are all examples of noise. Essentially, anything that interferes with a clean and consistent dataset is considered noise.

P

pre-trained model: an AI model that has already been trained on a set of input training data. Given an input, a pre-trained model can rapidly return its prediction on that input, without needing to train the model again. Pre-trained models can also be used for transfer learning, i.e. applying knowledge to a different but similar problem (for example, from recognizing car manufacturers to truck manufacturers).
polygon: this one is a usually non-rectangular shape outlining the object of interest and allowing more precision than a regular bounding box.
prediction: a model’s output when provided with an input example.
precision: the number of correct positive results divided by the number of all positive results returned by a classifier.
prediction bias: a value indicating how far apart the average of predictions is from the average of labels in the dataset. Not to be confused with the bias term in machine learning models or with bias in ethics and fairness.

R

recurrent neural network (RNN): a special type of neural network that uses the output of the previous step as the input to the current step. RNNs are best suited for sequential and time-based data such as text and speech.
reinforcement learning: reinforcement learning is a type of algorithm that continuously mines feedback from previous iterations, learns on trial and error, and is led by the action-reward principle. In games, reinforcement learning algorithms are often used to analyze historical data and discover sequences that eventually lead either to victory or defeat.
regression: regression is a supervised learning approach with continuous target variables. In regression tasks, we evaluate the performance of machine learning (ML) algorithms based on how close the predicted values are to the actual values.

S

supervised learning: supervised learning is an approach to creating AI, where a computer algorithm is trained on input data that has been labeled for a particular output. In other words, there is a “supervisor”, e.g., data annotator, who labels the training data points for future deployment.
scraping images: “scraping” typically refers to obtaining images through technical methods not explicitly provided by a website. Using custom software to parse a website’s HTML and download the images, or using a web-browser emulator render a page and record visual elements from a virtual webpage render can be considered scraping. Downloading images through an interface or API (Application Programming Interface) provided by the website is not typically considered scraping. For example obtaining images from search engine results would typically be considered scraping while obtaining images through the Flickr API would be considered downloading. However, when scripts or custom software is used to rotate API keys, IP addresses, and user-agents to avoid rate-limiting then it could be considered “scraping”.
structured data: data that adheres to a known, predefined schema, making it easier to query and analyze.
similarity thresholding: the process of converting a numerical similarity score measured between two face templates into a match or no-match determination. This typically involves a single static similarity threshold, such that any similarity score lower than the threshold is determined to be a no-match, and any similarity score greater than the threshold is determined to be a match.

T

training dataset: the portion of the dataset used to train an algorithm, during which a neural network learns weights and features that are later encoded into a model.
test dataset: a portion of a dataset used to evaluate the algorithm’s performance. Often the test dataset is approximately 20% split of the full dataset, but it can be a completely separate standalone dataset. In this case it would be referred to as a benchmark dataset.
tagging: the process of labeling the input data with a single tag in preparation for AI training. Tagging is similar to annotation, but uses only a single label for each piece of input data. For example, if you want to perform image recognition for different dog breeds, your tags may be “golden retriever”, “bulldog”, etc.
transfer learning: a machine learning (ML) technique that reuses a model trained for one problem on a different but related problem, shortening the training process. For example, transfer learning could apply a model trained to recognize car makes and models to identify trucks instead.
threshold: a user setting for facial recognition systems for authentication, verification or identification. The acceptance or rejection of a facial template match is dependent on the match score falling above or below the threshold. The threshold is adjustable within the facial recognition system.

U

unsupervised learning: contrary to supervised learning, unsupervised learning does not involve human-suggested labels. It discovers the underlying structure or patterns among the data points by means of finding similarities or differences in information and clustering it.
unstructured data: data that does not adhere to a predefined schema, making it more flexible but harder to analyze. Examples of unstructured data include text, images, and videos.

V

validation dataset: a portion of a dataset used to validate the training process. After each epoch the validation data is used to help determine if the training progress is moving in the right direction. Often the validation dataset is approximately 20% split of the full dataset with no overlap.