Keynote 2: Deborah Hellman
Deborah Hellman of UVA Law starts the day off with a keynote on justice and fairness. She opens with a quote from Sidney Morgenbesser about what is unfair and what is unjust, asking if fairness is about treating everyone the same. She follows with a quote from Anatole France — “In its majestic equality, the law forbids rich and poor alike to sleep under bridges, beg in the streets and steal loaves of bread.” In practice, policies that formally treat everyone the same affect people in different ways.
Hypothesis 1: Treat like cases alike.
This hypothesis relies on choosing a proxy by which to classify people and decide how to treat them differently. That is, if treating everyone the same is unfair because of the situations they’re in lead to different outcomes, classify them into different cases based on their situations, and treat each case separately. This hypothesis seems to fall apart based on how the classifications are made and the intentions of those classifications in search of certain outcomes. This leads to the next hypothesis…
Hypothesis 2: It’s the thought that counts.
These traits are usually adopted for bad reasons. The classifications are made to impose differing treatments with moral decisions that are misguided or unjust. For example, an employer may avoid hiring women between the ages of 25 and 40 to avoid having to pay women who may have children to take care of. The goal is not to avoid employing women, but to increase productivity. The intent behind the classification is itself misguided or flawed.
Hypothesis 3: “Anti-Classification”
The use of classifications, in particular classifications based on certain traits e.g. race, gender, can lead to unintended effects and denigration.
Hypothesis 4: Bad Effects
Certain classifications themselves can compound injustice — for example, charging higher life insurance rates to battered women.
Hypothesis 5: Expressing Denigration
For example, saying “All teengaers must sit in the back of the bus” vs. “All blacks must sit in the back of the bus” express different ideas. Regardless of the intention, there is denigration inherent in the classification. She cites Justice Harlan’s dissent in Plessy v. Ferguson.
Discussion: Cynthia Dwork
Session 3: Fairness in Computer Vision and NLP
Joy Buolamwini gives a talk on her now infamous paper on the poor performance of facial analysis technologies on non-white, non-male faces. She uses a more diverse dataset to benchmark various APIs. After reporting the poor performance to various companies, some actually improved their models to account for the underrepresented classes.
The Perpetual Line-Up
Taneea Agrawaal presents her analysis of gender stereotyping in Bollywood movies. The analysis was done with a database of Bollywood movies going back intio the 1940s, along with movie trailers from the last decade and a few released movie scripts. Syntax analysis is done to extract verbs related to males and females to study the actions associated with each. She argues that the stories told and representations expressed in movies affect society’s perception of itself and subsequent actions. For example, Eat Pray Love caused an increase in solo female travel, and Brave and Hunger Games caused a sharp increase in female participation in archery.
Natasha Duarte presents a talk focused on how NLP is being used to detect and flag content online for surveillance and law enforcement (for example, to detect and remove terrorist content from the internet). She argues that NLP tools are limited because they must be trained on domain-specific datasets to be effective in particular domains, and governments generally use pre-packaged solutions which are not designed for these domains. Manual human effort and language and context-specific work is necessary for any successful NLP system.
Session 4: Fair Classification
Bob Williamson presents his research which frames adding fairness to binary classification as imposing a constraint. There must be a cost to this constraint, and Williamson presents a mathematical approach to measuring that cost.
Nicole Immorlica shows that “training a separate classifier for each group (1) outperforms the optimal single classifier in both accuracy and fairness metrics, (2) and can be done in a black-box manner, thus leveraging existing code bases.” With the caveat that it “requires monotonic loss and access to sensitive attributes at classification time.”
A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions
Alexandra Chouldechova presents a case study in which a model was used to distill information about CPS cases to create risk scores to aid call center workers in case routing. She discusses some of the pitfalls of the model, and how improvements were made to address them along the way. She ends by emphasizing that this model is just one small black box which acts as one signal among many in a larger system of processes and decision-making.
Reuben Binns takes a mix of philosophy and computer science to nudge the debate around ML fairness from “textbook”/legal definitions of fairness to one that goes back to more philosophical roots. It follows a trend at the conference of focusing on the context in which models are used, the moral goals and decisions of the models, and a re-analysis of concepts of fairness that the rest of the field may consider standard.
Session 5: FAT Recommenders, Etc.
Carlos Scheidegger discusses a mathematical method, Polya Urns, that he’s used to discover feedback loops in PredPol. Such systems are based on a definition of fairness which states that areas with more crime should receive a higher allocation of police resources. He discusses the flaws of such methods and suggests some strategies to avoid these feedback loops.
All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness (code)
Michael Ekstrand asks: Who receives what benefits in our recommender systems?