This weekend, I am at the Conference on Fairness, Accountability, and Transparency (FAT*) at NYU. This conference has been around for a few years in various forms — previously as a workshop at larger ML conferences — but has really grown into its own force, attracting researchers and practitioners from computer science, the social sciences, and law/policy fields. I will do my best to document the most interesting bits and pieces from each session below.
Keynote 1: Latanya Sweeney
Sweeney has an amazing tech+policy background in this field — the work she did on de-anonymization of “anonymized” data lead to the creation of HIPAA. She has also done interesting work on Discrimination in Online Ad Delivery (article). She argues that technology in a sense dictates the laws we live by. Her work has centered around specific case studies that point out the algorithmic flaws of technologies that seem normal and benign in our daily lives. Technical approaches include an “Exclusivity Index”, which takes a probabalistic approach to defining behavior that is anomalous in particular sub-groups. Two noted examples of unintended consequences of algorithms are discriminatory pricing algorithms in Airbnb and the leaking of location data through Facebook Messenger.
In the subsequent discussion with Jason Schultz, the focus is on laws and regulation. She states that there are 2000+ US privacy laws, but because they are so fragmented, they are rendered completely ineffective in comparison to blanket EU privacy laws. The case is made that EU laws have teeth, and in practice may raise the data privacy bar for users all over the world. She also stresses the need for work across groups, including technologists, advocacy groups, and policy makers. She presents a bleak view of the current landscape, but also presents reasons to be optimistic.
Session 1: Online Discrimination and Privacy
Till Speicher presents a paper on the feasibility of various methods of using Facebook for discriminatory advertising. There are three methods presented:
- Attribute-based targeting, which lets advertisers select certain traits of an audience they wish to target. These attributes can be official ones tracked by Facebook (~1100), or “free-form” attributes such as a user’s Likes.
- PII-based targeting, which relies on public data such as voter records. Speicher takes NC voter records and is able to filter out certain groups by race, then re-upload the filtered voter data to create an audience.
- “Look-alike” targeting, which takes an audience created from either of the above methods and scales it automatically — discrimination scaling as a service!
These methods make it clear how Facebook’s ad platform could be used to target and manipulate large groups of people. Speicher suggests that the the best methods to mitigate such efforts may be based on the outcome of targeting (i.e. focusing on who is targeted, rather than how).
Amit Datta and Jael Makagon present this study on how advertising can be used for discriminatory advertising (e.g. to target a specific gender for a job adversiting). See past work here: Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination. Jael has a law background, and walks the audience through different anti-discrimination laws and which parties may be held responsible in different scenarios. He describes a mess of laws that don’t quite apply to any party in the discrimination scenarios. Amit describes cases where advertisers can play active rather than passive roles in discriminatory advertising, and Jael describes the legal implications that can result from that.
They ultimately call out a “mismatch between responsibility and capability” in the advertising world, and they propose policy and technology-based changes that may be effective in preventing such discrimination.
- Contextual integrity
Ekstrand argues that the tools we use to assess fairness of decision-making systems can be used to analyze privacy in systems. He raises three questions:
- Are technical or non-technical privacy protection schemes fair?
- When and how do privacy protection technologies or policies improve or impede the fairness of the systems they affect?
- When and how do technologies or policies aimed at improving fairness enhance or reduce the privacy protections of the people involved?
They mention an example where Muslim taxi drivers are outed in anonymized NYC TLC data, and where James Comey’s personal Twitter account was discovered using public data. They discuss the cost of guarantees of privacy for certain schemes and definitions of privacy, and how that affects “fairness” for different definitons of fairness.
- Fairness Through Awareness
- Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data
Session 2: Interpretability and Explainability
Andrew Selbst starts his talk asking why explainability is important, saying “what is inexplicable is unaccountable”. In his eyes, explainability brings a chain of decision-making that leads to accountability. He then explains some aspects of GDPR and asks if it contains an implicit “right to explanation” in some of its provisions. He cites current legal arguments that discuss whether or not such a right exists:
- Meaningful information and the right to explanation
- The Right Not to Be Subject to Automated Decisions Based on Profiling
- Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation
Notably, Selbst says that deep learning isn’t actually at risk of being banned, in particular becuase such a requirement is against completely automated systems, implying that deep learning systems are fine to use as long as they are just one factor in a larger explainable system with a human in the loop.
Richard Philips gives a talk on using LIME for active learning. By applying LIME to assess which features cause certainty in model classifications during active learning, their method can be used across populations to show if models are biased for or against certain subgroups.
Chelsea Barbaras argues that the debate around pre-trial risk assessment tools is shaped by old assumptions about the role risk assessment plays in these trials. Old risk-based systems considered factors that were drawn from social theories of criminal behavior at the time, that have since changed. They also focused on traits of the individual, which neglected to consider broader social factors in these cases. She also criticizes regression-based risk assessment in particular, due to the pitfalls of drawing conclusions from correlation vs. causation. She advocates for seeing risk not as a static thing to be predicted, but as a dynamic factor to be mitigated. She also discusses how we can use a causal framework of statistics and experiment design to ask better questions about risk assessment.
- Can we avoid reductionism in risk reduction?
- An Investigation of the Causal Association between Changes in Social Relationships and Changes in Substance Use and Criminal Offending During the Transition from Adolescence to Adulthood
Arving Narayanan gives a “survey of various definitions of fairness and the arguments behind them” which can act as “‘trolley problems’ for fairness in ML”.
Algorithmic decision making and the cost of fairness
Rather than maximizing accruacy, the goal should be about “how to make algorithmic systems support human values”.
- Group fairness — do outcomes systematically differ between demographic groups (or other population groups)?
- Fair prediction with disparate impact: A study of bias in recidivism prediction instruments
- “What do different stakehilders want of the binary classifier?”
- Decisionmaker: “Of those I’ve labeled high-risk, how many will recidivate?” — Predictive value AKA Precision — equalized under Predictive parity
- Defendant: “Whats the probability I’ll be incorrectly classified high-risk?” — False postive rate — equalized under Error rate balance
- Society [hiring vs. criminal justice]: “Is the selected set demographically balanced?” — Selection probability — equalized under Demographic parity
- Different metrics matter to different stakeholders — no “right” metric.
- Individual fairness — “equal thresholds” — generally impossible to pick a single threshold for all groups that equalizes both FPR and FNR
- Utility: Algorithmic decision making and the cost of fairness
- Between various measures of group fairness.
- Between group fairness and individual fairness.
- Between fairness and utility.
Tension between disparate treatment and disparate impact — finding creative case-by-case workarounds doesn’t “scale” for algorithmic decision making.
- In training vs. classification: Does mitigating ML’s disparate impact require disparate treatment?
- Ineffectiveness of “blindness” — Equality of Opportunity in Supervised Learning
- Bias is “just” a side effect of maximizing accuracy
- ML is great a picking up on proxies in data.
- Unacknowledged affirmative action:
- Measurement bias, historical prejudice
- What is the problem to which fair machine learning is the solution?
- Demographic parity assumes no intrinsic differences:
- Individual fairness: “Similar individuals should be treated similarly” — Fairness Through Awareness
- Process fairness: The Case for Process Fairness in Learning: Feature Selection for Fair Decision Making
- Diversity: Diversity in Big Data: A Review
- Stereotype mirroring and exaggeration: Unequal Representation and Gender Stereotypes in Image Search Results for Occupations
- To what extent should ML models reflect societal stereotypes? Default view in tech world is that stereotype mirroring is “unbiased” and “correct”.
- Dataset bias: Unbiased Look at Dataset Bias
- Representations — should they be debiased?
- Auditing Black-box Models for Indirect Influence
- Certifying and removing disparate impact
- BlackBoxAuditing library