Anthropic 2023 Paper Examines AI Model Adjustments for Bias and Discrimination
A 2023 research paper by Anthropic employees explored how AI models handle discrimination related to race and gender. The paper discussed potential overcorrections in model responses to address historical injustices, based on experiments with human input. It highlighted challenges in training AI on human-generated text while mitigating biases.
Fox NewsWe have limited corroborating sources on this story right now. This page will update automatically as more coverage emerges.
Fox News reported on a 2023 paper by researchers at Anthropic, an AI company, that examined discrimination in AI models. The paper, co-authored by Amanda Askell and four others, analyzed how large language models respond to prompts involving race and gender.
one experiment, a 175 billion parameter model showed a 3% discrimination against Black students compared to White students in a baseline condition. With additional human input and training, the model shifted to favor Black students by 7%. The paper noted that such overcorrections might be applied in contexts aiming to address historical injustices against marginalized groups, if compliant with local laws.
The authors stated that language models trained on human text often reflect harmful stereotypes and discrimination present in that data.
The paper observed that models can be steered to avoid bias through natural language instructions requesting unbiased responses. Anthropic has positioned its AI model Claude as focused on ethical behavior, with its constitution aiming for the model to exhibit skill, judgment, nuance, and sensitivity in decision-making.
Amanda Askell, a philosopher at Anthropic, works on finetuning and AI alignment to make models more honest and to develop positive character traits. She previously held a similar role at OpenAI.
recently withheld its latest model, Mythos, due to its effectiveness in identifying cyber vulnerabilities. Earlier this year, the company clashed with the Department of War over restrictions on using its technology for lethal operations. A federal judge blocked a ban on Anthropic's technology for Department of War use.
Anthropic markets Claude as an ethical AI option amid ongoing debates about AI ethics and applications. S. military reportedly used Claude in an operation to capture Venezuelan leader Nicolás Maduro, where it handled requests autonomously and generated documentation.
Key Facts
Story Timeline
6 events- 2026-04-22
Current date; article publication context.
1 sourceFox News - 2026-02-26
Pages from the Anthropic website and the company's logos displayed on a computer screen in New York.
1 sourceFox News - 2026 (earlier this year)
Anthropic clashed with the Department of War over restrictions on lethal operations.
1 sourceFox News - 2026 (recent weeks)
Anthropic withheld its latest model, Mythos, due to cyber vulnerability concerns.
1 sourceFox News - 2023
Publication of paper by Amanda Askell and co-authors on AI discrimination.
1 sourceFox News - 2021
Amanda Askell joined Anthropic (two years before the 2023 paper).
1 sourceFox News
Potential Impact
- 01
Potential shifts in AI ethics debates, influencing how companies train models for bias correction.
- 02
Increased scrutiny on Anthropic's military applications, affecting future government contracts.
- 03
Broader industry adoption of human-input overcorrection techniques for addressing AI stereotypes.
- 04
Regulatory responses to AI in national security, possibly leading to new guidelines.
Transparency Panel
Related Stories
cert.europa.euAnthropic Probes Possible Unauthorized Access to Mythos AI Model in Vendor Environment
Anthropic is probing reports of unauthorized access to its Mythos AI model, shared with select companies for vulnerability detection. The incident occurred on the model's release day, involving a Discord group. Separately, Microsoft reduced Game Pass prices but delayed day-one ac…
marketscreener.comMeta Installs Tracking Software on U.S. Employees' Computers for AI Training
Meta is deploying software to monitor U.S. employees' computer interactions, including mouse movements and keystrokes, to gather data for AI model training. The initiative aims to improve AI agents' ability to perform white-collar tasks. Safeguards are in place to protect sensiti…
CnbcTrump Discusses Potential Anthropic AI Deal for Defense Department Amid Iran Ceasefire Deadline
President Donald Trump stated a deal may allow Anthropic's AI models for Department of Defense use following recent White House talks. Separately, he noted restocking by Iran and the U.S. during a ceasefire set to expire tomorrow. Iran prepares new battlefield measures if fightin…