Anthropic 2023 Paper Examines AI Model Adjustments for Bias and Discrimination

A 2023 research paper by Anthropic employees explored how AI models handle discrimination related to race and gender. The paper discussed potential overcorrections in model responses to address historical injustices, based on experiments with human input. It highlighted challenges in training AI on human-generated text while mitigating biases.

1 source·Apr 22, 2:54 PM(2 hrs ago)·1m read|

Anthropic 2023 Paper Examines AI Model Adjustments for Bias and Discrimination

Audio version

Tap play to generate a narrated version.

Developing story

We have limited corroborating sources on this story right now. This page will update automatically as more coverage emerges.

Fox News reported on a 2023 paper by researchers at Anthropic, an AI company, that examined discrimination in AI models. The paper, co-authored by Amanda Askell and four others, analyzed how large language models respond to prompts involving race and gender.

one experiment, a 175 billion parameter model showed a 3% discrimination against Black students compared to White students in a baseline condition. With additional human input and training, the model shifted to favor Black students by 7%. The paper noted that such overcorrections might be applied in contexts aiming to address historical injustices against marginalized groups, if compliant with local laws.

The authors stated that language models trained on human text often reflect harmful stereotypes and discrimination present in that data.

The paper observed that models can be steered to avoid bias through natural language instructions requesting unbiased responses. Anthropic has positioned its AI model Claude as focused on ethical behavior, with its constitution aiming for the model to exhibit skill, judgment, nuance, and sensitivity in decision-making.

Amanda Askell, a philosopher at Anthropic, works on finetuning and AI alignment to make models more honest and to develop positive character traits. She previously held a similar role at OpenAI.

recently withheld its latest model, Mythos, due to its effectiveness in identifying cyber vulnerabilities. Earlier this year, the company clashed with the Department of War over restrictions on using its technology for lethal operations. A federal judge blocked a ban on Anthropic's technology for Department of War use.

Anthropic markets Claude as an ethical AI option amid ongoing debates about AI ethics and applications. S. military reportedly used Claude in an operation to capture Venezuelan leader Nicolás Maduro, where it handled requests autonomously and generated documentation.

Key Facts

2023 AI Paper on Discrimination

Anthropic researchers argued intentional overcorrection in AI could address historical injustices, with experiments showing shifts from 3% discrimination agains

Anthropic's Ethical Marketing

Claude AI positioned as ethical, with constitution aiming for virtue and nuance in decision-making.

Military Use of Claude

U.S. military deployed Claude autonomously in operation capturing Nicolás Maduro, handling thousands of requests and generating documentation.

Government Clashes

Anthropic clashed with Department of War; federal judge blocked Trump administration ban on its use.

Model Withholding

Anthropic withheld Mythos model due to effectiveness in finding cyber vulnerabilities.

ai-ethics ai bias-in-ai anthropic technology-research discrimination-studies department-of-war

Story Timeline

6 events

2026-04-22
Current date; article publication context.
1 sourceFox News
2026-02-26
Pages from the Anthropic website and the company's logos displayed on a computer screen in New York.
1 sourceFox News
2026 (earlier this year)
Anthropic clashed with the Department of War over restrictions on lethal operations.
1 sourceFox News
2026 (recent weeks)
Anthropic withheld its latest model, Mythos, due to cyber vulnerability concerns.
1 sourceFox News
2023
Publication of paper by Amanda Askell and co-authors on AI discrimination.
1 sourceFox News
2021
Amanda Askell joined Anthropic (two years before the 2023 paper).
1 sourceFox News

Potential Impact

01
Potential shifts in AI ethics debates, influencing how companies train models for bias correction.
02
Increased scrutiny on Anthropic's military applications, affecting future government contracts.
03
Broader industry adoption of human-input overcorrection techniques for addressing AI stereotypes.
04
Regulatory responses to AI in national security, possibly leading to new guidelines.

Transparency Panel

Sources cross-referenced1

Framing risk0/100 (low)

Confidence score65%

Synthesized bySubstrate AI

Word count302 words

PublishedApr 22, 2026, 2:54 PM

Bias signals removed3 across 3 outlets

Signal Breakdown

Loaded 2Framing 1

Original Sources

Fox NewsAnthropic's moral compass architect suggested AI overcorrection could address historical injustices

Anthropic Probes Possible Unauthorized Access to Mythos AI Model in Vendor Environment

Anthropic is probing reports of unauthorized access to its Mythos AI model, shared with select companies for vulnerability detection. The incident occurred on the model's release day, involving a Discord group. Separately, Microsoft reduced Game Pass prices but delayed day-one ac…

8 sources

Meta Installs Tracking Software on U.S. Employees' Computers for AI Training

marketscreener.com

ai27 min ago

Meta Installs Tracking Software on U.S. Employees' Computers for AI Training

Meta is deploying software to monitor U.S. employees' computer interactions, including mouse movements and keystrokes, to gather data for AI model training. The initiative aims to improve AI agents' ability to perform white-collar tasks. Safeguards are in place to protect sensiti…

4 sources

Trump Discusses Potential Anthropic AI Deal for Defense Department Amid Iran Ceasefire Deadline

Cnbc

ai1 day ago

Trump Discusses Potential Anthropic AI Deal for Defense Department Amid Iran Ceasefire Deadline

President Donald Trump stated a deal may allow Anthropic's AI models for Department of Defense use following recent White House talks. Separately, he noted restocking by Iran and the U.S. during a ceasefire set to expire tomorrow. Iran prepares new battlefield measures if fightin…

7 sources

Key Facts

2023 AI Paper on Discrimination

Anthropic researchers argued intentional overcorrection in AI could address historical injustices, with experiments showing shifts from 3% discrimination agains

Anthropic's Ethical Marketing

Claude AI positioned as ethical, with constitution aiming for virtue and nuance in decision-making.

Military Use of Claude

U.S. military deployed Claude autonomously in operation capturing Nicolás Maduro, handling thousands of requests and generating documentation.

Government Clashes

Anthropic clashed with Department of War; federal judge blocked Trump administration ban on its use.

Model Withholding

Anthropic withheld Mythos model due to effectiveness in finding cyber vulnerabilities.

Daily digest

Top stories every evening. Bias-free. Ranked by our public algorithm.

Key Facts

Story Timeline

Potential Impact

Transparency Panel

Related Stories

Anthropic Probes Possible Unauthorized Access to Mythos AI Model in Vendor Environment

Meta Installs Tracking Software on U.S. Employees' Computers for AI Training

Trump Discusses Potential Anthropic AI Deal for Defense Department Amid Iran Ceasefire Deadline