Substrate
ai

Anthropic 2023 Paper Examines AI Model Adjustments for Bias and Discrimination

A 2023 research paper by Anthropic employees explored how AI models handle discrimination related to race and gender. The paper discussed potential overcorrections in model responses to address historical injustices, based on experiments with human input. It highlighted challenges in training AI on human-generated text while mitigating biases.

Fox News
1 source·Apr 22, 2:54 PM(2 hrs ago)·1m read
|
Anthropic 2023 Paper Examines AI Model Adjustments for Bias and DiscriminationFox News
Audio version
Tap play to generate a narrated version.
Developing story

We have limited corroborating sources on this story right now. This page will update automatically as more coverage emerges.

Fox News reported on a 2023 paper by researchers at Anthropic, an AI company, that examined discrimination in AI models. The paper, co-authored by Amanda Askell and four others, analyzed how large language models respond to prompts involving race and gender.

one experiment, a 175 billion parameter model showed a 3% discrimination against Black students compared to White students in a baseline condition. With additional human input and training, the model shifted to favor Black students by 7%. The paper noted that such overcorrections might be applied in contexts aiming to address historical injustices against marginalized groups, if compliant with local laws.

The authors stated that language models trained on human text often reflect harmful stereotypes and discrimination present in that data.

The paper observed that models can be steered to avoid bias through natural language instructions requesting unbiased responses. Anthropic has positioned its AI model Claude as focused on ethical behavior, with its constitution aiming for the model to exhibit skill, judgment, nuance, and sensitivity in decision-making.

Amanda Askell, a philosopher at Anthropic, works on finetuning and AI alignment to make models more honest and to develop positive character traits. She previously held a similar role at OpenAI.

recently withheld its latest model, Mythos, due to its effectiveness in identifying cyber vulnerabilities. Earlier this year, the company clashed with the Department of War over restrictions on using its technology for lethal operations. A federal judge blocked a ban on Anthropic's technology for Department of War use.

Anthropic markets Claude as an ethical AI option amid ongoing debates about AI ethics and applications. S. military reportedly used Claude in an operation to capture Venezuelan leader Nicolás Maduro, where it handled requests autonomously and generated documentation.

Key Facts

2023 AI Paper on Discrimination
Anthropic researchers argued intentional overcorrection in AI could address historical injustices, with experiments showing shifts from 3% discrimination agains
Anthropic's Ethical Marketing
Claude AI positioned as ethical, with constitution aiming for virtue and nuance in decision-making.
Military Use of Claude
U.S. military deployed Claude autonomously in operation capturing Nicolás Maduro, handling thousands of requests and generating documentation.
Government Clashes
Anthropic clashed with Department of War; federal judge blocked Trump administration ban on its use.
Model Withholding
Anthropic withheld Mythos model due to effectiveness in finding cyber vulnerabilities.

Story Timeline

6 events
  1. 2026-04-22

    Current date; article publication context.

    1 sourceFox News
  2. 2026-02-26

    Pages from the Anthropic website and the company's logos displayed on a computer screen in New York.

    1 sourceFox News
  3. 2026 (earlier this year)

    Anthropic clashed with the Department of War over restrictions on lethal operations.

    1 sourceFox News
  4. 2026 (recent weeks)

    Anthropic withheld its latest model, Mythos, due to cyber vulnerability concerns.

    1 sourceFox News
  5. 2023

    Publication of paper by Amanda Askell and co-authors on AI discrimination.

    1 sourceFox News
  6. 2021

    Amanda Askell joined Anthropic (two years before the 2023 paper).

    1 sourceFox News

Potential Impact

  1. 01

    Potential shifts in AI ethics debates, influencing how companies train models for bias correction.

  2. 02

    Increased scrutiny on Anthropic's military applications, affecting future government contracts.

  3. 03

    Broader industry adoption of human-input overcorrection techniques for addressing AI stereotypes.

  4. 04

    Regulatory responses to AI in national security, possibly leading to new guidelines.

Transparency Panel

Sources cross-referenced1
Framing risk0/100 (low)
Confidence score65%
Synthesized bySubstrate AI
Word count302 words
PublishedApr 22, 2026, 2:54 PM
Bias signals removed3 across 3 outlets
Signal Breakdown
Loaded 2Framing 1

Related Stories

Anthropic Probes Possible Unauthorized Access to Mythos AI Model in Vendor Environmentcert.europa.eu
ai27 min ago

Anthropic Probes Possible Unauthorized Access to Mythos AI Model in Vendor Environment

Anthropic is probing reports of unauthorized access to its Mythos AI model, shared with select companies for vulnerability detection. The incident occurred on the model's release day, involving a Discord group. Separately, Microsoft reduced Game Pass prices but delayed day-one ac…

Cbs News
DI
ER
ZE
BBC News
+3
8 sources
Meta Installs Tracking Software on U.S. Employees' Computers for AI Trainingmarketscreener.com
ai27 min ago

Meta Installs Tracking Software on U.S. Employees' Computers for AI Training

Meta is deploying software to monitor U.S. employees' computer interactions, including mouse movements and keystrokes, to gather data for AI model training. The initiative aims to improve AI agents' ability to perform white-collar tasks. Safeguards are in place to protect sensiti…

FO
thenextweb.com
iphoneincanada.ca
indiatoday.intoday.in
4 sources
Trump Discusses Potential Anthropic AI Deal for Defense Department Amid Iran Ceasefire DeadlineCnbc
ai1 day ago

Trump Discusses Potential Anthropic AI Deal for Defense Department Amid Iran Ceasefire Deadline

President Donald Trump stated a deal may allow Anthropic's AI models for Department of Defense use following recent White House talks. Separately, he noted restocking by Iran and the U.S. during a ceasefire set to expire tomorrow. Iran prepares new battlefield measures if fightin…

Cnbc
DI
MA
SP
BU
+2
7 sources