Meta restricts employees from using Claude Code and OpenAI Codex to prevent competitor model distillation · Digg

/Tech1d ago

Meta restricts employees from using Claude Code and OpenAI Codex to prevent competitor model distillation

AI Judge changed title after evaluation, original title: "Meta bans developers from using Claude Code and OpenAI Codex to prevent distillation into its own MetaCode"

Story Overview

Meta is restricting employee access to Anthropic’s Claude Code and OpenAI’s Codex while building its internal coding model, MetaCode, to stop outputs from those tools from slipping into its training data or evaluations.

1891.6K108361201.6K

Original post

Chubby♨️@kimmonismus#1360inTech

Meta is now facing the exact problem every AI company will soon face.

It wants to replace expensive external coding tools like Claude Code and Codex with its own internal system, MetaCode. But to build a better coding model, Meta has to make sure it is not accidentally training or evaluating on outputs from rival models.

That is the distillation trap: The more companies rely on frontier models to build internal AI infrastructure, the harder it becomes to prove where the intelligence actually came from.

6:48 AM · Jun 29, 2026 · 141K Views

Developer Impact

Engineers Navigate Tighter Rules

Applied AI teams at Meta must now avoid heavy reliance on the external tools they have been using to speed up work, creating friction during the transition to an in-house alternative.

Open Question

Distillation Risk Remains Unmeasured

Public details do not yet show how strictly the guidelines are enforced or whether they will actually keep rival model outputs from influencing MetaCode.

Sentiment

Many users criticized Meta's limits on engineer use of Claude and Codex to block distillation as hypocritical given its open-source branding and unfair practices.

Pos

18.9%

Neg

81.1%

22 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

Internal Docs Show Meta Putting Limits on Claude and Codex, Fearing Distillation

THE INFORMATIONVia

Internal Docs Show Meta Putting Limits on Claude and Codex, Fearing Distillation

Posts from X

Most Activity

VIEWS13.5KLIKES141REPLIES18

Beff (e/acc)@beffjezos

The big labs need to figure out a distillation licensing contract for their big partners otherwise they will see their token spend decrease.

Distillation approved for parts of model revenue seems like a good deal.

Why hamstring the American AI labs while China plows ahead?

Chubby♨️@kimmonismus

Meta is now facing the exact problem every AI company will soon face.

It wants to replace expensive external coding tools like Claude Code and Codex with its own internal system, MetaCode. But to build a better coding model, Meta has to make sure it is not accidentally training or evaluating on outputs from rival models.

That is the distillation trap: The more companies rely on frontier models to build internal AI infrastructure, the harder it becomes to prove where the intelligence actually came from.

1d13.5K14114

BOOKMARKS17

Rohan Paul@rohanpaul_ai

The Information: Meta has reportedly limited engineer use of Claude Code and Codex because rival model outputs could contaminate Meta’s own AI training data and create contractual trouble with Anthropic and OpenAI.

Distillation risk starts when a new model of Meta learns from another model’s outputs (from OpenAI or Anthropic), so even accidental reuse of Claude or Codex answers could look like Meta extracted capability from competitors rather than built it alone.

OpenAI’s terms bar using output to develop competing models, and Anthropic says its terms do not allow Claude outputs to train models competitive with Anthropic’s own systems.

Both OpenAI’s and Anthropic's terms bar using output to develop competing models.

IMO, the safest strategy could be ingredient tracking: use rival tools for ordinary productivity only when outputs are barred from model-training pipelines, evaluation sets, benchmark generation, post-training data, reward-model data, and internal datasets that later feed model development.

Of course a strong lawsuit usually needs much more ugly facts like: mass scraping, fake accounts, rate-limit evasion, automated extraction, direct use of outputs as training labels, or internal records showing the buyer knew it was cloning a rival system.

In this situation, som of the typical safeguards are clean-room rules, approved enterprise accounts, no consumer accounts for sensitive work, training-data provenance logs, dataset quarantine, prompt and output retention, automated scanners for “AI-generated by vendor X” material, and access controls separating coding-agent work from model-training data.

1d10.4K6317

RETWEETS36

Chubby♨️@kimmonismus

Meta is now facing the exact problem every AI company will soon face.

It wants to replace expensive external coding tools like Claude Code and Codex with its own internal system, MetaCode. But to build a better coding model, Meta has to make sure it is not accidentally training or evaluating on outputs from rival models.

That is the distillation trap: The more companies rely on frontier models to build internal AI infrastructure, the harder it becomes to prove where the intelligence actually came from.

1d141K1.1K301

Alexander Doria@Dorialexander

Still don't know why so many people take it as a settled matter while "distillation" is a much more fringe legal issue than training on copyright outputs. If fair use holds here, it holds everywhere.

Chubby♨️@kimmonismus

Meta is now facing the exact problem every AI company will soon face.

It wants to replace expensive external coding tools like Claude Code and Codex with its own internal system, MetaCode. But to build a better coding model, Meta has to make sure it is not accidentally training or evaluating on outputs from rival models.

That is the distillation trap: The more companies rely on frontier models to build internal AI infrastructure, the harder it becomes to prove where the intelligence actually came from.

1d12.5K10615

Chubby♨️@kimmonismus

"Meta, which has been among the biggest customers of Claude Code, set up the applied AI engineering team earlier this year and has tasked it with improving Meta’s own coding assistant, MetaCode. A big part of that entails building high-quality datasets and programming challenges that engineers use to train and test its models for coding work. While Meta permits the team to use outside AI tools for some purposes, it requires engineers to design those challenges themselves and to rely on their own technical expertise rather than AI-generated concepts."

Chubby♨️@kimmonismus

Meta is now facing the exact problem every AI company will soon face.

It wants to replace expensive external coding tools like Claude Code and Codex with its own internal system, MetaCode. But to build a better coding model, Meta has to make sure it is not accidentally training or evaluating on outputs from rival models.

That is the distillation trap: The more companies rely on frontier models to build internal AI infrastructure, the harder it becomes to prove where the intelligence actually came from.

1d10.4K467

xlr8harder@xlr8harder

We should be more clear about this.

It might be time again for some stickers.

Alexander Doria@Dorialexander

Still don't know why so many people take it as a settled matter while "distillation" is a much more fringe legal issue than training on copyright outputs. If fair use holds here, it holds everywhere.

1d1.6K561

Chubby♨️@kimmonismus

Source: https://www.theinformation.com/articles/internal-docs-show-meta-putting-limits-claude-codex-fearing-distillation?rc=bfliih

Chubby♨️@kimmonismus

Meta is now facing the exact problem every AI company will soon face.

It wants to replace expensive external coding tools like Claude Code and Codex with its own internal system, MetaCode. But to build a better coding model, Meta has to make sure it is not accidentally training or evaluating on outputs from rival models.

That is the distillation trap: The more companies rely on frontier models to build internal AI infrastructure, the harder it becomes to prove where the intelligence actually came from.

1d10.2K275

Simon Willison@simonw

@Dorialexander This doesn't look to me like a fair use legal issue - it reads like a terms of service issue, where Meta have agreed to contractual terms with those third parties that say they can't use those models to train their own

Alexander Doria@Dorialexander

Still don't know why so many people take it as a settled matter while "distillation" is a much more fringe legal issue than training on copyright outputs. If fair use holds here, it holds everywhere.

1d2.2K221

Alexander Doria@Dorialexander

@simonw I mean Meta intently trained on Libgen (and hosted it). Everything is a risk calculation.

Simon Willison@simonw

@Dorialexander This doesn't look to me like a fair use legal issue - it reads like a terms of service issue, where Meta have agreed to contractual terms with those third parties that say they can't use those models to train their own

1d437110

Simon Willison@simonw

@Dorialexander I think it's enforcable - my guess is that if there was a contract dispute between Anthropic and Meta Anthropic could use a discovery process to access training logs and figure out what went into the data mix

Alexander Doria@Dorialexander

@simonw Yeah different legal regime but in both case not really enforceable: very hard to prove data was used for training. I think we'll see the same risk estimate.

1d35961

Rohan Paul@rohanpaul_ai

https://www.theinformation.com/articles/internal-docs-show-meta-putting-limits-claude-codex-fearing-distillation

Rohan Paul@rohanpaul_ai

The Information: Meta has reportedly limited engineer use of Claude Code and Codex because rival model outputs could contaminate Meta’s own AI training data and create contractual trouble with Anthropic and OpenAI.

Distillation risk starts when a new model of Meta learns from another model’s outputs (from OpenAI or Anthropic), so even accidental reuse of Claude or Codex answers could look like Meta extracted capability from competitors rather than built it alone.

OpenAI’s terms bar using output to develop competing models, and Anthropic says its terms do not allow Claude outputs to train models competitive with Anthropic’s own systems.

Both OpenAI’s and Anthropic's terms bar using output to develop competing models.

IMO, the safest strategy could be ingredient tracking: use rival tools for ordinary productivity only when outputs are barred from model-training pipelines, evaluation sets, benchmark generation, post-training data, reward-model data, and internal datasets that later feed model development.

Of course a strong lawsuit usually needs much more ugly facts like: mass scraping, fake accounts, rate-limit evasion, automated extraction, direct use of outputs as training labels, or internal records showing the buyer knew it was cloning a rival system.

In this situation, som of the typical safeguards are clean-room rules, approved enterprise accounts, no consumer accounts for sensitive work, training-data provenance logs, dataset quarantine, prompt and output retention, automated scanners for “AI-generated by vendor X” material, and access controls separating coding-agent work from model-training data.

1d1.2K31

Simon Willison@simonw

@Dorialexander Different issue though - that's a risk from copyright law, the thing about not using Claude/GPT-5.5 to train competing models is a risk from contract law, they have signed agreements with those companies

Alexander Doria@Dorialexander

@simonw I mean Meta intently trained on Libgen (and hosted it). Everything is a risk calculation.

1d31360

Gregor@bygregorr

@rohanpaul_ai already have 3 claude-written functions i keep feeding back to claude as 'my style'

1d71

Alexander Doria@Dorialexander

@simonw Yeah different legal regime but in both case not really enforceable: very hard to prove data was used for training. I think we'll see the same risk estimate.

Simon Willison@simonw

@Dorialexander Different issue though - that's a risk from copyright law, the thing about not using Claude/GPT-5.5 to train competing models is a risk from contract law, they have signed agreements with those companies

1d17220

Salio@Mr_Salio

@kimmonismus AI Regulation before GTA 6

1d1402

Simon Willison@simonw

@Dorialexander Plus I assume that even if they couldn't prove it in a court of law Anthropic and OpenAI could still cut off Meta's future access to their models, which Meta would like to avoid

Simon Willison@simonw

@Dorialexander I think it's enforcable - my guess is that if there was a contract dispute between Anthropic and Meta Anthropic could use a discovery process to access training logs and figure out what went into the data mix

1d21400

Nav Toor@heynavtoor

@kimmonismus AI companies now have to prove what they copied

1d124

Patrick Kuhnke@ku_ds17868

@kimmonismus wait metacode is a thing? just internally right?

1d111

prinz@deredleritt3r

This is correct.

There is no privity of contract between a frontier lab and the authors whose works are used to train an LLM. So, one must resolve the question of whether it is permissible to train an LLM based on copyright laws (including the doctrine of fair use).

This is *completely* different from a scenario where a contract exists between two parties that expressly prohibits distillation. If Meta distills, it breaches the contract, end of story.

1d834

Gen@ccgencc

Tbh that is the entry door to a lot of potential « distillation » claims. Lets say u use claude api to build a text to sql agent. You generate outputs and get user feedback on wether output is good/vad. Then u use those traces to post train a model. Ant can claim its distillation

1d221