The Information: Meta has reportedly limited engineer use of Claude Code and Codex because rival model outputs could contaminate Meta’s own AI training data and create contractual trouble with Anthropic and OpenAI.
Distillation risk starts when a new model of Meta learns from another model’s outputs (from OpenAI or Anthropic), so even accidental reuse of Claude or Codex answers could look like Meta extracted capability from competitors rather than built it alone.
OpenAI’s terms bar using output to develop competing models, and Anthropic says its terms do not allow Claude outputs to train models competitive with Anthropic’s own systems.
Both OpenAI’s and Anthropic's terms bar using output to develop competing models.
IMO, the safest strategy could be ingredient tracking: use rival tools for ordinary productivity only when outputs are barred from model-training pipelines, evaluation sets, benchmark generation, post-training data, reward-model data, and internal datasets that later feed model development.
Of course a strong lawsuit usually needs much more ugly facts like: mass scraping, fake accounts, rate-limit evasion, automated extraction, direct use of outputs as training labels, or internal records showing the buyer knew it was cloning a rival system.
In this situation, som of the typical safeguards are clean-room rules, approved enterprise accounts, no consumer accounts for sensitive work, training-data provenance logs, dataset quarantine, prompt and output retention, automated scanners for “AI-generated by vendor X” material, and access controls separating coding-agent work from model-training data.