/Tech1d ago

Meta's Lucas Beyer asks if any serious open-source LLM pretraining repositories exist without relying on NVIDIA's Megatron-LM

Andrew Carr proposed PyTorch's torchtitan as a potential alternative

38438827644K

#72

Original post

Lucas Beyer (bl16)@giffmana#72inTech

Each time i look at some open pretrain codebase, and i look under the hood, it ends up being a fork of megatron-lm.

Question: Is there any serious open source pretraining codebase that is not a megatron-lm derivative?

12:18 PM · Jun 29, 2026 · 30.1K Views

Sentiment

Users praise torchtitan and FSDP2 as strong alternatives to Megatron-LM for open-source LLM pretraining because they provide precise control and positive overall experiences.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

GITHUBVia

#301

Posts from X

Most Activity

VIEWS2.6KBOOKMARKS52LIKES67

Luca Soldaini 🇰🇷 ICML 2026@soldni

@giffmana Olmo Core is from scratch, @mechanicaldirk, @epwalsh, @tyleraromero & @AkshitaB93 main contributors.

Scaled up to dense 70B models + hybrid aarch, MoE not in yet afaik.

https://github.com/allenai/OLMo-core

Lucas Beyer (bl16)@giffmana

Each time i look at some open pretrain codebase, and i look under the hood, it ends up being a fork of megatron-lm.

Question: Is there any serious open source pretraining codebase that is not a megatron-lm derivative?

1d2.6K6752

RETWEETS1

Stella Biderman@BlancheMinerva

@giffmana In terms of use by people other than the org that created the library, I’m under the impression that the list is (high to low): - Megatron(-lm/-DeepSpeed/NeMO) - GPT-NeoX (Megatron-based but has moderate divergence by now) - TorchTitan - Lingua

Lucas Beyer (bl16)@giffmana

Each time i look at some open pretrain codebase, and i look under the hood, it ends up being a fork of megatron-lm.

Question: Is there any serious open source pretraining codebase that is not a megatron-lm derivative?

1d2.3K2123

REPLIES3

Lucas Beyer (bl16)@giffmana

@andrew_n_carr Yeah i liked torchtitan last time i looked at it! Afaik it wasn't used in any bigger open model training yet though, right?

Andrew Carr 🤸@andrew_n_carr

@giffmana torchtitan doesn't build on megatron, I believe, but I'm not sure if it fits your definition of `serious` here

1d1.8K81

elie@eliebakouch

@giffmana torchtitan, olmo-core are great!

also worth noting that i think the nvidia team doesn't use megatron-lm to train models anymore, they use megatron bridge (which is based on megatron-core, a submodule of megatron-lm)

Lucas Beyer (bl16)@giffmana

Each time i look at some open pretrain codebase, and i look under the hood, it ends up being a fork of megatron-lm.

Question: Is there any serious open source pretraining codebase that is not a megatron-lm derivative?

23h1.3K2416

Lucas Beyer (bl16)@giffmana

@BlancheMinerva nice list, yeah the answers seem to be converging to these four so far, plus marin/levanter for jax.

Stella Biderman@BlancheMinerva

1d1.3K54

Andrew Carr 🤸@andrew_n_carr

@giffmana torchtitan doesn't build on megatron, I believe, but I'm not sure if it fits your definition of `serious` here

Lucas Beyer (bl16)@giffmana

Each time i look at some open pretrain codebase, and i look under the hood, it ends up being a fork of megatron-lm.

Question: Is there any serious open source pretraining codebase that is not a megatron-lm derivative?

1d2.3K201

Lucas Beyer (bl16)@giffmana

@soldni @mechanicaldirk @epwalsh @tyleraromero @AkshitaB93 Ah nice, thank you, i will have a look at it!

Luca Soldaini 🇰🇷 ICML 2026@soldni

@giffmana Olmo Core is from scratch, @mechanicaldirk, @epwalsh, @tyleraromero & @AkshitaB93 main contributors.

Scaled up to dense 70B models + hybrid aarch, MoE not in yet afaik.

https://github.com/allenai/OLMo-core

1d1.8K91

arcjax@arcjax7

@giffmana https://github.com/google-research/big_vision

1d29831

jellybean ❄️@jdchawla29

@giffmana @stochasticchasm open the goose

1d4314

Lucas Beyer (bl16)@giffmana

@jeandut14000 Good suggestion, i should read it! Has it been used to train an open weights model? (Serious question, idk much but it)

1d253

Stella Biderman@BlancheMinerva

@giffmana @soldni has anyone outside of AI2 trained a 7B+ model on OLMo-core?

Stella Biderman@BlancheMinerva

1d60630

TimDarcet@TimDarcet

@giffmana I have something for you but it is (sadly) not open-source :/

1d4553

Juan@jeandut14000

@giffmana Lingua ?

1d4363

Stella Biderman@BlancheMinerva

@giffmana @jeandut14000 The models from paper were trained on it because some of our collaborators wanted to. It ended up being like 10% slower than if we had used GPT-NeoX though.

https://arxiv.org/abs/2506.05209

1d191

Juan@jeandut14000

@giffmana I know at least @ZeyuanAllenZhu is using it to showcase his infamous Canon layers but we are in the realm of very small model sizes intended for research purposes (<=8B). I use it quite heavily myself it is a good base to fork.

1d651

Luca Soldaini 🇰🇷 ICML 2026@soldni

@BlancheMinerva @giffmana yes, lemme DM you

Stella Biderman@BlancheMinerva

@giffmana @soldni has anyone outside of AI2 trained a 7B+ model on OLMo-core?

1d10600

Sean Cantrell@ThePremiseOfIt

@giffmana Built my own for our purposes. Needed a more extensible harness for the weird architectures we've made.

1d63

NA@hackpert

@giffmana @andrew_n_carr Arcee's Trinity Large and Solar 102B do!

1d191

Marco Ciccone@mciccone_AI

@giffmana torchtitan is great! I am not up to date regarding MoEs performance compared to Megatron, but my experience is positive in general. There is also an AMD-optimized fork. https://github.com/pytorch/torchtitan

1d1883

Lucas Beyer (bl16)@giffmana

@TimDarcet Yeah was mostly looking at public things, but you got me curious anyways, ping me the link on corp :)

1d3072