Training LLMs to reason by making them play zero-sum games with each other @ Tigra | Moera

Try all Moera features — View in Web Client

Back to Timeline

Tigra
27-07-2025 18:59

Training LLMs to reason by making them play zero-sum games with each other

https://open.substack.com/pub/machinelearningatscale/p/doing-rl-without-the-costly-training

Doing RL without the costly training data!

OPEN.SUBSTACK.COM

👍3