This site participates in
Moera
Network. To unlock all features,
View in Web Client
tigra
TIMELINE
PROFILE
PEOPLE
← Timeline
Tigra
27-07-2025 18:59
(updated
27-07-2025 18:59
)
Training LLMs to reason by making them play zero-sum games with each other
https://open.substack.com/pub/machinelearningatscale/p/doing-rl-without-the-costly-training
Doing RL without the costly training data!
Introduction
OPEN.SUBSTACK.COM
👍
3
Share
To react or comment
View in Web Client