← Timeline
Avatar
Tigra
(updated )
Training LLMs to reason by making them play zero-sum games with each other
Doing RL without the costly training data!
Introduction
OPEN.SUBSTACK.COM
👍3
To react or comment  View in Web Client