Latest Posts
-
Synthetic pretraining for very small reasoning models.
Does synthetic data improve reasoning in very small (<1B) reasoning models? Yes, a same-size generator unlocks stronger few-shot gains and better token efficiency on GSM8K and MATH500.
Read more -
1st Place in the ARC-AGI-3 Preview Competition External
We present our winning solution for the ARC-AGI-3 Agent Preview Competition.
Read more -
AlphaWrite: Inference-Time Compute Scaling for Writing
We introduce AlphaWrite, an inference-time scaling method for creative writing that uses evolutionary generation and ELO-based ranking to improve story quality.
Read more -
Self-Rewarding, Self-Improving External
We demonstrate that large language models can autonomously improve by judging their own solutions without reference answers, creating a complete self-learning loop that enhances performance beyond existing benchmarks.
Read more -
LLMs for Engineering: Teaching Models to Design High-Powered Rockets External
We demonstrate that while current SOTA language models struggle with iterative self-improvement in rocket engineering challenges, augmenting them with reinforcement learning unlocks superhuman design capabilities that could revolutionize physical engineering domains.
Read more -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition External
An in-depth analysis of a novel framework enabling language models to autonomously improve their problem-solving capabilities through recursive decomposition.
Read more -
Don't Throw the Baby out with the Bathwater: How and why Deep Learning for ARC External
Paper detailing our approach to the ARC-prize competition.
Read more -
Text to RL: Extracting High-Quality RL Questions from Text
Turning textbooks into RL Questions
Read more
Research Chatter
We are periodically sharing research updates, experiments, and publication drafts from Tufa Labs on our internal research blog.