Research

Latest Posts

Duck Harness: Winning Solution for ARC-AGI-3 Milestone 1

July 1, 2026 Harold Bessis, Jeroen Cottaar, Isaiah Pressman, Andries Smit, Michal Tešnar, Stefano Viel

We open-source the duck harness, our agent for the ARC-AGI-3 challenge on Kaggle, and share a technical overview and results comparing it to other published approaches.
Read more
A Predictive Law for On-Policy Self-Distillation From World Feedback External

June 8, 2026 Tommy He, Jerome Sieber*, Matteo Saponati*

We discover a simple linear law that predicts on-policy self-distillation outcomes from world feedback before running full training, holding across model scales and context types. Published at ICML 2026 RLxF Workshop.
Read more
Synthetic pretraining for very small reasoning models.

April 24, 2026 Matteo Saponati

Does synthetic data improve reasoning in very small (<1B) reasoning models? Yes, a same-size generator unlocks stronger few-shot gains and better token efficiency on GSM8K and MATH500.
Read more
1st Place in the ARC-AGI-3 Preview Competition External

August 21, 2025 Dries Smit

We present our winning solution for the ARC-AGI-3 Agent Preview Competition.
Read more
AlphaWrite: Inference-Time Compute Scaling for Writing

June 6, 2025 Toby Simonds

We introduce AlphaWrite, an inference-time scaling method for creative writing that uses evolutionary generation and ELO-based ranking to improve story quality.
Read more
Self-Rewarding, Self-Improving External

May 12, 2025 Toby Simonds

We demonstrate that large language models can autonomously improve by judging their own solutions without reference answers, creating a complete self-learning loop that enhances performance beyond existing benchmarks.
Read more
LLMs for Engineering: Teaching Models to Design High-Powered Rockets External

April 24, 2025 Toby Simonds

We demonstrate that while current SOTA language models struggle with iterative self-improvement in rocket engineering challenges, augmenting them with reinforcement learning unlocks superhuman design capabilities that could revolutionize physical engineering domains.
Read more
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition External

March 24, 2025 Toby Simonds

An in-depth analysis of a novel framework enabling language models to autonomously improve their problem-solving capabilities through recursive decomposition.
Read more
Don't Throw the Baby out with the Bathwater: How and why Deep Learning for ARC External

March 24, 2025 Jack Cole and Mohamed Osman

Paper detailing our approach to the ARC-prize competition.
Read more
Text to RL: Extracting High-Quality RL Questions from Text

March 5, 2025 Toby Simonds

Turning textbooks into RL Questions
Read more

Research Chatter

We are periodically sharing research updates, experiments, and publication drafts from Tufa Labs on our internal research blog.

Latest Posts

Duck Harness: Winning Solution for ARC-AGI-3 Milestone 1

A Predictive Law for On-Policy Self-Distillation From World Feedback External

Synthetic pretraining for very small reasoning models.

1st Place in the ARC-AGI-3 Preview Competition External

AlphaWrite: Inference-Time Compute Scaling for Writing

Self-Rewarding, Self-Improving External

LLMs for Engineering: Teaching Models to Design High-Powered Rockets External

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition External

Don't Throw the Baby out with the Bathwater: How and why Deep Learning for ARC External

Text to RL: Extracting High-Quality RL Questions from Text

Research Chatter