WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces Paper • 2606.09426 • Published 23 days ago • 104
Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents Paper • 2603.12634 • Published Mar 13 • 16