EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery Paper • 2606.13662 • Published 4 days ago • 26
EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 4 days ago • 127
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 4 days ago • 86
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 6 days ago • 20
SWE-Explore: Benchmarking How Coding Agents Explore Repositories Paper • 2606.07297 • Published 10 days ago • 110
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents Paper • 2606.05806 • Published 11 days ago • 22
MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery Paper • 2606.06473 • Published 11 days ago • 19
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 11 days ago • 40
TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration Paper • 2606.04743 • Published 12 days ago • 44
BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution Paper • 2606.01286 • Published 15 days ago • 5
AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks? Paper • 2606.05080 • Published 12 days ago • 30
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 14 days ago • 56
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 18 days ago • 77
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation Paper • 2605.28655 • Published 19 days ago • 12
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations Paper • 2605.27908 • Published 19 days ago • 6
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents Paper • 2605.28775 • Published 19 days ago • 38
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published 21 days ago • 36