LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents Paper • 2605.29559 • Published 26 days ago • 17
WebWorld: A Large-Scale World Model for Web Agent Training Paper • 2602.14721 • Published Feb 16 • 19
DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder Paper • 2602.00592 • Published Jan 31 • 2
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7, 2025 • 190
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 79
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design Paper • 2506.04734 • Published Jun 5, 2025 • 21
Krikri: Advancing Open Large Language Models for Greek Paper • 2505.13772 • Published May 19, 2025 • 6
Krikri 8B Collection Advanced Open LLM for Greek based on Llama-3.1 8B • 4 items • Updated Feb 12 • 4
ILSP Greek Evaluation Suite Collection A collection of test sets for evaluating base and chat LLMs (incl. VLMs) on Greek generation and understanding capabilities • 23 items • Updated 7 days ago • 7
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Paper • 2503.11751 • Published Mar 14, 2025 • 17
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 ariG23498, merve, pcuenq, reach-vb • Mar 12, 2025 • 497
Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs Paper • 2502.14561 • Published Feb 20, 2025 • 2