Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents
Paper • 2606.19704 • Published • 41
Enterprise AI and ML, Foundation Models, Responsible AI
Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines