Papers
arxiv:2509.05542

DreamPRM-1.5: Unlocking the Potential of Each Instance for Multimodal Process Reward Model Training

Published on Sep 5, 2025
Authors:

Abstract

DreamPRM-1.5, an instance-reweighted framework using bi-level optimization, improves multimodal process reward model training by addressing distribution shifts and noisy data, achieving high accuracy on the MMMU benchmark.

Training multimodal process reward models (PRMs) is challenged by distribution shifts and noisy data. We introduce DreamPRM-1.5, an instance-reweighted framework that adaptively adjusts the importance of each training example via bi-level optimization. We design two complementary strategies: Instance Table, effective for smaller datasets, and Instance Net, scalable to larger ones. Integrated into test-time scaling, DreamPRM-1.5 achieves 84.6 accuracy on the MMMU benchmark, surpassing GPT-5.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2509.05542
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.05542 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.05542 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.05542 in a Space README.md to link it from this page.

Collections including this paper 1