Papers
arxiv:2401.11864

Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation

Published on Jan 22, 2024
Authors:
,
,
,
,

Abstract

EoTD and MTD techniques enhance the reasoning capabilities of Small Language Models without sacrificing performance.

This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Mix Thoughts Distillation (MTD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes and using it for fine-tuning. Our experimental findings demonstrate that EoTD significantly boosts the reasoning abilities of SLMs, while MTD enables these models to achieve state-of-the-art reasoning performance.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2401.11864
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.11864 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.11864 in a Space README.md to link it from this page.

Collections including this paper 3