SottoASR Transcript Cleanup — LFM2.5-350M (Full Precision, soup_30)

sottoasr.app · MLX 5-bit (recommended) · MLX 4-bit (smaller) · Training Dataset

Overview

Full-precision bf16 fine-tune of LiquidAI/LFM2.5-350M-Base for on-device speech-to-text transcript cleanup. This is the training artifact — for on-device deployment on Apple Silicon, use the 5-bit MLX variant.

What's new (model soup release)

This model is a weight-space average of two strong checkpoints from the same fine-tuning lineage:

  • 0.3 × v55 (latest: 2-epoch refinement at lr 2e-6) — strongest on number-accuracy and filler-stripping
  • 0.7 × v51 (the prior production model) — strongest on adversarial sampling benchmark

Linear interpolation in weight space (θ = α·θ_v55 + (1-α)·θ_v51) is sometimes called "model souping". It works here because v55 was chained from v51 (same architecture, related minima), and the soup recovers v51's bench-sample strengths without losing v55's number/filler gains. The full recipe sweep is in the research journal (2026-05-06 loop).

Headline numbers (production-mode eval: max_new_tokens=900, repetition_penalty=1.05)

Capability v36 v45 v51 v55 soup (this)
Number accuracy (171-sample stratified val) 12.9% 95.9% 95.3% 96.5% 96.5%
66-case adversarial benchmark (greedy) n/a 76% 84.8% 84.8% 86.4%
66-case adversarial benchmark (temp 0.7 × 4) n/a 77% 84.5% 82.6% 86.0%
Loops on 264 sampling-mode probes n/a 0 1 2 0
Filler-free on 241 long inputs 67.2% 68.0% 72.2% 72.6% 71.8%
Sub-deletion >15% on 241 long inputs 13.3% 13.7% 4.6% 5.0% 5.0%

Composite score (0.35×num + 0.30×bench_greedy + 0.15×bench_sample + 0.10×filler_long + 0.05×(1-sub15) + 0.05×(1-loops/N)): 89.51 at full production settings.

Training pipeline

LiquidAI/LFM2.5-350M-Base
  → SFT v23 → GRPO v23 (paragraph emission)
  → GRPO v36: full FT with substantive-deletion-aware reward
  → SFT v39: + 12.7K augmented number examples (ITN)
  → GRPO v40–v45: chained refinement, fixed reward + amplified filler penalty
  → GRPO v50 + v51: anti-loop n-gram penalty
  → GRPO v55: 2-epoch refinement at lr 2e-6 (best chained checkpoint)
  → soup: 0.3·θ_v55 + 0.7·θ_v51 (weight-space average — this model)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "juanquivilla/sotto-cleanup-lfm25-350m",
    dtype=torch.bfloat16, trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("juanquivilla/sotto-cleanup-lfm25-350m")

text = "talk about server three sixty"
prompt = f"### Input:\n{text}\n\n### Output:\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=max(900, int(len(text.split()) * 1.5)),  # ≥1.5× input word count
        do_sample=False,
        repetition_penalty=1.05,                                # LFM2.5 official default
    )
output = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
if "###" in output:
    output = output[:output.index("###")]
print(output.strip())

Inference recommendations

The headline numbers above use these settings — they match the LFM2.5 model card's defaults and are the production deployment for sottoasr.app:

  • repetition_penalty=1.05 — LFM2.5's official default. Critical for long inputs: prevents the rare voicemail-style 5-gram loops that can occur with repetition_penalty=1.0.
  • max_new_tokens >= 1.5 × input_word_count (or 900 minimum) — long inputs (>200 words) need headroom; truncating mid-output looks like content deletion.
  • do_sample=False (greedy) for deterministic output. If sampling is needed, use temperature=0.1, top_k=50.

All Variants

Variant Size Use Case
Full precision (this) 676 MB Training, GPU inference
MLX 5-bit ~237 MB Recommended for Apple Silicon
MLX 4-bit ~195 MB Smallest

License

MIT

Downloads last month
142
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for juanquivilla/sotto-cleanup-lfm25-350m

Finetuned
(7)
this model
Quantizations
2 models

Dataset used to train juanquivilla/sotto-cleanup-lfm25-350m