Every parent, teacher, or babysitter knows the moment. The lights go dim. Blankets come out. Your child asks for a song. Then another. Then suddenly you’re improvising lyrics about dinosaurs, fire trucks, and princesses while trying to convince a little one that it’s actually bedtime.
That’s exactly the problem my partner’s sister faces as a kindergarten teacher. Every day she runs nap time for fifteen 4-year-olds, and ever since they learned about music and instruments in class, it starts the same way: "sing a song for me." She'd love to give each child their own song, built from whatever they love that week, but she doesn't have the time, the musical training, or a tool that could do it. So @volivers and I built one.
Introducing Lolaby — our submission to the Hugging Face Build Small Hackathon 2026, hosted by Gradio and backed by OpenBMB, OpenAI, NVIDIA, Modal, Cohere, JetBrains, and Black Forest Labs.
A child draws something they love (on screen or on paper), a name is entered, and a tiny AI watches the drawing, writes a personalised lullaby, and sings it back.
Everything runs locally. No cloud LLMs. No per-song API cost. No child's drawing or name ever leaves the device.
The full pipeline: 🖼️ MiniCPM-V 4.6 (1.3B) reads the drawing. ✍️ A fine-tuned Llama 3.2 3B writes the lyrics — trained on 1,500 lullabies with strict anti-boilerplate gates. 🎵 Kokoro 82M sings the result over custom DSP instruments.
Drop a like, upvote or comment. Feedback is welcome! 🙏
Darwin V9 — GPQA Diamond 90.9%, #1 on the leaderboard, with pure greedy decoding Darwin-398B-JGOS reaches 90.9% (180/198) on GPQA Diamond, the PhD-level scientific reasoning benchmark, ranking #1 on the Hugging Face GPQA Diamond leaderboard. No self-consistency, no test-time compute scaling — this was achieved with a single greedy decode (temperature 0, single sample, max 16,384 tokens). The full eval config is published in the model card, so anyone can reproduce it. Raw reasoning, no score inflation. The result comes from Darwin V9, a patented evolutionary model-development platform. Its core idea: it never trains a model from scratch. Why Darwin V9 beats training from scratch
Cost & speed: no trillion-token pretraining run, no months of compute — a purpose-built, high-performance model is produced in a fraction of the time. Reuse of proven intelligence: instead of re-learning every capability from a blank slate, it selects and combines only the strengths of already-trained, already-validated models, so results are stable and predictable. Surgical transplantation: it identifies which neural region of which model holds which capability — at the FFN (Feed Forward Network) layer level — and grafts in only the segments that contribute to the target skill.
How it works: a large model (Qwen 3.5 397B) serves as the mother model (the substrate); several father models specialized in reasoning, coding, and language are analyzed layer-by-layer across their FFN regions; the segments that contribute to the target performance are extracted and transplanted into the mother model to produce a new child model. The result is a ~400B MoE that activates only ~17B parameters per token at inference — large-model capacity with efficient inference. If training from scratch means rebuilding everything from a blank page, Darwin V9 means precisely recombining intelligence that has already been proven. GPQA Diamond #1 is the proof. Model: FINAL-Bench/Darwin-398B-JGOS