hf-vision (Hugging Face for Computer Vision)

posted an update 8 days ago

Post

120

Things rarely go as we expect!

In 2017, Google released the Transformer architecture. While it was clear the model was promising, absolutely no one (including its authors) anticipated the pervasive global revolution it would create!

The authors actually viewed the Transformer as just a stepping stone for a much more ambitious project: The MultiModel.

Their ultimate goal was to build a single deep learning architecture capable of jointly learning massive, diverse tasks across entirely different domains (in 2017). A One Model To Learn Them All.

In fact, the MultiModel paper was published in the exact same month as Attention Is All You Need!

But history had other plans. The building block eclipsed the grand design!

So, have you heard about the MultiModel before? 😀

1 reply

·

johko

posted an update 17 days ago

Post

125

One prompt, three answers - which model is from where?

johko/llm-blind-date

I built a little demo where you give three models (Apertus, Llama, Qwen3) the same prompt and in the end you have to guess which is which just based on their answers.

GIve it a try! ;)

blanchon

posted an update 24 days ago

Post

2629

I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280×720 · 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective

https://huggingface.co/collections/blanchon/opencs2

1 reply

·

mattmdjaga

authored a paper 3 months ago

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

Paper • 2603.15714 • Published Mar 16

mmhamdy

posted an update 5 months ago

Post

3183

The new DeepSeek Engram paper is super fun! It also integrates mHC, and I suspect they're probably releasing all these papers to make the V4 report of reasonable length😄

Here's a nice short summary from Gemini

adamm-hf

posted an update 7 months ago

Post

1419

The #1 trending AI/ML dataset today 🏆

Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles

adamm-hf

posted an update 7 months ago

Post

838

The new King 👑has arrived!

Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking

adamm-hf

posted an update 7 months ago

Post

2889

💸🤑You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook

merve

posted an update 8 months ago

Post

12788

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

4 replies

·

adamm-hf

posted an update 8 months ago

Post

2369

Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio

merve

posted an update 9 months ago

Post

7039

large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking

2 replies

·

merve

posted an update 9 months ago

Post

3553

IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗

merve

posted an update 9 months ago

Post

1299

a ton of image/video generation models and LLMs from big labs 🔥

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use 💬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR 📝
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac

merve

posted an update 9 months ago

Post

1088

fan-favorite vision LM Florence-2 is now officially supported in transformers 🤗

find all the models in

florence-community org 🫡

merve

posted an update 9 months ago

Post

1879

past week was great for open LLMs 🔥 merve/sep-1-releases-68bede0e729c12597eefd050

> Google released google/embeddinggemma-300m, new embedding model with 300M params
> new update to Kimi-K2 just landed moonshotai/Kimi-K2-Instruct-0905 😍
> OpenBMB released a new version to MiniCPM with 8B params openbmb/MiniCPM4.1-8B

also soooo many Qwen-Image & Kontext LoRAs dropped!

merve

posted an update 9 months ago

Post

3774

upgrade your transformers 🔥
it comes with insanely capable models like merve/sam2-66ac9deac6fca3bc5482fe30, microsoft/kosmos-2.5, and more 🫡
I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻‍♀️ merve/smol-vision

fine-tuning will follow-up soon!

merve

posted an update 9 months ago

Post

6332

large AI labs have dropped so many open models last week 🔥 don't miss out on them

→ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
→ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
→ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486

1 reply

·

merve

posted an update 9 months ago

Post

6120

first vision language model built off openai/gpt-oss-20b just dropped! 🔥

InternVL3.5 comes with 32 models 🤯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ⤵️

1 reply

·

mattmdjaga

authored 2 papers 10 months ago

Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition

Paper • 2507.20526 • Published Jul 28, 2025 • 1

Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems

Paper • 2504.07831 • Published Apr 10, 2025

AI & ML interests

Team members 53

hf-vision's activity