TemporalMesh Transformer: 29.4 PPL at 48% compute — beats Mamba, new open-source architecture
#4 opened 6 days ago
by
vigneshwar234
Fix chat_template crash when assistant message omits the `content` key
#3 opened about 1 month ago
by
qgallouedec
DeepSeek 3.2
#2 opened 6 months ago
by
imagenaryjack
Update config.json
#1 opened 10 months ago
by
mmangkad