-
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 61 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 15 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2005.14165
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 10 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 252
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 54
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 10 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 221 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 30 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 5.75M • • 13.4k -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation • 33B • Updated • 1.48M • • 2.03k -
google/gemma-2-27b-it
Text Generation • 27B • Updated • 131k • • 568 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 42
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Large Language Models Are Human-Level Prompt Engineers
Paper • 2211.01910 • Published • 1 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 44 -
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 3
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 95 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 19
-
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 61 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 15 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 221 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 30 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 10 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 5.75M • • 13.4k -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation • 33B • Updated • 1.48M • • 2.03k -
google/gemma-2-27b-it
Text Generation • 27B • Updated • 131k • • 568 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15
-
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 252
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 42
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 124 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 54
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Large Language Models Are Human-Level Prompt Engineers
Paper • 2211.01910 • Published • 1 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 44 -
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 3
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 29 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 10 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 265 -
A Survey on Latent Reasoning
Paper • 2507.06203 • Published • 95 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 19