网站首页 > 基础教程正文

【2024年终总结】2024年最值得读的 AI 论文

ccvgpt 2025-01-24 10:54:35 基础教程 3 ℃

对于刚刚过去的 2024 年，有哪些论文值得反复阅读？

知名机器学习与 AI 研究者 Sebastian Raschka 整理了一份关于LLM 的阅读清单（LLM Research Papers: The 2024 List），清单详细介绍了每个月都有哪些重要论文产出。

原文链接：https://sebastianraschka.com/blog/2024/llm-research-papers-the-2024-list.html

January 2024

1 Jan, Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models, https://arxiv.org/abs/2401.00788

2 Jan, A Comprehensive Study of Knowledge Editing for Large Language Models, https://arxiv.org/abs/2401.01286

2 Jan, LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, https://arxiv.org/abs/2401.01325

2 Jan, Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, https://arxiv.org/abs/2401.01335

2 Jan, LLaMA Beyond English: An Empirical Study on Language Capability Transfer, https://arxiv.org/abs/2401.01055

3 Jan, A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity, https://arxiv.org/abs/2401.01967

4 Jan, LLaMA Pro: Progressive LLaMA with Block Expansion, https://arxiv.org/abs/2401.02415

4 Jan, LLM Augmented LLMs: Expanding Capabilities through Composition, https://arxiv.org/abs/2401.02412

4 Jan, Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM, https://arxiv.org/abs/2401.02994

5 Jan, DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, https://arxiv.org/abs/2401.02954

5 Jan, Denoising Vision Transformers, https://arxiv.org/abs/2401.02957

7 Jan, Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon, https://arxiv.org/abs/2401.03462

8 Jan, Mixtral of Experts, https://arxiv.org/abs/2401.04088

8 Jan, MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts, https://arxiv.org/abs/2401.04081

8 Jan, A Minimaximalist Approach to Reinforcement Learning from Human Feedback, https://arxiv.org/abs/2401.04056

9 Jan, RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation, https://arxiv.org/abs/2401.04679

10 Jan, Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, https://arxiv.org/abs/2401.05566

11 Jan, Transformers are Multi-State RNNs, https://arxiv.org/abs/2401.06104

11 Jan, A Closer Look at AUROC and AUPRC under Class Imbalance, https://arxiv.org/abs/2401.06091

12 Jan, An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models, https://arxiv.org/abs/2401.06692

16 Jan, Tuning Language Models by Proxy, https://arxiv.org/abs/2401.08565

16 Jan, Scalable Pre-training of Large Autoregressive Image Models, https://arxiv.org/abs/2401.08541

16 Jan, Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering, https://arxiv.org/abs/2401.08500

16 Jan, RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture, https://arxiv.org/abs/2401.08406

17 Jan, ReFT: Reasoning with Reinforced Fine-Tuning, https://arxiv.org/abs/2401.08967

18 Jan, DiffusionGPT: LLM-Driven Text-to-Image Generation System, https://arxiv.org/abs/2401.10061

18 Jan, Self-Rewarding Language Models, https://arxiv.org/abs/2401.10020

18 Jan, VMamba: Visual State Space Model, https://arxiv.org/abs/2401.10166

19 Jan, Knowledge Fusion of Large Language Models, https://arxiv.org/abs/2401.10491

22 Jan, SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities, https://arxiv.org/abs/2401.12168

22 Jan, WARM: On the Benefits of Weight Averaged Reward Models, https://arxiv.org/abs/2401.12187

22 Jan, Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text, https://arxiv.org/abs/2401.12070

24 Jan, MambaByte: Token-free Selective State Space Model, https://arxiv.org/abs/2401.13660

24 Jan, SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection, https://arxiv.org/abs/2401.13160

25 Jan, Rethinking Patch Dependence for Masked Autoencoders, https://arxiv.org/abs/2401.14391

25 Jan, Pix2gestalt: Amodal Segmentation by Synthesizing Wholes, https://arxiv.org/abs/2401.14398

25 Jan, Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities, https://arxiv.org/abs/2401.14405

26 Jan, EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty, https://arxiv.org/abs/2401.15077

29 Jan, MoE-LLaVA: Mixture of Experts for Large Vision-Language Models, https://arxiv.org/abs/2401.15947

29 Jan, Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling, https://arxiv.org/abs/2401.16380

31 Jan, KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, https://arxiv.org/abs/2401.18079

February 2024

1 Feb, Efficient Exploration for LLMs, https://arxiv.org/abs/2402.00396

1 Feb, OLMo: Accelerating the Science of Language Models, https://arxiv.org/abs/2402.00838

1 Feb, Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?, https://arxiv.org/abs/2402.00841

1 Feb, Repeat After Me: Transformers are Better than State Space Models at Copying, https://arxiv.org/abs/2402.01032

2 Feb, LiPO: Listwise Preference Optimization through Learning-to-Rank, https://arxiv.org/abs/2402.01878

2 Feb, FindingEmo: An Image Dataset for Emotion Recognition in the Wild, https://arxiv.org/abs/2402.01355

3 Feb, More Agents Is All You Need, https://arxiv.org/abs/2402.05120

5 Feb, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models, https://arxiv.org/abs/2402.03300

6 Feb, MobileVLM V2: Faster and Stronger Baseline for Vision Language Model, https://arxiv.org/abs/2402.03766

6 Feb, A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention, https://arxiv.org/abs/2402.03902

6 Feb, Scaling Laws for Downstream Task Performance of Large Language Models, https://arxiv.org/abs/2402.04177

6 Feb, MOMENT: A Family of Open Time-series Foundation Models, https://arxiv.org/abs/2402.03885

6 Feb, Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models, https://arxiv.org/abs/2402.03749

6 Feb, Self-Discover: Large Language Models Self-Compose Reasoning Structures, https://arxiv.org/abs/2402.03620

7 Feb, Grandmaster-Level Chess Without Search, https://arxiv.org/abs/2402.04494

7 Feb, Direct Language Model Alignment from Online AI Feedback, https://arxiv.org/abs/2402.04792

8 Feb, Buffer Overflow in Mixture of Experts, https://arxiv.org/abs/2402.05526

9 Feb, The Boundary of Neural Network Trainability is Fractal, https://arxiv.org/abs/2402.06184

11 Feb, ODIN: Disentangled Reward Mitigates Hacking in RLHF, https://arxiv.org/abs/2402.07319

12 Feb, Policy Improvement using Language Feedback Models, https://arxiv.org/abs/2402.07876

12 Feb, Scaling Laws for Fine-Grained Mixture of Experts, https://arxiv.org/abs/2402.07871

12 Feb, Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model, https://arxiv.org/abs/2402.07610

12 Feb, Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping, https://arxiv.org/abs/2402.07610

12 Feb, Suppressing Pink Elephants with Direct Principle Feedback, https://arxiv.org/abs/2402.07896

13 Feb, World Model on Million-Length Video And Language With RingAttention, https://arxiv.org/abs/2402.08268

13 Feb, Mixtures of Experts Unlock Parameter Scaling for Deep RL, https://arxiv.org/abs/2402.08609

14 Feb, DoRA: Weight-Decomposed Low-Rank Adaptation, https://arxiv.org/abs/2402.09353

14 Feb, Transformers Can Achieve Length Generalization But Not Robustly, https://arxiv.org/abs/2402.09371

15 Feb, BASE TTS: Lessons From Building a Billion-Parameter Text-to-Speech Model on 100K Hours of Data, https://arxiv.org/abs/2402.08093

15 Feb, Recovering the Pre-Fine-Tuning Weights of Generative Models, https://arxiv.org/abs/2402.10208

15 Feb, Generative Representational Instruction Tuning, https://arxiv.org/abs/2402.09906

16 Feb, FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models, https://arxiv.org/abs/2402.10986

17 Feb, OneBit: Towards Extremely Low-bit Large Language Models, https://arxiv.org/abs/2402.11295

18 Feb, LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration, https://arxiv.org/abs/2402.11550

19 Feb, Reformatted Alignment, https://arxiv.org/abs/2402.12219

19 Feb, AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling, https://arxiv.org/abs/2402.12226

19 Feb, Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs, https://arxiv.org/abs/2402.12030

19 Feb, LoRA+: Efficient Low Rank Adaptation of Large Models, https://arxiv.org/abs/2402.12354

20 Feb, Neural Network Diffusion, https://arxiv.org/abs/2402.13144

21 Feb, YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information, https://arxiv.org/abs/2402.13616

21 Feb, LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, https://arxiv.org/abs/2402.13753

21 Feb, Large Language Models for Data Annotation: A Survey, https://arxiv.org/abs/2402.13446

22 Feb, TinyLLaVA: A Framework of Small-scale Large Multimodal Models, https://arxiv.org/abs/2402.14289

22 Feb, Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs, https://arxiv.org/abs/2402.14740

23 Feb, Genie: Generative Interactive Environments, https://arxiv.org/abs/2402.15391

26 Feb, CARTE: Pretraining and Transfer for Tabular Learning, https://arxiv.org/abs/2402.16785

27 Feb, The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, https://arxiv.org/abs/2402.17764

27 Feb, Sora Generates Videos with Stunning Geometrical Consistency, https://arxiv.org/abs/2402.17403

27 Feb, When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method, https://arxiv.org/abs/2402.17193

29 Feb, Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models, https://arxiv.org/abs/2402.19427

March 2024

1 Mar, Learning and Leveraging World Models in Visual Representation Learning, https://arxiv.org/abs/2403.00504

3 Mar, Improving LLM Code Generation with Grammar Augmentation, https://arxiv.org/abs/2403.01632

3 Mar, The Hidden Attention of Mamba Models, https://arxiv.org/abs/2403.01590

4 Mar, Training-Free Pretrained Model Merging, https://arxiv.org/abs/2403.01753

4 Mar, Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures, https://arxiv.org/abs/2403.02308

5 Mar, The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, https://arxiv.org/abs/2403.03218

5 Mar, Evolution Transformer: In-Context Evolutionary Optimization, https://arxiv.org/abs/2403.02985

5 Mar, Enhancing Vision-Language Pre-training with Rich Supervisions, https://arxiv.org/abs/2403.03346

5 Mar, Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, https://arxiv.org/abs/2403.03206

5 Mar, Design2Code: How Far Are We From Automating Front-End Engineering?, https://arxiv.org/abs/2403.03163

6 Mar, ShortGPT: Layers in Large Language Models are More Redundant Than You Expect, https://arxiv.org/abs/2403.03853

6 Mar, Backtracing: Retrieving the Cause of the Query, https://arxiv.org/abs/2403.03956

6 Mar, Learning to Decode Collaboratively with Multiple Language Models, https://arxiv.org/abs/2403.03870

6 Mar, SaulLM-7B: A pioneering Large Language Model for Law, https://arxiv.org/abs/2403.03883

6 Mar, Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning, https://arxiv.org/abs/2403.03864

6 Mar, 3D Diffusion Policy, https://arxiv.org/abs/2403.03954

6 Mar, MedMamba: Vision Mamba for Medical Image Classification, https://arxiv.org/abs/2403.03849

6 Mar, GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection, https://arxiv.org/abs/2403.03507

6 Mar, Stop Regressing: Training Value Functions via Classification for Scalable Deep RL, https://arxiv.org/abs/2403.03950

7 Mar, How Far Are We from Intelligent Visual Deductive Reasoning?, https://arxiv.org/abs/2403.04732

7 Mar, Common 7B Language Models Already Possess Strong Math Capabilities, https://arxiv.org/abs/2403.04706

8 Mar, Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context, https://arxiv.org/abs/2403.05530

8 Mar, Is Cosine-Similarity of Embeddings Really About Similarity?, https://arxiv.org/abs/2403.05440

8 Mar, LLM4Decompile: Decompiling Binary Code with Large Language Models, https://arxiv.org/abs/2403.05286

9 Mar, Algorithmic Progress in Language Models, https://arxiv.org/abs/2403.05812

11 Mar, Stealing Part of a Production Language Model, https://arxiv.org/abs/2403.06634

12 Mar, Chronos: Learning the Language of Time Series, https://arxiv.org/abs/2403.07815

13 Mar, Simple and Scalable Strategies to Continually Pre-train Large Language Models, https://arxiv.org/abs/2403.08763

13 Mar, Language Models Scale Reliably With Over-Training and on Downstream Tasks, https://arxiv.org/abs/2403.08540

14 Mar, BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences, https://arxiv.org/abs/2403.09347

14 Mar, LocalMamba: Visual State Space Model with Windowed Selective Scan, https://arxiv.org/abs/2403.09338

14 Mar, GiT: Towards Generalist Vision Transformer through Universal Language Interface, https://arxiv.org/abs/2403.09394

14 Mar, MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training, https://arxiv.org/abs/2403.09611

15 Mar, RAFT: Adapting Language Model to Domain Specific RAG, https://arxiv.org/abs/2403.10131

18 Mar, TnT-LLM: Text Mining at Scale with Large Language Models, https://arxiv.org/abs/2403.12173

18 Mar, Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression, https://arxiv.org/abs/2403.15447

19 Mar, PERL: Parameter Efficient Reinforcement Learning from Human Feedback, https://arxiv.org/abs/2403.10704

20 Mar, RewardBench: Evaluating Reward Models for Language Modeling, https://arxiv.org/abs/2403.13787

20 Mar, LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models, https://arxiv.org/abs/2403.13372

21 Mar, RakutenAI-7B: Extending Large Language Models for Japanese, https://arxiv.org/abs/2403.15484

22 Mar, SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time Series, https://arxiv.org/abs/2403.15360

22 Mar, Can Large Language Models Explore In-Context?, https://arxiv.org/abs/2403.15371

22 Mar, LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement, https://arxiv.org/abs/2403.15042

25 Mar, LLM Agent Operating System, https://arxiv.org/abs/2403.16971

26 Mar, The Unreasonable Ineffectiveness of the Deeper Layers, https://arxiv.org/abs/2403.17887

27 Mar, BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text, https://arxiv.org/abs/2403.18421

27 Mar, ViTAR: Vision Transformer with Any Resolution, https://arxiv.org/abs/2403.18361

27 Mar, Long-form Factuality in Large Language Models, https://arxiv.org/abs/2403.18802

27 Mar, Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models, https://arxiv.org/abs/2403.18814

26 Mar, LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning, https://arxiv.org/abs/2403.17919

26 Mar, Mechanistic Design and Scaling of Hybrid Architectures, https://arxiv.org/abs/2403.17844

28 Mar, MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions, https://arxiv.org/abs/2403.19651

28 Mar, Model Stock: All We Need Is Just a Few Fine-Tuned Models, https://arxiv.org/abs/2403.19522

April 2024

1 Apr, Do Language Models Plan Ahead for Future Tokens?, https://arxiv.org/abs/2404.00859

1 Apr, Bigger is not Always Better: Scaling Properties of Latent Diffusion Models, https://arxiv.org/abs/2404.01367

1 Apr, The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis, https://arxiv.org/abs/2404.01204

1 Apr, Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models, https://arxiv.org/abs/2404.04478

2 Apr, Mixture-of-Depths: Dynamically Allocating Compute in Transformer-Based Language Models, https://arxiv.org/abs/2404.02258

2 Apr, Long-context LLMs Struggle with Long In-context Learning, https://arxiv.org/abs/2404.02060

2 Apr, Emergent Abilities in Reduced-Scale Generative Language Models, https://arxiv.org/abs/2404.02204

2 Apr, Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks, https://arxiv.org/abs/2404.02151

3 Apr, On the Scalability of Diffusion-based Text-to-Image Generation, https://arxiv.org/abs/2404.02883

3 Apr, BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models, https://arxiv.org/abs/2404.02827

3 Apr, Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models, https://arxiv.org/abs/2404.02747

4 Apr, Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences, https://arxiv.org/abs/2404.02151

4 Apr, Training LLMs over Neurally Compressed Text, https://arxiv.org/abs/2404.03626

4 Apr, CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues, https://arxiv.org/abs/2404.03820

5 Apr, ReFT: Representation Finetuning for Language Models, https://arxiv.org/abs/2404.03592

5 Apr, Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data, https://arxiv.org/abs/2404.03862

5 Apr, Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation, https://arxiv.org/abs/2404.04256

8 Apr, AutoCodeRover: Autonomous Program Improvement, https://arxiv.org/abs/2404.05427

8 Apr, Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, https://arxiv.org/abs/2404.05892

8 Apr, CodecLM: Aligning Language Models with Tailored Synthetic Data, https://arxiv.org/abs/2404.05875

9 Apr, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies, https://arxiv.org/abs/2404.06395

9 Apr, Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models, https://arxiv.org/abs/2404.06209

9 Apr, LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders, https://arxiv.org/abs/2404.05961

10 Apr, Adapting LLaMA Decoder to Vision Transformer, https://arxiv.org/abs/2404.06773

10 Apr, Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, https://arxiv.org/abs/2404.07143

11 Apr, LLoCO: Learning Long Contexts Offline, https://arxiv.org/abs/2404.07979

11 Apr, JetMoE: Reaching Llama2 Performance with 0.1M Dollars, https://arxiv.org/abs/2404.07413

11 Apr, Best Practices and Lessons Learned on Synthetic Data for Language Models, https://arxiv.org/abs/2404.07503

11 Apr, Rho-1: Not All Tokens Are What You Need, https://arxiv.org/abs/2404.07965

12 Apr, Pre-training Small Base LMs with Fewer Tokens, https://arxiv.org/abs/2404.08634

12 Apr, Dataset Reset Policy Optimization for RLHF, https://arxiv.org/abs/2404.08495

13 Apr, LLM In-Context Recall is Prompt Dependent, https://arxiv.org/abs/2404.08865

15 Apr, State Space Model for New-Generation Network Alternative to Transformers: A Survey, https://arxiv.org/abs/2404.09516

15 Apr, Chinchilla Scaling: A Replication Attempt, https://arxiv.org/abs/2404.10102

15 Apr, Learn Your Reference Model for Real Good Alignment, https://arxiv.org/abs/2404.09656

16 Apr, Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study, https://arxiv.org/abs/2404.10719

16 Apr, Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies, https://arxiv.org/abs/2404.08197

16 Apr, How Faithful Are RAG Models? Quantifying the Tug-of-War Between RAG and LLMs’ Internal Prior, https://arxiv.org/abs/2404.10198

17 Apr, A Survey on Retrieval-Augmented Text Generation for Large Language Models, https://arxiv.org/abs/2404.10981

18 Apr, When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes, https://arxiv.org/abs/2404.12365

18 Apr, Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing, https://arxiv.org/abs/2404.12253

18 Apr, OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data, https://arxiv.org/abs/2404.12195

19 Apr, The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions, https://arxiv.org/abs/2404.13208

22 Apr, How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study, https://arxiv.org/abs/2404.14047

22 Apr, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, https://arxiv.org/abs/2404.14219

22 Apr, OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework, https://arxiv.org/abs/2404.14619

22 Apr, A Survey on Self-Evolution of Large Language Models, https://arxiv.org/abs/2404.14662

23 Apr, Multi-Head Mixture-of-Experts, https://arxiv.org/abs/2404.15045

23 Apr, NExT: Teaching Large Language Models to Reason about Code Execution, https://arxiv.org/abs/2404.14662

23 Apr, Graph Machine Learning in the Era of Large Language Models (LLMs), https://arxiv.org/abs/2404.14928

24 Apr, Retrieval Head Mechanistically Explains Long-Context Factuality, https://arxiv.org/abs/2404.15574

25 Apr, Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding, https://arxiv.org/abs/2404.16710

25 Apr, Make Your LLM Fully Utilize the Context, https://arxiv.org/abs/2404.16811

28 Apr, LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report, https://arxiv.org/abs/2405.00732

30 Apr, Better & Faster Large Language Models via Multi-token Prediction, https://arxiv.org/abs/2404.19737

30 Apr, RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing, https://arxiv.org/abs/2404.19543

30 Apr, A Primer on the Inner Workings of Transformer-based Language Models, https://arxiv.org/abs/2405.00208

30 Apr, When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively, https://arxiv.org/abs/2404.19705

30 Apr, KAN: Kolmogorov–Arnold Networks, https://arxiv.org/abs/2404.19756

May 2024

1 May, Is Bigger Edit Batch Size Always Better? An Empirical Study on Model Editing with Llama-3, https://arxiv.org/abs/2405.00664

1 May, Self-Play Preference Optimization for Language Model Alignment, https://arxiv.org/abs/2405.00675

1 May, A Careful Examination of Large Language Model Performance on Grade School Arithmetic, https://arxiv.org/abs/2405.00332

2 May, Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models, https://arxiv.org/abs/2405.01535

3 May, What Matters When Building Vision-Language Models?, https://arxiv.org/abs/2405.02246

5 May, Is Flash Attention Stable?, https://arxiv.org/abs/2405.02803

7 May, vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention, https://arxiv.org/abs/2405.04437

7 May, xLSTM: Extended Long Short-Term Memory, https://arxiv.org/abs/2405.04517

8 May, You Only Cache Once: Decoder-Decoder Architectures for Language Models, https://arxiv.org/abs/2405.05254

8 May, DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, https://arxiv.org/abs/2405.04434

8 May, Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models, https://arxiv.org/abs/2405.05417

9 May, Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?, https://arxiv.org/abs/2405.05904

10 May, Value Augmented Sampling for Language Model Alignment and Personalization, https://arxiv.org/abs/2405.06639

12 May, PHUDGE: Phi-3 as Scalable Judge, https://arxiv.org/abs/2405.08029

13 May, RLHF Workflow: From Reward Modeling to Online RLHF, https://arxiv.org/abs/2405.07863

15 May, LoRA Learns Less and Forgets Less, https://arxiv.org/abs/2405.09673

15 May, Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model, https://arxiv.org/abs/2405.09215

16 May, Chameleon: Mixed-Modal Early-Fusion Foundation Models, https://arxiv.org/abs/2405.09818

17 May, Towards Modular LLMs by Building and Reusing a Library of LoRAs, https://arxiv.org/abs/2405.11157

19 May, SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization, https://arxiv.org/abs/2405.11582

20 May, MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning, https://arxiv.org/abs/2405.12130

22 May, Attention as an RNN, https://arxiv.org/abs/2405.13956

22 May, Dense Connector for MLLMs, https://arxiv.org/abs/2405.13800

23 May, AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability, https://arxiv.org/abs/2405.14129

23 May, SimPO: Simple Preference Optimization with a Reference-Free Reward, https://arxiv.org/abs/2405.14734

23 May, Instruction Tuning With Loss Over Instructions, https://arxiv.org/abs/2405.14394

24 May, The Road Less Scheduled, https://arxiv.org/abs/2405.15682

26 May, Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training, https://arxiv.org/abs/2405.15319

26 May, gzip Predicts Data-dependent Scaling Laws, https://arxiv.org/abs/2405.16684

27 May, Trans-LoRA: Towards Data-free Transferable Parameter Efficient Finetuning, https://arxiv.org/abs/2405.17258

28 May, VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections, https://arxiv.org/abs/2405.17991

28 May, LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models, https://arxiv.org/abs/2405.18377

29 May, Contextual Position Encoding: Learning to Count What’s Important, https://arxiv.org/abs/2405.18719

June 2024

2 Jun, Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback, https://arxiv.org/abs/2406.00888

3 Jun, Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models, https://arxiv.org/abs/2406.06563

3 Jun, OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models, https://arxiv.org/abs/2406.01775

3 Jun, The Geometry of Categorical and Hierarchical Concepts in Large Language Models, https://arxiv.org/abs/2406.01506

3 Jun, Towards Scalable Automated Alignment of LLMs: A Survey, https://arxiv.org/abs/2406.01252

4 Jun, Scalable MatMul-free Language Modeling, https://arxiv.org/abs/2406.02528

4 Jun, Block Transformer: Global-to-Local Language Modeling for Fast Inference, https://arxiv.org/abs/2406.02657

6 Jun, Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models, https://arxiv.org/abs/2406.04271

6 Jun, The Prompt Report: A Systematic Survey of Prompting Techniques, https://arxiv.org/abs/2406.06608

6 Jun, Transformers Need Glasses! Information Over-Squashing in Language Tasks, https://arxiv.org/abs/2406.04267

6 Jun, Are We Done with MMLU?, https://arxiv.org/abs/2406.04127

6 Jun, Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step, https://arxiv.org/abs/2406.04314

7 Jun, Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach, https://arxiv.org/abs/2406.04594

7 Jun, CRAG – Comprehensive RAG Benchmark, https://arxiv.org/abs/2406.04744

7 Jun, WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild, https://arxiv.org/abs/2406.04770

7 Jun, Mixture-of-Agents Enhances Large Language Model Capabilities, https://arxiv.org/abs/2406.04692

7 Jun, BERTs are Generative In-Context Learners, https://arxiv.org/abs/2406.04823

7 Jun, 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination, https://arxiv.org/abs/2406.05132

8 Jun, Creativity Has Left the Chat: The Price of Debiasing Language Models, https://arxiv.org/abs/2406.05587

10 Jun, Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation, https://arxiv.org/abs/2406.06525

10 Jun, Margin-aware Preference Optimization for Aligning Diffusion Models Without Reference, https://arxiv.org/abs/2406.06424

10 Jun, Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning, https://arxiv.org/abs/2406.06469

10 Jun, Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters, https://arxiv.org/abs/2406.05955

10 Jun, Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching, https://arxiv.org/abs/2406.06326

11 Jun, An Image is Worth 32 Tokens for Reconstruction and Generation, https://arxiv.org/abs/2406.07550

11 Jun, TextGrad: Automatic “Differentiation” via Text, https://arxiv.org/abs/2406.07496

11 Jun, Simple and Effective Masked Diffusion Language Models, https://arxiv.org/abs/2406.07524

11 Jun, Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent “Middle” Enhancement, https://arxiv.org/abs/2406.07138

11 Jun, Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling, https://arxiv.org/abs/2406.07522

12 Jun, Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing, https://arxiv.org/abs/2406.08464

12 Jun, What If We Recaption Billions of Web Images with LLaMA-3?, https://arxiv.org/abs/2406.08478

12 Jun, Large Language Model Unlearning via Embedding-Corrupted Prompts, https://arxiv.org/abs/2406.07933

12 Jun, Large Language Models Must Be Taught to Know What They Don’t Know, https://arxiv.org/abs/2406.08391

12 Jun, An Empirical Study of Mamba-based Language Models, https://arxiv.org/abs/2406.07887

12 Jun, Discovering Preference Optimization Algorithms with and for Large Language Models, https://arxiv.org/abs/2406.08414

13 Jun, Transformers Meet Neural Algorithmic Reasoners, https://arxiv.org/abs/2406.09308

13 Jun, MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding, https://arxiv.org/abs/2406.09297

13 Jun, An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels, https://arxiv.org/abs/2406.09415

13 Jun, FouRA: Fourier Low Rank Adaptation, https://arxiv.org/abs/2406.08798

14 Jun, Bootstrapping Language Models with DPO Implicit Rewards, https://arxiv.org/abs/2406.09760

14 Jun, Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs, https://arxiv.org/abs/2406.10209

14 Jun, Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs, https://arxiv.org/abs/2406.10216

16 Jun, THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation, https://arxiv.org/abs/2406.10996

17 Jun, Task Me Anything, https://arxiv.org/abs/2406.11775

17 Jun, How Do Large Language Models Acquire Factual Knowledge During Pretraining?, https://arxiv.org/abs/2406.11813

17 Jun, mDPO: Conditional Preference Optimization for Multimodal Large Language Models, https://arxiv.org/abs/2406.11839

17 Jun, Nemotron-4 340B Technical Report, https://arxiv.org/abs/2406.11704

17 Jun, DataComp-LM: In Search of the Next Generation of Training Sets for Language Models, https://arxiv.org/abs/2406.11794

17 Jun, Tokenization Falling Short: The Curse of Tokenization, https://arxiv.org/abs/2406.11687

17 Jun, DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence, https://arxiv.org/abs/2406.11931

17 Jun, Unveiling Encoder-Free Vision-Language Models, https://arxiv.org/abs/2406.11832

17 Jun, Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level, https://arxiv.org/abs/2406.11817

17 Jun, HARE: HumAn pRiors, a key to small language model Efficiency, https://arxiv.org/abs/2406.11410

17 Jun, Measuring memorization in RLHF for code completion, https://arxiv.org/abs/2406.11715

17 Jun, Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts, https://arxiv.org/abs/2406.12034

18 Jun, From RAGs to Rich Parameters: Probing How Language Models Utilize External Knowledge Over Parametric Information for Factual Queries, https://arxiv.org/abs/2406.12824

18 Jun, Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges, https://arxiv.org/abs/2406.12624

19 Jun, Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?, https://arxiv.org/abs/2406.13121

20 Jun, Instruction Pre-Training: Language Models are Supervised Multitask Learners, https://arxiv.org/abs/2406.14491

20 Jun, Can LLMs Learn by Teaching? A Preliminary Study, https://arxiv.org/abs/2406.14629

21 Jun, A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems, https://arxiv.org/abs/2406.14972

21 Jun, LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs, https://arxiv.org/abs/2406.15319

21 Jun, MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression, https://arxiv.org/abs/2406.14909

21 Jun, Efficient Continual Pre-training by Mitigating the Stability Gap, https://arxiv.org/abs/2406.14833

24 Jun, Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers, https://arxiv.org/abs/2406.16747

24 Jun, WARP: On the Benefits of Weight Averaged Rewarded Policies, https://arxiv.org/abs/2406.16768

24 Jun, Adam-mini: Use Fewer Learning Rates To Gain More, https://arxiv.org/abs/2406.16793

25 Jun, The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale, https://arxiv.org/abs/2406.17557

25 Jun, LongIns: A Challenging Long-context Instruction-based Exam for LLMs, https://arxiv.org/abs/2406.17588

25 Jun, Following Length Constraints in Instructions, https://arxiv.org/abs/2406.17744

26 Jun, A Closer Look into Mixture-of-Experts in Large Language Models, https://arxiv.org/abs/2406.18219

26 Jun, RouteLLM: Learning to Route LLMs with Preference Data, https://arxiv.org/abs/2406.18665

26 Jun, Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs, https://arxiv.org/abs/2406.18629

27 Jun, Dataset Size Recovery from LoRA Weights, https://arxiv.org/abs/2406.19395

27 Jun, From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data, https://arxiv.org/abs/2406.19292

27 Jun, Changing Answer Order Can Decrease MMLU Accuracy, https://arxiv.org/abs/2406.19470

28 Jun, Direct Preference Knowledge Distillation for Large Language Models, https://arxiv.org/abs/2406.19774

28 Jun, LLM Critics Help Catch LLM Bugs, https://arxiv.org/abs/2407.00215

28 Jun, Scaling Synthetic Data Creation with 1,000,000,000 Personas, https://arxiv.org/abs/2406.20094

July 2024

1 Jul, LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives, https://arxiv.org/abs/2407.01490

1 Jul, Searching for Best Practices in Retrieval-Augmented Generation, https://arxiv.org/abs/2407.01219

1 Jul, Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models, https://arxiv.org/abs/2407.01906

1 Jul, Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion, https://arxiv.org/abs/2407.01392

1 Jul, Eliminating Position Bias of Language Models: A Mechanistic Approach, https://arxiv.org/abs/2407.01100

2 Jul, JMInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, https://arxiv.org/abs/2407.02490

2 Jul, TokenPacker: Efficient Visual Projector for Multimodal LLM, https://arxiv.org/abs/2407.02392

2 Jul, Reasoning in Large Language Models: A Geometric Perspective, https://arxiv.org/abs/2407.02678

2 Jul, RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs, https://arxiv.org/abs/2407.02485

3 Jul, AgentInstruct: Toward Generative Teaching with Agentic Flows, https://arxiv.org/abs/2407.03502

3 Jul, HEMM: Holistic Evaluation of Multimodal Foundation Models, https://arxiv.org/abs/2407.03418

4 Jul, Mixture of A Million Experts, https://arxiv.org/abs/2407.04153

5 Jul, Learning to (Learn at Test Time): RNNs with Expressive Hidden States, https://arxiv.org/abs/2407.04620

9 Jul, Vision Language Models Are Blind, https://arxiv.org/abs/2407.06581

9 Jul, Self-Recognition in Language Models, https://arxiv.org/abs/2407.06946

10 Jul, Inference Performance Optimization for Large Language Models on CPUs, https://arxiv.org/abs/2407.07304

11 Jul, Gradient Boosting Reinforcement Learning, https://arxiv.org/abs/2407.08250

11 Jul, FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision, https://arxiv.org/abs/2407.08608

12 Jul, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models, https://arxiv.org/abs/2407.09025

12 Jul, New Desiderata for Direct Preference Optimization, https://arxiv.org/abs/2407.09072

12 Jul, Context Embeddings for Efficient Answer Generation in RAG, https://arxiv.org/abs/2407.09252

15 Jul, Qwen2 Technical Report, https://arxiv.org/abs/2407.10671

15 Jul, The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism, https://arxiv.org/abs/2407.10457

15 Jul, From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients, https://arxiv.org/abs/2407.11239

16 Jul, GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression, https://arxiv.org/abs/2407.12077

16 Jul, Scaling Diffusion Transformers to 16 Billion Parameters, https://arxiv.org/abs/2407.11633

16 Jul, NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?, https://arxiv.org/abs/2407.11963

17 Jul, Patch-Level Training for Large Language Models, https://arxiv.org/abs/2407.12665

17 Jul, LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models, https://arxiv.org/abs/2407.12772

17 Jul, A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks, https://arxiv.org/abs/2407.12994

17 Jul, Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models, https://arxiv.org/abs/2407.12327

18 Jul, Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation, https://arxiv.org/abs/2407.13481

18 Jul, Weak-to-Strong Reasoning, https://arxiv.org/abs/2407.13647

18 Jul, Understanding Reference Policies in Direct Preference Optimization, https://arxiv.org/abs/2407.13709

18 Jul, Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies, https://arxiv.org/abs/2407.13623

19 Jul, BOND: Aligning LLMs with Best-of-N Distillation, https://arxiv.org/abs/2407.14622

19 Jul, Compact Language Models via Pruning and Knowledge Distillation, https://arxiv.org/abs/2407.14679

19 Jul, LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference, https://arxiv.org/abs/2407.14057

22 Jul, Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training, https://arxiv.org/abs/2407.15892

22 Jul, DDK: Distilling Domain Knowledge for Efficient Large Language Models, https://arxiv.org/abs/2407.16154

23 Jul, Generation Constraint Scaling Can Mitigate Hallucination, https://arxiv.org/abs/2407.16908

23 Jul, Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach, https://arxiv.org/abs/2407.16833

23 Jul, Course-Correction: Safety Alignment Using Synthetic Preferences, https://arxiv.org/abs/2407.16637

26 Jul, Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?, https://arxiv.org/abs/2407.16607

28 Jul, Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge, https://arxiv.org/abs/2407.19594

29 Jul, Improving Retrieval Augmented Language Model with Self-Reasoning, https://arxiv.org/abs/2407.19813

29 Jul, Apple Intelligence Foundation Language Models, https://arxiv.org/abs/2407.21075

30 Jul, ThinK: Thinner Key Cache by Query-Driven Pruning, https://arxiv.org/abs/2407.21018

31 Jul, The Llama 3 Herd of Models, https://arxiv.org/abs/2407.21783

31 Jul, Gemma 2: Improving Open Language Models at a Practical Size, https://arxiv.org/abs/2408.00118

August 2024

1 Aug, SAM 2: Segment Anything in Images and Videos,https://arxiv.org/abs/2408.00714

2 Aug, POA: Pre-training Once for Models of All Sizes,https://arxiv.org/abs/2408.01031

2 Aug, RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework, https://arxiv.org/abs/2408.01262

2 Aug, A Survey of Mamba, https://arxiv.org/abs/2408.01129

3 Aug, MiniCPM-V: A GPT-4V Level MLLM on Your Phone,https://arxiv.org/abs/2408.01800

5 Aug, RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation, https://arxiv.org/abs/2408.02545

5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666

5 Aug, BioMamba: A Pre-trained Biomedical Language Representation Model Leveraging Mamba, https://arxiv.org/abs/2408.02600

5 Aug, Self-Taught Evaluators, https://arxiv.org/abs/2408.02666

7 Aug, EXAONE 3.0 7.8B Instruction Tuned Language Model,https://arxiv.org/abs/2408.03541

7 Aug, 1.5-Pints Technical Report: Pretraining in Days, Not Months – Your Language Model Thrives on Quality Data, https://arxiv.org/abs/2408.03506

8 Aug, Conversational Prompt Engineering, https://arxiv.org/abs/2408.04560

8 Aug, Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP, https://arxiv.org/abs/2408.04303

12 Aug, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, https://arxiv.org/abs/2408.06292

15 Aug, Hermes 3 Technical Report, https://arxiv.org/abs/2408.12570

19 Aug, Customizing Language Models with Instance-wise LoRA for Sequential Recommendation, https://arxiv.org/abs/2408.10159

20 Aug, Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information, https://arxiv.org/abs/2408.10615

20 Aug, To Code, or Not To Code? Exploring Impact of Code in Pre-training,https://arxiv.org/abs/2408.10914

21 Aug , LLM Pruning and Distillation in Practice: The Minitron Approach, https://arxiv.org/abs/2408.11796

22 Aug, Jamba-1.5: Hybrid Transformer-Mamba Models at Scale,https://arxiv.org/abs/2408.12570

22 Aug, Controllable Text Generation for Large Language Models: A Survey,https://arxiv.org/abs/2408.12599

23 Aug, Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time, https://arxiv.org/abs/2408.13233

26 Aug, A Practitioner’s Guide to Continual Multimodal Pretraining,https://arxiv.org/abs/2408.14471

26 Aug, Building and better understanding vision-language models: insights and future directions, https://arxiv.org/abs/2408.12637

26 Aug, CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation, https://arxiv.org/abs/2408.14572

27 Aug, The Mamba in the Llama: Distilling and Accelerating Hybrid Models,https://arxiv.org/abs/2408.15237

28 Aug, ReMamba: Equip Mamba with Effective Long-Sequence Modeling,https://arxiv.org/abs/2408.15496

29 Aug, Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling, https://arxiv.org/abs/2408.16737

31 Aug, LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models, https://arxiv.org/abs/2409.00509

September 2024

3 Sep, OLMoE: Open Mixture-of-Experts Language Models,https://arxiv.org/abs/2409.02060

3 Sep 2024, In Defense of RAG in the Era of Long-Context Language Models,https://arxiv.org/abs/2409.01666

5 Sep, Attention Heads of Large Language Models: A Survey,https://arxiv.org/abs/2409.03752

5 Sep, LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA, https://arxiv.org/abs/2409.02897

5 Sep, How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data, https://arxiv.org/abs/2409.03810

6 Sep, Theory, Analysis, and Best Practices for Sigmoid Self-Attention,https://arxiv.org/abs/2409.04431

10 Sep, LLaMA-Omni: Seamless Speech Interaction with Large Language Models, https://arxiv.org/abs/2409.06666

10 Sep, What is the Role of Small Models in the LLM Era: A Survey,https://arxiv.org/abs/2409.06857

11 Sep, Policy Filtration in RLHF to Fine-Tune LLM for Code Generation,https://arxiv.org/abs/2409.06957

16 Sep, RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, https://arxiv.org/abs/2409.10516

18 Sep, Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement, https://arxiv.org/abs/2409.12122

18 Sep, Qwen2.5-Coder Technical Report, https://arxiv.org/abs/2409.12186

21 Sep, Instruction Following without Instruction Tuning,https://arxiv.org/abs/2409.14254

30 Sep, Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis, https://arxiv.org/abs/2409.20059

30 Sep, The Perfect Blend: Redefining RLHF with Mixture of Judges,https://arxiv.org/abs/2409.20370 (New paper by Meta on how they did RLHF for Llama 3)

October 2024

1 Oct, Addition is All You Need for Energy-efficient Language Models,https://arxiv.org/abs/2410.00907

2 Oct Quantifying Generalization Complexity for Large Language Models,https://arxiv.org/abs/2410.01769

2 Oct, When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1, https://arxiv.org/abs/2410.01792

2 Oct, Were RNNs All We Needed?, https://arxiv.org/abs/2410.01201

3 Oct, Selective Attention Improves Transformer, https://arxiv.org/abs/2410.02703

3 Oct, LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations, https://arxiv.org/abs/2410.02707

3 Oct, LLaVA-Critic: Learning to Evaluate Multimodal Models, https://arxiv.org/abs/2410.02712

7 Oct, Differential Transformer, https://arxiv.org/abs/2410.05258

7 Oct, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, https://arxiv.org/abs/2410.05229

8 Oct, ARIA: An Open Multimodal Native Mixture-of-Experts Model, https://arxiv.org/abs/2410.05993

8 Oct, O1 Replication Journey: A Strategic Progress Report – Part 1, https://arxiv.org/abs/2410.18982

8 Oct, Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG, https://arxiv.org/abs/2410.05983

9 Oct, From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning, https://arxiv.org/abs/2410.06456

10 Oct, KV Prediction for Improved Time to First Token, https://arxiv.org/abs/2410.08391

11 Oct, Baichuan-Omni Technical Report, https://arxiv.org/abs/2410.08565

13 Oct, MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models, https://arxiv.org/abs/2410.10139

13 Oct, LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models, https://arxiv.org/abs/2410.09732

15 Oct, AFlow: Automating Agentic Workflow Generation, https://arxiv.org/abs/2410.10762

15 Oct, Toward General Instruction-Following Alignment for Retrieval-Augmented Generation, https://arxiv.org/abs/2410.09584

21 Oct, Pre-training Distillation for Large Language Models: A Design Space Exploration, https://arxiv.org/abs/2410.16215

23 Oct, MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models, https://arxiv.org/abs/2410.17637

23 Oct, Scalable Ranked Preference Optimization for Text-to-Image Generation, https://arxiv.org/abs/2410.18013

23 Oct, Scaling Diffusion Language Models via Adaptation from Autoregressive Models, https://arxiv.org/abs/2410.17891

24 Oct, Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback, https://arxiv.org/abs/2410.19133

25 Oct, Counting Ability of Large Language Models and Impact of Tokenization, https://arxiv.org/abs/2410.19730

25 Oct, A Survey of Small Language Models, https://arxiv.org/abs/2410.20011

26 Oct, Accelerating Direct Preference Optimization with Prefix Sharing, https://arxiv.org/abs/2410.20305

27 Oct, Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse, https://arxiv.org/abs/2410.21333

28 Oct, LongReward: Improving Long-context Large Language Models with AI Feedback, https://arxiv.org/abs/2410.21252

28 Oct, ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference, https://arxiv.org/abs/2410.21465

29 Oct, Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications, https://arxiv.org/abs/2410.21943

30 Oct, CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation, https://arxiv.org/abs/2410.23090

31 Oct, What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective, https://arxiv.org/abs/2410.23743

31 Oct, GPT or BERT: why not both?, https://arxiv.org/abs/2410.24159

31 Oct, Language Models can Self-Lengthen to Generate Long Texts, https://arxiv.org/abs/2410.23933

November 2024

1 Nov, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations, https://arxiv.org/abs/2411.00640

1 Nov 2024, Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation, https://arxiv.org/abs/2411.00412

1 Nov 2024, Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models, https://arxiv.org/abs/2411.00492

3 Nov, Sample-Efficient Alignment for LLMs, https://arxiv.org/abs/2411.01493

4 Nov 2024, A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness, https://arxiv.org/abs/2411.03350

4 Nov, “Give Me BF16 or Give Me Death”? Accuracy-Performance Trade-Offs in LLM Quantization, https://arxiv.org/abs/2411.02355

4 Nov, Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study, https://arxiv.org/abs/2411.02462

5 Nov, HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems, https://arxiv.org/abs/2411.02959

6 Nov, Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination, https://arxiv.org/abs/2411.03823

6 Nov, Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding, https://arxiv.org/abs/2411.04282

6 Nov, Number Cookbook: Number Understanding of Language Models and How to Improve It, https://arxiv.org/abs/2411.03766

7 Nov, Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models, https://arxiv.org/abs/2411.04996

7 Nov, BitNet a4.8: 4-bit Activations for 1-bit LLMs, https://arxiv.org/abs/2411.04965

7 Nov, Scaling Laws for Precision, https://arxiv.org/abs/2411.04330

8 Nov, Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation, https://arxiv.org/abs/2411.05966

8 Nov, Balancing Pipeline Parallelism with Vocabulary Parallelism, https://arxiv.org/abs/2411.05288

11 Nov, Toward Optimal Search and Retrieval for RAG, https://arxiv.org/abs/2411.07396

12 Nov, Large Language Models Can Self-Improve in Long-context Reasoning, https://arxiv.org/abs/2411.08147

12 Nov, Stronger Models are NOT Stronger Teachers for Instruction Tuning, https://arxiv.org/abs/2411.07133

12 Nov, Direct Preference Optimization Using Sparse Feature-Level Constraints, https://arxiv.org/abs/2411.07618

13 Nov, Cut Your Losses in Large-Vocabulary Language Models, https://arxiv.org/abs/2411.09009

15 Nov, Does Prompt Formatting Have Any Impact on LLM Performance?, https://arxiv.org/abs/2411.10541

17 Nov, SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization, https://arxiv.org/abs/2411.11909

17 Nov, SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration, https://arxiv.org/abs/2411.10958

18 Nov, Bi-Mamba: Towards Accurate 1-Bit State Space Models, https://arxiv.org/abs/2411.11843

19 Nov, RedPajama: an Open Dataset for Training Large Language Models, https://arxiv.org/abs/2411.12372

20 Nov, Hymba: A Hybrid-head Architecture for Small Language Models, https://arxiv.org/abs/2411.13676

20 Nov, Loss-to-Loss Prediction: Scaling Laws for All Datasets, https://arxiv.org/abs/2411.12925

21 Nov, When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training, https://arxiv.org/abs/2411.13476

21 Nov, Multimodal Autoregressive Pre-training of Large Vision Encoders, https://arxiv.org/abs/2411.14402

21 Nov, Natural Language Reinforcement Learning, https://arxiv.org/abs/2411.14251

22 Nov, Large Multi-modal Models Can Interpret Features in Large Multi-modal Models, https://arxiv.org/abs/2411.14982

22 Nov, T"ULU 3: Pushing Frontiers in Open Language Model Post-Training, https://arxiv.org/abs/2411.15124

23 Nov, MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs, https://arxiv.org/abs/2411.15296

24 Nov, LLMs Do Not Think Step-by-step In Implicit Reasoning, https://arxiv.org/abs/2411.15862

25 Nov, O1 Replication Journey – Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?, https://arxiv.org/abs/2411.16489

26 Nov, Star Attention: Efficient LLM Inference over Long Sequences, https://arxiv.org/abs/2411.17116

27 Nov, Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens, https://arxiv.org/abs/2411.17691

27 Nov, Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration, https://arxiv.org/abs/2411.17686

29 Nov, Reverse Thinking Makes LLMs Stronger Reasoners, https://arxiv.org/abs/2411.19865

29 Nov, Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM’s Reasoning Capability, https://arxiv.org/abs/2411.19943

December 2024

2 Dec, Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis, https://arxiv.org/abs/2412.01819

2 Dec, X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models, https://arxiv.org/abs/2412.01824

2 Dec, Free Process Rewards without Process Labels, https://arxiv.org/abs/2412.01981

3 Dec, Scaling Image Tokenizers with Grouped Spherical Quantization, https://arxiv.org/abs/2412.02632

3 Dec, RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models, https://arxiv.org/abs/2412.02830

4 Dec, Perception Tokens Enhance Visual Reasoning in Multimodal Language Models, https://arxiv.org/abs/2412.03548

4 Dec, Evaluating Language Models as Synthetic Data Generators, https://arxiv.org/abs/2412.03679

4 Dec, Best-of-N Jailbreaking, https://arxiv.org/abs/2412.03556

4 Dec, PaliGemma 2: A Family of Versatile VLMs for Transfer, https://arxiv.org/abs/2412.03555

5 Dec, VisionZip: Longer is Better but Not Necessary in Vision Language Models, https://arxiv.org/abs/2412.04467

5 Dec, Evaluating and Aligning CodeLLMs on Human Preference, https://arxiv.org/abs/2412.05210

6 Dec, MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale, https://arxiv.org/abs/2412.05237

6 Dec, Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling, https://arxiv.org/abs/2412.05271

7 Dec, LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods, https://arxiv.org/abs/2412.05579

8 Dec, Does RLHF Scale? Exploring the Impacts From Data, Model, and Method, https://arxiv.org/abs/2412.06000

9 Dec, Unraveling the Complexity of Memory in RL Agents: An Approach for Classification and Evaluation, https://arxiv.org/abs/2412.06531

9 Dec, Training Large Language Models to Reason in a Continuous Latent Space, https://arxiv.org/abs/2412.06769

9 Dec, AutoReason: Automatic Few-Shot Reasoning Decomposition, https://arxiv.org/abs/2412.06975

11 Dec, Large Concept Models: Language Modeling in a Sentence Representation Space, https://arxiv.org/abs/2412.08821

12 Dec, Phi-4 Technical Report, https://arxiv.org/abs/2412.08905

13 Dec, Byte Latent Transformer: Patches Scale Better Than Tokens, https://arxiv.org/abs/2412.09871

13 Dec, SCBench: A KV Cache-Centric Analysis of Long-Context Methods, https://arxiv.org/abs/2412.10319

13 Dec, Cultural Evolution of Cooperation among LLM Agents, https://arxiv.org/abs/2412.10270

13 Dec, DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding, https://arxiv.org/abs/2412.10302

16 Dec, No More Adam: Learning Rate Scaling at Initialization is All You Need, https://arxiv.org/abs/2412.11768

16 Dec, Precise Length Control in Large Language Models, https://arxiv.org/abs/2412.11937

16 Dec, The Open Source Advantage in Large Language Models (LLMs), https://arxiv.org/abs/2412.12004

16 Dec, A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges, https://arxiv.org/abs/2412.11936

17 Dec, Are Your LLMs Capable of Stable Reasoning?, https://arxiv.org/abs/2412.13147

18 Dec, LLM Post-Training Recipes, Improving Reasoning in LLMs, https://arxiv.org/abs/2412.14135

18 Dec, Hansel: Output Length Controlling Framework for Large Language Models, https://arxiv.org/abs/2412.14033

18 Dec, Mind Your Theory: Theory of Mind Goes Deeper Than Reasoning, https://arxiv.org/abs/2412.1363

18 Dec, Alignment Faking in Large Language Models, https://arxiv.org/abs/2412.14093

18 Dec, SCOPE: Optimizing Key-Value Cache Compression in Long-Context Generation, https://arxiv.org/abs/2412.13649

19 Dec, LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-Context Multitasks, https://arxiv.org/abs/2412.15204

20 Dec, Offline Reinforcement Learning for LLM Multi-Step Reasoning, https://arxiv.org/abs/2412.16145

24 Dec, Mulberry: Empowering MLLM with O1-like Reasoning and Reflection via Collective Monte Carlo Tree Search, https://arxiv.org/abs/2412.18319

参考：

https://sebastianraschka.com/blog/2024/llm-research-papers-the-2024-list.html

上一篇： Percona Toolkit系列之一:pt-online-schema-change
下一篇： AlpineLinux安装部署MariaDB（mariadb linux安装）

网站首页 > 基础教程 正文

【2024年终总结】2024年最值得读的 AI 论文

January 2024

February 2024

March 2024

April 2024

May 2024

June 2024

July 2024

August 2024

September 2024

October 2024

November 2024

December 2024

猜你喜欢

网站首页 > 基础教程正文