- LLM Research Papers - Sebastian Raschka paper list 2024.
- Ilya 30u30 - Ilya Sutskever'in önerdiği makaleler.
- List of 27 papers - Eğer bu listeyi okursanız AI'daki en önemli olan %90'lık kısmı öğreneceksiniz :D
- The 2025 AI Engineer Reading List - AI Engineer olmak için okunması gereken paperlar ve bloglar.
- Attention Is All You Need: Query, Key, and Value are all you need* (*Also position embeddings, multiple heads, feed-forward layers, skip-connections, etc.)
- GPT: Improving Language Understanding by Generative Pre-Training: Decoder is all you need* (*Also, pre-training + finetuning)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: Encoder is all you need*. Left-to-right language modeling is NOT all you need. (*Also, pre-training + finetuning)
- GPT3: Language Models are Few-Shot Learners: Unsupervised pre-training + a few* examples is all you need. (*From 5 examples, in Conversational QA, to 50 examples in Winogrande, PhysicalQA, and TriviaQA)
- Scaling Laws for Neural Language Models: Larger models trained on lesser data* are what you you need. (*10x more compute should be spent on 5.5x larger model and 1.8x more tokens)
- Chinchilla: Training Compute-Optimal Large Language Models: Smaller models trained on more data* are what you need. (*10x more compute should be spent on 3.2x larger model and 3.2x more tokens)
- LLaMA: Open and Efficient Foundation Language Models: Smoler models trained longer—on public data—is all you need
- InstructGPT: Training language models to follow instructions with human feedback: 40 labelers are all you need* (*Plus supervised fine-tuning, reward modeling, and PPO)
- LoRA: Low-Rank Adaptation of Large Language Models: One rank is all you need
- RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: Semi-parametric models* are all you need (*Dense vector retrieval as non-parametric component; pre-trained LLM as parametric component)
- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision: 680k hrs of audio and multitask formulated as a sequence is all you need.
-
Breaking Sticks and Ambiguities with Adaptive Skip-gram: Character n-grams are all you need* (*Also subword tokenization)
-
Distributed Representations of Words and Phrases: Skip-gram with negative sampling is all you need* (*Also, hierarchical softmax and subsampling of frequent words)
-
Learning the Dimensionality of Word Embeddings: Automatic dimension selection is all you need* (*Also, Bayesian skip-gram model)
-
Emergence of Language with Multi-agent Games: Self-play and reward signals are all you need* (*Also emergent communication protocols)
-
Skip Thought Vectors: Sentence encoders trained on book sequences are all you need* (*Also, encoder-decoder architecture trained on continuous text)
- Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks: A generalist multi-agent system is all you need (Also, AutoGenBench for rigorous agentic evaluation)
- Agents from Google: Generative AI agents are all you need (Tools, orchestration layers, and cognitive architectures)
- Agent-Oriented Planning In Multi-Agent Systems: Agent-Oriented Planning is All You Need (Also, fast decomposition, reward models, and feedback loops)