📑 Paper List

LLM Research Papers - Sebastian Raschka paper list 2024.
Ilya 30u30 - Ilya Sutskever'in önerdiği makaleler.
List of 27 papers - Eğer bu listeyi okursanız AI'daki en önemli olan %90'lık kısmı öğreneceksiniz :D
The 2025 AI Engineer Reading List - AI Engineer olmak için okunması gereken paperlar ve bloglar.

🤖 GenerativeAI

Attention Is All You Need: Query, Key, and Value are all you need* (*Also position embeddings, multiple heads, feed-forward layers, skip-connections, etc.)
GPT: Improving Language Understanding by Generative Pre-Training: Decoder is all you need* (*Also, pre-training + finetuning)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding: Encoder is all you need*. Left-to-right language modeling is NOT all you need. (*Also, pre-training + finetuning)
GPT3: Language Models are Few-Shot Learners: Unsupervised pre-training + a few* examples is all you need. (*From 5 examples, in Conversational QA, to 50 examples in Winogrande, PhysicalQA, and TriviaQA)
Scaling Laws for Neural Language Models: Larger models trained on lesser data* are what you you need. (*10x more compute should be spent on 5.5x larger model and 1.8x more tokens)
Chinchilla: Training Compute-Optimal Large Language Models: Smaller models trained on more data* are what you need. (*10x more compute should be spent on 3.2x larger model and 3.2x more tokens)
LLaMA: Open and Efficient Foundation Language Models: Smoler models trained longer—on public data—is all you need
InstructGPT: Training language models to follow instructions with human feedback: 40 labelers are all you need* (*Plus supervised fine-tuning, reward modeling, and PPO)
LoRA: Low-Rank Adaptation of Large Language Models: One rank is all you need
RAG: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: Semi-parametric models* are all you need (*Dense vector retrieval as non-parametric component; pre-trained LLM as parametric component)
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision: 680k hrs of audio and multitask formulated as a sequence is all you need.

📝Natural Language Processing

Breaking Sticks and Ambiguities with Adaptive Skip-gram: Character n-grams are all you need* (*Also subword tokenization)
Distributed Representations of Words and Phrases: Skip-gram with negative sampling is all you need* (*Also, hierarchical softmax and subsampling of frequent words)
Learning the Dimensionality of Word Embeddings: Automatic dimension selection is all you need* (*Also, Bayesian skip-gram model)
Emergence of Language with Multi-agent Games: Self-play and reward signals are all you need* (*Also emergent communication protocols)
Skip Thought Vectors: Sentence encoders trained on book sequences are all you need* (*Also, encoder-decoder architecture trained on continuous text)

🤖 AI Agents

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks: A generalist multi-agent system is all you need (Also, AutoGenBench for rigorous agentic evaluation)
Agents from Google: Generative AI agents are all you need (Tools, orchestration layers, and cognitive architectures)
Agent-Oriented Planning In Multi-Agent Systems: Agent-Oriented Planning is All You Need (Also, fast decomposition, reward models, and feedback loops)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

papers.md

papers.md

📑 Paper List

🤖 GenerativeAI

📝Natural Language Processing

🤖 AI Agents

Files

papers.md

Latest commit

History

papers.md

File metadata and controls

📑 Paper List

🤖 GenerativeAI

📝Natural Language Processing

🤖 AI Agents