Fast LLM & Agents & MCPs

This repo covers LLM, Agents, MCP Tools concepts both theoretically and practically:

LLM Architectures, RAG, Fine Tuning, Agents, Tools, MCP, Agent Frameworks, Reference Documents.
Agent Sample Codes with AWS Strands Agents.
Agent Sample Codes with Google Agent Development Kit (ADK).

AWS Strands - Agent Sample Code & Projects

Google ADK - Agent Sample Code & Projects

LLM Projects

Motivation
LLM Architecture & LLM Models
Prompt Engineering
RAG: Retrieval-Augmented Generation
Fine Tuning
LLM Application Frameworks & Libraries
- LangChain-LangGraph
Agent Frameworks
Agents
Google ADK - Agent Samples
LLM Projects
- Project1: AI Content Detector with AWS Bedrock, Llama 3.1 405B
- Project2: LLM with Model Context Protocol (MCP) using PraisonAI, Ollama, LLama 3.1 1B,8B
Other Useful Resources Related LLMs, Agents, MCPs
References

Motivation

Why should we use / learn / develop LLM models and app?

🔍 Automate Complex Tasks: Summarization, translation, coding, research, and content creation at scale.
🧠 Enable Intelligent Apps: Build chatbots, copilots, search engines, and data explorers with natural language interfaces.
🚀 Boost Productivity: Save time for users and businesses by handling repetitive or knowledge-based tasks.
💰 High Market Demand: Growing need for LLM apps in SaaS, enterprise tools, customer support, and education.
🧩 Versatile Integration: Combine with tools, APIs, databases, and agents for powerful workflows.
🛠️ Customize for Edge Cases: Fine-tune or adapt for domain-specific needs (finance, legal, health, etc.).
📈 Career Growth: Opens roles in AI engineering, MLOps, product innovation, and technical leadership.
🌍 Global Impact: Democratizes access to information and automation, especially for non-technical users.

LLM Architecture & LLM Models

An LLM (Large Language Model) is a type of artificial intelligence model trained on massive amounts of text data to understand and generate human-like language.
"Large": Refers to the billions of parameters and the vast datasets used during training.
"Language Model": Predicts and generates text, based on the context of the input.
LLMs are built on Transformer architecture (uses self-attention to understand context)
LLMs are decoder-only transformers.

LLM Core components:

Embedding layer: Converts words/tokens into vectors.
Positional encoding: Adds word order information.
Self-attention mechanism: Captures relationships between tokens.
Feed-forward layers: Process token-level information.
Layer normalization & residuals: Stabilize and enhance training.
Output head: predicts next token or classification result.

Prompt Engineering

Prompt engineering is the practice of designing and refining input prompts to get the best possible responses from LLMs (Large Language Models).
Crafting clear, specific instructions.
Using structured formats (e.g., lists, JSON, examples).
Applying techniques like:
- Zero-shot prompting (just ask)
- Few-shot prompting (give examples)
- Chain-of-thought prompting (ask the model to explain step by step)
Why Is It Important?
- Improves output quality: A well-crafted prompt guides the LLM to produce relevant and accurate results.
- Boosts model accuracy: Helps LLMs follow complex instructions. Precise instructions reduce false or made-up information.
- Enables task-specific performance: You can get the model to summarize, translate, generate code, or follow workflows with just a well-written prompt.
- Cost-efficient: Improves results without retraining or fine-tuning large models.
- Critical for GenAI apps: Critical for building chatbots, agents, search assistants, and copilots.
- Supports RAG and Tool Use: Helps orchestrate LLMs with tools, APIs, or knowledge bases in advanced systems.

RAG: Retrieval-Augmented Generation

RAG (Retrieval-Augmented Generation) is a technique that combines external knowledge retrieval with LLM-based text generation. How It Works:

Retrieve: Search a knowledge base (e.g., documents, PDFs, databases) using a query. Typically uses vector search with embeddings (e.g., FAISS, Elasticsearch).
Augment: Inject the retrieved text into the prompt.
Generate: The LLM uses this context to produce a more accurate, grounded response.

Why RAG Is Important:

Reduces hallucinations by grounding LLM responses in real data.
Keeps answers up to date (LLMs don’t need retraining for every update).
Improves trust and transparency (can show sources).
Scalable for enterprise: Used in chatbots, search, document QA, agents.

RAG Popular/Common Tools:

Embedding models: https://python.langchain.com/docs/integrations/text_embedding/
- OpenAI (text-embedding-ada-002),
- HuggingFace (all-MiniLM-L6-v2, sentence-transformers/all-mpnet-base-v2),
- AWS Titan embeddings
Vector stores: https://python.langchain.com/docs/integrations/vectorstores/
- FAISS,
- Pinecone,
- Weaviate,
- Chroma,
- Milvus,
- ElasticsearchStore,
- Redis,
Frameworks:
- LangChain: https://python.langchain.com/docs/introduction/

Fine Tuning

Fine-tuning is the process of adapting a pre-trained LLM model to a specific task or domain by further training it on a smaller, specialized dataset. The goal is to make the model better at handling particular requirements without starting from scratch.

How Fine-Tuning Works:

Start with a Pre-trained Model: A large model (Llama3.3, Mistral) that has been trained on vast amounts of general data (pdfs) to understand language.
Use Domain-Specific Data: Provide a smaller dataset that is specific to the task or domain you care about (e.g., legal text, medical data, customer service).
Train the Model: The model’s parameters are updated with this new, specialized data to make it perform better in the target domain, but it retains the general knowledge learned from the larger training dataset.
Apply the Fine-Tuned Model: The model is now able to generate or classify text based on the specialized domain, often achieving much higher performance than a general model.

Why Fine-Tuning Is Important:

Improves Accuracy: The model becomes more precise in your specific use case, like medical text analysis or customer support chatbots.
Saves Time and Resources: You don’t have to train a new model from scratch (which requires vast computing resources and data).
Customizes Behavior: Fine-tuning allows you to adjust a model to behave in a way that aligns with your application’s goals (e.g., tone, context, style).

Fine-Tuning Methods:

Full Fine-Tuning: Fine-tuning all parameters of a pre-trained model for a specific task. It offers maximum customization but is resource-intensive and requires large datasets
Feature-based Fine-Tuning: Only the final layers of a pre-trained model are fine-tuned, with the rest of the model frozen. It’s faster and less resource-heavy than full fine-tuning but may not provide the same level of task-specific optimization.
PFET (Progressive Fine-Tuning): Fine-tuning occurs in stages, starting with simpler tasks and progressing to more complex ones. This method reduces catastrophic forgetting and improves the model’s ability to learn complex tasks gradually but is more time-consuming.
Adapter Fine-Tuning: Small trainable layers (adapters) are added to a pre-trained model, and only these adapters are fine-tuned. It’s highly efficient for multi-task learning but less flexible than full fine-tuning and may not be optimal for all tasks.
LoRA (Low-Rank Adaptation): Low-rank matrices are added to specific layers of a pre-trained model, and only these matrices are fine-tuned. It’s efficient in terms of memory and computation, making it ideal for large models but may not be as effective for very complex tasks.
QLoRA (Quantized LoRA): Combines LoRA with quantization techniques to further reduce the size of the model and its memory requirements. The low-rank matrices are quantized, making them more memory-efficient.

LLM Application Frameworks & Libraries

LLM (Large Language Model) Application Frameworks and Libraries are tools designed to simplify the development, orchestration, and deployment of applications powered by large language models like GPT, Claude, Gemini, or LLaMA. These tools provide abstractions for managing prompts, memory, agents, tools, workflows, and integrations with external data or systems.

LangChain-LangGraph

LangChain implements a standard interface for large language models and related technologies, such as embedding models and vector stores, and integrates with hundreds of providers.
https://python.langchain.com/docs/introduction/

Agent Frameworks

Agent frameworks are specialized software tools or libraries designed to build, manage, and orchestrate LLM-based agents. These frameworks help you create intelligent agents that can autonomously reason, plan, and act using external tools or in coordination with other agents.

What Do Agent Frameworks Provide?

Agent Abstractions: Define agent identity, goals, memory, behavior, and permissions.
Tool Interfaces: Register external tools (APIs, functions, databases, etc.) agents can use.
Execution Logic: Handle planning, decision-making, tool-calling, retries, and feedback loops.
Multi-Agent Orchestration: Manage communication and task delegation among multiple agents.
Memory & Context: Enable persistent memory, history tracking, and contextual reasoning.
Observability: Offer tracing, logging, and step-by-step reasoning outputs for debugging.

Google Agent Development Kit

Agent Development Kit (ADK) is a flexible and modular framework for developing and deploying AI agents. While optimized for Gemini and the Google ecosystem, ADK is model-agnostic, deployment-agnostic, and is built for compatibility with other frameworks.
ADK was designed to make agent development feel more like software development, to make it easier for developers to create, deploy, and orchestrate agentic architectures that range from simple tasks to complex workflows.
https://google.github.io/adk-docs/

CrewAI

CrewAI is Python framework built entirely from scratch—completely independent of LangChain or other agent frameworks. CrewAI empowers developers with both high-level simplicity and precise low-level control, ideal for creating autonomous AI agents tailored to any scenario:

CrewAI Crews: Optimize for autonomy and collaborative intelligence, enabling you to create AI teams where each agent has specific roles, tools, and goals.
CrewAI Flows: Enable granular, event-driven control, single LLM calls for precise task orchestration and supports Crews natively.
https://docs.crewai.com/introduction

PraisonAI Agents

PraisonAI is a production-ready Multi-AI Agents framework with self-reflection, designed to create AI Agents to automate and solve problems ranging from simple tasks to complex challenges.
https://docs.praison.ai/

PydanticAI

PydanticAI is a Python agent framework designed to make it less painful to build production grade applications with Generative AI. https://ai.pydantic.dev/

Agents

An LLM agent is an autonomous or semi-autonomous system that uses a large language model (like GPT, Claude, or Gemini) to reason, make decisions, and take actions using external tools or APIs to accomplish a goal.

An LLM agent typically includes:

LLM Core: The brain that interprets tasks and generates reasoning.
Memory (optional): Stores history of actions, inputs, and outputs for context-aware behavior.
Tools: External APIs, databases, web search, code execution, or custom functions the agent can call.
Environment: The runtime or framework (e.g., LangChain, CrewAI, Google ADK) in which the agent operates.

# Define a tool function
def get_capital_city(country: str) -> str:
  """Retrieves the capital city for a given country."""
  # Replace with actual logic (e.g., API call, database lookup)
  capitals = {"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"}
  return capitals.get(country.lower(), f"Sorry, I don't know the capital of {country}.")

# Add the tool to the agent
capital_agent = LlmAgent(
    model="gemini-2.0-flash",
    name="capital_agent",
    description="Answers user questions about the capital city of a given country.",
    instruction="""You are an agent that provides the capital city of a country... (previous instruction text)""",
    tools=[get_capital_city]
)

Tools

A Tool represents a specific capability provided to an AI agent, enabling it to perform actions and interact with the world beyond its core text generation and reasoning abilities.

Function Tool: Transforming a function into a tool is a straightforward way to integrate custom logic into your agents.
Agent-as-Tool: This powerful feature allows you to leverage the capabilities of other agents within your system by calling them as tools. The Agent-as-a-Tool enables you to invoke another agent to perform a specific task, effectively delegating responsibility.
Google ADK Built-in-Tools: These built-in tools provide ready-to-use functionality such as Google Search or code executors that provide agents with common capabilities.
Langchain Tools: Also used in Google ADK as Langchain Tools.
- https://python.langchain.com/docs/integrations/tools/
CrewAI Tools: Also used in Google ADK as CrewAI Tools.
- https://docs.crewai.com/concepts/tools
MCP Tools: Connects MCP server apps to MCP Clients (Claude App, VSCode, etc,) or Agents (Google ADK).
- MCP Servers: https://github.com/modelcontextprotocol/servers

MCP: Model Context Protocol

The Model Context Protocol (MCP) is an open standard designed to standardize how Large Language Models (LLMs) like Gemini and Claude communicate with external applications, data sources, and tools
MCP follows a client-server architecture, defining how data (resources), interactive templates (prompts), and actionable functions (tools) are exposed by an MCP server and consumed by an MCP client (which could be an LLM host application or an AI agent).
https://www.anthropic.com/engineering/building-effective-agents

A2A: Agent to Agent Protocol

The Agent2Agent (A2A) protocol facilitates communication between independent AI agents. Here are the core concepts:

Agent Card: A public metadata file (usually at /.well-known/agent.json) describing an agent's capabilities, skills, endpoint URL, and authentication requirements. Clients use this for discovery.
A2A Server: An agent exposing an HTTP endpoint that implements the A2A protocol methods (defined in the json specification). It receives requests and manages task execution.
A2A Client: An application or another agent that consumes A2A services. It sends requests (like tasks/send) to an A2A Server's URL.
Task: The central unit of work. A client initiates a task by sending a message (tasks/send or tasks/sendSubscribe). Tasks have unique IDs and progress through states (submitted, working, input-required, completed, failed, canceled).
Message: Represents communication turns between the client (role: "user") and the agent (role: "agent"). Messages contain Parts.
Part: The fundamental content unit within a Message or Artifact. Can be TextPart, FilePart (with inline bytes or a URI), or DataPart (for structured JSON, e.g., forms).
Artifact: Represents outputs generated by the agent during a task (e.g., generated files, final structured data). Artifacts also contain Parts.
Streaming: For long-running tasks, servers supporting the streaming capability can use tasks/sendSubscribe. The client receives Server-Sent Events (SSE) containing TaskStatusUpdateEvent or TaskArtifactUpdateEvent messages, providing real-time progress.
Push Notifications: Servers supporting pushNotifications can proactively send task updates to a client-provided webhook URL, configured via tasks/pushNotification/set.
Link: https://github.com/google/A2A

A2A Typical Flow:

Discovery: Client fetches the Agent Card from the server's well-known URL.
Initiation: Client sends a tasks/send or tasks/sendSubscribe request containing the initial user message and a unique Task ID.
Processing:
- (Streaming): Server sends SSE events (status updates, artifacts) as the task progresses.
- (Non-Streaming): Server processes the task synchronously and returns the final Task object in the response.
Interaction (Optional): If the task enters input-required, the client sends subsequent messages using the same Task ID via tasks/send or tasks/sendSubscribe.
Completion: The task eventually reaches a terminal state (completed, failed, canceled).