Skip to content

Latest commit

 

History

History
185 lines (142 loc) · 8.4 KB

README.md

File metadata and controls

185 lines (142 loc) · 8.4 KB

Introduction to Semantic Kernel Development

About this repo

This repo includes C# interactive notebooks at notebooks/ that showcase how to run scenarios with and without Semantic Kernal. The idea is to showcase the differences between writing LLM/Chatbot apps with and without Semantic Kernel. The samples include:

  • notebooks/cars.ipynb: A sales description generator for used cars implemented without Semantic Kernel.
  • notebooks/sk-cars.ipynb: A sales description generator for used cars implemented with Semantic Kernel.
  • notebooks/chat.ipynb: A GPT chatbot implementation without Semantic Kernel.
  • notebooks/sk-chat.ipynb: A GPT chatbot implementation with Semantic Kernel.
  • notebooks/sk-pipes.ipynb: Showcases how to use Semantic Kernel pipes.
    • Note: This code comes from the original dotnet/SK solution. I just have adapted it to run in C# interactive.
  • notebooks/sk-planner.ipynb: Showcases how to use the Semantic Kernel sequential planner.
    • Note: This code comes from the original dotnet/SK solution. I just have adapted it to run in C# interactive.
  • notebooks/sk-memory.ipynb: Demonstrates how to use the Semantic Kernel memories functionality.
    • Note: This code comes from the original dotnet/SK solution. I just have adapted it to run in C# interactive.
  • notebooks/sk-memory-rag.ipynb: Demonstrate a RAG pattern implementation using Semantic Kernel volatile memory. Useful to understand the concepts of this pattern.
  • notebooks/sk-memory-raasg.ipynb: Demonstrate a RAASG pattern implementation using Semantic Kernel. Useful to understand the concepts of this pattern.

Note: I am using C# interactive notebooks because I find them helpful for documenting, explaining, and debugging the different concepts. In productions applications, I have deployed c# projects with WebAPIs and React frontends.

Running the demos

Requirements:

  • .NET 6 or 7 SDK
  • VS Code
  • VS Code Polyglot extension

Running:

  • In Azure create OpenAI endpoints for:
    • GPT 3.5 or 4
    • ADA Embeddings
    • Copy the deployment name, endpoint, and API key information
  • Clone this repo
  • Navigate to the notebooks folder within this repo
  • Make sure the VS Code is installed as well as the VS Code Polyglot extension and either .NET 6 or 7
  • Create a file called: notebooks/.env and add the following values from
GPT_OPENAI_KEY=<KEY>
GPT_OPENAI_DEPLOYMENT_NAME=<MODEL_NAME>
GPT_OPENAI_ENDPOINT=https://<NAME>.openai.azure.com/
GPT_OPENAI_FULL_ENDPOINT=https://<NAME>.openai.azure.com/openai/deployments/<MODEL>/chat/completions?api-version=2023-03-15-preview

DAVINCI_OPENAI_KEY=<KEY>
DAVINCI_OPENAI_DEPLOYMENT_NAME=<MODEL_NAME>
DAVINCI_OPENAI_ENDPOINT=https://<NAME>.openai.azure.com/
DAVINCI_OPENAI_FULL_ENDPOINT=https://<NAME>.openai.azure.com/openai/deployments/<MODEL_NAME>/completions?api-version=2022-12-01
  • Type: 'code .'
  • Open any one of the notebooks
  • Click the play button at each cell and review the headings and comments

What are some important concepts in LLM/Chatbot development

  • Understanding that generative AI is a Foundational model
    • A foundational model can be used to solve many problems
  • Understanding the difference between LLM/Chatbot model
    • An LLM (Davinci) model is for answer/response
    • A chat (GPT) model can keep a conversation, but it can also be used in answer/response
  • Understanding tokens, token limits, and working around these limits
  • Understanding the prompt and completion models
  • Understanding and applying prompt engineering

This is how simple a GPT call looks like

Summarize the following text:

Text: """
{text input here}
"""

You don't even have to write a program, you can just curl:

curl https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/completions?api-version=2023-05-15\
  -H "Content-Type: application/json" \
  -H "api-key: YOUR_API_KEY" \
  -d "{
  \"prompt\": \"What is the speed of light?\",
  \"max_tokens\": 5
}"

So where is the complexity then? It is everything else.

  • For example:
    • As a foundational model that can solve many problems, applying this concept is complex in itself.
    • Making resilient applications that can handle disconnections and throttling
    • Developing complex orchestrations
    • Preparing the prompts and applying prompt engineering (Grounding)
    • Saving and recalling text from embeddings, for example in the RAG pattern
    • Processing the completions

Semantic Kernel can help with everything else.

Resiliency considerations and recommendations

  • Azure GPT models have two kinds of throttling:
    • By token per minute (TPM)
    • By Number of requests per minute
  • When throttling occurs, the API will return a RetryAfter (in seconds) in the header
  • Set the number of tokens as the number of tokens that you are expecting (not how many you will send)
  • Consider using (currently as of 11/2023) GPT 3.5 Turbo over GPT 4. GPT 3.5 is faster and offers a higher TPM
  • Consider load balancing between two accounts
  • Handle retries at the client application and consider the RetryAfter leveraging the timeout in the RetryAfter header

What is Semantic Kernel?

  • Is an open-source SDK that provides a set of connectors and plugins that allow you to orchestrate AI models and skills with natural language semantic functions, traditional native code functions, and embeddings-based memory.
  • It supports prompt templating, function chaining, vectorized memory, and intelligent planning capabilities.
  • It enables you to create AI apps that combine the best of both worlds: natural language understanding and conventional programming logic

Where does SK land in the development of LLM/Chatbot applications?

graph LR;
  A(Basic<br/>Development) --> B(Development<br/>with SDKs<br/>_____________<br/> <br/>SK<br/>lands here)
  B-->C(Application Patterns<br/>and Architectures)
  C-->D(Applications &<br/>Solutions)
  D-->E(WAF<br/>Deployment)  
  classDef someclass fill:blue,color:white
  class B someclass

Loading

Advantages of using SK vs making REST calls or using other SDKs

  • Kernel can be configured via plugins that are implemented by interfaces.
  • Resilient HttpClient client that can handle timeouts and throttling.
    • Note: HttpClient in the main network object to perform Http operations in many languages.
  • Semantic functions.
    • Generally, an SK function is defined as a templated prompt.
      • ie: var skfJokeDefinition = "Write a joke about {{$$input}}.";
      • ie: var skSpanishTranslator = "Translate the following text from English to Spanish:\n\n\nText:"""\n{{$$input}}\n"""";
    • SK functions can be piped together:
      • ie:
    var result = await kernel.RunAsync("a chicken crossing the road",
        skfJokeGenerator,
        skfSpanishTranslator,
        text["uppercase"]);
  • Output: ¿POR QUÉ CRUZÓ EL POLLO LA CARRETERA? PORQUE QUERÍA VER SI LA HIERBA ERA MÁS VERDE AL OTRO LADO. PERO RESULTA QUE ERA ARTIFICIAL Y SE QUEDÓ ATASCADO EN EL CÉSPED. ¡QUÉ POLLO TAN TONTO!
  • SK functions can be in-line, coded, or loaded from files.
  • SK planner, dynamically can create complex orchestrations.
  • Memories can be used to save contents with embeddings and retrieve content using the same embeddings, for example for use in the RAG pattern.

When and when not to choose Semantic Kernel

Note: Semantic Kernel is still in preview. It is not yet recommended for production.

When

  • There's language support, particularly for C# and Python (more language support coming).
  • The app can take advantage of the functionality provided by Semantic Kernel such as pipes, orchestration, memories, etc.
  • If you need complex SK function orchestrations.
  • If you need to work with embeddings.

When not to

  • SK may be too complex for simple applications.
  • Obviously, if a language is not supported.

Semantic Kernel vs Langchain

Semantic Kernel LangChain
Connectors Language Models (LLM/Chat)
SK functions Prompt Templates
Variables in functions Variables in prompt templates
Pipes Chains
Planner Chains
Memories Memory
C#, Python, Java Python

Memories: Usually are helpers to generate, store, retrieve, and compare embeddings from vector databases.