Skip to content

Add a how-to RAG guide #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/genai/01_getting_started/01_intro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ If you're looking to harness Generative AI for administrative or classroom use,
Welcome to Pythia, the generative AI platform for research workflows. As part of the Pythia platform, the following capabilities are offered:
- [Access to externally hosted LLMs](../02_external_llms/01_llm_access.mdx)
- [HPC resources for fine tuning LLMs](../03_llm_fine_tuning/01_intro.md)
- [Milvus vector database](../04_vector_databases/01_intro.md)

:::tip[Personal use]
If you want to access NYU provided LLMs for personal use, proceed to https://gemini.google.com/app with your NYU credentials.
Expand Down
2 changes: 1 addition & 1 deletion docs/genai/02_external_llms/02_catalogue.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ We currently facilitate access to the following externally hosted LLMs:
- text-embedding-3-small

## VertexAI
- Gemini-2.5-flash-preview-04-17
- gemini-2.5-flash-preview-05-20
- Gemini-2.0 models (flash, flash-lite)
- Gemini-1.5 models (flash, pro) (deprecated)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Temperature
# Effect of Temperature

Generating text (or images) from LLMs is inherently probabilistic. However, as an end user you have many parameters at your disposal to tweak the behavior of LLMs. Of these, temperature is the most commonly used. Broadly, it controls the randomness of the generated text. A lower temperature produces more deterministic outputs, while a higher temperature produces more random "creative" output. For a more comprehensive explanation on this topic, refer to the following:
- [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate)
Expand Down
48 changes: 48 additions & 0 deletions docs/genai/04_how_to_guides/02_embeddings.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Embeddings

While Decoder-only LLMs gained massive popularity via their usage in chatbots, Encoder-only LLMs can be used for a wider variety of tasks. Decoder-only LLMs "generate" tokens ("text") one at a time probabalisticsally. Encoder-only LLMs on the other hand take text as their input, tokenize it and generate "embeddings" as their output. Here, we shall walk through a task of generating embeddings from a text document.

```mermaid
flowchart LR;
A["natual language text string <br> *GenAI can be used for research*"]
B["encoder-only LLM"]
C["vector embedding <br> [0.052587852, 0.094195396, 0.24439038, 0.104940414, ...]"]
A-- "Input" -->B;
B-- "Output" -->C;
```

## How to generate embeddings from plain text:

The snippet below uses the `text-embedding-3-small` model to create 32-dimensional floating point vector embeddings for the input string:

```python
from portkey_ai import Portkey

portkey = Portkey(
base_url="https://ai-gateway.apps.cloud.rt.nyu.edu/v1/",
api_key="", # Replace with your Portkey API key
virtual_key="", # Replace with your virtual key
)

response = portkey.embeddings.create(
model="text-embedding-3-small",
input="GenAI can be used for research.",
encoding_format="float",
dimensions=32,
)

print(response["data"][0].embedding)
```

and gives the following response:
```
[0.052587852, 0.094195396, 0.24439038, 0.104940414, -0.028921358, -0.31591928, -0.1846261, 0.221018, 0.033215445, -0.1382735, -0.14776362, -0.15058714, 0.057725072, -0.23435123, 0.07956805, -0.32156628, -0.08454841, 0.04066637, -0.022215525, 0.19090058, -0.11160703, 0.22258662, -0.06843088, -0.22854735, 0.1033718, -0.38085997, 0.2933312, -0.023215517, 0.20768477, -0.039333045, 0.17192031, -0.14180289]
```

## Applications of embeddings

Embeddings have the ability to encode the semantic meaning of the text. Thus, they find applications in:
- retrieval-augmented generation
- search
- classification
among others
27 changes: 27 additions & 0 deletions docs/genai/04_how_to_guides/03_retrieval_augmented_generation.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Retrieval-augmented generation

Large Language Models only know about the data they were trained upon and do not have the context needed to be effective at answering questions based on
- private datasets
- newer knowledge past the cutoff date (i.e. the date at which data collection was frozen)

To get around this issue, one of the most popular techniques is Retrieval-augmented generation.


```mermaid
flowchart TB;
A["natual language prompt <br> *Can GenAI be used for research?*"]
B["vector embedding <br> [-0.013879947, 0.0601184, 0.35442936, 0.04381764, ...]"]
C["vector database <br> embedding1 <br> embedding2 <br> embedding3 <br> ... "]
D["text with embeddings similar to the prompt"]
E["priginal prompt with added context"]
F["response from LLM using context"]
subgraph Retrieval
A-- "Embedding" -->B;
B-- "Look for similar embeddings" -->C;
C-- "Generate context" -->D;
end
D-- "With expanded prompt" -->E;
subgraph Augmented Generation
E-- "LLM" -->F;
end
```
3 changes: 0 additions & 3 deletions docs/genai/04_vector_databases/01_intro.md

This file was deleted.

3 changes: 0 additions & 3 deletions docs/genai/04_vector_databases/_category_.json

This file was deleted.