NYU-RTS · s-sajid-ali · Jun 5, 2025 · Jun 5, 2025 · Jun 5, 2025 · Jun 9, 2025
diff --git a/docs/genai/01_getting_started/01_intro.mdx b/docs/genai/01_getting_started/01_intro.mdx
@@ -7,7 +7,6 @@ If you're looking to harness Generative AI for administrative or classroom use,
 Welcome to Pythia, the generative AI platform for research workflows. As part of the Pythia platform, the following capabilities are offered:
 - [Access to externally hosted LLMs](../02_external_llms/01_llm_access.mdx)
 - [HPC resources for fine tuning LLMs](../03_llm_fine_tuning/01_intro.md)
-- [Milvus vector database](../04_vector_databases/01_intro.md)
 
 :::tip[Personal use]
 If you want to access NYU provided LLMs for personal use, proceed to https://gemini.google.com/app with your NYU credentials. 

diff --git a/docs/genai/02_external_llms/02_catalogue.md b/docs/genai/02_external_llms/02_catalogue.md
@@ -10,7 +10,7 @@ We currently facilitate access to the following externally hosted LLMs:
 -   text-embedding-3-small
 
 ## VertexAI
--   Gemini-2.5-flash-preview-04-17
+-   gemini-2.5-flash-preview-05-20
 -   Gemini-2.0 models (flash, flash-lite)
 -   Gemini-1.5 models (flash, pro) (deprecated)
 

diff --git a/.../genai/05_how_to_guides/01_temperature.md → .../genai/04_how_to_guides/01_temperature.md b/.../genai/05_how_to_guides/01_temperature.md → .../genai/04_how_to_guides/01_temperature.md
@@ -1,4 +1,4 @@
-# Temperature
+# Effect of Temperature
 
 Generating text (or images) from LLMs is inherently probabilistic. However, as an end user you have many parameters at your disposal to tweak the behavior of LLMs. Of these, temperature is the most commonly used. Broadly, it controls the randomness of the generated text. A lower temperature produces more deterministic outputs, while a higher temperature produces more random "creative" output. For a more comprehensive explanation on this topic, refer to the following:
 -   [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate)

diff --git a/docs/genai/04_how_to_guides/02_embeddings.mdx b/docs/genai/04_how_to_guides/02_embeddings.mdx
@@ -0,0 +1,48 @@
+# Embeddings
+
+While Decoder-only LLMs gained massive popularity via their usage in chatbots, Encoder-only LLMs can be used for a wider variety of tasks. Decoder-only LLMs "generate" tokens ("text") one at a time probabalisticsally. Encoder-only LLMs on the other hand take text as their input, tokenize it and generate "embeddings" as their output. Here, we shall walk through a task of generating embeddings from a text document.
+
+```mermaid
+flowchart LR;
+    A["natual language text string <br> *GenAI can be used for research*"]
+    B["encoder-only LLM"]
+    C["vector embedding <br> [0.052587852, 0.094195396, 0.24439038, 0.104940414, ...]"]
+    A-- "Input" -->B;
+    B-- "Output" -->C;
+```
+
+## How to generate embeddings from plain text:
+
+The snippet below uses the `text-embedding-3-small` model to create 32-dimensional floating point vector embeddings for the input string:
+
+```python
+from portkey_ai import Portkey
+
+portkey = Portkey(
+    base_url="https://ai-gateway.apps.cloud.rt.nyu.edu/v1/",
+    api_key="",  # Replace with your Portkey API key
+    virtual_key="",  # Replace with your virtual key
+)
+
+response = portkey.embeddings.create(
+    model="text-embedding-3-small",
+    input="GenAI can be used for research.",
+    encoding_format="float",
+    dimensions=32,
+)
+
+print(response["data"][0].embedding)
+```
+
+and gives the following response:
+```
+[0.052587852, 0.094195396, 0.24439038, 0.104940414, -0.028921358, -0.31591928, -0.1846261, 0.221018, 0.033215445, -0.1382735, -0.14776362, -0.15058714, 0.057725072, -0.23435123, 0.07956805, -0.32156628, -0.08454841, 0.04066637, -0.022215525, 0.19090058, -0.11160703, 0.22258662, -0.06843088, -0.22854735, 0.1033718, -0.38085997, 0.2933312, -0.023215517, 0.20768477, -0.039333045, 0.17192031, -0.14180289]
+```
+
+## Applications of embeddings
+
+Embeddings have the ability to encode the semantic meaning of the text. Thus, they find applications in:
+-  retrieval-augmented generation
+-  search
+-  classification
+among others
diff --git a/docs/genai/04_how_to_guides/03_retrieval_augmented_generation.mdx b/docs/genai/04_how_to_guides/03_retrieval_augmented_generation.mdx
@@ -0,0 +1,27 @@
+# Retrieval-augmented generation
+
+Large Language Models only know about the data they were trained upon and do not have the context needed to be effective at answering questions based on
+- private datasets
+- newer knowledge past the cutoff date (i.e. the date at which data collection was frozen)
+
+To get around this issue, one of the most popular techniques is Retrieval-augmented generation.
+
+
+```mermaid
+flowchart TB;
+    A["natual language prompt <br> *Can GenAI be used for research?*"]
+    B["vector embedding <br> [-0.013879947, 0.0601184, 0.35442936, 0.04381764, ...]"]
+    C["vector database <br> embedding1 <br> embedding2 <br> embedding3 <br> ... "]
+    D["text with embeddings similar to the prompt"]
+    E["priginal prompt with added context"]
+    F["response from LLM using context"]
+    subgraph Retrieval
+    A-- "Embedding" -->B;
+    B-- "Look for similar embeddings" -->C;
+    C-- "Generate context" -->D;
+    end
+    D-- "With expanded prompt" -->E;    
+    subgraph Augmented Generation
+    E-- "LLM" -->F;
+    end
+```
diff --git a/docs/genai/05_how_to_guides/_category_.json → docs/genai/04_how_to_guides/_category_.json b/docs/genai/05_how_to_guides/_category_.json → docs/genai/04_how_to_guides/_category_.json
diff --git a/docs/genai/04_vector_databases/01_intro.md b/docs/genai/04_vector_databases/01_intro.md
diff --git a/docs/genai/04_vector_databases/_category_.json b/docs/genai/04_vector_databases/_category_.json