chore: Resync with upstream #1

leseb · 2025-04-09T08:32:06Z

No description provided.

…1887) # What does this PR do? Move around bits. This makes the copies from llama-models _much_ easier to maintain and ensures we don't entangle meta-reference specific tidbits into llama-models code even by accident. Also, kills the meta-reference-quantized-gpu distro and rolls quantization deps into meta-reference-gpu. ## Test Plan ``` LLAMA_MODELS_DEBUG=1 \ with-proxy llama stack run meta-reference-gpu \ --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \ --env INFERENCE_CHECKPOINT_DIR=<DIR> \ --env MODEL_PARALLEL_SIZE=4 \ --env QUANTIZATION_TYPE=fp8_mixed ``` Start a server with and without quantization. Point integration tests to it using: ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct ```

# What does this PR do? ## Test Plan export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic; LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference --vision-model $MODEL --text-model $MODEL

…ation (#1870) # What does this PR do? This PR updates the [playground RAG example](llama_stack/distribution/ui/page/playground/rag.py) so that the agent is able to use its builtin conversation history. Here we are using streamlit's `cache_resource` functionality to prevent the agent from re-initializing after every interaction as well as storing its session_id in the `session_state`. This allows the agent in the RAG example to behave more closely to how it works using the python-client directly. [//]: # (If resolving an issue, uncomment and update the line below) Closes #1869 ## Test Plan Without these changes, if you ask it "What is 2 + 2"? followed by the question "What did I just ask?" It will provide an obviously incorrect answer. With these changes, you can ask the same series of questions and it will provide the correct answer. [//]: # (## Documentation) Signed-off-by: Michael Clifford <[email protected]>

# What does this PR do? ## Test Plan

Update "chat" badge on README to make it more visible for visitors; changing the look from ![image](https://github.com/user-attachments/assets/630be671-a937-4841-8009-93e8eea1cbe1) ... to ... ![image](https://github.com/user-attachments/assets/cfcb946a-e266-48da-bd50-c994cf1e3a9d)

Move the test_context.py under the main tests directory, and fix the code. The problem was that the function captures the initial values of the context variables and then restores those same initial values before each iteration. This means that any modifications made to the context variables during iteration are lost when the next iteration starts. Error was: ``` ====================================================== FAILURES ======================================================= ______________________________________ test_preserve_contexts_across_event_loops ______________________________________ @pytest.mark.asyncio async def test_preserve_contexts_across_event_loops(): """ Test that context variables are preserved across event loop boundaries with nested generators. This simulates the real-world scenario where: 1. A new event loop is created for each streaming request 2. The async generator runs inside that loop 3. There are multiple levels of nested generators 4. Context needs to be preserved across these boundaries """ # Create context variables request_id = ContextVar("request_id", default=None) user_id = ContextVar("user_id", default=None) # Set initial values # Results container to verify values across thread boundaries results = [] # Inner-most generator (level 2) async def inner_generator(): # Should have the context from the outer scope yield (1, request_id.get(), user_id.get()) # Modify one context variable user_id.set("user-modified") # Should reflect the modification yield (2, request_id.get(), user_id.get()) # Middle generator (level 1) async def middle_generator(): inner_gen = inner_generator() # Forward the first yield from inner item = await inner_gen.__anext__() yield item # Forward the second yield from inner item = await inner_gen.__anext__() yield item request_id.set("req-modified") # Add our own yield with both modified variables yield (3, request_id.get(), user_id.get()) # Function to run in a separate thread with a new event loop def run_in_new_loop(): # Create a new event loop for this thread loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) try: # Outer generator (runs in the new loop) async def outer_generator(): request_id.set("req-12345") user_id.set("user-6789") # Wrap the middle generator wrapped_gen = preserve_contexts_async_generator(middle_generator(), [request_id, user_id]) # Process all items from the middle generator async for item in wrapped_gen: # Store results for verification results.append(item) # Run the outer generator in the new loop loop.run_until_complete(outer_generator()) finally: loop.close() # Run the generator chain in a separate thread with a new event loop with ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(run_in_new_loop) future.result() # Wait for completion # Verify the results assert len(results) == 3 # First yield should have original values assert results[0] == (1, "req-12345", "user-6789") # Second yield should have modified user_id assert results[1] == (2, "req-12345", "user-modified") # Third yield should have both modified values > assert results[2] == (3, "req-modified", "user-modified") E AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified') E E At index 2 diff: 'user-6789' != 'user-modified' E E Full diff: E ( E 3, E 'req-modified', E - 'user-modified', E + 'user-6789', E ) tests/unit/distribution/test_context.py:155: AssertionError -------------------------------------------------- Captured log call -------------------------------------------------- ERROR asyncio:base_events.py:1758 Task was destroyed but it is pending! task: <Task pending name='Task-7' coro=<<async_generator_athrow without __name__>()>> ================================================== warnings summary =================================================== .venv/lib/python3.10/site-packages/pydantic/fields.py:1042 /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pydantic/fields.py:1042: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/ warn( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =============================================== short test summary info =============================================== FAILED tests/unit/distribution/test_context.py::test_preserve_contexts_across_event_loops - AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified') At index 2 diff: 'user-6789' != 'user-modified' Full diff: ( 3, 'req-modified', - 'user-modified', + 'user-6789', ) ``` [//]: # (## Documentation) Signed-off-by: Sébastien Han <[email protected]>

# What does this PR do? ## Test Plan pytest verifications/openai/test_chat_completion.py --provider together

Add the content to use AMD GPU as the vLLM server. Split the original part to two sub chapters, 1. AMD vLLM server 2. NVIDIA vLLM server (orignal) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) --------- Signed-off-by: Alex He <[email protected]>

# What does this PR do? The usage of the tempdir was removed in 094eb6a. Signed-off-by: Sébastien Han <[email protected]>

# What does this PR do? Fix hash for v3.0.1 tag for a github action. Signed-off-by: Ihar Hrachyshka <[email protected]>

# What does this PR do? Providers that live outside of the llama-stack codebase are now supported. A new property `external_providers_dir` has been added to the main config and can be configured as follow: ``` external_providers_dir: /etc/llama-stack/providers.d/ ``` Where the expected structure is: ``` providers.d/ inference/ custom_ollama.yaml vllm.yaml vector_io/ qdrant.yaml ``` Where `custom_ollama.yaml` is: ``` adapter: adapter_type: custom_ollama pip_packages: ["ollama", "aiohttp"] config_class: llama_stack_ollama_provider.config.OllamaImplConfig module: llama_stack_ollama_provider api_dependencies: [] optional_api_dependencies: [] ``` Obviously the package must be installed on the system, here is the `llama_stack_ollama_provider` example: ``` $ uv pip show llama-stack-ollama-provider Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv Name: llama-stack-ollama-provider Version: 0.1.0 Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider Requires: Required-by: ``` Closes: #658 Signed-off-by: Sébastien Han <[email protected]>

# What does this PR do? closes #1853 ## Test Plan ``` uv run llama stack build --image-type conda --image-name ollama --config llama_stack/templates/ollama/build.yaml ollama pull llama3.2:3b LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/integration/inference/test_text_inference.py -v --text-model=llama3.2:3b ```

# What does this PR do? download the getting started w/ ollama model instead of downloading and running it. directly running it was necessary before #1854 ## Test Plan run the code on the page

# What does this PR do? Fixes issue #1537 that causes "500 Internal Server Error" when unregistering a toolgroup # (Closes #1537 ) ## Test Plan ```console $ pytest -s -v tests/integration/tool_runtime/test_registration.py --stack-config=ollama --env INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" INFO 2025-03-14 21:15:03,999 tests.integration.conftest:41 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS /opt/homebrew/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ===================================================== test session starts ===================================================== platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /opt/homebrew/opt/[email protected]/bin/python3.10 cachedir: .pytest_cache rootdir: /Users/paolo/Projects/aiplatform/llama-stack configfile: pyproject.toml plugins: asyncio-0.25.3, anyio-4.8.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 1 item tests/integration/tool_runtime/test_registration.py::test_register_and_unregister_toolgroup[None-None-None-None-None] INFO 2025-03-14 21:15:04,478 llama_stack.providers.remote.inference.ollama.ollama:75 inference: checking connectivity to Ollama at `http://localhost:11434`... INFO 2025-03-14 21:15:05,350 llama_stack.providers.remote.inference.ollama.ollama:294 inference: Pulling embedding model `all-minilm:latest` if necessary... INFO: Started server process [78391] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 127.0.0.1:57424 - "GET /sse HTTP/1.1" 200 OK INFO: 127.0.0.1:57434 - "GET /sse HTTP/1.1" 200 OK INFO 2025-03-14 21:15:16,129 mcp.client.sse:51 uncategorized: Connecting to SSE endpoint: http://localhost:8000/sse INFO: 127.0.0.1:57445 - "GET /sse HTTP/1.1" 200 OK INFO 2025-03-14 21:15:16,146 mcp.client.sse:71 uncategorized: Received endpoint URL: http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b INFO 2025-03-14 21:15:16,147 mcp.client.sse:140 uncategorized: Starting post writer with endpoint URL: http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b INFO: 127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted INFO: 127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted INFO: 127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted INFO 2025-03-14 21:15:16,155 mcp.server.lowlevel.server:535 uncategorized: Processing request of type ListToolsRequest PASSED =============================================== 1 passed, 4 warnings in 12.17s ================================================ ``` --------- Signed-off-by: Paolo Dettori <[email protected]>

**What does this PR do?** This PR fixes a build issue with the Containerfile caused by missing requirement `llama-stack`. It updates the Containerfile to include the necessary requirements and upgrades the Python version to ensure successful builds. **Test Plan** The updated Containerfile has been tested, and the build now completes successfully with the required dependencies included.

# What does this PR do? This PR adds an additional page to the playground called "Tools". This page connects to a llama-stack server and lists all the available LLM models, builtin tools and MCP tools in the sidebar. Users can select whatever combination of model and tools they want from the sidebar for their agent. Once the selections are made, users can chat with their agent similarly to the RAG page and test out agent tool use. closes #1902 ## Test Plan Ran the following commands with a llama-stack server and the updated playground worked as expected. ``` export LLAMA_STACK_ENDPOINT="http://localhost:8321" streamlit run llama_stack/distribution/ui/app.py ``` [//]: # (## Documentation) Signed-off-by: Michael Clifford <[email protected]>

# What does this PR do? These are missing and changelog doc automation is not working yet due to missing permissions for GitHub Actions: https://dev.to/suzuki0430/how-to-enable-the-allow-github-actions-to-create-and-approve-pull-requests-option-when-its-grayed-out-3e1i --------- Signed-off-by: Yuan Tang <[email protected]>

# What does this PR do? This PR adds the "TAVILY_SEARCH_API_KEY" option to the playground to enable the use of the websearch tool. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` export TAVILY_SEARCH_API_KEY=*** streamlit run llama_stack/distribution/ui/app.py ``` Without this change the builtin websearch tool will fail due to missing API key. [//]: # (## Documentation) Related to #1902 Signed-off-by: Michael Clifford <[email protected]>

ashwinb and others added 13 commits April 7, 2025 15:03

test: verification on provider's OAI endpoints (#1893)

7b4eb09

# What does this PR do? ## Test Plan export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic; LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference --vision-model $MODEL --text-model $MODEL

fix: type (#1898)

031a40b

# What does this PR do? ## Test Plan

feat: adds test suite to verify provider's OAI compat endpoints (#1901)

bcbc56b

# What does this PR do? ## Test Plan pytest verifications/openai/test_chat_completion.py --provider together

chore: remove unused tempdir in agent (#1896)

10882bf

# What does this PR do? The usage of the tempdir was removed in 094eb6a. Signed-off-by: Sébastien Han <[email protected]>

fix: meta reference + llama4 tokenizer fix

8001c30

chore: fix hash for thollander/actions-comment-pull-request (#1900)

e3d22d8

# What does this PR do? Fix hash for v3.0.1 tag for a github action. Signed-off-by: Ihar Hrachyshka <[email protected]>

fix: llama3 bf16 model load

45e210f

leseb changed the title ~~Resync with upstream~~ chore: Resync with upstream Apr 9, 2025

mattf and others added 7 commits April 9, 2025 10:34

fix: update getting started guide to use ollama pull (#1855)

a2cf299

# What does this PR do? download the getting started w/ ollama model instead of downloading and running it. directly running it was necessary before #1854 ## Test Plan run the code on the page

VaishnaviHire merged commit 8353414 into opendatahub-io:main Apr 9, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Resync with upstream #1

chore: Resync with upstream #1

Uh oh!

leseb commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

chore: Resync with upstream #1

chore: Resync with upstream #1

Uh oh!

Conversation

leseb commented Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!