Skip to content

chore: Resync with upstream #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 9, 2025
Merged

Conversation

leseb
Copy link
Collaborator

@leseb leseb commented Apr 9, 2025

No description provided.

ashwinb and others added 13 commits April 7, 2025 15:03
…1887)

# What does this PR do?

Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.

## Test Plan

```
LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed
```

Start a server with and without quantization. Point integration tests to
it using:

```
pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
# What does this PR do?


## Test Plan
export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic;
LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference
--vision-model $MODEL --text-model $MODEL
…ation (#1870)

# What does this PR do?

This PR updates the [playground RAG
example](llama_stack/distribution/ui/page/playground/rag.py) so that the
agent is able to use its builtin conversation history. Here we are using
streamlit's `cache_resource` functionality to prevent the agent from
re-initializing after every interaction as well as storing its
session_id in the `session_state`. This allows the agent in the RAG
example to behave more closely to how it works using the python-client
directly.

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1869 

## Test Plan

Without these changes, if you ask it "What is 2 + 2"? followed by the
question "What did I just ask?" It will provide an obviously incorrect
answer.

With these changes, you can ask the same series of questions and it will
provide the correct answer.

[//]: # (## Documentation)

Signed-off-by: Michael Clifford <[email protected]>
# What does this PR do?


## Test Plan
Move the test_context.py under the main tests directory, and fix the
code.

The problem was that the function captures the initial values of the
context variables and then restores those same initial values before
each iteration. This means that any modifications made to the context
variables during iteration are lost when the next iteration starts.

Error was:

```
====================================================== FAILURES =======================================================
______________________________________ test_preserve_contexts_across_event_loops ______________________________________

    @pytest.mark.asyncio
    async def test_preserve_contexts_across_event_loops():
        """
        Test that context variables are preserved across event loop boundaries with nested generators.
        This simulates the real-world scenario where:
        1. A new event loop is created for each streaming request
        2. The async generator runs inside that loop
        3. There are multiple levels of nested generators
        4. Context needs to be preserved across these boundaries
        """
        # Create context variables
        request_id = ContextVar("request_id", default=None)
        user_id = ContextVar("user_id", default=None)

        # Set initial values

        # Results container to verify values across thread boundaries
        results = []

        # Inner-most generator (level 2)
        async def inner_generator():
            # Should have the context from the outer scope
            yield (1, request_id.get(), user_id.get())

            # Modify one context variable
            user_id.set("user-modified")

            # Should reflect the modification
            yield (2, request_id.get(), user_id.get())

        # Middle generator (level 1)
        async def middle_generator():
            inner_gen = inner_generator()

            # Forward the first yield from inner
            item = await inner_gen.__anext__()
            yield item

            # Forward the second yield from inner
            item = await inner_gen.__anext__()
            yield item

            request_id.set("req-modified")

            # Add our own yield with both modified variables
            yield (3, request_id.get(), user_id.get())

        # Function to run in a separate thread with a new event loop
        def run_in_new_loop():
            # Create a new event loop for this thread
            loop = asyncio.new_event_loop()
            asyncio.set_event_loop(loop)

            try:
                # Outer generator (runs in the new loop)
                async def outer_generator():
                    request_id.set("req-12345")
                    user_id.set("user-6789")
                    # Wrap the middle generator
                    wrapped_gen = preserve_contexts_async_generator(middle_generator(), [request_id, user_id])

                    # Process all items from the middle generator
                    async for item in wrapped_gen:
                        # Store results for verification
                        results.append(item)

                # Run the outer generator in the new loop
                loop.run_until_complete(outer_generator())
            finally:
                loop.close()

        # Run the generator chain in a separate thread with a new event loop
        with ThreadPoolExecutor(max_workers=1) as executor:
            future = executor.submit(run_in_new_loop)
            future.result()  # Wait for completion

        # Verify the results
        assert len(results) == 3

        # First yield should have original values
        assert results[0] == (1, "req-12345", "user-6789")

        # Second yield should have modified user_id
        assert results[1] == (2, "req-12345", "user-modified")

        # Third yield should have both modified values
>       assert results[2] == (3, "req-modified", "user-modified")
E       AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified')
E
E         At index 2 diff: 'user-6789' != 'user-modified'
E
E         Full diff:
E           (
E               3,
E               'req-modified',
E         -     'user-modified',
E         +     'user-6789',
E           )

tests/unit/distribution/test_context.py:155: AssertionError
-------------------------------------------------- Captured log call --------------------------------------------------
ERROR    asyncio:base_events.py:1758 Task was destroyed but it is pending!
task: <Task pending name='Task-7' coro=<<async_generator_athrow without __name__>()>>
================================================== warnings summary ===================================================
.venv/lib/python3.10/site-packages/pydantic/fields.py:1042
  /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pydantic/fields.py:1042: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
    warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================== short test summary info ===============================================
FAILED tests/unit/distribution/test_context.py::test_preserve_contexts_across_event_loops - AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified')

  At index 2 diff: 'user-6789' != 'user-modified'

  Full diff:
    (
        3,
        'req-modified',
  -     'user-modified',
  +     'user-6789',
    )
```

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <[email protected]>
# What does this PR do?


## Test Plan
pytest verifications/openai/test_chat_completion.py --provider together
Add the content to use AMD GPU as the vLLM server. Split the original
part to two sub chapters,
1. AMD vLLM server
2. NVIDIA vLLM server (orignal)

# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: Alex He <[email protected]>
# What does this PR do?

The usage of the tempdir was removed in
094eb6a.

Signed-off-by: Sébastien Han <[email protected]>
# What does this PR do?

Fix hash for v3.0.1 tag for a github action.

Signed-off-by: Ihar Hrachyshka <[email protected]>
# What does this PR do?

Providers that live outside of the llama-stack codebase are now
supported.
A new property `external_providers_dir` has been added to the main
config and can be configured as follow:

```
external_providers_dir: /etc/llama-stack/providers.d/
```

Where the expected structure is:

```
providers.d/
  inference/
    custom_ollama.yaml
    vllm.yaml
  vector_io/
    qdrant.yaml
```

Where `custom_ollama.yaml` is:

```
adapter:
  adapter_type: custom_ollama
  pip_packages: ["ollama", "aiohttp"]
  config_class: llama_stack_ollama_provider.config.OllamaImplConfig
  module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
```

Obviously the package must be installed on the system, here is the
`llama_stack_ollama_provider` example:

```
$ uv pip show llama-stack-ollama-provider
Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv
Name: llama-stack-ollama-provider
Version: 0.1.0
Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages
Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider
Requires:
Required-by:
```

Closes: #658

Signed-off-by: Sébastien Han <[email protected]>
@leseb leseb changed the title Resync with upstream chore: Resync with upstream Apr 9, 2025
mattf and others added 7 commits April 9, 2025 10:34
# What does this PR do?

closes #1853 

## Test Plan
```
uv run llama stack build --image-type conda --image-name ollama --config llama_stack/templates/ollama/build.yaml

ollama pull llama3.2:3b

LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/integration/inference/test_text_inference.py -v --text-model=llama3.2:3b
```
# What does this PR do?

download the getting started w/ ollama model instead of downloading and
running it.

directly running it was necessary before
#1854

## Test Plan

run the code on the page
# What does this PR do?
Fixes issue #1537 that causes "500 Internal Server Error" when
unregistering a toolgroup

# (Closes #1537 )

## Test Plan

```console
$ pytest -s -v tests/integration/tool_runtime/test_registration.py --stack-config=ollama --env INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
INFO     2025-03-14 21:15:03,999 tests.integration.conftest:41 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS          
/opt/homebrew/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
===================================================== test session starts =====================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /opt/homebrew/opt/[email protected]/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/paolo/Projects/aiplatform/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.25.3, anyio-4.8.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 1 item                                                                                                              

tests/integration/tool_runtime/test_registration.py::test_register_and_unregister_toolgroup[None-None-None-None-None] INFO     2025-03-14 21:15:04,478 llama_stack.providers.remote.inference.ollama.ollama:75 inference: checking            
         connectivity to Ollama at `http://localhost:11434`...                                                          
INFO     2025-03-14 21:15:05,350 llama_stack.providers.remote.inference.ollama.ollama:294 inference: Pulling embedding  
         model `all-minilm:latest` if necessary...                                                                      
INFO:     Started server process [78391]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:57424 - "GET /sse HTTP/1.1" 200 OK
INFO:     127.0.0.1:57434 - "GET /sse HTTP/1.1" 200 OK
INFO     2025-03-14 21:15:16,129 mcp.client.sse:51 uncategorized: Connecting to SSE endpoint: http://localhost:8000/sse 
INFO:     127.0.0.1:57445 - "GET /sse HTTP/1.1" 200 OK
INFO     2025-03-14 21:15:16,146 mcp.client.sse:71 uncategorized: Received endpoint URL:                                
         http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b                                    
INFO     2025-03-14 21:15:16,147 mcp.client.sse:140 uncategorized: Starting post writer with endpoint URL:              
         http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b                                    
INFO:     127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted
INFO:     127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted
INFO:     127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted
INFO     2025-03-14 21:15:16,155 mcp.server.lowlevel.server:535 uncategorized: Processing request of type               
         ListToolsRequest                                                                                               
PASSED

=============================================== 1 passed, 4 warnings in 12.17s ================================================
```

---------

Signed-off-by: Paolo Dettori <[email protected]>
**What does this PR do?**

This PR fixes a build issue with the Containerfile caused by missing
requirement `llama-stack`. It updates the Containerfile to include the
necessary requirements and upgrades the Python version to ensure
successful builds.

**Test Plan**
The updated Containerfile has been tested, and the build now completes
successfully with the required dependencies included.
# What does this PR do?

This PR adds an additional page to the playground called "Tools". This
page connects to a llama-stack server and lists all the available LLM
models, builtin tools and MCP tools in the sidebar. Users can select
whatever combination of model and tools they want from the sidebar for
their agent. Once the selections are made, users can chat with their
agent similarly to the RAG page and test out agent tool use.

closes #1902 

## Test Plan

Ran the following commands with a llama-stack server and the updated
playground worked as expected.
```
export LLAMA_STACK_ENDPOINT="http://localhost:8321"     
streamlit run  llama_stack/distribution/ui/app.py
```

[//]: # (## Documentation)

Signed-off-by: Michael Clifford <[email protected]>
# What does this PR do?

These are missing and changelog doc automation is not working yet due to
missing permissions for GitHub Actions:
https://dev.to/suzuki0430/how-to-enable-the-allow-github-actions-to-create-and-approve-pull-requests-option-when-its-grayed-out-3e1i

---------

Signed-off-by: Yuan Tang <[email protected]>
# What does this PR do?
This PR adds the "TAVILY_SEARCH_API_KEY" option to the playground to
enable the use of the websearch tool.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
export TAVILY_SEARCH_API_KEY=***
streamlit run  llama_stack/distribution/ui/app.py      
```
Without this change the builtin websearch tool will fail due to missing
API key.


[//]: # (## Documentation)
Related to #1902

Signed-off-by: Michael Clifford <[email protected]>
@VaishnaviHire VaishnaviHire merged commit 8353414 into opendatahub-io:main Apr 9, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.