Allow multiple OpenAI clients per Pipeline #563

bbrowning · 2025-02-19T02:18:49Z

This change allows a user to construct a PipelineContext with multiple OpenAI clients, such as:

PipelineContext(
    clients={
        "default": OpenAI(base_url="https://foo.local"),
        "server_a": OpenAI(base_url="https://server_a.local"),
        "server_b": OpenAI(base_url="https://server_b.local"),
    }
)

And then, within the pipeline yaml, choose which client to apply to which LLMBlock via a new client key, such as:

version: "1.0"
blocks:
  - name: server_a_client
    type: LLMBlock
    config:
      client: server_a
      ...

  - name: server_b_client
    type: LLMBlock
    config:
      client: server_b
      ...

See docs/examples/multiple_llm_clients for more details and a full example.

Resolves #521

This change allows a user to construct a PipelineContext with multiple OpenAI clients, such as: ```python PipelineContext( clients={ "default": OpenAI(base_url="https://foo.local"), "server_a": OpenAI(base_url="https://server_a.local"), "server_b": OpenAI(base_url="https://server_b.local"), } ) ``` And then, within the pipeline yaml, choose which client to apply to which LLMBlock via a new `client` key, such as: ```yaml version: "1.0" blocks: - name: server_a_client type: LLMBlock config: client: server_a ... - name: server_b_client type: LLMBlock config: client: server_b ... ``` See `docs/examples/multiple_llm_clients` for more details and a full example. Resolves instructlab#521 Signed-off-by: Ben Browning <[email protected]>

bbrowning · 2025-02-19T12:23:30Z

That e2e-small failure is due to instructlab/instructlab#3155 and unrelated to this change. We won't be able to get our CI green again until instructlab/instructlab#3167 merges.

eshwarprasadS · 2025-02-23T00:16:01Z

src/instructlab/sdg/pipeline.py

@@ -70,6 +73,9 @@ class PipelineContext:  # pylint: disable=too-many-instance-attributes
    max_num_tokens: Optional[int] = llmblock.DEFAULT_MAX_NUM_TOKENS
    batch_size: int = DEFAULT_BATCH_SIZE
    batch_num_workers: Optional[int] = None
+    clients: Optional[Dict[str, OpenAI]] = None


Great changes and excellent thoughts for supporting backwards compatibility. One small nit:

Here, it seems that, there's no mandate in PipelineContext that there should be a default key present in the dictionary passed to clients property. But subsequently, when a Block is initialized with the Pipeline created from this PipelineContext, it could potentially throw an error if the block config does not explicitly specify a client key (it will expect default to exist) (?), if my understanding is correct.

So maybe should we have a check either in PipelineContext, or in LLMBlock to explicitly mention default key, value for robustness?

eshwarprasadS · 2025-02-23T00:22:08Z

src/instructlab/sdg/pipeline.py

+            # Only set _clients if passed in a value, so we don't
+            # override it with the default of None from the @dataclass
+            self._clients = value
+


Something like this could be added to make sure 'default' key is explicitly mentioned, so the blocks know how to behave during fallback path (?). Just a suggestion, please feel free to disregard or consider alternate implementations for the same.

Suggested change

def __post_init__(self):

if self.clients is not None and "default" not in self.clients:

raise ValueError("PipelineContext requires a 'default' client in the clients dictionary")

RobotSail · 2025-02-23T22:11:14Z

tests/test_llmblock.py

+        assert block.client == client_default
+
+        # restore state for future tests
+        self.ctx.clients = old_clients


How come we are relying on the inner logic of a particular test to preserve the integrity of other tests in this suite? Maybe I'm misunderstanding something, but it seems like this violates the encapsulation property that we would want this test to have.

RobotSail · 2025-02-23T22:18:58Z

src/instructlab/sdg/pipeline.py

@@ -78,6 +84,33 @@ def batching_enabled(self) -> bool:
        """
        return self.batch_size > 0 and self.batch_num_workers != 1

+    @property  # type: ignore
+    def client(self):
+        return self.clients.get(self.DEFAULT_CLIENT_KEY, None)


A few questions:

Should we also handle the case where self.client = None but self.clients contains a single client not set to the default client? It seems like in this case, we could perhaps make the assumption that this is the default client. Otherwise, if there are multiple clients then it would become ambiguous.

Why do we specify None as the explicit default here? Won't this be the default of the .get method?

RobotSail · 2025-02-23T22:19:29Z

src/instructlab/sdg/blocks/llmblock.py

@@ -63,6 +63,15 @@ def template_from_struct_and_config(struct, config):
    return PromptRegistry.template_from_string(struct.format(**filtered_config))


+def _resolve_client(client_name, context, block):


Should we include type-hints here?

RobotSail · 2025-02-23T22:21:33Z

A few questions but good change overall, thank you for adding this @bbrowning

bbrowning · 2025-03-28T13:47:03Z

Closing this as it's no longer needed. We can borrow ideas/code from this PR in the future if we decide to implement this later.

mergify bot added documentation Improvements or additions to documentation testing Relates to testing ci-failure labels Feb 19, 2025

bbrowning force-pushed the multiple-clients branch from 4b0c0fb to 8ffa923 Compare February 19, 2025 02:30

mergify bot added ci-failure and removed ci-failure labels Feb 19, 2025

eshwarprasadS requested changes Feb 23, 2025

View reviewed changes

RobotSail reviewed Feb 23, 2025

View reviewed changes

bbrowning closed this Mar 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow multiple OpenAI clients per Pipeline #563

Allow multiple OpenAI clients per Pipeline #563

Uh oh!

bbrowning commented Feb 19, 2025

Uh oh!

bbrowning commented Feb 19, 2025

Uh oh!

eshwarprasadS Feb 23, 2025

Uh oh!

eshwarprasadS Feb 23, 2025

Uh oh!

RobotSail Feb 23, 2025

Uh oh!

RobotSail Feb 23, 2025

Uh oh!

RobotSail Feb 23, 2025

Uh oh!

RobotSail commented Feb 23, 2025

Uh oh!

bbrowning commented Mar 28, 2025

Uh oh!

Uh oh!

+    def __post_init__(self):
+        if self.clients is not None and "default" not in self.clients:
+            raise ValueError("PipelineContext requires a 'default' client in the clients dictionary")

		@@ -63,6 +63,15 @@ def template_from_struct_and_config(struct, config):
		return PromptRegistry.template_from_string(struct.format(**filtered_config))


		def _resolve_client(client_name, context, block):

Allow multiple OpenAI clients per Pipeline #563

Allow multiple OpenAI clients per Pipeline #563

Uh oh!

Conversation

bbrowning commented Feb 19, 2025

Uh oh!

bbrowning commented Feb 19, 2025

Uh oh!

eshwarprasadS Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

eshwarprasadS Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

RobotSail Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

RobotSail Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

RobotSail Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

RobotSail commented Feb 23, 2025

Uh oh!

bbrowning commented Mar 28, 2025

Uh oh!

Uh oh!