Skip to content

Conversation

@rcano-baseten
Copy link
Contributor

@rcano-baseten rcano-baseten commented Nov 26, 2025

🚀 What

Supports training checkpoints in truss transfer as a source

  • Creates a new Model Source
  • Adds conditional logic to the AWSProvider to encode the base secret path and use a session token if present

💻 How

🔬 Testing

@michaelfeil
Copy link
Contributor

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 11 to +15
S3,
#[serde(rename = "azure")]
Azure,
#[serde(rename = "baseten_training")]
BasetenTraining,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse baseten_training repos from Python bindings

A new ResolutionType::BasetenTraining variant was introduced, but the Python TryFrom<PyModelRepo> matcher in bindings.rs still only accepts http, gcs, s3, or azure. Passing kind="baseten_training" to create_basetenpointer_from_models will therefore still raise Unknown kind and prevent creating manifests for training checkpoints from Python callers. Please extend the binding match to include the new variant.

Useful? React with 👍 / 👎.

Comment on lines 115 to +118
&destination,
pointer.size,
&pointer.runtime_secret_name,
false,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Training checkpoints never read training secret path on download

The S3 download path now takes a use_training_secrets flag, but download_file_with_cache always passes false (lines 114–118), so download_s3_to_path will read credentials from /secrets even for manifests generated from training buckets whose AWS keys live under /aws-secrets. Downloading a Baseten training checkpoint will therefore fail with "Failed to read AWS credentials from not existing file" unless the secrets are duplicated in /secrets. The download branch needs to propagate the training flag when the pointer originates from a training source.

Useful? React with 👍 / 👎.

@michaelfeil
Copy link
Contributor

@rcano-baseten just on mobile, but a bit confused. Isn’t all you want a Union of secret \ access key?

@michaelfeil
Copy link
Contributor

@rcano-baseten I think its complicated, and does not even stop there. Wiring tough training_secret_bool: true/false seems to complicated, and creates weird side effects, like you need to have s3 to use training, and access key is allowed.

I opened a separate PR that just generally allows secrets to be named e.g. aws_token aka /secrets/aws_token or /aws-secrets/aws_token. This is actually currently possible as well / today.

Additionally, I noticed s3 bucket always requires a region, otherwise it will default to us-east. I opened a small pr here:

#2050

# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants