Add media constructors: baml.audio, baml.image, baml.pdf, baml.video#3170
Add media constructors: baml.audio, baml.image, baml.pdf, baml.video#3170
Conversation
…yground renderers Co-authored-by: Cursor <[email protected]>
- Add builtin modules audio.baml, image.baml, pdf.baml, video.baml with from_url, from_base64, from_file (thin wrappers around baml.Media.from_*). - Extend builtins macros (parse, collect, codegen_native, util) and path_resolve, TIR builtins/normalize, MIR lower for new types. - Register new builtins in baml_builtins lib and bex_vm native. - Include new baml_std test snapshots and update existing snapshots. Co-authored-by: Cursor <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis PR introduces media type constructors for audio, image, PDF, and video as BAML builtin wrappers around existing Media.from_* functions. It extends the builtin type system with DSL-aware type patterns (StringLiteral, Union), updates the builtin macro infrastructure to support these new patterns, adds path resolution for nested BAML namespaces, and implements native VM constructors for media creation from URLs, base64, and files. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Merging this PR will degrade performance by 13.12%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | bench_parse_only_simple |
70.1 µs | 80.7 µs | -13.12% |
| ❌ | WallTime | bench_lexer_only_simple |
44.8 µs | 50.2 µs | -10.81% |
Comparing hellovai/media-constructors (274f0b5) with canary (7c71c3e)
Footnotes
-
91 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Binary size checks passed✅ 7 passed
Generated by |
| baml.Media.from_url("pdf", url, mime_type) | ||
| } | ||
|
|
||
| function from_base64(base64: string, mime_type: string) -> pdf { |
There was a problem hiding this comment.
No need for mime type for pdf
|
|
||
| // Infer receiver type (could be single var or nested field access) | ||
| let receiver_ty = if receiver_segments.len() == 1 { | ||
| // Simple receiver: `baz.method()` |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
baml_language/crates/baml_compiler_tir/src/lib.rs (1)
2104-2213:⚠️ Potential issue | 🟠 MajorShort-path builtin lookup runs before local receiver resolution.
Trying
baml.{full_path}before receiver-chain handling can misresolve local calls likeimage.from_url(...)as global builtins.💡 Proposed fix
Expr::Path(segments) if segments.len() >= 2 => { let full_path = segments .iter() .map(smol_str::SmolStr::as_str) .collect::<Vec<_>>() .join("."); + let first_segment_is_local = ctx.lookup(&segments[0]).is_some(); + - let prefixed_path = if !full_path.starts_with("baml.") { + let prefixed_path = if !first_segment_is_local && !full_path.starts_with("baml.") { Some(format!("baml.{full_path}")) } else { None }; - if let Some(def) = builtins::lookup_builtin_by_path(&full_path).or_else(|| { + if !first_segment_is_local && let Some(def) = builtins::lookup_builtin_by_path(&full_path).or_else(|| { prefixed_path .as_deref() .and_then(builtins::lookup_builtin_by_path) }) { // Builtin function via Path ...
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (18)
baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__audio.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__image.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__pdf.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____01_lexer__video.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__audio.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__image.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__pdf.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____02_parser__video.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____10_formatter__audio.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____10_formatter__image.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____10_formatter__pdf.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____10_formatter__video.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/src/snapshots/baml_tests__codegen__tests__bytecode_display_expanded.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/src/snapshots/baml_tests__codegen__tests__bytecode_display_expanded_unoptimized.snapis excluded by!**/*.snap
📒 Files selected for processing (17)
baml_language/crates/baml_builtins/baml/audio.bamlbaml_language/crates/baml_builtins/baml/image.bamlbaml_language/crates/baml_builtins/baml/pdf.bamlbaml_language/crates/baml_builtins/baml/video.bamlbaml_language/crates/baml_builtins/src/lib.rsbaml_language/crates/baml_builtins_macros/src/codegen_native.rsbaml_language/crates/baml_builtins_macros/src/collect.rsbaml_language/crates/baml_builtins_macros/src/parse.rsbaml_language/crates/baml_builtins_macros/src/util.rsbaml_language/crates/baml_compiler_hir/src/path_resolve.rsbaml_language/crates/baml_compiler_mir/src/lower.rsbaml_language/crates/baml_compiler_tir/src/builtins.rsbaml_language/crates/baml_compiler_tir/src/lib.rsbaml_language/crates/baml_compiler_tir/src/normalize.rsbaml_language/crates/baml_tests/build.rsbaml_language/crates/bex_vm/src/native.rsbaml_language/crates/bridge_ctypes/types/baml/cffi/v1/baml_inbound.proto
💤 Files with no reviewable changes (1)
- baml_language/crates/baml_compiler_mir/src/lower.rs
| #[derive(Clone)] | ||
| pub(crate) enum DslType { | ||
| /// A standard Rust type parsed by `syn`. | ||
| Syn(Type), | ||
| /// A string literal type (e.g., `"image"`). | ||
| StringLiteral(String), | ||
| /// A union of types (e.g., `"image" | "video" | "audio"`). | ||
| Union(Vec<DslType>), | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
DslType is missing a Debug derive.
The enum is used as a proc-macro intermediate type. Without Debug, any eprintln!("{:?}", dsl_ty) or syn Error::new_spanned(…) reflection that touches DslType will fail to compile, making debugging proc-macro panics harder.
✏️ Proposed fix
-#[derive(Clone)]
+#[derive(Clone, Debug)]
pub(crate) enum DslType {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[derive(Clone)] | |
| pub(crate) enum DslType { | |
| /// A standard Rust type parsed by `syn`. | |
| Syn(Type), | |
| /// A string literal type (e.g., `"image"`). | |
| StringLiteral(String), | |
| /// A union of types (e.g., `"image" | "video" | "audio"`). | |
| Union(Vec<DslType>), | |
| } | |
| #[derive(Clone, Debug)] | |
| pub(crate) enum DslType { | |
| /// A standard Rust type parsed by `syn`. | |
| Syn(Type), | |
| /// A string literal type (e.g., `"image"`). | |
| StringLiteral(String), | |
| /// A union of types (e.g., `"image" | "video" | "audio"`). | |
| Union(Vec<DslType>), | |
| } |
| pub(crate) fn dsl_type_to_pattern( | ||
| dsl_ty: &DslType, | ||
| generic_params: &[String], | ||
| builtin_types: &HashMap<String, String>, | ||
| builtin_enums: &HashMap<String, String>, | ||
| ) -> TokenStream2 { | ||
| match dsl_ty { | ||
| DslType::Syn(ty) => type_to_pattern(ty, generic_params, builtin_types, builtin_enums), | ||
| DslType::StringLiteral(s) => { | ||
| quote!(TypePattern::StringLiteral(#s)) | ||
| } | ||
| DslType::Union(variants) => { | ||
| let patterns: Vec<TokenStream2> = variants | ||
| .iter() | ||
| .map(|v| dsl_type_to_pattern(v, generic_params, builtin_types, builtin_enums)) | ||
| .collect(); | ||
| quote!(TypePattern::Union(vec![#(#patterns),*])) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Get the simple type name from a `DslType` (for native fn generation). | ||
| pub(crate) fn dsl_type_to_simple_name(dsl_ty: &DslType) -> String { | ||
| match dsl_ty { | ||
| DslType::Syn(ty) => type_to_simple_name(ty), | ||
| DslType::StringLiteral(_) => "String".to_string(), | ||
| DslType::Union(variants) => { | ||
| if variants | ||
| .iter() | ||
| .all(|v| matches!(v, DslType::StringLiteral(_))) | ||
| { | ||
| "String".to_string() | ||
| } else { | ||
| dsl_type_to_simple_name(variants.first().expect("empty union")) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Check if a `DslType` is a generic type parameter. | ||
| pub(crate) fn dsl_is_generic_type(dsl_ty: &DslType, generic_params: &[String]) -> bool { | ||
| match dsl_ty { | ||
| DslType::Syn(ty) => is_generic_type(ty, generic_params), | ||
| DslType::StringLiteral(_) | DslType::Union(_) => false, | ||
| } | ||
| } | ||
|
|
||
| /// Check if a `DslType` is `Result<T>` and return the inner type if so. | ||
| /// | ||
| /// String literal and union types are never `Result`. | ||
| /// Returns owned values since the inner type may need wrapping. | ||
| pub(crate) fn dsl_unwrap_result_type(dsl_ty: &DslType) -> (DslType, bool) { | ||
| match dsl_ty { | ||
| DslType::Syn(ty) => { | ||
| let (inner, is_result) = unwrap_result_type(ty); | ||
| if is_result { | ||
| (DslType::Syn(inner.clone()), true) | ||
| } else { | ||
| (dsl_ty.clone(), false) | ||
| } | ||
| } | ||
| DslType::StringLiteral(_) | DslType::Union(_) => (dsl_ty.clone(), false), | ||
| } |
There was a problem hiding this comment.
No unit tests for the four new DSL helpers.
dsl_type_to_pattern, dsl_type_to_simple_name, dsl_is_generic_type, and dsl_unwrap_result_type are non-trivial dispatch functions with branching logic for StringLiteral, Union, and Syn variants. None have unit tests.
As per coding guidelines, prefer Rust unit tests. Please add #[cfg(test)] coverage — at minimum for the StringLiteral and Union branches that are not exercised by the existing type_* tests.
| DslType::Union(variants) => { | ||
| if variants | ||
| .iter() | ||
| .all(|v| matches!(v, DslType::StringLiteral(_))) | ||
| { | ||
| "String".to_string() | ||
| } else { | ||
| dsl_type_to_simple_name(variants.first().expect("empty union")) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
dsl_type_to_simple_name for mixed Union is order-dependent and fragile.
For a Union where not all variants are StringLiteral, the function silently returns the simple name of the first variant. This means "audio" | String and String | "audio" produce different names. Since this name drives native codegen (argument extraction/conversion), a reordering of union variants in a DSL declaration could silently change generated code.
Additionally, variants.first().expect("empty union") will panic inside the proc-macro with an unhelpful message if a DslType::Union(vec![]) is ever constructed.
Consider either: (a) asserting the invariant at construction time in parse.rs, or (b) making the mixed-Union case an explicit compile-time error here.
🛡️ Suggested hardening
DslType::Union(variants) => {
if variants
.iter()
.all(|v| matches!(v, DslType::StringLiteral(_)))
{
"String".to_string()
} else {
- dsl_type_to_simple_name(variants.first().expect("empty union"))
+ // Mixed unions (non-all-StringLiteral) should not reach here.
+ // Union(vec![]) is invalid; Syn variants in a union already have
+ // a fixed canonical name from their first non-literal element.
+ match variants.iter().find(|v| !matches!(v, DslType::StringLiteral(_))) {
+ Some(v) => dsl_type_to_simple_name(v),
+ None => panic!("dsl_type_to_simple_name: empty union reached"),
+ }
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| DslType::Union(variants) => { | |
| if variants | |
| .iter() | |
| .all(|v| matches!(v, DslType::StringLiteral(_))) | |
| { | |
| "String".to_string() | |
| } else { | |
| dsl_type_to_simple_name(variants.first().expect("empty union")) | |
| } | |
| } | |
| } | |
| DslType::Union(variants) => { | |
| if variants | |
| .iter() | |
| .all(|v| matches!(v, DslType::StringLiteral(_))) | |
| { | |
| "String".to_string() | |
| } else { | |
| // Mixed unions (non-all-StringLiteral) should not reach here. | |
| // Union(vec![]) is invalid; Syn variants in a union already have | |
| // a fixed canonical name from their first non-literal element. | |
| match variants.iter().find(|v| !matches!(v, DslType::StringLiteral(_))) { | |
| Some(v) => dsl_type_to_simple_name(v), | |
| None => panic!("dsl_type_to_simple_name: empty union reached"), | |
| } | |
| } | |
| } |
| // Fallback: try with "baml" prefix for short-name builtin access. | ||
| // e.g., image.from_url -> baml.image.from_url | ||
| if first.as_str() != "baml" { | ||
| let mut prefixed = vec![Name::new("baml")]; | ||
| prefixed.extend(segments.iter().cloned()); | ||
| return resolve_path(db, project, &prefixed); | ||
| } |
There was a problem hiding this comment.
Avoid unconditional short-path baml prefix fallback in HIR resolver.
This resolver is scope-agnostic. Prefixing unresolved paths (e.g., image.from_url) can incorrectly steal local receiver chains by resolving them as baml.image.from_url.
💡 Proposed fix
- // Fallback: try with "baml" prefix for short-name builtin access.
- // e.g., image.from_url -> baml.image.from_url
- if first.as_str() != "baml" {
- let mut prefixed = vec![Name::new("baml")];
- prefixed.extend(segments.iter().cloned());
- return resolve_path(db, project, &prefixed);
- }
-
None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Fallback: try with "baml" prefix for short-name builtin access. | |
| // e.g., image.from_url -> baml.image.from_url | |
| if first.as_str() != "baml" { | |
| let mut prefixed = vec![Name::new("baml")]; | |
| prefixed.extend(segments.iter().cloned()); | |
| return resolve_path(db, project, &prefixed); | |
| } | |
| None |
| (TypePattern::Union(patterns), ty) => patterns | ||
| .iter() | ||
| .any(|p| match_pattern_inner(p, ty, bindings)), |
There was a problem hiding this comment.
Isolate union-branch bindings to avoid cross-branch contamination.
patterns.iter().any(...) reuses the same mutable bindings map for every arm. A failed arm can leave partial bindings that incorrectly constrain later arms.
💡 Proposed fix
- (TypePattern::Union(patterns), ty) => patterns
- .iter()
- .any(|p| match_pattern_inner(p, ty, bindings)),
+ (TypePattern::Union(patterns), ty) => patterns.iter().any(|p| {
+ let mut branch_bindings = bindings.clone();
+ if match_pattern_inner(p, ty, &mut branch_bindings) {
+ *bindings = branch_bindings;
+ true
+ } else {
+ false
+ }
+ }),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| (TypePattern::Union(patterns), ty) => patterns | |
| .iter() | |
| .any(|p| match_pattern_inner(p, ty, bindings)), | |
| (TypePattern::Union(patterns), ty) => patterns.iter().any(|p| { | |
| let mut branch_bindings = bindings.clone(); | |
| if match_pattern_inner(p, ty, &mut branch_bindings) { | |
| *bindings = branch_bindings; | |
| true | |
| } else { | |
| false | |
| } | |
| }), |
| TypePattern::StringLiteral(val) => Ty::Literal(LiteralValue::String(val.to_string())), | ||
| TypePattern::Union(patterns) => substitute_with_fallback( | ||
| patterns.first().expect("empty union in TypePattern"), | ||
| bindings, | ||
| ), |
There was a problem hiding this comment.
Union substitution currently drops all but the first member.
Using patterns.first() narrows union patterns incorrectly and can produce wrong inferred parameter/return types.
💡 Proposed fix
- TypePattern::Union(patterns) => substitute_with_fallback(
- patterns.first().expect("empty union in TypePattern"),
- bindings,
- ),
+ TypePattern::Union(patterns) => {
+ if patterns.is_empty() {
+ Ty::Unknown
+ } else {
+ Ty::Union(
+ patterns
+ .iter()
+ .map(|p| substitute_with_fallback(p, bindings))
+ .collect(),
+ )
+ }
+ }| // Media(Generic) is a subtype of any specific media kind. | ||
| // At runtime, the kind is just metadata; the wrapper functions | ||
| // (e.g., image.from_url) ensure the correct kind is set. | ||
| (StructuralTy::Media(_), StructuralTy::Media(_)) => true, |
There was a problem hiding this comment.
The subtyping rule comment and code disagree — and a unit test is missing.
The comment says "Media(Generic) is a subtype of any specific media kind" (one-directional), but the arm (StructuralTy::Media(_), StructuralTy::Media(_)) => true is fully symmetric: it also makes Media(Image) <: Media(Audio). A function declared to accept audio would therefore silently accept an image value at the type-check level.
If cross-kind assignability is intentional (all media kinds are interchangeable), the comment should say so explicitly. If only Media(Generic) <: Media(Specific) was intended, the arm should be tightened:
🛡️ Option A — clarify intent (all-media-compatible, current behaviour)
-// Media(Generic) is a subtype of any specific media kind.
-// At runtime, the kind is just metadata; the wrapper functions
-// (e.g., image.from_url) ensure the correct kind is set.
-(StructuralTy::Media(_), StructuralTy::Media(_)) => true,
+// All Media kinds are mutually assignable. The kind tag is metadata
+// set by constructors (e.g., baml.image.from_url), not a structural
+// restriction; the runtime does not enforce kind at call boundaries.
+(StructuralTy::Media(_), StructuralTy::Media(_)) => true,🛡️ Option B — tighten to Generic-only (if cross-kind should be rejected)
-(StructuralTy::Media(_), StructuralTy::Media(_)) => true,
+// Media(Generic) <: Media(AnyKind): a generic media value may be used
+// where a specific media kind is expected.
+(StructuralTy::Media(baml_base::MediaKind::Generic), StructuralTy::Media(_)) => true,Additionally, the test module has no coverage for this rule. Please add a test:
#[test]
fn test_media_subtyping() {
let aliases = HashMap::new();
// cross-kind (verify current intent)
assert!(is_subtype_of(
&Ty::Media(baml_base::MediaKind::Image),
&Ty::Media(baml_base::MediaKind::Audio),
&aliases,
));
}As per coding guidelines, prefer Rust unit tests over integration tests.
| fn media_kind_from_str(s: &str) -> MediaKind { | ||
| match s { | ||
| "image" => MediaKind::Image, | ||
| "video" => MediaKind::Video, | ||
| "audio" => MediaKind::Audio, | ||
| "pdf" => MediaKind::Pdf, | ||
| _ => MediaKind::Generic, | ||
| } |
There was a problem hiding this comment.
Fail fast on invalid media kind instead of silently coercing to Generic.
The _ => MediaKind::Generic branch masks invalid kinds and can silently produce incorrect media classification.
💡 Proposed fix
fn media_kind_from_str(s: &str) -> MediaKind {
match s {
"image" => MediaKind::Image,
"video" => MediaKind::Video,
"audio" => MediaKind::Audio,
"pdf" => MediaKind::Pdf,
- _ => MediaKind::Generic,
+ "media" => MediaKind::Generic,
+ other => unreachable!("invalid media kind literal: {other}"),
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn media_kind_from_str(s: &str) -> MediaKind { | |
| match s { | |
| "image" => MediaKind::Image, | |
| "video" => MediaKind::Video, | |
| "audio" => MediaKind::Audio, | |
| "pdf" => MediaKind::Pdf, | |
| _ => MediaKind::Generic, | |
| } | |
| fn media_kind_from_str(s: &str) -> MediaKind { | |
| match s { | |
| "image" => MediaKind::Image, | |
| "video" => MediaKind::Video, | |
| "audio" => MediaKind::Audio, | |
| "pdf" => MediaKind::Pdf, | |
| "media" => MediaKind::Generic, | |
| other => unreachable!("invalid media kind literal: {other}"), | |
| } | |
| } |
Summary
Adds BAML builtin media constructor modules
baml.audio,baml.image,baml.pdf, andbaml.videoas thin wrappers aroundbaml.Media.from_*with a fixedkind.Changes
audio.baml,image.baml,pdf.baml,video.bamlinbaml_builtins/baml/, each withfrom_url,from_base64, andfrom_file(delegating tobaml.Media.from_*).baml_builtinslib andbex_vmnative.Clippy
baml_compiler_tir/src/builtins.rs: use.map(substitute_unknown)instead of.map(|p| substitute_unknown(p)).bex_vm/src/native.rs: usemime_type.map(ToString::to_string)instead ofmime_type.map(|s| s.to_string()).Made with Cursor
Summary by CodeRabbit
New Features
Improvements