feat: add media type for unknown file type #53

aftersnow · 2025-05-19T12:44:03Z

In some cases, the model package system may not know the type of the file, or the file is not config, weight, doc, code or dataset, but it is still a valid model file, which is required by the downstream model serving/deployment system.

For instance:

In https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/tree/main, there is a .nemo file.
In https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/tree/main, there is a .task file.
In https://huggingface.co/mradermacher/SpaceThinker-Qwen2.5VL-3B-i1-GGUF/tree/main, there is a .dat file.

So the application/vnd.cnai.model.unknown.v1.tar media type and its friends are needed to represent this type.

The application/vnd.cnai.model.unknown.v1.tar+gzip and application/vnd.cnai.model.unknown.v1.tar+zstd media types represent the gzip and zstd compressed payloads of the application/vnd.cnai.model.unknown.v1.tar media type. If the file is large, implementations are RECOMMENDED to use application/vnd.cnai.model.unknown.v1.raw media type.

Signed-off-by: Zhao Chen <[email protected]>

caozhuozi · 2025-05-20T01:08:36Z

which is required by the downstream model serving/deployment platform.

Dose it mean the model serving/deployment platforms will explicitly require an unknow media type when attempting to serve a specific model?

aftersnow · 2025-05-20T02:51:37Z

which is required by the downstream model serving/deployment platform.

Dose it mean the model serving/deployment platforms will explicitly require an unknow media type when attempting to serve a specific model?

No, only the model package system need the unknown media type, because it's not config, weight, code, doc, or dataset. Package system should just treat it as an opaque binary. At last, the unknown media type should be passed to the model serving/deployment platform, the platform knowns the media type and consumes it. For instance, a .so or .lib file, or some novel model file types.

caozhuozi · 2025-05-20T04:12:17Z

@aftersnow Thanks for the clarification. I'm not sure if "unknown" might be overused if we provide this option.
Perhaps we can further restrict its usage by tying it to weight?
How about using "model.weight.unknown"?

aftersnow · 2025-05-20T06:17:43Z

@aftersnow Thanks for the clarification. I'm not sure if "unknown" might be overused if we provide this option. Perhaps we can further restrict its usage by tying it to weight? How about using "model.weight.unknown"?

Yes, it might be overused, but if an unknown type do exists, which we met frequently in our production env, they maybe set to a wrong type. That is worse than a unknown type.

The problem of model.weight.unknown is that some unknown types are not model weight.

gorkem · 2025-05-20T10:59:52Z

This media type is both unnecessary and imprecise. The unknown media type fails to convey its intended use. A packaging system must embed clear, unambiguous metadata so that downstream services can automatically and reliably recognize exactly what they’re handling. Without this metadata, interoperability collapses and automation pipelines will inevitably break.

aftersnow · 2025-05-21T06:30:03Z

This media type is both unnecessary and imprecise. The unknown media type fails to convey its intended use. A packaging system must embed clear, unambiguous metadata so that downstream services can automatically and reliably recognize exactly what they’re handling. Without this metadata, interoperability collapses and automation pipelines will inevitably break.

Yes, I agree with you. But what if the system need an type, for instance, a .so file, a .lib file or a binary file? It's not code, config, weight, dataset, or doc type. @gorkem Maybe we can change unknown to other, or misc?

amisevsk · 2025-05-21T17:57:47Z

In my opinion, object and lib files fall under the umbrella of code. I can't think of a case where we would want to distinguish between .so files and actual source code/scripts.

aftersnow · 2025-05-22T12:17:13Z

Thank you @amisevsk. The .so example maybe not enough, here are more examples from Hugging Face:

In https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/tree/main, there is a .nemo file.
In https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/tree/main, there is a .task file.
In https://huggingface.co/mradermacher/SpaceThinker-Qwen2.5VL-3B-i1-GGUF/tree/main, there is a .dat file.

Our package system cannot correctly categorize these files. They may not belong to categories such as model weight, code, doc, config, or dataset. Therefore, we need a media type to accommodate this kind of file. Maybe "unknown" is misleading, perhaps "misc" or "other" are better?

amisevsk · 2025-05-22T17:08:41Z

My approach in developing KitOps thus far has been to do a 'best effort' categorization, and leave it to the user to clarify any issues. With our current implementation, .nemo, .task, and .dat files would get included as 'code'-type layers (though in this case they appear to all be model-related).

To me, an 'unknown'/'misc' category is an undesirable element of the spec, as it's a dead end. Ideally, as the package system improves, it should be able to categorize all incoming files relatively accurately, accepting user input to correct any errors. With 'unknown' layers, tooling using the spec has to basically pretend they don't exist.

In other words, if we hit a file that can't be categorized, ultimately it will be on the end-user to provide additional context (i.e. say "this .nemo file is a model-related file and we would like it to be treated as such"). Sticking it in an 'unknown' layer type feels like a proxy for just returning an error or requiring additional input at packaging time.

aftersnow · 2025-05-23T04:28:39Z

To me, an 'unknown'/'misc' category is an undesirable element of the spec, as it's a dead end. Ideally, as the package system improves, it should be able to categorize all incoming files relatively accurately, accepting user input to correct any errors. With 'unknown' layers, tooling using the spec has to basically pretend they don't exist.

It seems the word unknown is the problem. What if we change unknown to opaque (it means the package system should only pass it to the user transparently), or some thing like that? Qe can try to find a better name to solve this problem as we talked in the meeting yesterday? @gorkem @amisevsk

Ideally, as the package system improves, it should be able to categorize all incoming files relatively accurately, accepting user input to correct any errors

First, there is always a new type in the rapid developing area of AI, the package system may need to upgrade very frequently, it's a big burden to us. Second, maybe there is no correct type now, for instance, '.nemo' or 'task'. The current code, config, dataset, weight, doc, none of them is correct.

feat: add media type for unkonwn file type

5d50d45

Signed-off-by: Zhao Chen <[email protected]>

aftersnow force-pushed the add-unknown-media-type branch from 1350752 to 5d50d45 Compare May 19, 2025 12:45

aftersnow requested a review from gorkem May 20, 2025 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add media type for unknown file type #53

feat: add media type for unknown file type #53

Uh oh!

aftersnow commented May 19, 2025 •

edited

Loading

Uh oh!

caozhuozi commented May 20, 2025 •

edited

Loading

Uh oh!

aftersnow commented May 20, 2025

Uh oh!

caozhuozi commented May 20, 2025

Uh oh!

aftersnow commented May 20, 2025 •

edited

Loading

Uh oh!

gorkem commented May 20, 2025

Uh oh!

aftersnow commented May 21, 2025 •

edited

Loading

Uh oh!

amisevsk commented May 21, 2025

Uh oh!

aftersnow commented May 22, 2025

Uh oh!

amisevsk commented May 22, 2025 •

edited

Loading

Uh oh!

aftersnow commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: add media type for unknown file type #53

Are you sure you want to change the base?

feat: add media type for unknown file type #53

Uh oh!

Conversation

aftersnow commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caozhuozi commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aftersnow commented May 20, 2025

Uh oh!

caozhuozi commented May 20, 2025

Uh oh!

aftersnow commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gorkem commented May 20, 2025

Uh oh!

aftersnow commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amisevsk commented May 21, 2025

Uh oh!

aftersnow commented May 22, 2025

Uh oh!

amisevsk commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aftersnow commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aftersnow commented May 19, 2025 •

edited

Loading

caozhuozi commented May 20, 2025 •

edited

Loading

aftersnow commented May 20, 2025 •

edited

Loading

aftersnow commented May 21, 2025 •

edited

Loading

amisevsk commented May 22, 2025 •

edited

Loading

aftersnow commented May 23, 2025 •

edited

Loading