-
Notifications
You must be signed in to change notification settings - Fork 11
feat: add media type for unknown file type #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zhao Chen <[email protected]>
1350752
to
5d50d45
Compare
Dose it mean the model serving/deployment platforms will explicitly require an |
No, only the model package system need the |
@aftersnow Thanks for the clarification. I'm not sure if "unknown" might be overused if we provide this option. |
Yes, it might be overused, but if an unknown type do exists, which we met frequently in our production env, they maybe set to a wrong type. That is worse than a unknown type. The problem of |
This media type is both unnecessary and imprecise. The unknown media type fails to convey its intended use. A packaging system must embed clear, unambiguous metadata so that downstream services can automatically and reliably recognize exactly what they’re handling. Without this metadata, interoperability collapses and automation pipelines will inevitably break. |
Yes, I agree with you. But what if the system need an type, for instance, a |
In my opinion, object and lib files fall under the umbrella of code. I can't think of a case where we would want to distinguish between .so files and actual source code/scripts. |
Thank you @amisevsk. The In https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/tree/main, there is a .nemo file. Our package system cannot correctly categorize these files. They may not belong to categories such as model weight, code, doc, config, or dataset. Therefore, we need a media type to accommodate this kind of file. Maybe "unknown" is misleading, perhaps "misc" or "other" are better? |
My approach in developing KitOps thus far has been to do a 'best effort' categorization, and leave it to the user to clarify any issues. With our current implementation, To me, an 'unknown'/'misc' category is an undesirable element of the spec, as it's a dead end. Ideally, as the package system improves, it should be able to categorize all incoming files relatively accurately, accepting user input to correct any errors. With 'unknown' layers, tooling using the spec has to basically pretend they don't exist. In other words, if we hit a file that can't be categorized, ultimately it will be on the end-user to provide additional context (i.e. say "this .nemo file is a model-related file and we would like it to be treated as such"). Sticking it in an 'unknown' layer type feels like a proxy for just returning an error or requiring additional input at packaging time. |
It seems the word
First, there is always a new type in the rapid developing area of AI, the package system may need to upgrade very frequently, it's a big burden to us. Second, maybe there is no correct type now, for instance, '.nemo' or 'task'. The current |
In some cases, the model package system may not know the type of the file, or the file is not config, weight, doc, code or dataset, but it is still a valid model file, which is required by the downstream model serving/deployment system.
For instance:
In https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/tree/main, there is a .nemo file.
In https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/tree/main, there is a .task file.
In https://huggingface.co/mradermacher/SpaceThinker-Qwen2.5VL-3B-i1-GGUF/tree/main, there is a .dat file.
So the
application/vnd.cnai.model.unknown.v1.tar
media type and its friends are needed to represent this type.The
application/vnd.cnai.model.unknown.v1.tar+gzip
andapplication/vnd.cnai.model.unknown.v1.tar+zstd
media types represent the gzip and zstd compressed payloads of theapplication/vnd.cnai.model.unknown.v1.tar
media type. If the file is large, implementations are RECOMMENDED to useapplication/vnd.cnai.model.unknown.v1.raw
media type.