Skip to content

Example for running custom GGUF models #517

@drtinkerer

Description

@drtinkerer

Hi All,
I was able to build my custom GGUF model file which is around 5 GB inside Ollama docker container image.
On docker, I am able to run this container without any issues and it works as expected.

Following the docs, I tried running this same image on KubeAI as model CRD using

apiVersion: kubeai.org/v1
kind: Model
metadata:
  name: model-test
spec:
  features: ["TextGeneration"]
  owner: custom
  image: drtinkerer/ollama-test:deepseek-test
  url: "ollama://deepseek-local"
  engine: OLlama
  resourceProfile: cpu:1

The pod that comes up has below startup probe configured

    Startup:    exec [bash -c /bin/ollama pull deepseek-local && /bin/ollama cp deepseek-local model-test && /bin/ollama run model-test hi] delay=1s timeout=10800s period=3s #success=1 #failure=10

Now the URL i have put there is not available on ollama registry and the startup probe seems to be trying to pull it.
So, the startup probe fails as it tried to pull non-existing image where I want to utilize image that is baked inside Ollama container image itself.

Error: pull model manifest: file does not exist
  Normal  Started  54s (x5 over 2m55s)  kubelet  Started container server
  Normal  Created  54s (x5 over 2m55s)  kubelet  Created container: server
  Normal  Pulled   54s (x5 over 2m55s)  kubelet  Container image "drtinkerer/ollama-test:deepseek-test" already present on machine
  Normal  Killing  25s (x5 over 2m25s)  kubelet  Container server failed startup probe, will be restarted

In my particular use case with GGUF model inside ollama container, my startupProbe should only look like /bin/ollama run model-test or perhaps, I want to configure startupProbe as httpGet instead of script for my own pod.

Same behaviour is there when I upload my GGUF to a PVC and try to load it from there.
Its the startup probe that fails always.

I believe this is the function responsible for configuring startup probe that needs to be configurable.

https://github.com/substratusai/kubeai/blob/d6e393ca76f11da76b3a6db74b737b94d1a4f057/internal/modelcontroller/engine_ollama.go#L171

  • Is my method of putting GGUF model inside docker image correct ?
  • Are there better ways to achieve the same ?
  • Can the startup probe made be customisable ?

Any help is appreciated. Thanks :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions