anwer_clip_ollama: new operator to do vqa on clips

This is similar to BVQA and the answer_transcription_ollama.

This assumes that ollama support video input and has compatible VLMs. E.g. qwen3-vl

One hurdle is finding a method to efficiently pass a video to ollama using FrameSense container architecture.