-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor dataprep multimedia2text #1065
base: main
Are you sure you want to change the base?
Conversation
Hello @MSCetin37, we need to refactor the multimedia2text component and the relevant example DocSum. Firstly thanks for your previous contribution of multimedia2text. After rechecking your code, we find that the audio2text is mostly duplicated with the existing ASR component and the video2audio can be replaced with a simple ffmpeg command of the conversion from video to audio. By rechecking DocSum, the only example that requires this multimedia2text functionality, we find that we can basically refactor the logics with minimal extra hardware resources like following:
This will have following advantages:
I will later open a PR for this on the GenAIExample side, and if @MSCetin37 you have any suggestions, please don't hesitate to tell me! |
Here is the relevant refactor PR opea-project/GenAIExamples#1286 in examples. |
@Spycsh > * Remove the duplicate part of audio2text
> * Move the video2audio to the preparation stage in the example DocSum
My overall thoughts are that implementing a service (similar to multimedia2text) that can convert any data domain to a targeted domain will simplify and expand the scope of the implementations in OPEA.
|
Hi @MSCetin37 ,
Yes as you may notice, we leverage the
Sure however I think video2audio is super lightweight and can be easily replaced with
I agree with you if what you mean is to add a multimedia2text component, which serves like a wrapper (or controller) to access existing whisper(asr)/speecht5(tts)/video2audio(local ffmpeg or moviepy conversion) etc, even other multimedia related functionalities. And I think it would be good for the future MultiModal Q&A. The name can also not to be exactly "multimedia2text", but something else (multimediaprocessing?). Currently there are some refactoring work that I have to do this week. So do you agree we just keep it simple now? The hard requirements is that |
Sounds good to me. |
Description
Refactor dataprep multimedia2text
Issues
na
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
na
Tests
na