Conversation
…nxruntime-genai into nebanfic/stft-genai
| namespace Microsoft.ML.OnnxRuntimeGenAI | ||
| { | ||
| public static class SignalProcessor | ||
| { |
There was a problem hiding this comment.
Can these APIs be added to the MultiModalProcessor instead of introducing a new class? All multi-modal models (e.g. Whisper, Phi-4 mm, etc) use that processor for pre-processing.
|
|
||
| [DllImport(NativeLib.DllName, CallingConvention = CallingConvention.Winapi)] | ||
| public static extern int /* extError_t */ OgaSplitSignalSegments( | ||
| IntPtr /* const OgaTensor* */ input, |
There was a problem hiding this comment.
Can we follow the same tab spacings as the other native method APIs listed in this file for the new APIs?
| } | ||
|
|
||
| template <typename T> | ||
| OrtxTensor* MakeOrtxTensor(Generators::Tensor* src) { |
There was a problem hiding this comment.
Since the Generators namespace is included at the top, I think you can remove the Generators:: part before each Tensor.
| const OgaTensor* hop_ms_tensor, | ||
| const OgaTensor* energy_threshold_db_tensor, | ||
| OgaTensor* output0) { | ||
| OGA_TRY |
There was a problem hiding this comment.
Can we add unit tests for the new APIs in ORT GenAI's C API tests?
| */ | ||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaProcessorProcessImagesAndAudiosAndPrompts(const OgaMultiModalProcessor*, const OgaStringArray* prompts, const OgaImages* images, const OgaAudios* audios, OgaNamedTensors** input_tensors); | ||
|
|
||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaSplitSignalSegments( |
There was a problem hiding this comment.
Can you add C# API tests that call these C APIs?
| const OgaTensor* energy_threshold_db_tensor, | ||
| OgaTensor* output0); | ||
|
|
||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaMergeSignalSegments( |
There was a problem hiding this comment.
Let's add comments above these new APIs to explain their usage and their parameters.
|
|
||
| OGA_EXPORT OgaResult* OGA_API_CALL OgaMergeSignalSegments( | ||
| const OgaTensor* segments_tensor, | ||
| const OgaTensor* merge_gap_ms_tensor, |
There was a problem hiding this comment.
Given that the type of the object is an OgaTensor*, I think we can remove the _tensor suffix from the parameter names.
| reinterpret_cast<const Generators::Tensor*>(input), | ||
| reinterpret_cast<const Generators::Tensor*>(sr_tensor), | ||
| reinterpret_cast<const Generators::Tensor*>(frame_ms_tensor), | ||
| reinterpret_cast<const Generators::Tensor*>(hop_ms_tensor), |
There was a problem hiding this comment.
To set the return value to output0 in a way that does not cause memory corruptions, you may need to use ReturnShared<OgaTensor> and change output0 to OgaTensor**.
onnxruntime-genai/src/ort_genai_c.cpp
Lines 632 to 638 in cac5438
Let's also rename output0 to out since there's just one output and to be consistent with existing API conventions.
There was a problem hiding this comment.
It doesn't cause memory corruption anymore, code I have here now is working correctly, so I wouldn't mess up again with changing it.
| OgaTensor* output0) { | ||
| OGA_TRY | ||
| return reinterpret_cast<OgaResult*>( | ||
| Generators::MergeSignalSegments( |
There was a problem hiding this comment.
We can move SplitSignalSegments and MergeSignalSegments inside a processor object. The outputs from those methods can then be set to the output tensors from the processor APIs.
auto tensor = processor->SplitSignalSegments(...);
*out = ReturnShared<OgaTensor>(tensor);auto tensor = processor->MergeSignalSegments(...);
*out = ReturnShared<OgaTensor>(tensor);Then these new APIs could be called similar to how methods such as OgaTokenizerEncodeBatch look in the above review comment. This will also help add future support for these new APIs in other language bindings.
There was a problem hiding this comment.
Why would I have to construct a new object which has no connection with my current functionality? I think what we need to do here is to clarify, do we even need a Processor class, and if we do, what exactly should be inside it, because I see many methods not being a part of the class (e.g. ProcessTensor is not a part of the class but is within the same file).
I can do this, but we really need to clarify this.
| { | ||
| long[] inputShape = new long[] { 1, inputSignal.Length }; | ||
|
|
||
| input = CreateFloatTensorFromArray(inputSignal, inputShape); |
There was a problem hiding this comment.
Can you construct a Tensor object directly for the input data? Each Tensor object has its own IntPtr handle that you can provide for the NativeMethods.CAPI call.
onnxruntime-genai/src/csharp/Tensor.cs
Lines 28 to 45 in 9d3631d
The Tensor class can help with memory management during disposal.
Here is an example with Tensor from a unit test.
onnxruntime-genai/test/csharp/TestOnnxRuntimeGenAIAPI.cs
Lines 637 to 669 in 9d3631d
There was a problem hiding this comment.
Yes, this approach gave me memory corruption as well, there might be an issue with Tensor class, but I can give it another try.
| const OrtxTensor* frame_tensor = Generators::MakeOrtxTensorConst<int64_t>(frame_ms_tensor); | ||
| const OrtxTensor* hop_tensor = Generators::MakeOrtxTensorConst<int64_t>(hop_ms_tensor); | ||
| const OrtxTensor* thr_tensor = Generators::MakeOrtxTensorConst<float>(energy_threshold_db_tensor); | ||
| OrtxTensor* out_tensor = Generators::MakeOrtxTensor<int64_t>(output0); |
There was a problem hiding this comment.
I would recommend looking at how the tokenizer and processor APIs are currently implemented (ex: Process method in the Whisper processor). That will help with making the implementations of the two methods similar to existing implementations.
-
You can use
CheckResultin place of theerrcheck. -
Are tensors really needed for most of the parameters passed to
OrtxSplitSignalSegments? The sampling rate (sr_tensor), frame size (frame_ms), hop length (hop_ms), and energy threshold (energy_threshold_db) are singular values. Can we modifyOrtxSplitSignalSegmentsto use primitive types so that primitive types can be used in ORT GenAI for the "language binding API --> ORT GenAI C API --> ORT GenAI C++ API" flow? That will also greatly clean up the need for all of the workarounds you have in the C# bindings and in this file. -
If
OrtxSplitSignalSegmentsis updated, you can have the output stored asOrtxTensorResult** resultinside that extensions API. Then you can create the output tensor inside this method instead of passing it in as a parameter toSplitSignalSegments. Here is an example.
onnxruntime-genai/src/models/whisper_processor.cpp
Lines 27 to 31 in 9d3631d
There was a problem hiding this comment.
-
Its more consistant having tensors, but I can do your approach as well, I have no preference, I was just trying to be consistant + I haven't seen built-in types used for Oga API.
-
It is possible, yes, but since I am initializing tensors on C# side, again for consistancy and simplicity (no need for **), I decided to create all on the caller side. But the extension should, in theory, work for both cases.
|
Thanks for reviewing the PR @kunal-vaishnavi , I am currently still testing the changes from here with Audio Streaming, and will for now only address the comments that are fast to resolve, while others will follow later. |
No description provided.