-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial stream synchronization does not work #12
Comments
Maybe the current logic is not sound: delay adds a delay on the played audio. However the TimedStream is removing samples for a defined time period, so it is removing/decreasing the delay.
Which processor are you currently using ? |
SnapProcessorRTOS |
Maybe that's why I don't understand the sync. And I struggle a bit understanding things. You've created quite something with the arduino-audio-tools. 😊 I kind of just jumped into it, without reading up on the Snapcast protocol or grasping the complexity of the arduino-audio-tools. I know a bit about the I2S peripheral, but that's about it. I've used earlephilhower's ESP8266Audio before. As far as I understand we have all the timestamps from the packets available to us. And SNTP time etc. As far as I understand Snapcast, we need to sync our internal clock, buffer all the chunks (1000ms per default) and once the timestamp is valid, write it to the I2S. (...keep the buffer filled with minus buffer delay.) I can store about 36*1024 bytes(?) into the I2S DMA buffer, if my math's is correct, that should be about 192ms. The rest needs to be buffered either after it's been uncompressed by the decoder, or still in compressed format. I still haven't looked at Snapcast and whether it sends us the chunks automatically, or if we can control what is being read. I can't come up with a good architecture without knowing all the details, we obviously need to do blocking writes to the I2S buffer, keep it filled and maybe report back the delay introduced by that, and further up in the pipeline resample and decode the data, maybe in large chunks so the CPU has some time to do other things, and at the other side of the decoder we either need to poll more data once we need it, or keep it buffered if we don't have control over what Snapserver sends to us. The sync has to factor in (just) the timestamps and it'd best be somewhere close to the output, before(?) the resampling takes place. And I'm not sure if we can write a better resampling class. Can we use the APLL to control the sample rate of I2S? If we can make that slightly slower or faster, we don't need to do resampling in software. |
What you describe is pretty much the current implementation of the SnapProcessorRTOS. A couple of additional information:
The SnapProcessor implementation currently just processes the received data by resampling and sending it to I2S, w/o any buffering involved. If the resampling rate is not absolutely correct this gives the risk of buffer under or overruns and if the processing chain with the decoding is too slow it will not work at all. As far as startup synchronization is concerned, my gut feeling is that it is quite impossible to come up with an automatic solution, taking the memory constraints into consideration |
Ah, alright. I've had a look at the timestamps but wasn't sure if they're relative (...just on the device) or absolute, especially why we need the SNTP time. But I looked at the resampler at that point.
Agree. The ESP32 has a crazy big RAM compared to other microprocessors. And we can decide when to uncompress the data. If your architecture does blocking writes to the next step in the pipeline, keeping the data close to the source shouldn't be a problem.
The Snapserver has an option to set the buffer size. I think the prior attempts at bringing Snapcast to the ESP suggested to lower that so the microcontroller doesn't need to buffer a whole 1000ms. I'm not sure. Since the SnapProcessorRTOS has a fixed buffer size, we could try to fit everything or output a warning and skip some of the buffering. But I agree that adds to the complexity of the delay calculations. I mean my ESP32-WROVER has 4Mbit of PSRAM available and we can do dynamic memory allocation. |
You can define the buffer size for SnapProcessorRTOS in it's constructor, so it's not fixed... |
Hehe, yeah I meant fixed at compile time, not dynamically allocated at runtime / connecting. Edit: Theoretically we could do a |
I committed some correction for RTOS, where the startup synchronization is using a simple delay. |
I was just testing: The needed value for the delay parameter seems to be around -78ms. I did not implement any dynamic buffer, because I could not come up with a good formula. Apart from this it's just a call of the resize() method of my buffer implementation. I am currently doing this in the constructor... |
Alright. I'm out for the weekend. I'll try the new code and updated value later. We could fit the AudioStream or the AudioOutput classes with a method to optionally report back the max delay introduced by the contained buffers, add all of that up through the chain and make it so the value doesn't need to be updated by the user when they change the size of a buffer. I really have to read some more of this code to come up with something useful. And with the dynamic buffer: I think it's perfectly alright for now to delegate that to the user of this library. They probably know if they have PSRAM available and what codec and parameters they set in the Snapserver. (And what else they're doing on the µC.) So unless there is a clever and reliable way, it's probably best to make the user deal with this. |
Thanks for the hint with the server settings. By lowering the chunck_ms an the buffer on the server, PCM is now working also pretty well... |
I got lazy and therfore like to work with audio boards like the LyraT or the AI Thinker AudioKit |
Hmm, with the following setting it sounds about right: SnapProcessorRTOS rtos(1024*30);
/* ... */
cfg.sample_rate = 48000;
cfg.buffer_count = 30;
cfg.buffer_size = 1024; However it still doesn't matter what I set in the SnapTimeSync ... at least on startup... The Dynamic variant starts to resample at some point later and shifts things around. The values are in milliseconds, right? So I should be hear a difference of 500? Can you confirm changing the value of SnapTimeSyncFixed synch(-78, 1.0f);
//SnapTimeSyncDynamic synch(-500, 10);
/* ... */
client.begin(synch); Edit: I've set CONFIG_SNAPCLIENT_SNTP_ENABLE=true and set it to a NTP server on the same machine, if that matters... And the SnapTimeSyncDynamic seems to also sync to the same point in the stream, it's now always resamples to slightly ahead of my phone, despite the varying numbers I put into the constructor. |
I confirm that it is in milliseconds. If you use 0, the system is using the delay reported by the server and you hear the difference that is lost by buffering and decoding. By entering -500, the system starts to play 500 ms earlier. |
I was going thru the stanard Processor logic and found that I can easily add a buffer on the receiving side as well: this gives the new class SnapProcessorBuffered. But I am totally confused and would need your opinion: in the beginning the buffer gets filled and it plays using the buffer, but during the first couple of minutes the available data in the buffer is slightly decreasing until it gets empty and only contains the last receive data. When I decrease the resampling (and make the audio playing slower), the playback speed is really reduced, but the behavior stays the same. It is as if there is some synchronization logic on the server... |
It's the same here, it takes 45 seconds and the buffer is empty. I spent quite some time looking for it, but I didn't find anything in the client. I'm not sure if the whole approach is feasible. Reading up on the snapcast protocol, it seems to me we're not supposed to be clever and do many calculations. What I'd like to try is something like a class Chunk {
public:
uint8_t *data;
size_t size;
int32_t sec;
int32_t usec;
}; keep it buffered with the individual timestamps, and feed that through all the write() and writeAudio() methods through the decoder and everything until it arrives at the resampler. The resampler then has the timestamps along with the data to write out. This would require a good amount of changes, though. |
Actually, at first sight, this might be pretty easy to implement if we just wrap the data in a container. This would also make the buffering simpler and remove the need to store the number of bytes in a separate buffer. |
I'd move all the logic to calculate the speed and initial wait there. Currently we do all the calculations and decision-making right after the packets arrive. If there's buffering in-between all of that factors in and needs to be handled. When the wireChunk arrives, I'd just subtract the time difference to arrive at the local timestamp and pass everything down. The synchronization now happens close to where it's handed over to the i2s peripheral, which makes it more accurate anyways. The SnapProcessor and SnapOutput don't need to worry about sync and all the delay and synchronizeOnStart code and the complexity can be removed there. All they need to do is buffer the incoming wireChunks and pass them on. As much as the pipeline accepts. I made a rough draft, I hope this makes it more clear: class SyncedStream : public ResampleStream {
public:
size_t write(Chunk chunk) override {
size_t written = 0;
if (is_running) { // keep buffer filled
written = ResampleStream::write(chunk.data, chunk.size);
time_in_buffer = chunk.timestamp();
} else { // waiting for start
if (snap_time.time() + processing_delay >= chunk.timestamp()) {
written = ResampleStream::write(chunk.data, chunk.size);
time_in_buffer = chunk.timestamp();
is_running = true;
}
}
factor = (snap_time.time() + processing_delay) / time_in_buffer;
return written;
}
void updatePlaybackFactor() { setStepSize(factor); }
protected:
SnapTime &snap_time = SnapTime::instance();
timeval time_in_buffer;
bool is_running = false;
float factor = 1.0f;
};
/* --- */
class SnapProcessorBuffered {
public:
bool wireChunk(SnapMessageWireChunk &wire_chunk_message) {
Chunk chunk((uint8_t*)wire_chunk_message.payload, (size_t)wire_chunk_message.size, wire_chunk_message.timestamp + getLatency());
return buffer.bufferChunk(chunk);
}
virtual void processExt() {
while (p_snap_output->available() >= buffer.nextChunkSize()) { // ToDo
Chunk outputChunk = buffer.getChunk();
size_t written = p_snap_output->write(outputChunk);
}
}
protected:
RingBuffer<uint8_t> buffer{0};
}; I'm not sure. At this point it's a rough idea. I think it would make some of the code simpler. I think with this approach that would be about the complexity needed to keep everything in sync and also do the initial startup sync. We'd be sure the I2S buffer is filled at all times and there's just the input buffer to worry about. There are still some issues. I'd need to figure out how to pass the Chunk objects through the decoder. And I'm not sure if I can take the architecture as is, or if this requires me to do additional stuff like non-blocking writes, or handling parts of chunks. As you said, there is no way to figure out the level of the I2S DMA buffer. All I can do to be clever is do a write to it with a timeout of '0' and see if everything got written. And additionally use the blocking writes so the writing speed propagates back through the chain. And there is the size of the input buffer. We need some extra handling if the input buffer can't fit arriving chunks and it can also not write. That shouldn't happen in the first place, but if it does, I'd rather mess with the delay than drop chunks. But this would require additional handling. |
I like your idea of the SyncedStream: this is elegant and would actually work independently of the chosen processor implementation. So I am planning to just use a BinaryContainer to pass meta data along the output chain as part of the stream data. |
How is the progress on this topic? I have a Snapcast client running on an ESP32 (with If I understand the syncro approach correctly your code adjust the complete playback speed of a received chunk, but the original snapcast docu suggests to double single samples or drop single samples, so couldn't a receiving buffer (let's say 6 samples) have the last sample cloned 2 times
|
No matter what I set in the
SnapTimeSyncDynamic synch(CONFIG_PROCESSING_TIME_MS, 10)
or SnapTimeSyncFixed, it doesn't have any effect. However, if I reintroduce the delay() in line 251 below, that has an effect. I verified the TimedStream::isPlaying() now spends some time returning false, but that's not it.arduino-snapclient/src/api/SnapOutput.h
Lines 248 to 253 in 8da9a85
The text was updated successfully, but these errors were encountered: