Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial stream synchronization does not work #12

Open
h3ndrik opened this issue Mar 1, 2024 · 20 comments
Open

Initial stream synchronization does not work #12

h3ndrik opened this issue Mar 1, 2024 · 20 comments

Comments

@h3ndrik
Copy link
Contributor

h3ndrik commented Mar 1, 2024

No matter what I set in the SnapTimeSyncDynamic synch(CONFIG_PROCESSING_TIME_MS, 10) or SnapTimeSyncFixed, it doesn't have any effect. However, if I reintroduce the delay() in line 251 below, that has an effect. I verified the TimedStream::isPlaying() now spends some time returning false, but that's not it.

// wait for the audio to become valid
ESP_LOGI(TAG, "starting after %d ms", delay_ms);
setPlaybackFactor(p_snap_time_sync->getFactor());
// replaced delay(delay_ms); with timed_stream
timed_stream.setStartMs(delay_ms);
timed_stream.begin();

@pschatzmann
Copy link
Owner

pschatzmann commented Mar 2, 2024

Maybe the current logic is not sound: delay adds a delay on the played audio. However the TimedStream is removing samples for a defined time period, so it is removing/decreasing the delay.

  • When using the SnapProcessorRTOS, the easiest would be to use only the delay() w/o TimedStream.

  • When using the SnapProcessor, this is not acceptable however: The most memory efficient way, I think, would be to add some buffer for the encoded data and then fine tune with the TimedStream. From a user point of view, the easiest way would be to add a buffer on the decoded data, but this is wasting valuable RAM.

Which processor are you currently using ?

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 2, 2024

SnapProcessorRTOS

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 2, 2024

Maybe that's why I don't understand the sync. And I struggle a bit understanding things. You've created quite something with the arduino-audio-tools. 😊 I kind of just jumped into it, without reading up on the Snapcast protocol or grasping the complexity of the arduino-audio-tools. I know a bit about the I2S peripheral, but that's about it. I've used earlephilhower's ESP8266Audio before.

As far as I understand we have all the timestamps from the packets available to us. And SNTP time etc. As far as I understand Snapcast, we need to sync our internal clock, buffer all the chunks (1000ms per default) and once the timestamp is valid, write it to the I2S. (...keep the buffer filled with minus buffer delay.) I can store about 36*1024 bytes(?) into the I2S DMA buffer, if my math's is correct, that should be about 192ms. The rest needs to be buffered either after it's been uncompressed by the decoder, or still in compressed format. I still haven't looked at Snapcast and whether it sends us the chunks automatically, or if we can control what is being read. I can't come up with a good architecture without knowing all the details, we obviously need to do blocking writes to the I2S buffer, keep it filled and maybe report back the delay introduced by that, and further up in the pipeline resample and decode the data, maybe in large chunks so the CPU has some time to do other things, and at the other side of the decoder we either need to poll more data once we need it, or keep it buffered if we don't have control over what Snapserver sends to us. The sync has to factor in (just) the timestamps and it'd best be somewhere close to the output, before(?) the resampling takes place. And I'm not sure if we can write a better resampling class. Can we use the APLL to control the sample rate of I2S? If we can make that slightly slower or faster, we don't need to do resampling in software.

@pschatzmann
Copy link
Owner

What you describe is pretty much the current implementation of the SnapProcessorRTOS.

A couple of additional information:

  • Snapcast is pushing data and as far as I can see we can not control this.
  • I am synchronizing the time from the server with the micropocessor time reported by millis() and translate any differences to give the resampling rate.
  • The goal of this implementation is to have a memory efficient solution that runs also on other processors, so buffering of decoded data should be avoided.

The SnapProcessor implementation currently just processes the received data by resampling and sending it to I2S, w/o any buffering involved. If the resampling rate is not absolutely correct this gives the risk of buffer under or overruns and if the processing chain with the decoding is too slow it will not work at all.

As far as startup synchronization is concerned, my gut feeling is that it is quite impossible to come up with an automatic solution, taking the memory constraints into consideration

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 2, 2024

Ah, alright. I've had a look at the timestamps but wasn't sure if they're relative (...just on the device) or absolute, especially why we need the SNTP time. But I looked at the resampler at that point.

memory efficient solution that runs also on other processors, so buffering of decoded data should be avoided

Agree. The ESP32 has a crazy big RAM compared to other microprocessors. And we can decide when to uncompress the data. If your architecture does blocking writes to the next step in the pipeline, keeping the data close to the source shouldn't be a problem.

it is quite impossible to come up with an automatic solution, taking the memory constraints into consideration

The Snapserver has an option to set the buffer size. I think the prior attempts at bringing Snapcast to the ESP suggested to lower that so the microcontroller doesn't need to buffer a whole 1000ms. I'm not sure. Since the SnapProcessorRTOS has a fixed buffer size, we could try to fit everything or output a warning and skip some of the buffering. But I agree that adds to the complexity of the delay calculations. I mean my ESP32-WROVER has 4Mbit of PSRAM available and we can do dynamic memory allocation.

@pschatzmann
Copy link
Owner

pschatzmann commented Mar 2, 2024

You can define the buffer size for SnapProcessorRTOS in it's constructor, so it's not fixed...
But you are right, we should issue an error message if the buffer was too small!

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 2, 2024

Hehe, yeah I meant fixed at compile time, not dynamically allocated at runtime / connecting.

Edit: Theoretically we could do a ESP.getFreeHeap() or ESP.getFreePsram() and then do malloc(). But we'd have to calculate what size it needs to be, given the parameters from the server. Or just grab whatever is available or a predetermined size as like now.

@pschatzmann
Copy link
Owner

I committed some correction for RTOS, where the startup synchronization is using a simple delay.
I did not have the time to test this yet however...

@pschatzmann
Copy link
Owner

I was just testing: The needed value for the delay parameter seems to be around -78ms.

I did not implement any dynamic buffer, because I could not come up with a good formula. Apart from this it's just a call of the resize() method of my buffer implementation. I am currently doing this in the constructor...

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 2, 2024

Alright. I'm out for the weekend. I'll try the new code and updated value later. We could fit the AudioStream or the AudioOutput classes with a method to optionally report back the max delay introduced by the contained buffers, add all of that up through the chain and make it so the value doesn't need to be updated by the user when they change the size of a buffer. I really have to read some more of this code to come up with something useful.

And with the dynamic buffer: I think it's perfectly alright for now to delegate that to the user of this library. They probably know if they have PSRAM available and what codec and parameters they set in the Snapserver. (And what else they're doing on the µC.) So unless there is a clever and reliable way, it's probably best to make the user deal with this.

@pschatzmann
Copy link
Owner

Thanks for the hint with the server settings. By lowering the chunck_ms an the buffer on the server, PCM is now working also pretty well...

@pschatzmann
Copy link
Owner

I got lazy and therfore like to work with audio boards like the LyraT or the AI Thinker AudioKit

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 2, 2024

Hmm, with the following setting it sounds about right:

SnapProcessorRTOS rtos(1024*30);

/* ... */
  cfg.sample_rate = 48000;
  cfg.buffer_count = 30;
  cfg.buffer_size = 1024;

However it still doesn't matter what I set in the SnapTimeSync ... at least on startup... The Dynamic variant starts to resample at some point later and shifts things around. The values are in milliseconds, right? So I should be hear a difference of 500? Can you confirm changing the value of int processingLag in the constructor changes the delay for you on startup? I'm sure I pulled the latest changes this time.

SnapTimeSyncFixed synch(-78, 1.0f);
//SnapTimeSyncDynamic synch(-500, 10);

/* ... */
  client.begin(synch);

Edit: I've set CONFIG_SNAPCLIENT_SNTP_ENABLE=true and set it to a NTP server on the same machine, if that matters... And the SnapTimeSyncDynamic seems to also sync to the same point in the stream, it's now always resamples to slightly ahead of my phone, despite the varying numbers I put into the constructor.

@pschatzmann
Copy link
Owner

I confirm that it is in milliseconds. If you use 0, the system is using the delay reported by the server and you hear the difference that is lost by buffering and decoding. By entering -500, the system starts to play 500 ms earlier.

@pschatzmann
Copy link
Owner

I was going thru the stanard Processor logic and found that I can easily add a buffer on the receiving side as well: this gives the new class SnapProcessorBuffered.

But I am totally confused and would need your opinion: in the beginning the buffer gets filled and it plays using the buffer, but during the first couple of minutes the available data in the buffer is slightly decreasing until it gets empty and only contains the last receive data.

When I decrease the resampling (and make the audio playing slower), the playback speed is really reduced, but the behavior stays the same. It is as if there is some synchronization logic on the server...

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 4, 2024

It's the same here, it takes 45 seconds and the buffer is empty. I spent quite some time looking for it, but I didn't find anything in the client. I'm not sure if the whole approach is feasible. Reading up on the snapcast protocol, it seems to me we're not supposed to be clever and do many calculations.

What I'd like to try is something like a

class Chunk {
  public:
    uint8_t *data;
    size_t size;
    int32_t sec;
    int32_t usec;
};

keep it buffered with the individual timestamps, and feed that through all the write() and writeAudio() methods through the decoder and everything until it arrives at the resampler. The resampler then has the timestamps along with the data to write out. This would require a good amount of changes, though.

@pschatzmann
Copy link
Owner

Actually, at first sight, this might be pretty easy to implement if we just wrap the data in a container. This would also make the buffering simpler and remove the need to store the number of bytes in a separate buffer.
I just would not know what to do with the additional data at the resampler..

@h3ndrik
Copy link
Contributor Author

h3ndrik commented Mar 5, 2024

I'd move all the logic to calculate the speed and initial wait there. Currently we do all the calculations and decision-making right after the packets arrive. If there's buffering in-between all of that factors in and needs to be handled.

When the wireChunk arrives, I'd just subtract the time difference to arrive at the local timestamp and pass everything down. The synchronization now happens close to where it's handed over to the i2s peripheral, which makes it more accurate anyways. The SnapProcessor and SnapOutput don't need to worry about sync and all the delay and synchronizeOnStart code and the complexity can be removed there. All they need to do is buffer the incoming wireChunks and pass them on. As much as the pipeline accepts.

I made a rough draft, I hope this makes it more clear:

class SyncedStream : public ResampleStream {
 public:
  size_t write(Chunk chunk) override {
    size_t written = 0;
    if (is_running) { // keep buffer filled
    written = ResampleStream::write(chunk.data, chunk.size);
    time_in_buffer = chunk.timestamp();
    } else { // waiting for start
      if (snap_time.time() + processing_delay >= chunk.timestamp()) {
        written = ResampleStream::write(chunk.data, chunk.size);
        time_in_buffer = chunk.timestamp();
        is_running = true;
      }
    }

    factor = (snap_time.time() + processing_delay) / time_in_buffer;
    return written;
  }

  void updatePlaybackFactor() { setStepSize(factor); }

protected:
  SnapTime &snap_time = SnapTime::instance();
  timeval time_in_buffer;
  bool is_running = false;
  float factor = 1.0f;
};

/* --- */

class SnapProcessorBuffered {
 public:
  bool wireChunk(SnapMessageWireChunk &wire_chunk_message) {
    Chunk chunk((uint8_t*)wire_chunk_message.payload, (size_t)wire_chunk_message.size, wire_chunk_message.timestamp + getLatency());
    return buffer.bufferChunk(chunk);
  }

  virtual void processExt() {
    while (p_snap_output->available() >= buffer.nextChunkSize()) { // ToDo
      Chunk outputChunk = buffer.getChunk();
      size_t written = p_snap_output->write(outputChunk);
    }
  }

 protected:
  RingBuffer<uint8_t> buffer{0};
};

I'm not sure. At this point it's a rough idea. I think it would make some of the code simpler. I think with this approach that would be about the complexity needed to keep everything in sync and also do the initial startup sync. We'd be sure the I2S buffer is filled at all times and there's just the input buffer to worry about.

There are still some issues. I'd need to figure out how to pass the Chunk objects through the decoder. And I'm not sure if I can take the architecture as is, or if this requires me to do additional stuff like non-blocking writes, or handling parts of chunks. As you said, there is no way to figure out the level of the I2S DMA buffer. All I can do to be clever is do a write to it with a timeout of '0' and see if everything got written. And additionally use the blocking writes so the writing speed propagates back through the chain. And there is the size of the input buffer. We need some extra handling if the input buffer can't fit arriving chunks and it can also not write. That shouldn't happen in the first place, but if it does, I'd rather mess with the delay than drop chunks. But this would require additional handling.

@pschatzmann
Copy link
Owner

I like your idea of the SyncedStream: this is elegant and would actually work independently of the chosen processor implementation.

So I am planning to just use a BinaryContainer to pass meta data along the output chain as part of the stream data.

@constant-flow
Copy link

constant-flow commented Mar 24, 2024

How is the progress on this topic? I have a Snapcast client running on an ESP32 (with SnapTimeSyncDynamic) now thanks to your help and on multiple Raspberry Pi. Besides the initial sync I think it's not catching up/realigning properly to the current stream from the master. So even if I select a manual latency for the ESP, it will not be consistent. If I change the latency the ESP client will fall back more and more.

If I understand the syncro approach correctly your code adjust the complete playback speed of a received chunk, but the original snapcast docu suggests to double single samples or drop single samples, so couldn't a receiving buffer (let's say 6 samples) have the last sample cloned 2 times [ABCDEFFF] (max buffer size 6+2) and then we can write an adjusted size from the buffer to the actual stream

  • much behind size 4 : [ABCD] --> cut off from the end to catch up
  • a bit behind size 5 : [ABCDE]
  • on spot size 6 : [ABCDEF]
  • a bit too fast size 7 : [ABCDEFF]
  • much too fast size 8 : [ABCDEFFF] --> double the last bit of a chunk to wait for other clients/server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants