[Flac-dev] Synchronizing a streaming client to the server Was: Idea to possibly improve flac?

Fri Jan 7 16:36:30 PST 2011

This thread has raised several good topics.  It's surprising that the  
FLAC-Dev list has been silent for years, and now suddenly there are  
several good ideas to discuss.

On Jan 7, 2011, at 15:04, David Richards wrote:
> I am interested in streaming lossless audio, FLAC is probably the best
> option for that. Currently the OggFLAC way of doing it mostly works
> with a few hacks in libflac and my version of edcast. It might be that
> the Ogg packaging layer is ill suited for this purpose, and an
> alternative model developed.  I've seen that its possible to stream
> native flac with netcat, but thats not really the solution I'm looking
> for.

I have not done much work with streaming.  I have written a lot of  
serious code that uses the FLAC library.  I remember that there used  
to be separate objects in the FLAC library for streams, and they were  
unique from the file objects because you can seek backwards in a  
file, but you cannot seek backwards in a stream.  For some reason, it  
seems that these objects have been removed in the latest versions of  
the FLAC library.

Can anyone explain the issues with streaming pure FLAC?  What does  
OggFLAC add to make streaming possible, or even easier than pure  
FLAC?  I thought that OggFLAC was just a way to put FLAC blocks into  
the Ogg file format.  Apple's CAF specification would also allow FLAC  
blocks to be placed inside their file container, although this still  
would not force iTunes to play FLAC unless a decoder were installed  
in the system.

What is it about netcat that you don't like?  Can you describe what  
you're looking for, and why the specific details are important?  I  
was always under the impression that the FLAC format was already  
designed for streaming, but I must admit that I've never studied the  
issue.

> On Fri, Jan 7, 2011 at 5:58 PM, Tor-Einar Jarnbjo <tor- 
> einar at jarnbjo.name> wrote:
>> Am 07.01.2011 23:38, schrieb David Richards:
>>> I'm also interested in another concept of lossless streaming with
>>> flac. Lets call it broadcast flac. A problem with streaming for long
>>> periods of time is that the sending and receiving computers  
>>> clocks go
>>> out of sync, for example even if I stream myself on localhost, with
>>
>> This is not a FLAC specific problem, but has to be handled in all  
>> situations
>> where the streaming server is in control of the transmitting data  
>> rate. It's
>> caused by a playback device, which actual sample rate is slightly  
>> higher
>> than the sample rate actually requested or a streaming source,  
>> which system
>> clock is running slowly. Since these parameters (at least an exact  
>> playback
>> sample rate) is hard to achieve, this is a rather common problem.  
>> Or to
>> shorten it: If the data has a sample rate of 44100 and your sound  
>> card
>> consumes more than 44100 samples per "sender-time" second, your  
>> buffer will
>> eventually exhaust. If it's the other way around, your buffer may  
>> overflow
>> if the client does not handle these cases properly.
>
> I am well aware its not flac specific, but such a standard way of
> handling such a matter could be part of the packaging for streaming
> flac.
I think that this would be a good opportunity to design a solution  
that is specific to broadcast.  At the sending end, the server should  
have knowledge of when there are breaks in the content.  If the  
stream could send flags at these breaks, then the receiving client  
could go silent and reset the synchronization.  As you describe, the  
situation only becomes a problem after long periods of time, but I  
would guess that there are enough station breaks (or at least song  
breaks) in a long broadcast that there would be a chance for a reset.

CoreAudio is a pull model, and the API provides a time line that can  
be used to find the audio samples for a specific time.  However,  
there are many cases where this time line gets reset.  Usually, each  
callback has a time stamp that occurs precisely after the previous  
callback.  Obviously, the audio should not glitch when the time line  
is contiguous, and thus the data must be sample-accurate.  However,  
CoreAudio code must also deal with situations where the time line  
starts over from 0, usually under control of the host application.   
CoreAudio also has a flag in the callback to indicate when the  
buffers are totally silent.  I'd like to borrow these ideas, or at  
least similarly-inspired ideas, and have FLAC streaming designed such  
that the stream can tell the playback software when to reset.

The typical process to deal with synchronization of separate systems  
is sample rate conversion.  However, this introduces distortion into  
the audio, especially with real-time SRC.  The only way to avoid SRC  
is to have some way to reset the alignment without dropping or adding  
samples.  As I said above, if the broadcast server were to put flags  
in the stream to indicate silent breaks in the audio, then the  
playback client could drop silent samples or insert silent samples  
until the two time lines are resynchronized.  But, since this would  
only add or remove silence, there should be absolutely no audible  
glitch.  Perhaps the stream would need more than simple silent flags,  
or resync flags.  It might be necessary to transmit an actual running  
time line counter, with enough bits to count the longest stretch of  
contiguously-clocked audio blocks.  When the broadcast server sees a  
break in the content material, the time code could be reset to zero,  
and this would tell the client to start the sync over, thus avoiding  
dropped samples in the middle of real audio content.

>>> Anyway what could happen is the client could do a little bit of
>>> re-sampling here or there to ensure its in sync with the servers
>>> clock.
>>
>> That is how streaming clients usually solve this problem, although  
>> is not
>> really improving sound quality.
>
> Its probably not a big deal if you don't resample all the time, just
> when your off by X amount, all of this would just be client side
> preferences. As long as the client side "knows" its off by X amount
> you could handle it in any number of ways, I'd be fine if its just
> crossfaded to the correct timing if was off by more than half a
> second, then no resampling would ever happen, you would just get a
> weird effect about once an hour, better than a buffer underrun or lag,
> or perhaps the client could look for a half second of silence and just
> cut it out.
I don't think it's a good idea to resample just some of the time,  
although your idea to crossfade would work since it never resamples.   
I think that there are a number of PC-based digital audio playback  
systems, and perhaps even in the television broadcast industry, where  
this idea of intermittent resampling is done.  I hear a regular  
glitch in audio about once per second in many syndicated television  
shows, and my suspicion is that they are speeding up the show so that  
they can sell more commercial time.  Another place that I hear this  
glitching is in some of the PC audio software oriented for DJs which  
can play MP3 files at different speeds and mix them together.  I hear  
the same sound - one glitch per second - and it is very annoying.

But, as you said, a crossfade once per hour would not be as bad.   
Also, the stream could be completely resynchronized even without a  
crossfade.  Some streaming servers are so bad that they can't run for  
hours without rebuffering, but I guess it's probably pretty lazy to  
design something that does that on purpose (the rebuffering, that  
is).  However, as I suggested, it might be better if the broadcast  
server gives hints so that the client player can do these crossfades  
during the silence between tracks.  Using my idea, you'd need to  
"crossfade" more than once per hour, because there probably isn't  
enough silence to handle it that seldom.  But a fraction of a second  
between tracks several times per hour would never be noticed, unless  
there is a continuous audio broadcast with absolutely no silence.

Brian Willoughby
Sound Consulting