[Flac-dev] Synchronizing a streaming client to the server Was: Idea to possibly improve flac?

Fri Jan 7 19:33:47 PST 2011

On Fri, Jan 7, 2011 at 7:36 PM, Brian Willoughby <brianw at sounds.wa.com> wrote:
> This thread has raised several good topics.  It's surprising that the
> FLAC-Dev list has been silent for years, and now suddenly there are several
> good ideas to discuss.

I'll take credit for this, toot toot toot :D

>
> On Jan 7, 2011, at 15:04, David Richards wrote:
>>
>> I am interested in streaming lossless audio, FLAC is probably the best
>> option for that. Currently the OggFLAC way of doing it mostly works
>> with a few hacks in libflac and my version of edcast. It might be that
>> the Ogg packaging layer is ill suited for this purpose, and an
>> alternative model developed.  I've seen that its possible to stream
>> native flac with netcat, but thats not really the solution I'm looking
>> for.
>
> I have not done much work with streaming.  I have written a lot of serious
> code that uses the FLAC library.  I remember that there used to be separate
> objects in the FLAC library for streams, and they were unique from the file
> objects because you can seek backwards in a file, but you cannot seek
> backwards in a stream.  For some reason, it seems that these objects have
> been removed in the latest versions of the FLAC library.
>
> Can anyone explain the issues with streaming pure FLAC?  What does OggFLAC
> add to make streaming possible, or even easier than pure FLAC?  I thought
> that OggFLAC was just a way to put FLAC blocks into the Ogg file format.
>  Apple's CAF specification would also allow FLAC blocks to be placed inside
> their file container, although this still would not force iTunes to play
> FLAC unless a decoder were installed in the system.
>
> What is it about netcat that you don't like?  Can you describe what you're
> looking for, and why the specific details are important?  I was always under
> the impression that the FLAC format was already designed for streaming, but
> I must admit that I've never studied the issue.
>
>
Ok dokie, basically libflac provides functions for working with
streams as well as files, but thats as far as it goes and there is a
bit more needed for a entire solution. So libflac is ready to go, but
you need more that libflac to stream, in the same way that you need
more than libflac to hear a flac being played... What I mean by this
for those reading is that libflac provides functions for encoding and
decoding from an arbitrary buffer rather than a file.

netcat isn't right for what I think are obvious reasons, namely one to
many transmission, it just doesn't scale in the correct way something
like icecast and a vorbis stream would.  I actually have never
bothered to try it, but I think you actually have to hack libflac even
for that because many programs that play flac will wait until the end
of the silent (constant) subframes before continuing to decode so any
silence gets you out of sync...

Compatibility with icecast would really be the ideal way to handle it
I think. I'm not sure how it might get in the way if your goal was
super low latency, but I think it still may work in that case. OggFLAC
currently works with icecast, thats why I started messing with it. It
somewhat hijacks the Ogg part of the vorbis code in icecast from what
I've seen.. I've been able to use it with about 1 second of latency
just fine on localhost and other machines on my network. It doesn't
support metadata updates, ie. chained ogg decoding, but I don't think
any clients do either. I think wrote the only client that doesn't
crash when listening to OggFLAC for extended periods too, lol. Mplayer
will crash randomly, and VLC will stop and rebuffer randomly..

So in summary Icecast is a http 1.0 streaming music server, and if its
oggflac getting fixed, native flac with perhaps some kind of small
wrapper or modification, I think that should be a target protocol,
everyone uses icecast.

And if for some reason icecast doesn't suit the lowest latency
situations, perhaps a small protocol could be designed to wrap native
flac and provide network streaming, with a lib, and some sample
senders/receivers.

A problem with native flac and streaming IIRC is that you only get one
metadata frame and thats at the beginning or end of the stream. I
really don't think metadata et all such as coverart and all of this
really needs to be fretted over, really all that is needed is a string
or two, maybe 255chars each, one can be the title or currently playing
information and the second one arbitrary, and something to handle this
time sync stuff we have been talking about.

Of course any of this should handle all capabilities of the flac
format, ie streaming 8 channel 24bit/96khz audio. I've been doing two
channel 24bit 44.1khz, and the bitrate is a bout 200-250K (thats big
K) per second, I haven't tried 4 or 5 channel yet, but I intend to ( I
will be building support into edcast2 for this). Also BTW if anyone
has a link on some information about why 96khz is better than 48/44.1
id like to see it, because I don't comprehend why it would be. 24bit
is a clear win to my ears... (but don't bring it up on this list
because I'd rather not digress into madness)

>> On Fri, Jan 7, 2011 at 5:58 PM, Tor-Einar Jarnbjo <tor-einar at jarnbjo.name>
>> wrote:
>>>
>>> Am 07.01.2011 23:38, schrieb David Richards:
>>>>
>>>> I'm also interested in another concept of lossless streaming with
>>>> flac. Lets call it broadcast flac. A problem with streaming for long
>>>> periods of time is that the sending and receiving computers clocks go
>>>> out of sync, for example even if I stream myself on localhost, with
>>>
>>> This is not a FLAC specific problem, but has to be handled in all
>>> situations
>>> where the streaming server is in control of the transmitting data rate.
>>> It's
>>> caused by a playback device, which actual sample rate is slightly higher
>>> than the sample rate actually requested or a streaming source, which
>>> system
>>> clock is running slowly. Since these parameters (at least an exact
>>> playback
>>> sample rate) is hard to achieve, this is a rather common problem. Or to
>>> shorten it: If the data has a sample rate of 44100 and your sound card
>>> consumes more than 44100 samples per "sender-time" second, your buffer
>>> will
>>> eventually exhaust. If it's the other way around, your buffer may
>>> overflow
>>> if the client does not handle these cases properly.
>>
>> I am well aware its not flac specific, but such a standard way of
>> handling such a matter could be part of the packaging for streaming
>> flac.
>
> I think that this would be a good opportunity to design a solution that is
> specific to broadcast.  At the sending end, the server should have knowledge
> of when there are breaks in the content.  If the stream could send flags at
> these breaks, then the receiving client could go silent and reset the
> synchronization.  As you describe, the situation only becomes a problem
> after long periods of time, but I would guess that there are enough station
> breaks (or at least song breaks) in a long broadcast that there would be a
> chance for a reset.
>
> CoreAudio is a pull model, and the API provides a time line that can be used
> to find the audio samples for a specific time.  However, there are many
> cases where this time line gets reset.  Usually, each callback has a time
> stamp that occurs precisely after the previous callback.  Obviously, the
> audio should not glitch when the time line is contiguous, and thus the data
> must be sample-accurate.  However, CoreAudio code must also deal with
> situations where the time line starts over from 0, usually under control of
> the host application.  CoreAudio also has a flag in the callback to indicate
> when the buffers are totally silent.  I'd like to borrow these ideas, or at
> least similarly-inspired ideas, and have FLAC streaming designed such that
> the stream can tell the playback software when to reset.
>
> The typical process to deal with synchronization of separate systems is
> sample rate conversion.  However, this introduces distortion into the audio,
> especially with real-time SRC.  The only way to avoid SRC is to have some
> way to reset the alignment without dropping or adding samples.  As I said
> above, if the broadcast server were to put flags in the stream to indicate
> silent breaks in the audio, then the playback client could drop silent
> samples or insert silent samples until the two time lines are
> resynchronized.  But, since this would only add or remove silence, there
> should be absolutely no audible glitch.  Perhaps the stream would need more
> than simple silent flags, or resync flags.  It might be necessary to
> transmit an actual running time line counter, with enough bits to count the
> longest stretch of contiguously-clocked audio blocks.  When the broadcast
> server sees a break in the content material, the time code could be reset to
> zero, and this would tell the client to start the sync over, thus avoiding
> dropped samples in the middle of real audio content.
>
>
>>>> Anyway what could happen is the client could do a little bit of
>>>> re-sampling here or there to ensure its in sync with the servers
>>>> clock.
>>>
>>> That is how streaming clients usually solve this problem, although is not
>>> really improving sound quality.
>>
>> Its probably not a big deal if you don't resample all the time, just
>> when your off by X amount, all of this would just be client side
>> preferences. As long as the client side "knows" its off by X amount
>> you could handle it in any number of ways, I'd be fine if its just
>> crossfaded to the correct timing if was off by more than half a
>> second, then no resampling would ever happen, you would just get a
>> weird effect about once an hour, better than a buffer underrun or lag,
>> or perhaps the client could look for a half second of silence and just
>> cut it out.
>
> I don't think it's a good idea to resample just some of the time, although
> your idea to crossfade would work since it never resamples.  I think that
> there are a number of PC-based digital audio playback systems, and perhaps
> even in the television broadcast industry, where this idea of intermittent
> resampling is done.

When I google this a few weeks ago I came up with only software
patents haha! I know we all love those.

HDTV looks pretty good, but sounds like crap IMO because the Dolby
Digital is hardly better than a 128k mp3, whenever I listen to flac
over the same speakers I listen to tv on im blown away by the
disparity.

I hear a regular glitch in audio about once per second
> in many syndicated television shows, and my suspicion is that they are
> speeding up the show so that they can sell more commercial time.  Another
> place that I hear this glitching is in some of the PC audio software
> oriented for DJs which can play MP3 files at different speeds and mix them
> together.  I hear the same sound - one glitch per second - and it is very
> annoying.

Thats alot of glitching, every second??? Jeez, I'm skeptical its that
bad but ill take your word on it until I know different.

>
> But, as you said, a crossfade once per hour would not be as bad.  Also, the
> stream could be completely resynchronized even without a crossfade.  Some
> streaming servers are so bad that they can't run for hours without
> rebuffering, but I guess it's probably pretty lazy to design something that
> does that on purpose (the rebuffering, that is).

I wonder as well, I can only suspect ignorance, lazyness or apathy. A
3-4 second rebuffering every few hours probably didn't concern mr.
winamp when he invented shoutcast. Honestly I don't even know what
even reminded me that this kind of thing happens, I think most people
suspect its just a network issue.

 However, as I suggested,
> it might be better if the broadcast server gives hints so that the client
> player can do these crossfades during the silence between tracks.  Using my
> idea, you'd need to "crossfade" more than once per hour, because there
> probably isn't enough silence to handle it that seldom.  But a fraction of a
> second between tracks several times per hour would never be noticed, unless
> there is a continuous audio broadcast with absolutely no silence.

Yeah, resampling is such a silly thing. Other than not liking the idea
of it, I also don't know how to implement it.

As for fading and crossfading, well I have some experience there. I
have written a little library that can fade up and down and crossfade
between two sources, and it doesn't even need to have the whole set of
samples you intend to fade given to it at one time as long as you know
the total number of samples you want to crossfade. It works with your
typical jack floats.

https://github.com/oneman/libfaded/blob/master/libfaded.c

ps. I would really really like some help converting this to work with
all other sample types.... from what I can tell, in order to keep it C
and efficient I would need a different function for each sample type
since I can't recast the void *s dynamically...  (this would not
affect the api, but it would be tedious... )

Anyways, happy to hear someone else interested in this AND can code! Yay!

Cheers,

David

>
> Brian Willoughby
> Sound Consulting
>
>