[vorbis-dev] Re: Ogg format and latency

Beni Cherniavsky cben at techunix.technion.ac.il
Sun May 18 04:58:18 PDT 2003



Silvia.Pfeiffer at csiro.au wrote on 2003-05-17:

> Hi Aaron,
>
> thanks for your thoughts on improving the Ogg encapsulation format. As
> Monty was the person who developed the specification, I am forwarding
> your suggestions to him. Maybe some of these changes can be adopted into
> a future version of the Ogg encapsulation format.
>
I'm not Monty but I have some comments ;-).

> Aaron Williams wrote:
> > Hi Silvia,
> >
> > After reading the RFC on the ogg streaming format I see an improvement
> > that can be made to reduce encoding latency on the sending end.  It also
> > reduces the amount of memory required as well.
> >
> > In your page format, you store the segment count and a segment table and a
> > CRC in the front of each page.  Having experience with ATM networking and
> > some hardware, it is actually better to place any checksums or CRCs at the
> > end of a packet.
> >
Moving the CRC to the end seems a sound idea.  One must have the whole
page to generate it and must read the whole page to check it anyway.

> > Another improvement can be made to the segments.
> >
> > Instead of a segment count and a segment table in the beginning, just
> > encode the length of each segment at the front of every segment.  Segments
> > can be anywhere from 1 to 255 bytes not including the length field.  If
> > the length field is zero then this is the end of the page and the CRC
> > would immediately follow.
> >
This seems a bad idea to me.  The "segments" in Ogg are AFAIK a purely
imaginative notion, invoked only to simlify the explanation of lacing
values in the docs.  Since segments are adjacent in the stream, what
you really have is a continuos packet (or the part of it that fits on
the page) whose length is encoded in a variable-length encoding - the
"lacing values".  You can copy/process this part as a whole.  This is
good because one usually needs to have the packet (or al least some
parts of it) packets countinuosly in memory; you still need to
assemble long packets from several pages but in each page you have a
single part.

With your proposed change, however, there would be a length byte
between every two segments, meaning that the assembly would have to be
done segment-by-segment.  What *could* be done without sacrificing
this is to put all lacing values for each packet (or the part of it
that fits on the page) right before it.  That would allow
packet-per-packet output but not segment-per-segment (see arguments
below).  For example:

255 22 <277-byte long packet> 190 <190-byte long packet> ...

As in your proposal, to get any benefit from this we also need to give
up the number-of-segments field in the header and use some
self-terminating notation.  Note that using a zero as you propose
would forbid the use of empty packets, currently legal:

    Note also that a 'nil' (zero length) packet is not an error; it
    consists of nothing more than a lacing value of zero in the
    header.

    -- http://xiph.org/ogg/doc/framing.html

Not that somebody really needs them ;-).

All this however would have the drawback of complicating page length
calculation, which would complicate seeking a bit.

> > For a system generating a low-speed Ogg stream it reduces latency in that
> > a page can be sent out on a per-segment basis without having to wait for
> > the end of a page.  Another benefit is that it is no longer necessary to
> > store the entire page in memory.
> >
I'm afraid you misunderstood the intended usage of Ogg framing.  As
far as I understand, a packet is intended to be the minimal amount of
data that makes sense to feed at once to a decoder.  We are talking
about modern compression codecs that typically involve some transform
on a whole block.  Any application in which it makes sense to
produce/consume partial packets should instead rethink it's concept of
"Ogg packet".  Note also that segments are fixed to 255 bytes (except
the last), futher limiting the usefulness of this change...

So with approriate definition of "packet", my alternative idea of
grouping lacing values per packets should resolve your desire for
minimal latency - but you can already do it by using a page per packet
packet.  Of course, this wastes some overhead.  But there is a deeper
issue making this irrelevant: Ogg framing is intended for TCP
streaming and disk storage.  Both involve buffering latencies on many
levels.  So if you really want low latencies, just drop the Ogg
framing on the wire and use for example the prosed Vorbis-over-RTP
transprot.

> > When networking, it is always a problem when fields like the length and
> > checksum are stored at the beginning of a packet because the entire packet
> > must be stored in memory before being sent.  ATM (Asynchronous Transmit
> > Mode) AAL5 solved this problem by placing the length and CRC at the end of
> > the packet.  In ATM, each packet is chopped up into 48 byte cells where a
> > 5 byte header is prepended.  A bit in the 5 byte header indicates that the
> > cell is the last cell of a PDU and the last 8 bytes contain a trailer, 2
> > unused bytes, a 16-bit length, and a 32-bit CRC.
> >
> > The advantage of this is that the latency is reduced for the sending side
> > since it can transmit data as soon as it has at least one cells worth of
> > data.  When the MTU size is reached it just sets a bit and fills in the
> > trailer.  This also means that the sending hardware does not need to store
> > potentially 64K of data since it only needs to keep track of the number of
> > bytes sent and the running CRC.
> >
The sending side is less important, senders are a minority ;-).  More
seriosly, as said above it usually needs to store a whole packet at a
time anyway, and some packets can be big.  Also, the sender is free to
choose small pages (e.g. current encoder does ~4KB).

> > With my proposed change, the page size stays the same, only the segment
> > information is distributed throughout the page and the CRC is moved to the
> > end.
> >
> > If the transport medium is reliable the receiving end can begin decoding
> > data before the end of the page is received, as is possible with the
> > current Ogg format.
> >
ATM doesn't have to cope with sync loss/recapture (all cells are
fixed-length and separate) and seeking (nobody stores ATM cells on
disk :).  At the recieving end you ideally don't want to start
accepting packets until you know the whole page has good CRC, so your
solution (vs. page-per-packet) would mean some sacrifice of clean
error recovery.  You can gambit that it will work while the medium is
reliable but when it eventually fails, you *must* have stored the
whole last page for optimal sync recapture (granted, in real-time
applications you can sacrifice this).

In other words: in Ogg, you probably don't want to deal with less than
a single page at a time.  If you do, you probably don't want to deal
with with Ogg but with e.g. RTP ;-).


-- 
Beni Cherniavsky <cben at users.sf.net>

The Three Laws of Copy-Protechnics:
http://www.technion.ac.il/~cben/threelaws.html
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis-dev mailing list