[xiph-rtp] A few other comments

Tue Jan 4 15:56:01 PST 2005

Hi,

a few things I stumbled across in the latest draft version, related to new
paragraphs or things I haven't noticed before:

* 2.2  Payload Header

*    Codebook Ident: 32 bits
*
*    This 32 bit field is used to associate the Vorbis data to a decoding
*    Codebook.  It is created by making a CRC32 checksum of the codebook
*    required to decode the particular Vorbis audio stream.

It should be defined exactly which data is being used to create the codebook
checksum. One thing, which is not obvious to me is if the packet type
descriptor shall be included when calculating the checksum. I am also
wondering if it's not better (to avoid random checksum duplicates) to use a
better hash algorithm than CRC32.

* 3.  Frame Packetizing
*
*    Any Vorbis data packet that is 256 octets or less SHOULD be bundled
*    in the RTP packet with as many Vorbis packets as will fit, up to a
*    maximum of 16.

Wouldn't it make sense here to have some more formal description of the RTP
packet length than just "as will fit"?

*    If a Vorbis packet is larger than 256 octets it MUST be fragmented.

Is it really necessary to have a one octet Vorbis packet length field in the
RTP packet and limit the Vorbis packet length to 256 octets? Most Vorbis
packets are longer (at common bitrates, I would estimate 500-1000 bytes) and
most networks paths have a higher MTU, making them able to transport common
Vorbis packets without splitting them, hence causing unnecessary bandwidth
overhead. The UDP packet header, the RTP packet header and the 6 octets for
the Vorbis specific data adds up to about 50 octets. That is at least 20% if
the rest of the data is limited to 256 octets.

* 4.  Configuration Headers
*
*    To decode a Vorbis stream three configuration header blocks are
*    needed.  The first header indicates the sample and bitrates, the
*    number of channels and the version of the Vorbis encoder used.  The
*    second header contains the decoders probability model, or codebook
*    and the third header details stream metadata.

The metadata header block is not needed for stream decoding. It was already
discussed to even recommend using some other metadata format together with
Vorbis/RTP, as the Vorbis metadata header is rather limited.

* 4.1  In-band Header Transmission
*
*    The three header data blocks are sent in-band with the packet type
*    bits set to match the payload type.  The transmission sequence for
*    the headers MUST be in this order:  configuration, codebook,
*    metadata.

This MUST is technically irrelevant. Even if the Vorbis spec itself has this
unecessary "MUST", I don't see any point in copying it in the RTP RFC. As
long as the configuration and the codebook header has been received properly
by the client (in any order), it is able to decode the audio stream.

*    A 16 bit codebook length field precedes the codebook datablock.  The
*    length field allows for codebooks to be up to 64K in size.

Is it ok to limit the codebook length to 64 kilobytes? Is it ok to
abbreviate kilobyte with K? The text graphic on page 13 shows a 32 bit
codebook length field.

Why is there a codebook ident in the configuration header packet?

The first paragraph on page 13 is not clear to me. Will the codebook ident
of the codebook packet itself in some cases differ from the actually
transmitted codebook? In any case, what's the point in including it? AFAIK,
RTP depends on reliable packet transport anyway, so there is no need to add
checksums, just to detect corrupted data.

Because of the same reason why a more detailed description of the checksum
calculation is necessary, it should also be described in section 4.3 in
which format the codebook header will be delivered from the specified URI
(at least with or without packet type descriptor).

Tor