[vorbis-dev] Updated Vorbis RTP I-D

Phil Kerr philkerr at elec.gla.ac.uk
Mon Dec 16 07:07:00 PST 2002



Hi all,

Apologies in advance, this email is quite long.

I've prepared an updated Vorbis RTP Internet Draft, which is a
continuation of draft-moffitt-vorbis-rtp-00.txt which can be found
below.

If this new draft gets the ok I'd like to submit this to the AVT WG
later this week.

There are a number of changes over the original I-D, notably the
changing of the M bit function in the RTP header to match current AVT
practice, together with an initial suggestion for codebook delivery.

Other changes are the expansion of previous sections and the explicit
declarations of certain requirements.

For background you may want to read the minutes of the IETF meeting
where Jack presented the initial draft:

http://www.ietf.org/proceedings/01mar/ietf50-135.htm

In updating this I-D I've trawled the Vorbis-dev ML and I think I've
taken a reasonable consensus line, but comments and feedback is
required.

The biggest area is the transmission of the codebooks.  I've taken the
apporach of transmitting them using RTCP, and added a checksum block for
integrity checking.  There has been discussion on the AVT list a few
weeks ago concerning TCP over RTP and there are plans for an I-D to
cover this.  I think this could be a better solution than to use RTSP
for codebook delivery, but I have nothing against RTSP, I wanted to keep
the focus of this I-D as close to that of RFC 1889.  This approach fits
well with both unicast and multicast models, however comments and
alternative suggestions from others is most welcomed.

Another area where the Vorbis spec and current practices of the AVT WG
differ is the output channel order.

The output channel order in draft-ietf-avt-profile-new-12 I-D, sect 4.1,
defines the order as:

   l  left
   r  right
   c  center
   S  surround
   F  front
   R  rear

   channels  description   channel
                              1     2   3   4   5   6
   __________________________________________________
   2         stereo           l     r
   3                          l     r   c
   4         quadrophonic    Fl     Fr  Rl  Rr
   4                          l     c   r   S
   5                         Fl     Fr  Fc  Sl  Sr
   6                          l     lc  c   r   rc  S

<p><p>The 3, 5 and 6 channels layout do not match.

The I-D does state that the channel ordering SHOULD follow the AIFF-C
format, so we can use the Vorbis layout if we push it, but this may be
something we may want to review.

An entry for Vorbis RTP streams should be named in the Audio Encodings
table in section 4.5 of draft-ietf-avt-profile-new-12, together with a
MIME type.  A draft MIME type document should be ready either later this
week, or just after the holidays.

Comments, feedback and minor flames welcomed.

Regards

Phil

<p>---------------------8<------------------------8<-----------------------

Network Working Group                           Phil Kerr
Internet-Draft                                  The Ogg Vorbis Community
December 20, 2002                               / OpenDrama
Expires: June 20, 2003      

<p>            RTP Payload Format for Vorbis Encoded Audio

                 <draft-kerr-avt-vorbis-rtp-00.txt>

Status of this Memo

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   "work in progress".

   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [1].

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract
        
   This document describes a RTP payload format for transporting Vorbis
   encoded audio.

Table of Contents

   1.         Introduction ........................................    x
   2.         Background ..........................................    x
   3.         Payload Format ......................................    x
   3.1        RTP Header ..........................................    x
   3.2        Payload Header ......................................    x
   3.3        Payload Data ........................................    x
   3.4        Example RTP Packet ..................................    x
   4.         Frame Packetizing ...................................    x
   4.1        Example Fragmented Vorbis Packet ....................    x
   5.         Codebooks ...........................................    x
   6.         Security Considerations .............................    x
   7.         Acknowledgments .....................................    x
   8.         References ..........................................    x
   9.         Full Copyright Statement ............................    x
   10.        Authors Address .....................................    x

1 Introduction

   This document describes how Vorbis encoded audio may be formatted for
   use as an RTP payload type. 

2 Background

   The Xiph.org Foundation creates and defines codecs for use in 
   multimedia that are not encumbered by patents and thus may be freely 
   implemented by any individual or organization.

   Vorbis is the general purpose multi-channel audio codec created by 
   the Xiph.org Foundation.

   Vorbis encoded audio is generally found within an Ogg format 
   bitstream, which provides framing and synchronization.  For the 
   purposes of RTP transport, this layer is unnecessary, and so raw 
   Vorbis packets are used in the payload.

   Vorbis packets are unbounded in length currently.  At some future
   point there will likely be a practical limit placed on packet
   length.  

   Typical Vorbis packet sizes are from very small (2-3 bytes) to 
   quite large (8-12 kilobytes).  The reference implementation [2] 
   seems to make every packet less than ~800 bytes, except for the
   codebooks packet which are ~8-12 kilobytes.

   Within a RTP context the maximum Vorbis packet SHOULD be kept below
   the MTU size of 1500 octets, including the RTP and payload headers,
   to avoid fragmentation.

3 Payload Format

   The standard RTP header is followed by an 8 bit payload header, and 
   then the payload data.

3.1 RTP Header

     0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           timestamp                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         synchronization source (SSRC) identifier               |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |         contributing source (CSRC) identifiers                |
   |                      ...                                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The RTP header begins with an octet of fields (V, P, X, and CC) to   
   support specialized RTP uses (see [4] and [5] for details). For
   Vorbis RTP applications, V is set to 2, and the P, X, and CC fields
   are set to 0. 

   Marker (M): 1 bit
      Set to zero.  Audio silence suppression not used.  This conforms
      to section 4.1 of [6].

   Payload Type (PT): 7 bits
      An RTP profile for a class of applications is expected to assign a
      payload type for this format, or a dynamically allocated payload 
      type should be chosen which designates the payload as Vorbis.

   Sequence number: 16 bits
      The sequence number increments by one for each RTP data packet
      sent, and may be used by the receiver to detect packet loss and
      to restore packet sequence. This field is detailed further in
      [3].

   Timestamp: 32 bits
      A timestamp representing the sampling time of the first sample of
      the first Vorbis packet in the RTP packet.  The clock frequency 
      MUST be set to the sample rate of the encoded audio data and is 
      conveyed out-of-band.

   SSRC/CSRC identifiers: 
      These two fields, 32 bits each with one SSRC field and a maximum 
      of 16 CSRC field, are as defined in [3].  

3.2 Payload Header

   The first octet of the payload data is the payload header:

     1   2   3   4   5   6   7   8
   +---+---+---+---+---+---+---+---+
   | C | F | R |  # of packets     |
   +---+---+---+---+---+---+---+---+

   C: 1 bit
      Set to one if this is a continuation of a fragmented packet.

   F: 1 bit
      Set to one if the payload contains complete packets or if it
      contains the last fragment of a fragmented packet. 

   R: 1 bit
      Reserved, must be set to zero by senders, and ignored by 
      receivers.

   The last 5 bits are the number of complete packets in this payload.  
   This provides for a maximum number of 32 Vorbis packets in the
   payload. If C is set to one, this number should be 0.

3.3 Payload Data

   If the payload contains a single Vorbis packet or a Vorbis packet
   fragment, the Vorbis packet data follows the payload header.

   For payloads which consist of multiple Vorbis packets, payload data 
   consists of one octet representing the packet length followed by the 
   packet data for each of the Vorbis packets in the payload.

   The Vorbis packet length octet is the length minus one.  A value of 
   0 means a length of 1.

   The payload packing of the Vorbis data packets SHOULD follow the
   guidelines set-out in section 4.4 of [6] where the oldest packet
   occurs immediately after the RTP packet header.

3.4 Example RTP Packet

   Here is an example RTP packet containing two Vorbis packets.

   RTP Packet Header:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | 2 |0|0|  0    |0|      PT     |      sequence number          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 timestamp (in sample rate units)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         synchronization source (SSRC) identifier              |
   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |         contributing source (CSRC) identifiers                |
   |                      ...                                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Payload Data:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|0|0| # pks: 2|      len      |  vorbis data ...              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  ...vorbis data...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     ...       |      len      | next vorbis packet data...    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

<p>4 Frame Packetizing

   Each RTP packet contains either one complete Vorbis packet, one 
   Vorbis packet fragment, or an integer number of complete Vorbis 
   packets (upto a max of 32 packets, since the number of packets is 
   defined by a 5 bit value).

   Any Vorbis packet that is larger than 256 octets and less than the
   path-MTU should be placed in a RTP packet by itself.

   Any Vorbis packet that is 256 bytes or less should be bundled in the
   RTP packet with as many Vorbis packets as will fit, up to a maximum
   of 32.

   If a Vorbis packet will not fit into the RTP packet, it must be 
   fragmented.  A fragmented packet has a zero in the last five bits 
   of the payload header.  Each fragment after the first will also set 
   the Continued (C) bit to one in the payload header.  The RTP packet 
   containing the last fragment of the Vorbis packet will have the 
   Marker (F) bit set to one.

4.1 Example Fragmented Vorbis Packet

   Here is an example fragmented Vorbis packet split over three RTP
   packets.

   RTP packet header details have been excluded from this example.

   Packet 1:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|0|0|        0|      len      |  vorbis data ...              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  ...vorbis data...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The number of packets field is set to 0.

   Packet 2:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|0|0|        0|      len      |  vorbis data ...              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  ...vorbis data...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The C bit is set to 1 and the number of packets field is set to 0.
   For large Vorbis fragments there can be several of these type of
   payload packets.  The maximum packet size should be no greater
   than the MTU of 1500 octets, including all RTP and payload headers.

   Packet 3:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|1|0|        0|      len      |  vorbis data ...              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  ...vorbis data...                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   This is the last Vorbis fragment packet.  The C and F bits are 
   set and the packet count remains set to 0.

5 Codebooks

   To decode a Vorbis stream, a set of codebooks is required.  These
   codebooks are allowed to change for each logical bitstream (for
   example, for each song encoded in a radio stream).

   The codebooks must be completely intact and a client can not decode
   a stream with an incomplete or corrupted set.

   A client connecting to a multicast RTP Vorbis session needs to get 
   the first set of codebooks in some manner.  These codebooks are
   typically between 4 kilobytes and 8 kilobytes in size.

   On joining a session the first packet sent MUST be a Vorbis 
   codebook message.

   When codebooks change a new set are sent as a SR just prior to 
   the Vorbis bitstream change as an APP defined RTCP message with 
   the 4 octet name field set to VORC.  This is the same format as
   the initial codebook packet.

   Codebook RTCP packets MUST set the padding (P) flag and add the
   appropriate padding octets needed to conform with section 6.6 
   of [3].

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |V=2|P| subtype |   PT=APP=204  |             length            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                           SSRC/CSRC                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                              VORC                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     codebook checksum         |             codebook        ...
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   A 16 bit 1's complement checksum of the codebook precedes the 
   codebook datablock.  This checksum is used to detect a corrupted 
   codebook.  If a checksum failure is detected an empty RR RTCP 
   message, of APP type with the 4 octet name field set to VORR, is
   sent from the client.  Transmission of the codebook back to the
   client SHOULD be handled as an unicast delivery to prevent a 
   rogue client from generating an excessive number of codebook
   requests within a multicast stream, however multicast transmission 
   of codebook request replies SHOULD be catered for at the application
   level.
   

6 Security Considerations

   RTP packets using this payload format are subject to the security 
   considerations discussed in the RTP specification [3].  This implies 
   that the confidentiality of the media stream is achieved by using
   encryption.  Because the data compression used with this payload
   format is applied end-to-end, encryption may be performed on the 
   compressed data.

7 Acknowledgments

   This I-D is a continuation of draft-moffitt-vorbis-rtp-00.txt.   

   Thanks to the Ogg Vorbis Community and to the Xiph.org team,
   especially Jack Moffitt <jack at xiph.org>.

8 References

   1. Key words for use in RFCs to Indicate Requirement Levels 
      (RFC 2119).

   2. libvorbis: Available from the Xiph website, http://www.xiph.org

   3. RTP: A Transport Protocol for Real-Time Applications (RFC 1889).
  
   4. RTP: A transport protocol for real-time applications. Work   
      in progress, draft-ietf-avt-rtp-new-11.txt.

   5. RTP Profile for Audio and Video Conferences with Minimal Control. 
      Work in progress, draft-ietf-avt-profile-new-12.txt.

9 Full Copyright Statement

   Copyright (C) The Internet Society (2002). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

<p>10 Authors Address

   Phil Kerr
   Centre for Music Technology
   University of Glasgow
   email: philkerr at elec.gla.ac.uk
   WWW: http://www.xiph.org/

<p><p><p><p><p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list