[vorbis-dev] Updated Vorbis RTP I-D

Phil Kerr philkerr at elec.gla.ac.uk
Mon Feb 17 07:40:55 PST 2003

Hi all,

Below is a final draft of the updated Vorbis RTP Internet Draft which
I'll send to the IETF in a few days.  The changes include:

Added IANA MIME type section
Redesigned setup, codebook and comment metadata packet, fixing bugs
Added SDP section
Added congestion section
Extended acknowledgments section
Various textual tweaks

Thanks to everyone who has contributed and of course feedback welcomed.




<p><p>Network Working Group				               Phil Kerr
Internet-Draft		 			  Ogg Vorbis Community /
February 20, 2003                                              OpenDrama
Expires: August 20, 2003      

<p>                RTP Payload Format for Vorbis Encoded Audio


Status of this Memo

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   "work in progress".

   The list of current Internet-Drafts can be accessed at 

   The list of Internet-Draft Shadow Directories can be accessed at

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119 [1].

Copyright Notice

   Copyright (C) The Internet Society (2003).  All Rights Reserved.

   This document describes a RTP payload format for transporting 
   Vorbis encoded audio.  It details the encapsulation mechanism for 
   raw Vorbis data and details the delivery mechanism for the 
   decoder probability model, referred to as a codebook, and other
   decoder setup information.

<p><p><p><p><p>Kerr                        Expires August 20, 2003             [Page 1]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p>Table of Contents

   1.         Introduction ........................................    2
   2.         Payload Format ......................................    3
   2.1        RTP Header ..........................................    3
   2.2        Payload Header ......................................    4
   2.3        Payload Data ........................................    4
   2.4        Example RTP Packet ..................................    5
   3.         Frame Packetizing ...................................    5
   3.1        Example Fragmented Vorbis Packet ....................    6
   4.         IANA Considerations .................................    7
   5.         Configuration headers ...............................    7
   6.         Session Description .................................   10
   7.         Congestion Control ..................................   10
   8.         Security Considerations .............................   10
   9.         Acknowledgments .....................................   10
   10.        References ..........................................   11
   11.        Full Copyright Statement ............................   11
   12.        Authors Address .....................................   12

1 Introduction

   The Xiph.org Foundation creates and defines codecs for use in 
   multimedia that are not encumbered by patents and thus may be freely 
   implemented by any individual or organization.

   Vorbis is the general purpose multi-channel audio codec created by 
   the Xiph.org Foundation.

   Vorbis encoded audio is generally encapsulated within an Ogg format 
   bitstream, which provides framing and synchronization.  For the 
   purposes of RTP transport, this layer is unnecessary, and so raw 
   Vorbis packets are used in the payload.

   Vorbis packets are unbounded in length currently.  At some future
   point there will likely be a practical limit placed on packet

   Typical Vorbis packet sizes are from very small (2-3 bytes) to 
   quite large (8-12 kilobytes).  The reference implementation [2] 
   typically produces packets less than ~800 bytes, except for the
   header packets which are ~4-12 kilobytes.

   Within a RTP context the maximum Vorbis packet SHOULD be kept below
   the MTU size of 1500 octets, including the RTP and payload headers,
   to avoid fragmentation.  For the delivery of Vorbis audio using RTP 
   the maximum size of the header block is limited to 64K.

<p><p><p><p>Kerr                        Expires August 20, 2003             [Page 2]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p><p>2 Payload Format

   The standard RTP header is followed by an 8 bit payload header, 
   then the payload data.

<p>2.1 RTP Header

     0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   |                           timestamp                           |
   |         synchronization source (SSRC) identifier              |
   |         contributing source (CSRC) identifiers                |
   |                      ...                                      |

   The RTP header begins with an octet of fields (V, P, X, and CC) to   
   support specialized RTP uses (see [4] and [5] for details). For 
   Vorbis RTP applications, V is set to 2, and the P, X, and CC fields 
   are set to 0. 

   Marker (M): 1 bit
      Set to zero.  Audio silence suppression not used.  This conforms
      to section 4.1 of [6].

   Payload Type (PT): 7 bits
      An RTP profile for a class of applications is expected to assign 
      a payload type for this format, or a dynamically allocated 
      payload type should be chosen which designates the payload as 

   Sequence number: 16 bits
      The sequence number increments by one for each RTP data packet
      sent, and may be used by the receiver to detect packet loss and
      to restore packet sequence. This field is detailed further in

   Timestamp: 32 bits
      A timestamp representing the sampling time of the first sample of
      the first Vorbis packet in the RTP packet.  The clock frequency 
      MUST be set to the sample rate of the encoded audio data and is 
      conveyed out-of-band.

   SSRC/CSRC identifiers: 
      These two fields, 32 bits each with one SSRC field and a maximum 
      of 16 CSRC field, are as defined in [3].  

Kerr                        Expires August 20, 2003             [Page 3]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

2.2 Payload Header

   The first octet of the payload data is the payload header:

     1   2   3   4   5   6   7   8
   | C | F | R |  # of packets     |

   C: 1 bit
      Set to one if this is a continuation of a fragmented packet.

   F: 1 bit
      Set to one if the payload contains complete packets or if it
      contains the last fragment of a fragmented packet. 

   R: 1 bit
      Reserved, must be set to zero by senders, and ignored by 

   The last 5 bits are the number of complete packets in this payload.  
   This provides for a maximum number of 32 Vorbis packets in the 
   payload.  If C is set to one, this number should be 0.

2.3 Payload Data

   If the payload contains a single Vorbis packet or a Vorbis packet
   fragment, the Vorbis packet data follows the payload header.

   For payloads which consist of multiple Vorbis packets, payload data 
   consists of one octet representing the packet length followed by the 
   packet data for each of the Vorbis packets in the payload.

   The Vorbis packet length octet is the length of the data block 
   minus one.   

   The payload packing of the Vorbis data packets SHOULD follow the
   guidelines set-out in section 4.4 of [5] where the oldest packet
   occurs immediately after the RTP packet header.

   Channel mapping of the audio is in accordance with BS. 775-1 

<p><p><p><p><p><p><p><p><p><p><p>Kerr                        Expires August 20, 2003             [Page 4]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

2.4 Example RTP Packet

   Here is an example RTP packet containing two Vorbis packets.

   RTP Packet Header:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   | 2 |0|0|  0    |0|      PT     |       sequence number         |
   |                 timestamp (in sample rate units)              |
   |          synchronization source (SSRC) identifier             |
   |            contributing source (CSRC) identifiers             |
   |                      ...                                      |

   Payload Data:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |0|1|0| # pks: 2|      len      |         vorbis data ...       |
   |                      ...vorbis data...                        |
   |     ...       |      len      |   next vorbis packet data...  |

<p>3 Frame Packetizing

   Each RTP packet contains either one complete Vorbis packet, one 
   Vorbis packet fragment, or an integer number of complete Vorbis 
   packets (upto a max of 32 packets, since the number of packets is 
   defined by a 5 bit value).

   Any Vorbis packet that is larger than 256 octets and less than the
   path-MTU should be placed in a RTP packet by itself.

   Any Vorbis packet that is 256 bytes or less should be bundled in the
   RTP packet with as many Vorbis packets as will fit, up to a maximum
   of 32.

   If a Vorbis packet will not fit into the RTP packet, it must be 
   fragmented.  A fragmented packet has a zero in the last five bits 
   of the payload header.  Each fragment after the first will also set 
   the Continued (C) bit to one in the payload header.  The RTP packet 
   containing the last fragment of the Vorbis packet will have the 
   Marker (F) bit set to one.

<p>Kerr                        Expires August 20, 2003             [Page 5]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p>3.1 Example Fragmented Vorbis Packet

   Here is an example fragmented Vorbis packet split over three RTP

   RTP packet header details have been excluded from this example.

   Packet 1:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |0|0|0|        0|      len      |         vorbis data ...       |
   |                       ..vorbis data..                         |

   The number of packets field is set to 0.

   Packet 2:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |1|0|0|        0|      len      |         vorbis data ...       |
   |                       ..vorbis data..                         |

   The C bit is set to 1 and the number of packets field is set to 0.
   For large Vorbis fragments there can be several of these type of
   payload packets.  The maximum packet size should be no greater
   than the MTU of 1500 octets, including all RTP and payload headers.

   Packet 3:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |1|1|0|        0|      len      |         vorbis data ..        |
   |                       ..vorbis data..                         |

   This is the last Vorbis fragment packet.  The C and F bits are 
   set and the packet count remains set to 0.

<p><p><p><p><p><p>Kerr                        Expires August 20, 2003             [Page 6]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

4 IANA Considerations

   Media MIME type name: audio

   Media MIME subtype name: vorbis

   Required Parameters: none

   Optional Parameters: none

<p>5 Configuration headers

   To decode a Vorbis stream three configuration header information 
   blocks are needed.  This data is sent out-of-band and is defined 
   below as an APP defined RTCP message with the 4 octet name field 
   set to VORB. 

   On joining a session the first packet sent back to the client
   MUST be a Vorbis message containing the codec setup and codebook 

   VORB RTCP packets MUST set the padding (P) flag and add the
   appropriate padding octets needed to conform with section 6.6 
   of [3].  Synchronising the configuration headers to the RTP stream 
   is  critical.  A 32 bit timestamp field is used to indicate the
   timepoint when a VORB header MUST be applied to the RTP stream. 
   VORB RTCP packets MUST be sent just ahead of the change in the RTP

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |V=2|P| subtype |   PT=APP=204  |             Length            |
   |                           SSRC/CSRC                           |
   |                             VORB                              |
   |                 Timestamp (in sample rate units)              |
   |                        Vorbis Version                         |
   |                       Audio Sample Rate                       |
   |                        Bitrate Maximum                        |
   |                        Bitrate Nominal                        |
   |                        Bitrate Minimum                        |
   | bsz 0 | bsz 1 |       Num Audio Channels      |c|m|o|x|x|x|x|x|
Kerr                        Expires August 20, 2003             [Page 7]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p>   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |     Codebook length           |      Codebook checksum        |
   ..                          Codebook                            |
   |                      Vendor string length                     |
   |                         Vendor string                        ..
   |                    User comments list length                  |
   ..               User comment length / User comment             |
   ..                          URI string                          |

   The first Vorbis config header defines the Vorbis stream 
   attributes.  The Vorbis version MUST be set to zero to comply with
   this document.  The fields Sample Rate up to Num Audio Channels 
   are set in accordance with [6] with the bsz fields above referring
   to the blocksize parameters.  The framing bit is not used for RTP
   transportation and so applications constructing Vorbis files MUST
   take care to set this if required.

   The next 8 bits are used to indicate the presence of the two 
   other Vorbis stream config headers and the size overflow header.

   The c flag indicates the presence of a Codebook header block, the
   m flag indicates the presence of a comment metadata block.  The o
   flag indicates if the size of either of the c and m headers would
   make the VORB packet greater than that allowed for a RTCP message.

   The remaining five bits, indicated with an x, are reserved/unused
   and MUST be set to 0.

   If the c flag is set then the next header block will contain the 
   codebook configuration data.  Unlike other mainstream audio codecs 
   Vorbis has no statically configured probability model instead it 
   packs all entropy decoding configuration, VQ and Huffman models 
   into a self-contained codebook.  This codebook block also requires 
   additional identification information detailing the number of audio
   channels, bit rates and other information used to initalise the 
   Vorbis stream.

   This setup information MUST be completely intact and a client can 
   not decode a stream with an incomplete or corrupted codebook set.

<p><p><p><p><p>Kerr                        Expires August 20, 2003             [Page 8]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p>   A 16 bit codebook length field and a 16 bit 1's complement checksum
   of the codebook precedes the codebook datablock.  The length field 
   allows for codebooks to be upto 64k in size. The checksum is used to 
   detect a corrupted codebook.  If a checksum failure is detected then
   a new config header file SHOULD be obtained from SDP.  If no SDP 
   value is set and no other method for obtaining the config headers 
   exists then this is considered to be a failure and should be 
   reported to the client application.

   If the m flag is set then the next header block will contain the 
   comment metadata, such as artist name, track title and so on.  These
   metadata messages are not intended to be fully descriptive but to 
   offer basic track/song information.  This message MUST be sent at 
   the start of the stream, together with the setup and codebook 
   headers, even if it contains no information.  During a session the
   metadata associated with the stream may change from that specified 
   at the start, eg. a live concert broadcast changing acts/scenes, so
   clients MUST have the ability to receive m header blocks.  Details
   on the format of the comments can be found in the Vorbis 
   documentation [7].

   The format for the data takes the form of a 32 bit codec vendors
   name length field followed by the name encoded in UTF-8.  The next
   field denotes the number of user comments and then the user comments
   length and text field pairs upto the number indicated by the user 
   comment list length.   

   The framing bit is not used for RTP transportation and so 
   applications constructing Vorbis files MUST take care to set 
   this if required.

   If the o, overflow, bit is set then the URI of a whole header block
   is specified in an overflow URI field, which is a null terminated 
   UTF-8 string.  The header file specified at the URI MUST NOT have 
   the overflow flag set, otherwise a loop condition will occur. If 
   SDP information is available then the URI value set there MUST take

<p><p><p><p><p><p><p><p><p><p><p><p><p><p><p>Kerr                        Expires August 20, 2003             [Page 9]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p>6 Session Description for Vorbis RTP Streams

   Session description information concerning the Vorbis stream 
   SHOULD be provided if possible and must be in accordance with 
   [8].  The contents of the Vorbis Header file referred to in the 
   u attribute must contain all three of the config header blocks 
   as specified above.  The overflow bit of the header packet must 
   not be set.

   u=<URI of Vorbis header file>
   m=audio <port> RTP/AVP 98
   c=IN IP4 <URI of Vorbis stream>
   a=rtpmap:98 vorbis/<sample rate>

   The port value is specified by the server application bound to 
   the URI specified in the c attribute.  The bitrate value specified 
   in the a attribute MUST match the Vorbis sample rate value.

7 Congestion Control

   Vorbis clients SHOULD send regular receiver reports detailing 
   congestion.  A mechanism for dynamically downgrading the stream, 
   known as bitrate peeling, will allow for a graceful backing off
   of the stream bitrate.  This feature is not available at present
   so an alternative would be to redirect the client to a lower 
   bitrate stream if one is available. 

8 Security Considerations

   RTP packets using this payload format are subject to the security 
   considerations discussed in the RTP specification [3].  This implies 
   that the confidentiality of the media stream is achieved by using
   encryption.  Because the data compression used with this payload
   format is applied end-to-end, encryption may be performed on the 
   compressed data.  Where the size of a data block is set care must 
   be taken to prevent buffer overflows in the client applications.

<p>9 Acknowledgments

   This I-D is a draft-moffitt-vorbis-rtp-00.txt.  The MIME type 
   section is a continuation of draft-short-avt-rtp-vorbis-mime-00.txt

   Thanks to the AVT, Ogg Vorbis Communities / Xiph.org team including 
   Steve Casner, Ralph Jiles, Tor-Einar Jarnbjo, John Lazarro, Jack 
   Moffitt, Colin Perkins, Barry Short, Mike Smith.

<p><p><p><p><p>Kerr                        Expires August 20, 2003            [Page 10]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p>10 References

   1. Key words for use in RFCs to Indicate Requirement Levels 
      (RFC 2119), S. Bradner.

   2. libvorbis: Available from the Xiph website, http://www.xiph.org

   3. RTP: A Transport Protocol for Real-Time Applications (RFC 1889),
      Schulzrinne, et al.
   4. RTP: A transport protocol for real-time applications. Work   
      in progress, draft-ietf-avt-rtp-new-11.txt.

   5. RTP Profile for Audio and Video Conferences with Minimal Control. 
      Work in progress, draft-ietf-avt-profile-new-12.txt.

   6. Ogg Vorbis I spec:  Codec setup and packet decode.

   7. Ogg Vorbis I spec:  Comment field and header specification. 

   8. SDP: Session Description Protocol (RFC 2327), Handley, M. and 
      V. Jacobson.

<p>11 Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

<p><p><p><p><p><p>Kerr                        Expires August 20, 2003            [Page 11]
Internet Draft      draft-kerr-avt-rtp-vorbis-01.txt   February 20, 2003

<p><p>   This document and the information contained herein is provided on an

<p>12 Authors Address

   Phil Kerr
   Centre for Music Technology
   University of Glasgow
   Glasgow, Scotland
   UK, G12 8LT
   Phone: +44 141 330 5740
   Email: philkerr at elec.gla.ac.uk
   WWW: http://www.xiph.org/

<p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p><p>Kerr                        Expires August 20, 2003            [Page 12]

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.

More information about the Vorbis-dev mailing list