[cvs-annodex] commit (/annodex): standards/draft-pfeiffer-cmml-current.xml

silvia nobody at lists.annodex.net
Tue Feb 15 04:48:47 EST 2005


Update of /annodex (new revision 897)

Modified files:
   standards/draft-pfeiffer-cmml-current.xml

Log Message:
Fixed up the encoding side of CMML into Annodex. Yay!
Still missing:
- decoding Annodex to CMML
- additional new tags for CMML 2.1 (encoding hints)
- moving back from v3 to v2 for I-D submission




Modified: standards/draft-pfeiffer-cmml-current.xml
===================================================================
--- standards/draft-pfeiffer-cmml-current.xml	2005-02-14 16:37:18 UTC (rev 896)
+++ standards/draft-pfeiffer-cmml-current.xml	2005-02-14 17:48:46 UTC (rev 897)
@@ -660,7 +660,8 @@
       time of 350 seconds is to be included 50 seconds into the
       Annodex bitstream.  If no basetime (or no stream tag) is given,
       the basetime defaults to 0 npt. The basetime can be given as a
-      SMPTE or NPT time, but not as a utc time.
+      SMPTE or NPT time, or as a rational number as in 5/1300, but
+      not as a utc time.
       </t>
 
       <t>The "utc" attribute associates a calendar date and a
@@ -1300,13 +1301,28 @@
       <t>CMML is serialised by having some initial header pages that
       set up the CMML decoding environment, and contain header type
       information. The content of a CMML bitstream then consists of
-      "clip" tags.
+      "clip" tags. The "stream" tag is not represented in the CMML
+      bitstream as it controls the authoring of the bitstream that is
+      created by interleaving the CMML with the media streams listed
+      in the "stream" tag. Its information is meant to be stored in the
+      encapsulation format.
       </t>
 
+      <t>All of the CMML bitstream information is text. As it gets
+      encoded into a binary bitstream, an encoding format has to be
+      specified. To simplify things, UTF-8 is defined as the mandatory
+      encoding format for all data in a CMML binary bitstream. Also,
+      the encoding process MUST ensure that newline characters are
+      represented as LF (or "\n" in C) only and replace any new line
+      representations that come as CR LF combinations (or  "\r\n" in C)
+      with LF only.
+      </t>
+
       <section title="The format of the CMML ident header packet">
 
-	<t>The ident header packet of a logical bitstream contains all
-        information required to set up a CMML decoder. It has the
+	<t>The first header packet of a CMML logical bitstream is the
+        CMML ident header. It contains all information required to identify
+        the CMML bitstream and to set up a CMML decoder. It has the
         following format:
 	</t>
 
@@ -1321,23 +1337,11 @@
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Version major                 | Version minor                 | 8-11
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-   | Granulerate numerator                                         | 12-15
-   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-   |                                                               | 16-19
-   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-   | Granulerate denominator                                       | 20-23
-   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-   |                                                               | 24-27
-   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-   | Granuleshift  |                                                 28
-   +-+-+-+-+-+-+-+-+
+   | ...
 
 	    ]]></artwork>
         </figure>
 
-	<t>Fields with more than one byte length are encoded LSB
-	  (least significant byte) first.
-	</t>
  
         <t>The fields in an CMML ident header packet have the following
         meaning:
@@ -1365,44 +1369,33 @@
           minor version number of the CMML format
           bitstream.
           </t>
-	  <t>Granule rate numerator &amp; denominater: 8 Byte integer
-	  number each. They represent the temporal resolution of the
-	  logical bitstream in Hz given as a rational number in the
-	  same way as the fishead basetime field above.
-	  </t>
-          <t>Granuleshift: a 1 Byte integer number describing whether to
-          partition the granule_position into two for that logical
-          bitstream, and how many of the lower bits to use for the
-          partitioning. The upper bits then still signify a
-          time-continuous granule positions for a directly decodable
-          and presentable data granule. The lower bits allow for
-          specification of a finer resolution such that for example
-          predicted frames of a video can be addressed as well, though
-          not decoded without tracing back to the last fully decodable
-          data granule. This is e.g. the case with Ogg theora.</t>
         </list>
 
+        <t>When encapsulating a CMML bitstream, more fields may be added
+        to this header as required by the encapsulation or exchange format.
+        </t>
+
       </section>
 
       <section title="The format of the CMML secondary headers">
 
 	<t>The CMML secondary headers are a sequence of
-        two packets that contain the CMML "setup" information and
-        are getting mapped into (at least) two Ogg pages:
+        two packets that contain the CMML and XML "setup" information:
           <list typs="symbols">
-            <t>one packet with the CMML xml preamble.</t>
+            <t>one packet with the CMML xml preamble and "cmml" tag.</t>
             <t>one packet with the CMML "head" tag.</t>
           </list>
-        These packets contain textual, not binary information. All
-        characters MUST be encoded in UTF-8 as transport format.
+        These packets contain textual, not binary information.
 	</t>
 
         <t>The CMML preamble tags are all single-line tags, such as the
         xml processing instruction (<![CDATA[<?xml...>]]>) and the
         document type declaration (<![CDATA[<!DOCTYPE...>]]>).
-        The only CMML tag that is not already serialized from a
+        </t>
+
+        <t>The only CMML tag that is not already serialized from a
         CMML file is the "cmml" tag, as it encloses all the other
-        content tags. To include it into the Ogg stream, the "cmml"
+        content tags. To serialise it, the "cmml"
         start tag is transformed into a processing instruction,
         retaining all its attributes (<![CDATA[<?cmml ...>]]>), and
         the "cmml" end tag is deleted.
@@ -1430,8 +1423,9 @@
 	    ]]></artwork>
         </figure>
 
-        <t>The second CMML secondary header packet has the following
-           format.
+        <t>The second CMML secondary header packet contains the
+        CMML head element with all its attributes and other
+        containing elements and has the following format.
         </t>
 
 	<figure>
@@ -1439,7 +1433,7 @@
     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
-   | <head> ...                                                    | 0-
+   | <head ...                                                     | 0-
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | ...                                                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -1451,6 +1445,49 @@
 
       </section>
 
+      <section title="The format of the CMML data packets">
+
+        <t>The data packets of the CMML bitstream contain the
+        CMML clip elements. Their "start" and "end" attributes
+        however only exist for authoring purposes and are not
+        copied into the bitstream, but are rather represented
+        through the time mapping of the encapsulation format that
+        interleaves CMML data with data from other time-continuous
+        bitstreams. This avoids contradictory doubly represented
+        timing information. Generally the time mapping is done through
+        some timestamp representation and through the position in
+        the stream.
+        </t>
+
+        <t>A "clip" tag is encoded with all tags (except for the
+        "start" and "end" attributes) as a string printed into a
+        clip packet. The "clip" tag's "start" attribute tells the
+        encapsulator at what  time to insert the clip packet into
+        the bitstream. If an "end" attribute is present, it leads to
+        the creation of another clip packet, unless another clip packet
+        starts on the same track beforehand. This clip packet contains
+        an empty "clip" tag, i.e. a "clip" tag without "meta", "a",
+        "img" or "desc" elements and no attribute values except for a
+        copy of the "track" attribute from the original "clip" tag.
+        </t>
+
+	<figure>
+	  <artwork><![CDATA[
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | <clip ...                                                     | 0-
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | ...                                                           |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | </clip>                                                       |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+	    ]]></artwork>
+        </figure>
+
+      </section>
+
     </section>
 
 
@@ -1461,59 +1498,231 @@
 
       <section title="Media mapping for a CMML logical bitstream inside Ogg">
 
-        <t>As CMML is an authoring format for Annodex bitstreams, there
-        is a simple way to map the annotations and meta information
-        contained in a CMML instance document to the annotation
-        bitstream and header fields of an Annodex format bitstream.
-        Please be aware that some of the encoding rules given here are a MUST,
-        and others a SHOULD. As the binary header format for the annotation
-        and media bitstreams provide for an extensible list of message
-        header fields, an encoder MAY however add some or all of the
-        non-used tags in there and even add others. For this section a
-        detailed understanding of the <xref target="ANX">Annodex format
-        bitstream</xref> is necessary.
+        <t>When mapping a CMML logical bitstream into Ogg, the 
+        serialisation as described in the previous section is used as
+        a logical bitstream. The ident packet is extended by a few
+        fields that are necessary for handling the time stamping of
+        the content packets (i.e. the clips) for Ogg. Here is its format:
         </t>
 
-        <t>The "head" and "clip" tags of a CMML document are mapped as
-        codec data into the annotation bitstream of an Annodex bitstream,
-        where the "head" tag is regarded as a secondary header.
+	<figure>
+	  <artwork><![CDATA[
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | Identifier 'CMML\0\0\0\0'                                     | 0-3
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                                                               | 4-7
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | Version major                 | Version minor                 | 8-11
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | Granulerate numerator                                         | 12-15
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                                                               | 16-19
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | Granulerate denominator                                       | 20-23
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                                                               | 24-27
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | Granuleshift  |                                                 28
+   +-+-+-+-+-+-+-+-+
 
+	    ]]></artwork>
+        </figure>
 
- Thus,
-      the rest of the information in a CMML file, i.e. the "stream" tag,
-      the "cmml" tag and the preamble information, MUST be handled as 
-      binary header type information. Header type information in Annodex
-      is generally regarded as non-human readable information, therefore
-      by default language and directionality information will not be 
-      encoded. The character set used in the Annodex header fields is 
-      UTF-8, but the mandatory header fields are all covered by US-ASCII
-      code points and for the optional ones it is recommended to do the
-      same as much as possible. User defined optional message header
-      fields MUST follow the naming standard given in RFC2822.
-      </t>
-<!--
-<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
-<!DOCTYPE cmml SYSTEM "cmml.dtd">
+	<t>Fields with more than one byte length are encoded LSB
+	  (least significant byte) first.
+	</t>
 
-<cmml lang="en">
--->
+        <t>The additional fields in an CMML ident header packet for Ogg
+        have the following meaning:
+        </t>
+        <list style="numbers">
+	  <t>Granule rate numerator &amp; denominater: 8 Byte integer
+	  number each. They represent the temporal resolution of the
+	  logical bitstream in Hz given as a rational number in the
+	  same way as the fishead basetime field above.
+	  </t>
+          <t>Granuleshift: a 1 Byte integer number describing whether to
+          partition the granule_position into two for the CMML logical
+          bitstream, and how many of the lower bits to use for the
+          partitioning. The upper bits then still signify a
+          time-continuous granule position for a directly decodable
+          and presentable data granule. The lower bits allow for
+          specification of the granule position of a previous CMML
+          data packet (i.e. "clip" element), which helps to identify
+          how much backwards seeking is necessary to get to the last
+          and still active "clip" element (of the given track).
+          </t>
+        </list>
 
-        <section title="Encoding the 'stream' tag">
- 
-          <t>A CMML instance document contains in its "stream" tag
-          information that is relevant to the authoring process of Annodex
-          format bitstreams.
+        <t>A default granule rate for CMML is: 1/1000. The default
+        granule shift used is 32, which halfs the granule position to
+        allow for the packwards pointer to be public.
+        </t>
+
+        <t>The ident header packet is encapsulated into the bos page of 
+        the CMML logical bitstream in Ogg. The other header packets are
+        included as secondary header packets. The content packets are
+        also included into Ogg by encapsulating them into Ogg pages and
+        providing them with the accurate offset time.
+        </t>
+
+      </section>
+
+      <section title="Using CMML to author Annodex bitstreams">
+
+        <t>As CMML contains authoring information for Annodex bitstreams,
+        a CMML instance document contains more than just the annotation
+        information necessary for the CMML logical bitstream. It also
+        contains control information to create the control section of an
+        Annodex bitstream, i.e. the skeleton bitstream with its secondary
+        header packets describing each of the contained logical bitstreams.
+        Note that we only describe the creation of Annodex Version 3.0
+        bitstreams here.
+        </t>
+
+        <t>The authoring information stems in particular from the "stream" tag
+        plus some specific information from the "cmml" tag. Generally,
+        the "stream" tag's attributes contribute to the skeleton fishead
+        packet, the "import" tag's attributes to the skeleton fisbone
+        packets of each logical bitstream, and the "cmml" tag's attributes
+        to the fisbone of the CMML logical bitstream. While the "cmml" tag
+        is represented in full as a processing instruction in the secondary
+        header packets of the CMML logical bitstream (see above), this is
+        not the case for the "stream" tag. Therefore, this section also
+        contains a description of what tags of the "stream" tag are not
+        used inside an Annodex bitstream.
+        </t>
+
+        <section title="Creating the skeleton ident packet">
+
+          <t>The skeleton ident packet receives the "basetime" and the
+          "utc" field information from the "stream" tag.
           </t>
 
-          <t>The "stream" tag itself finds no representation in the
-	  Annodex bitstream. Rather, it contains both, information on
-	  the complete Annodex bitstream, and information on the
-	  different input documents. This is information that finds
-	  a representation in the Skeleton logical bitstream of an
-          Annodex bitstream. The second information is also used
-          during the encoding process of each media bitstream.
+          <t>"Basetime numerator &amp; denominator": if the "basetime"
+          attribute is given in a CMML instance document, it MUST be
+          represented in the skeleton ident header in the fields
+          "Basetime numerator" and "Basetime denominator". It is converted
+          from a possible NPT or SMPTE representation to a rational number
+          to be stored in these fishead fields.
 	  </t>
 
+          <t>"Presentationtime numerator &amp; denominator": if the "basetime"
+          attribute is given in a CMML instance document, it also
+          determines the presentation time of the interleaved bitstream and
+          the "Basetime numerator" and "Basetime denominator" MUST be
+          copied to the "Presentationtime numerator" and "Presentationtime
+          denominator" fields of the skeleton ident header.
+	  </t>
+
+          <t>"UTC": if the "utc" attribute is given in a CMML instance document,
+          it MUST be represented in the skeleton ident header in the "UTC" field.
+          </t>
+
+        </section>
+
+        <section title="Creating the skeleton fisbone packets">
+
+          <t>A fisbone packet for a logical bitstream is created through
+          the authoring information of an "import" tag in a CMML instance
+          document's "stream" tag. One "import" tag contains information
+          on one particular logical bitstream in the interleaved bitstream
+          and thus creates one particular skeleton fisbone packet.
+          </t>
+
+          <t>"Granulerate numerator &amp; denominator": if the "granulerate"
+          attribute is present in the "import" tag, it MUST be represented 
+          in the fisbone header for the respective media bitstream in the
+          fields "Granulerate numerator" and "Granulerate denominator".
+          The encoder MUST however ascertain that the values are sensible,
+          and if it knows the accurate granule rate for a logical bitstrea
+          overrun the user input with the one that was used during creation
+          of the interleaved bitstream.
+          </t>
+
+          <t>"Content-type" message header field: this attribute MUST be
+          represented in the respective skeleton fisbone packet as a message header
+          field with name "Content-type", as it signifies the MIME type
+          of the media bitstream, providing for a decoding hint. If the user
+          does not specify the "contenttype" attribute, the encoder
+          MUST provide it during the interleaving process.
+          </t>
+
+          <t>"ID" message header field: if an "id" attribute is specified
+          for an "import" tag, it SHOULD be represented in the skeleton
+	  fisbone header for the respecitve media bitstream as a message
+          header field with name "ID", as it signifies a short identifying 
+	  machine-readable string for the import media bitstream.
+	  </t>
+
+          <t>User specified message header fields: if "name" and "value"
+          attributes are specified in the "param" tags of the "import" tag,
+          these MAY be represented in the skeleton fisbone packet of the respective
+          media bitstream as a message header field with the given name-value pair.
+          These fields are highly dependent on the type of media bitstream
+          handled and it therefore depends on the encoding tool to make
+          a selection of the parameters acquired. For example, an
+          audio bitstream that contains speech in a specific language may
+          be identified during CMML authoring through a param element with
+          "Content-Language" name, and acquired into the media bitstream
+           message header field of the same name.
+	  </t>
+
+        </section>
+
+        <section title="The CMML fisbone packet fields">
+
+          <t>A CMML instance document that specifies annotations in "head"
+          and "clip" elements does not get to use the "stream" tag to
+          provide encoding hints for its CMML logical bitstream. Its
+          encoding hints come from the "cmml" tag and the "encoding"
+          attribute of the xml processing directive.
+          </t>
+
+          <t>"Number of header packets": this field has a fixed size of 3
+          for the CMML specification given in this document. It counts the
+          CMML ident packet, the XML preamble packet and the head tag packet.
+          </t>
+
+          <t>"Granulerate numerator &amp; denominator": if the "granulerate"
+          attribute is present in the "cmml" tag, it MUST be represented 
+          in the fisbone header in the fields "Granulerate numerator" and
+          "Granulerate denominator". The encoder MUST however ascertain
+          that the values are sensible. The value defaults to "1/1000" if
+          it is not specified by the user.
+          </t>
+
+          <t>"Content-type" message header field: the content type for
+          the fisbone packet that describes the CMML logical bitstream is
+          fixed at "text/x-cmml" (or "text/cmml" after IANA registration
+          of the MIME type.
+          </t>
+
+          <t>"charset": if the xml processing directive contains an "encoding"
+          attribute, this MUST be represented in the CMML fisbone packet as an
+          addendum to the message header field "Content-type" as a charset. For
+          example: "Content-type: text/x-cmml; charset=UTF-8".
+          </t>
+
+          <t>"ID" message header field: if an "id" attribute is specified
+          for the "cmml" tag, it SHOULD be represented in the skeleton
+	  fisbone header for CMML as a message
+          header field with name "ID", as it signifies a short identifying 
+	  machine-readable string for the import media bitstream.
+	  </t>
+
+          <t>"Content-Language" and "Content-Dir" message header fields: if
+          the "lang" and "dir" attributes are given in a "cmml" tag, they
+          MUST be represented in the fishbone packet of the CMML bitstream
+          as message header fields with name "Content-Language" and "Content-Dir".
+	  </t>
+
+        </section>
+
+        <section title="Usage of the 'stream' tag">
+ 
 	  <t>Here is a list of the attribute values of the
 	  "stream" tag and how they are being used:
 	  <list>
@@ -1523,41 +1732,36 @@
 	    therefore be lost on encoding.
 	    </t>
 
-	    <t>basetime: this attribute MUST be represented in the Skeleton
-	    ident header in the fields "Basetime numerator" and "Basetime
+	    <t>basetime: this attribute maps to the skeleton
+	    ident header fields "Basetime numerator" and "Basetime
 	    denominator".
 	    </t>
 
-            <t>utc: this attribute MUST be represented in the Skeleton ident
-            header in the field "utc".</t>
+            <t>utc: this attribute maps to the skeleton ident
+            header field "UTC".</t>
 	  </list>
 	  </t>
 
 	  <t>Here is a list of the attribute values of the
 	  "import" tag and how they are being used:
 	  <list>
-	    <t>id: this attribute SHOULD be represented in the Skeleton
-	    secondary header for the respecitve media bitstream as a message
-	    header field with name "ID", as it signifies a short identifying 
-	    machine-readable string for the import media bitstream.
+	    <t>id: this attribute may be represented as a message header field
+            in the respective skeleton fisbone packet.
 	    </t>
 
 	    <t>lang, dir: not used, as these attributes signify the language 
 	    and directionality of the human readable texts in the stream tag
 	    which are not acquired into the Annodex bitstream.</t>
 
-	    <t>granulerate: this attribute MUST be represented in the Skeleton
-	    secondary header for the respective media bitstream in the
-            fields "Granule rate numerator" and "Granule
-	    rate denominator". The encoder MUST however ascertain that
-	    the values are corrected with the exact granule rate that was
-	    used during creation of the Annodex bitstream.
+	    <t>granulerate: this attribute is used in the skeleton
+	    fisbone header fields "Granule rate numerator" and "Granule
+	    rate denominator" as well as for the "Presentationtime numerator"
+            and "Presentationtime denominator".
 	    </t>
 
-            <t>contenttype: this attribute MUST be represented in the 
-	    respective Skeleton secondary header packet as a message header
-            field with name "Content-type", as it signifies the MIME type
-            of the media bitstream, providing for a decoding hint.
+            <t>contenttype: this attribute is represented in the 
+	    respective skeleton fisbone packet as a message header
+            field with name "Content-type".
             </t>
 
 	    <t>src: not used, as this attribute only points to the location
@@ -1582,126 +1786,14 @@
 	    therefore be lost on encoding.
 	    </t>
 
-	    <t>name, value: these attributes MAY be represented in the
-            Skeleton secondary header packet of the respective media bitstream
+	    <t>name, value: these attributes may be represented in the
+            skeleton fisbone packet of the respective media bitstream
             as a message header field with the given name-value pair.
-            These are highly dependent on the type of media bitstream
-            handled and it therefore depends on the encoding tool to make
-            a selection of the parameters acquired. E.g. lets regard an
-            audio bitstream containing speech in a specific language.
-            This language MAY be identified during CMML authoring as a
-            param element with "Content-Language" name, and acquired into
-            the media bitstream message header field of the same name.
 	    </t>
 	  </list>
 	  </t>
         </section>
 
-        <section title="Encoding the preamble and the 'cmml' tag">
-      
-	  <t>While the "stream" tag contained meta data on the different
-	  input media bitstreams, the preamble and the "cmml" tag contain
-	  meta data on the annotation bitstream and therefore end up in the
-	  Skeleton secondary header packet of the cmml bitstream.</t>
-
-	  <t>Here is a list of the attribute values of the preamble and
-	  how they are being acquired:
-	  <list>
-	    <t>xml version: without loss of generality, for simplicity
-	    this is fixed to version "1.0" for the current versions of 
-	    CMML 2.0 and Annodex 2.0. Therefore, this attribute
-	    does not get represented in the Annodex bitstream and MUST be
-	    auto recreated during ripping of annotations out of the
-	    Annodex bitstream.</t>
-
-	    <t>xml encoding: this attribute MUST be represented in the
-	    CMML fisbone packet as a message header field with name
-	    "Content-type" and the encoding format being the charset
-	    value following "text/x-cmml;" (or "text/cmml;" after IANA
-	    registration of the MIME type).</t>
-
-	    <t>xml standalone: this is fixed to "yes" for the current versions
-	    of CMML 2.0 and Annodex 2.0. There is a need to explore how
-	    to include data of general xml documents that conform to a
-	    different DTD into CMML and ultimately Annodex. Until then,
-	    standalone is fixed to "yes" and does not get represented in
-	    the Annodex bitstream, but MUST be auto recreated during
-	    ripping of annotations out of it.</t>
-
-	    <t>DOCTYPE declaration: this is fixed to 
-	    <![CDATA[<!DOCTYPE cmml SYSTEM "cmml.dtd">]]> and thus
-	    again does not get represented in the Annodex bitstream
-	    but MUST be auto recreated during ripping.</t>
-	  </list>
-	  </t>
-
-	  <t>Here is a list of the attribute values of the "cmml" tag and
-	  how they are being acquired:
-	  <list>
-	    <t>id: this attribute SHOULD be represented in the fisbone packet 
-	    of the annotation bitstream as a message header field with 
-	    name "ID", as it signifies a short identifying 
-            machine-readable string for the annotation bitstream (in
-	    analogy to the id field of the import tags).
-	    </t>
-
-	    <t>lang, dir: these attributes MUST be represented in the
-	    fishbone packet of the annotation bitstream as message header
-	    fields with name "Content-Language" and "Content-Dir".
-	    </t>
-
-	    <t>xmlns: this attribute is fixed to "http://www.annodex.net/cmml"
-	    and thus does not get represented in the Annodex bitstream
-	    but must be auto recreated during ripping.
-	    </t>
-	  </list>
-          </t>
-        </section>
-
-        <section title="Encoding the 'head' tag">
-	  <t>The CMML "head" tag is printed as a string into the first
-	  secondary header packet of the annotation bitstream. Thus,
-	  the value of the field named "number of header packets"
-	  in the fisbone page for the annotation bitstream will be 1, unless
-	  the "head" tag turns out to be too big for one Ogg page (i.e.
-	  larger than about 64K).
-	  </t>
-	
-	  <t>Note that the encoding process must ensure that newline
-	  characters are represented as LF (or "\n" in C) only. As some 
-	  systems represent the new line as CR LF combinations (or
-	  "\r\n" in C), the encoding process MAY need to strip out
-	  the CR character.
-	  </t>
-        </section>
-
-        <section title="Encoding the 'clip' tags">
-	  <t>The "clip" tags are the real content of an annotation
-	  bitstream. Their "start" and "end" attributes only exist for
-	  authoring purposes and are not copied into the annotation
-	  bitstream to avoid contradictory doubly represented information as
-	  their position in the stream already represents this timing information.
-	  </t>
-
-	  <t>A "clip" tag is encoded with all tags (except for the 
-	  "start" and "end" attributes) as a string printed into a 
-	  clip packet in the annotation bitstream. The "clip"
-	  tag's "start" attribute tells the Annodex encoder at what
-	  time to insert the clip packet into the bitstream. Its "end" 
-	  attribute (if present) leads to the creation of another 
-	  clip packet at the given end time in the Annodex bitstream, 
-	  unless another clip packet starts on the same track beforehand. 
-	  This clip packet contains an empty "clip"	tag, i.e. a "clip" 
-	  tag without "meta", "a", "img" or "desc" elements and no 
-	  attribute values except for a copy of the "track" attribute
-	  from the original "clip" tag.
-	  </t>
-
-	  <t>Again, the encoding process must ensure that newline
-	  characters are represented as LF (or "\n" in C) only.
-	  </t>
-        </section>
-
       </section>
 
       <!--**************************-->
@@ -1726,7 +1818,7 @@
         </t>
 
         <t>If the Annodex bitstream has a non-zero basetime or a non-null
-        utc time in the Skeleton ident header, a "stream" tag MUST be
+        utc time in the skeleton ident header, a "stream" tag MUST be
         created with these attribute values. That "stream" tag is empty
         by default. A ripping application MAY however extract all the data
         bitstreams out of the Annodex bitstream into files, and then reference
@@ -1737,13 +1829,13 @@
         the logical bitstreams:
         <list style="symbols">
 	  <t>the "contenttype" attribute from the "Content-type" Message 
-	  header field of the respecitve Skeleton secondary header packet,</t>
+	  header field of the respecitve skeleton secondary header packet,</t>
 	  <t>the "granulerate" attribute from the Granulerate fields of 
-	  the respecitive Skeleton secondary header packet,</t>
+	  the respecitive skeleton secondary header packet,</t>
 	  <t>the "id" attribute from a Message header field called "ID"
 	  if available,</t>
 	  <t>and "param" elements from all the remaining Message header fields
-	  of the respective Skeleton secondary header packet, where the field
+	  of the respective skeleton secondary header packet, where the field
           name gets stored in the "name" attribute and the value in the
           "value" attribute.</t>
 	</list>


-- 
silvia



More information about the cvs-annodex mailing list