[theora-dev] Ogg questions

Sun Jan 5 23:49:51 PST 2003

On Tue, Jan 07, 2003 at 01:41:11AM +1100, Silvia.Pfeiffer at csiro.au wrote:
> 
> 2) an I-D describing the "Ogg file format" written by me:
> http://www.ietf.org/internet-drafts/draft-pfeiffer-ogg-fileformat-00.txt
> 

There's atleast one error in this document:

   Ogg Theora encapsulates a Vorbis-encoded audio bitstream and a
   Tarkin-encoded video bitstream in a single physical Ogg bitstream.

That's incorrect.  Tarkin is a completely different codec, Theroa is
based on VP3.  Also, I do not believe that Theora encapsulates the
Vorbis, but that they exist as concurrent, synced chains within Ogg.
Monty would know more about this, as I have done very little studying of
the Theora format.

Also, if Monty would care to clairify...

   Ogg Vorbis puts a further constraint onto Ogg by specifying that
   concurrent multiplexing is not allowed in Ogg Vorbis files.

Is this true?  I've been wondering about this.

I'm setting up an Ogg "editor" for the Freeform codebase (detailed at
http://savannah.gnu.org/projects/freeform ) which will allow encoded
Ogg files to be published, then allow the publisher, or another
publisher, to crop, chain together, etc different pieces to form a new
work.

A good example of this is the Indymedia Newsreal project.  This project
has people from around the world submit up to 5 minute segments which
are combined into a monthly news program, this is distributed both
online and on physical mediums (VHS, VCD, DVD, etc) for viewing.
Currently, these segments are submitted via DV tapes to have some person
in a centralised location go through the tapes and combine.  Then the
tapes would be archived digitally and, at a lower bitrate, for download.

An alternative to this would be to have the publishers encode high
bitrate theora (Q6+) and upload as Ogg.  Then someone, somewhere, will
go through these segments and using the above mentioned codebase merge
the show's intro, the segments, any media that would go between the
segments, and the closing into one larger Ogg file.  This file could
then be used to generate the physical medias, be archived itself, and
generate the lower bitrate (Q1) stream version (peeling???).

One issue that comes up is the NTSC/PAL deal.  Some media will be
submitted as NTSC, some PAL, and there is actually two physical versions
- NTSC and PAL.  If two theora streams of different bitrates are chained
together what will happen when you run it through a decoder?

Another limitation to this, and Vorbis streams as well, is more elegant
handling of stream-switching.  Similar to Quicktime's "Effects", as I've
been reading on, it think it'd be worthwhile to have an "effects" codec
which did things like handle audio mixing (multiple concurrent vorbis,
speex, and flac streams), video fading (overlapping theora streams),
scaling/cropping/spacing theora streams like layers over eachother (ie,
one frame can reduce in size or be pushed off the screen), and other
basic video-editing effects.  Applying these to pre-encoded media would
allow cooperative projects to happen without the generation-loss that
would result from decoding/editing/re-encoding and allow the seperation
of different elements such as a movie with the soundtrack and vocals
seperated so that the user could choose between english or spanish,
while only having a simple speex stream for each language.  

Even when we're just talking Vorbis, it'd be nice to be able to have
overlapping streams which an effects codec caused to crossfade, or with
Speex being able to have voice over music at the tail end (both full
volume), then fade out the vorbis stream, continue with the speex
stream, then chain into a new speex stream at the same time as starting
a new vorbis stream which is faded in.  I'm imagining a cool mixing
board/effects icecast2 stream source for handling all this on the fly
with pre-encoded media.    

I'm saying all this to ask, is this possible to do within the current
Ogg Vorbis specifications, if we added a codec to handle it?  If a
player (such as XMMS) runs into a codec it doesn't understand, will it
play the rest?  If it finds two Vorbis layers that overlap, will it die?
It would kinda look like this:

<p>          Track 1             Crossfade            Track 2
[AAAAAAAAAAAAAAAAAAAAAAAAAA][BBBBBBBBBBB]
                            [CCCCCCCCCCC][DDDDDDDDDDDDDDDDDDDDDDDDDDDD]
                            [ZZZZZZZZZZZ]

A = Track 1 with n sec cut from tail
B = n sec tail of Track 1
C = n sec lead of Track 2
D = the rest of Track 2
Z = effects codec

I can understand if XMMS would silently ignore Z, but what is going to
happen when it hits the end of A, chained together with B and C both
being Vorbis streams?  Will it silently ignore C or crash?  I guess it
wouldn't be so bad if old players simply chopped off the "crossfade"
beginning of songs, since it would be typically less than two seconds,
but crashing or dropping the live stream would be quite bad.

I assume that only Monty knows enough about this to answer definitivly..
prehaps some of these questions can be incorporated into Theora specs?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: part
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20030106/dd3596b6/part.pgp