[annodex-dev] RE: ogg skeleton/ basegranule (was Re: zen's comments
on new anx spec)
Silvia.Pfeiffer at csiro.au
Silvia.Pfeiffer at csiro.au
Sun Jan 30 12:40:22 EST 2005
Specifying a hole at the beginning of a vorbis track in Ogg is not really catered for IMHO. Maybe you can create a page without any content that declares at what time (or rather: granulepos) it ends (and thus the data starts). Not sure what decoders would do with pages without content (i.e. silence) though.
With the basegranule in skeleton it seems we agree on how it would work -thanks!
Cheers,
Silvia.
-----Original Message-----
From: Conrad Parker on behalf of Conrad Parker
Sent: Sun 1/30/2005 12:32 PM
To: Pfeiffer, Silvia (ICT Centre, Marsfield)
Cc: annodex-dev at lists.annodex.net
Subject: ogg skeleton/ basegranule (was Re: zen's comments on new anx spec)
[moved to annodex-dev]
On Sun, Jan 30, 2005 at 10:51:45AM +1100, Silvia.Pfeiffer at csiro.au wrote:
> Hi all,
>
> hmm, I think we need to take up this discussion again. Let me know if I'm talking crap - I'm trying to resync my mind...
>
> The basis for the design that we came up with in December was that there is an original file format (no subparts ripped out) which has the following properties:
> * all logical bitstreams start with a granulepos of 0
> * all these granulposes map to the same basetime (i.e. all logical bitstreams start at the same basetime)
> *
>
> Is my understanding correct? Are these assumptions sensible?
I don't think it's necessary that such an original file actually exists,
but that concept describes the relationship between the various fields.
>
> I'm wondering for a reason: on Friday Jan asked me about a way to cope in ogg/theora with bitstreams that you transcode from a different format (such as mpeg), where the audio bitstream starts later than the data of the video bitstream because there is silence at the beginning of the audio. This seems to happen frequently when you're ripping DVDs. In this case there is no such thing as an original file that adheres to the properties mentioned above. What you have is different basetimes for each logical bitstream. Ogg cannot currently cope with that FAIK, but Ogg Skeleton should. I'm just trying to figure out how.
>
> Is this how it would work? :
> - one basetime in the fishead
> - basegranule in theora fisbone is 0
> - basegranule in vorbis fisbone is whatever granulepos the basetime at which it starts maps to
>
> Let me know if I'm still sane.
for the Ogg Vorbis track, wouldn't it simply be a normal track with a
huge "hole" at the beginning (ie. between the codebooks and the first
data packet?)
in which case such data is valid ogg even without a skeleton?
(and, the definition of basegranule as "the granule that would be just
before the first packet in the stream" is stil valid, as it marks the
"granulepos" of the end of the hole).
Conrad.
>
> Cheers,
> Silvia.
>
> -----Original Message-----
> From: Pfeiffer, Silvia (ICT Centre, Marsfield)
> Sent: Sat 12/25/2004 7:45 AM
> To: Pfeiffer, Silvia (ICT Centre, Marsfield); Conrad Parker; illiminable
> Cc: conrad at metadecks.org; Pang, Andre (ICT Centre, Marsfield)
> Subject: RE: zen's comments on new anx spec
>
> Shit I did not realize how screwed up this picture turned out. *That* would have confused you!
> I've attached a fixed up one to help you understand better.
> Really sorry for this!
>
> Silvia.
>
> -----Original Message-----
> From: Pfeiffer, Silvia (ICT Centre, Marsfield)
> Sent: Fri 12/24/2004 6:40 AM
> To: Conrad Parker; illiminable
> Cc: conrad at metadecks.org; Pang, Andre (ICT Centre, Marsfield)
> Subject: RE: zen's comments on new anx spec
>
> OK, next round of specs.
>
> Conrad and I discussed this new scheme heaps yesterday. I've been trying to get all of that into a consistent representation and here is what I'm proposing. Conrad, I hope this represents what you were after with as little relative numbers as possible.
>
>
> data presentation time
> starttime A |
> | |
> Stream v v
> --------------------------------------------------------------------------------
> | A | | | | | |
> --------------------------------------------------------------------------------
> | B | |
> --------------------------------------------------------------------------------
> ^
> |
> data starttime B
>
> A remuxed stream from the requested presentation time onwards will contain data from starttime A for stream A and from starttime B for stream B. The presentation time is the time at which a player should start rendering data. As this may be half way through a packet or page, it is necessary to also state for each logical bitstream when its data is starting. This effectively attributes a starttime to the first packet/page of a logical bitstream.The starttime is a mapping of the virtual granuleposition that the logical bitstream has to at its beginning to a basetime. In the original bitstream this maps granulepos 0 to some basetime. In a remuxed bitstream the starttime is the basetime plus an offset to a later packet (page). Disregarded in this picture are preroll and keyindex, which may cause several more previous pages (and packets) to also be included into the remuxed stream and producing an even earlier basetime.
>
> For a remuxed stream the granule positions won't change (such that simple byte-copying can be retained). To achieve accuracy on the time stamp that the data starttimes (A or B) represent, one needs to know the mapping of granulepos 0 to a basetime plus the number of granules discarded until the data starttime:
>
> presentation time (i.e. time to start rendering) = absolute time
> basetime (i.e. mapping of granulepos 0 to a time) = absolute time
> data starttime = basetime + (basegranule / granulerate) = relative time
>
> This can be represented in the skeleten track with an extra
> - presentation time in the fishead and
> - basegranule in the fisbone
> The basetime will remain in the fishead as it is a mapping of granulepos 0 to the presentation time of the original bitstream and is thus the same for all of the logical bitstreams.
>
> Now, when doing seeking, there is a need to calculate from a granulepos that one reaches the time that this granulepos represents in order to make a decision whether one has reached the right page or has to go back or forward. This still works as previously since it is independent of how the stream is chopped up:
> current time = (granulepos /granulerate) + basetime
> (where granulepos = keyindex + keyoffset).
>
> Now, when doing remuxing, here's what you have to do:
> 1) Seek to the requested time offset
> 2) Wind back until over all bitstreams you reach the minimum of all their keyoffsets and prerolls
> 3) Get the basegranule for each of the logical bitstreams at this point in the stream (i.e. the granulepos of the previous page in each of these bitstreams)
> 4) Copy the control section (the section enclosed by the skeleton track) with adjustment of the presentation time and of all the basegranules
>
> Now, when doing playback, start decoding the stream from the beginning, but only render on screen when the time reached in each of the logical bitstreams reaches the presentation time. The time reached at the beginning for each logical bitstream is determined by
> data starttime = basetime + (basegranule / granulerate)
> and decoding from there onwards provides more information on which time is being reached.
>
> OK, time for feedback from you all. Have a happy Christmas Day!
>
> Cheers,
> Silvia.
>
>
> -----Original Message-----
> From: Conrad Parker on behalf of Conrad Parker
> Sent: Thu 12/23/2004 9:33 AM
> To: illiminable
> Cc: Pfeiffer, Silvia (ICT Centre, Marsfield); conrad at metadecks.org; Pang, Andre (ICT Centre, Marsfield)
> Subject: Re: zen's comments on new anx spec
>
> On Wed, Dec 22, 2004 at 08:16:47PM +0800, illiminable wrote:
> > Also... further to my last comment... i don't think using "granules" as the
> > extra resyncing mechanism is a good choice... since for one... for
> > resynching of presentation time... you could care less about granules...
> > you want to work in real times... and more impotrantly... granules are
> > not.... well granular enough.
> >
> > Say a 5fps movie... means you can only resynch with 200ms accuracy (because
> > 1 frame = 1 granule ignoring the shift shenanigans)... which is no good for
> > resynching av.
>
> err, no, if you have a constant granuleoffset in each track, then each
> track can be sync'd to it's own granularity -- ie. such a video at 5fps
> can be synched to its audio at the audio samplerate.
>
> however i do agree that granules aren't granular enough, and it can be
> difficult (at best!) to determine a timestamp for the first packet using
> only granulepos. If we make that offset an absolute time, then it should
> be given as a rational at the same accuracy as granulerate.
>
> kfish.
>
More information about the annodex-dev
mailing list