[ogg-dev] New Ogg Dirac mapping draft
davidf+nntp at woaf.net
Wed Aug 13 13:08:15 PDT 2008
On 2008-08-12, Ralph Giles <giles at xiph.org> wrote:
> David Flynn has proposed a new Ogg Dirac mapping.
I thought it'd be a good idea to explain some of the rationale in why we
want to change the definition of granulepos in the ogg-dirac mapping.
Terms used in this document:
- GP64 = The 64bit granule_position as found in the page header.
- GPH+L = Granule pos high + low as split by granule_shift.
- ST = System Time; this is the monotonically increasing decoder clock
- PT = Presentation Time; Picture is displayed when PT = ST, which implies AV sync.
NOTE, we will not use the terms I,P,B -- they are mpeg2 terms which do not
map to constructs in dirac or h264 properly.
Properties of an Out-of-order video codec (dirac,h264,vc-1,mpeg2)
- Each picture has a unique PT.
- Pictures in the stream are not in PT order.
- The decoder reorders pictures at output into PT order.
- ST != PT in stream order (ie, input to decoder).
Defacto rules of ogg (I've not found these actually written down anywhere):
[A1] One of GP64 or GPH+L must increase for each packet
For in-order codecs using keyframe-granuleshift, both are true.
[A2] GPH+L == time.
All codecs so far are inorder, so ST=PT=time.
[A3] Page flushes are NOT invariant across remuxes.
The ogg RFC does states that GP64 is codec specific without any
What is needed to decode & display Out-of-order coded video?
Each picture must have a unique & accurate(correct) PT.
ST needs to be derived from the stream correctly:
- Can interpolate ST for a particular picture
- Can not determine the starting value of ST from the first picture.
- This happens in streaming, example:
PT: 14 10 11 12 13
ST: 10 11 12 13 14
What is problematic with the xiph mapping?
- Here is an example using the xiph mapping:
Sync point: V V V
PT(actual): 0 3 1 2 6 4 5 9 7 8 c a b d
GP_high: 0 0 0 0 6 6 6 6 6 6 6 6 6 d
GP_low: 1 1 2 3 1 1 1 1 2 3 3 4 5 1
GPH+L-1: 0 0 1 2 6 6 6 6 7 8 8 9 a d
- Each picture does not have a unique value for granulepos
=> Cannot determine unique&correct PT
=> Cannot determine correct ST
=> If (due to paging) no GP64 is available for a frame,
it is impossible to correctly interpolate the value of PT.
=> Don't know when to display pictures
- Seeking is difficult:
- Want to seek to frame N
- GPH+L is non-unique (don't know if the right one has been found)
=> Some values of GPH+L do not exist (searches may fail)
- and GPH+L != N (ie, may find the wrong frame)
- Locating the sync point (eg, after seek) is irritating
- GP_low != to number of packets(pictures) since sync point.
=> Have to search backwards until GP_high changes
- Copes badly with open gop:
To correctly decode picture(PT=4) in above example, the sync point it
depends upon is picture(PT=0).
However, this would violate the property of GP64(n) > GP64(n-1).
- It requires that a page is flushed before transmitting a sync point
so that a syncpoint is guaranteed to have a valid GP64.
This violates axiom A3
Some comments on choice of GP64 in bbc mapping:
Consider axiom A2 (GPH+L == time), assume this is PT.
- PT (not ST) makes sense for AV sync
- PT (not ST) makes sense for locating pictures (seek)
although a naive sync will find the wrong picture.
- The stream is in the order required to satisfy decoding
dependencies, ie PT jumps around.
This violates axiom A1 (GPH+L(n) > GPH+L(n-1)).
This violates axiom A1 (GP64(n) > GP64(n-1)).
Consider axiom A2 (GPH+L == time), assume this is ST.
- Complies with axiom A1 (GPH+L(n) > GPH+L(n-1)).
- Complies with axiom A1 (GP64(n) > GP64(n-1)).
- Is not useful for AV sync.
- Is not useful for seeking (you will end up with the wrong picture).
=> No good reason for GPH+L == ST
.'. choose GPH+L = PT.
Some interactions with skeleton:
> ... allowing to map a granule position [GPH+L] to time by calculating
> "granulepos [GPH+L] / granulerate"
'.' the only useful time to decoding is the PT
=> GPH+L = PT.
Ie, you can seek based upon presentation time, however a binary
search can hit a reordered picture and therefor choose the wrong
picture at the end. The error is +/- one GOP.
> Restart after seek still requires new code; that part of skeleton
> doesn't work.
Actually, ogg skeleton does not provide such information to any GOP
based video codec.
- It only has Preroll, which in a GOP based video
codec is constantly varying.
- Preroll only makes sense for video when using something such as
Ponly-with-intra-slice-refresh, where there are no keyframes.
Some final remarks:
It has been said that one-packet-per-page (ie, a page flush per
packet) upsets remuxing due to axiom A3. however, it is a requirement
of the xiph mapping that a page flush occurs before a sync point.
- To resolve the above contradiction, i assume that axiom A3 is invalid
- The bbc mapping allows reconstruction of PT, ST and distance to
syncpoint without any a priori information.
- The bbc mapping does not require peeking into the packet payload to
fill in the blanks
- If GPH+L is to be useful, it is not possible to comply with
axiom A1 (GPH+L(n) > GPH+L(n-1)).
- I've realised that it is possible to rearrange GP64 in such a way that:
+ Complies with axiom A1 (GP64(n) > GP64(n-1)).
+ violates axiom A1 (GPH+L(n) > GPH+L(n-1)).
However, i doubt that is any use, since i hope any sane demuxer searches
based upon GPH+L.
More information about the ogg-dev