[Tremor] looking for test vectors that hit specific parts of codebook.c and res012.c for memory analysis

Tue Feb 3 09:38:46 PST 2009

Hi, Monty thanks for the reply - at least there weren't any huge surprises.
As for your last statement on memory concerns, it is a big deal for our
implementation.  The two areas I need to watch out for seem to be related to
the large codebooks & large comment headers.  I know the ogg spec says that
you're not supposed to embed images, but the first ogg tagger I downloaded
gave that option so it's bound to happen.  It seems that the decoder
allocates enough pages to gather the full comment header and then puts that
into the comment structure in _vorbis_unpack_comment() right?  If so then
this is an unbounded memory access issue and I need to avoid this scenario.
I know that some work has been done recently on-list to handle large comment
headers but if you have any additional ideas on how to deal with this
without unbounded memory usage that would be great.

Since I don't need to worry about comments at all I tried to create a dummy
page that read everything from the comments into a single static ogg
page/reference but that ended up failing badly and I didn't like the idea of
fighting against the spec so much so I dropped that attempt for the moment.
If you have any bright ideas I'd love to hear them.

As for the large codebooks, my only idea there was to look at the libVorbis
name at the beginning of the ogg file.  If the date is beyond a certain
point where the codebooks would be guaranteed to be smaller then I would
decode otherwise I'd dealloc and bail immediately.  Any idea if that's a
workable solution?  I'm concerned that some encoder implementations might
not have well formed names in them.  The version I am doing most of my
testing on is "Xiph.Org libVorbis I 20070622" and the name for the beta4 ogg
file is "Xiphophorus libVorbis I 20010225" - all I'd do is look at the value
of the numerical string and make a decision from there.  This would avoid
all of the extra pages required to read the large codebooks at init time,
but if the check fails because someone's encoder doesn't populate this field
properly then I'm screwed.  Is the name placed at that point under any sort
of control or do different implementations put all sorts of odd things in
there?

Ethan

> > I'd really like to avoid needing to allocate all this extra memory for a
> > very old encoder, but if there are a lot of ogg files out there then
> we're
> > going to at least need to come up with some elegant solution.  When is
> the
> > earliest the decoder realizes that the input file is from beta 4?  Is it
> > when it captures the vendor name in _vorbis_unpack_comment()?
>
> well, *I* have a bunch, but I don't count :-)  And yes, this was back
> in 2001ish.
>
> The decoder doesn't really know/care about specific releases.
> However, if memory usage is the concern, you can watch for large
> tesselated codebooks. Codebook maptype 0 == tesselation.
>
> Monty
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.xiph.org/pipermail/tremor/attachments/20090203/2291330c/attachment.htm