[Vorbis-dev] Cover art

Ian Malone ibmalone at gmail.com
Wed Apr 1 17:05:23 PDT 2009


2009/4/1 Tor-Einar Jarnbjo <tor-einar at jarnbjo.name>:
> Ian Malone schrieb:
>> 2009/4/1 Ian Malone <ibmalone at gmail.com>:
>>> 2009/4/1 Mathias Kunter <mathiaskunter at yahoo.de>:

>> So what we have is base64 encode FLAC block, place in contents of
>> BINARY_COVERART tag.
>>
>> On the wiki the points against the Audio Shell etc. COVERART approach
>> were:
>>    *  no additional information like a description about the cover
>> art is provided,
>>
>
> This is not related to Base64 vs direct UTF-8 encoding of the data.

No, I simply included it because I wanted to outline why we might
still want to retain the FLAC block vs simple JFIF/PNG.

>>
>>    * it may breaks the playback on hardware players because of a
>> large vorbis comment header
>>
>
> This is also a potential problem with the FLAC structure.

Yes, no difference (well, actually a little worse), but there doesn't
appear to be a more widely compatible way of doing it.

>>
>>    * Old C / C++ based implementations don't display the binary data
>> as string since it always starts with a zero byte at the first
>> position, which is an empty string when interpreted as UTF-8.
>>
>
> As shown with WinAmp, the leading \0 may however cause existing software to
> completely ignore the comment and remove it, if the file for some reason is
> written back to disk.

Okay, no '\0'.  It's also easier to transform with command line tools,
which is nice for prototyping and things, but not a primary
consideration.

>>
>>    * All common picture file formats are supported (jpg, gif, whatever).
>>
>


> This is actually not clear from the FLAC specification. Which MIME types is
> a compliant Vorbis player required to support? For some picture types like
> cover art or leaflet pages, it might even be reasonable to embed a PDF file.
>
> Unfortunately, the FLAC specification is not very specific and and allows
> for quite a few ambiguities and implementation specific details:
>
> - The byte order is not clearly stated. It's only mentioned in the
> METADATA_BLOCK_VORBIS_COMMENT field description, that that field uses
> little-endian order as opposed to the "usual" big-endian order in other FLAC
> fields.
>
> - Why is "-->" used as a pseudo mime type to indicate that the binary
> picture data contains a URL to the image instead of the registered MIME type
> "text/uri-list"? Which character encoding is used for the URL? Which
> protocols must be supported by a Vorbis player: http://, ftp://, file://
> ...?
>

The FLAC block is definitely unclear.  However parts of it seem to be
based on ID3v2, which would provide answers to quite a few of your
questions.  I think we should reference the relevant section in the
ID3v2.4.0 informal spec and highlight relevant points.  In order then:

'The "image/png" [PNG] or "image/jpeg" [JFIF] picture format should be
used when interoperability is wanted.'
I think we can let that stand or strengthen it.  I know people keep
suggesting things like SVG, PDF etc., but it will never be implemented
if we keep throwing things into a wishlist specification.  PNG and
JPEG implementations are widespread and well known.

The vorbiscomment length fields are little endian.  I don't know why,
but assuming you can already read vorbiscomments this shouldn't worry
you.  The FLAC block fields will follow the FLAC format and therefore
be big endian.  PNG and JFIF define their own byte order (and are BE
IIRC, but this shouldn't worry anyone either because you just pass
them off to an appropriate library).

It seems "-->" is taken from ID3v2.  It would not be a valid MIME type
so far as I can tell (http://tools.ietf.org/html/rfc2045#section-5.1
and the IETF would be unlikely ever to use it).  Even the ID3v2
standard doesn't seem very optimistic about its use ('The use of
linked files should however be used sparingly since there is the risk
of separation of files.').  There are also security implications
attached to retrieving remote URLs, particularly hidden ones.
Interpreting file:// as a 'may' I think, others probably not.
Relative references do not get a scheme name so "-->" and "my.jpg"
might be the usual case.  If this is a 'may' it can be left to later,
but I think we should recommend against retrieving other schemes
without user permission.

Aside: you mention the leaflet type later, this might be a fairly
valid use of a link (local or remote) to a different media type (e.g.
pdf) which could be presented to a user as an item to retrieve and
view separately (e.g. launch a pdf reader).

> - How should the fields "width", "height", "colour depth" and "number of
> colours" be set for image formats, to which they don't apply? E.g. SVG, to
> which none of the fields are applicable, or "colour depth" to bi-colour
> (pure black and white) image formats. Are these fields reasonable at all, if
> an external image is linked from a URL, since the image may be replaced and
> not match the embedded information anymore?
>

Colour depth for binary is 1... Actually that field is a bit weak
because you can't distinguish colour types e.g. 8bit indexed RGB vs 8
bit greyscale, 32bit RGBA vs floating point...
They are probably useful info if you want to check without decoding
the image, though applications should not be surprised if they are
incorrect (producing an error message would be an acceptable response,
applications might want to display anyway). They are certainly less
meaningful for linked files, but could still inform decisions about
placeholder images for example.

While some of these are of doubtful use I think it is helpful to use
the FLAC block since it does provide some slightly more useful
information and is consistent with FLAC, which would save applications
supporting Vorbis and FLAC a (very) little effort in implementation.

> And at last, a specific question to Vorbis integration:
>
> - How should multiple images be embedded? Several METADATA_BLOCK_PICTURE
> structs concatenated in one comment field, or several comment fields with
> the same name, each containing one METADATA_BLOCK_PICTURE struct? Some of
> the picture types indicate that the image order is relevant, e.g. leaflet
> pages. If each image is put in separate comment fields, will e.g. libvorbis
> (or other Vorbis decoders) retain the comment field order, so that
> supporting software is able to show the images in the correct order?
>

Does anyone know an implementation that messes with the order of
comment tags?  I've found them to be pretty stable.  Multiple comment
fields is the usual Vorbis tradition, I think separating concatenated
blocks might become unpleasant.

If I'm right then the score currently stands at: base64 encoded FLAC
metadata block in a tag named 'METADATA_BLOCK_PICTURE'.

-- 
imalone


More information about the Vorbis-dev mailing list