[Vorbis-dev] Cover art
Ian Malone
ibmalone at gmail.com
Tue Mar 31 16:02:47 PDT 2009
2009/3/31 Tor-Einar Jarnbjo <tor-einar at jarnbjo.name>:
> Ian Malone schrieb:
>>
>> Yes it's not ideal, but I'm reasoning on the basis that all the specs
>> dealing with the vorbis comment headers (or its clones in other
>> formats) require the comment contents to be UTF-8 encoded. This
>> presumably means the picture description is UTF-8 encoded twice.
>>
>
> I was wondering about this myself, as the proposal on
> http://wiki.xiph.org/index.php/VorbisComment didn't mention anything about
> UTF-8 encoding the binary content of the coverart structure, although it's
> done in the demo file. If not doing it, arbitrary binary content is most
> likely not a valid UTF-8 sequence and may cause current software to fail.
>
> I'm however not sure if having a leading 0 byte in the "string" solves too
> many incompatibilities with old software and if this approach is much better
> than Base64-encoding the data. If the byte values in the image structure are
> equally distributed (with JPEG they most likely are), the UTF-8 encoding
> will add an overhead of 50%, while the Base64 encoding adds an overhead of
Pretty much spot on 50% in the example.
> 33%. Even if older software displays the Base64 comment value as a string,
> it's unlikely that the comment is ignored completely, as is the case with
> e.g. WinAmp. Not only is the BINARY_COVERART not shown in the file info
> dialog, but it's removed from the file if other comments are edited with
> WinAmp.
>
> I'm also honestly not 100% sure how C/C++ is handling Unicode strings (it
> was not really a topic the last time I wrote anything in C), but 0x00 is
> actually a valid Unicode control character and will not always be treated as
> an end of string marker by current software, so using UTF-8 encoding instead
> of Base64 does not guarantee, that the content is not treated as a string
> and shown:
>
UTF-8 was designed partly to avoid breaking 8 bit string handling, so
a UTF-8 string in C is well behaved (until you have to worry about
encoding/decoding and translation to locales). To C string functions
codepoint 0 always looks like '\0'.
I'm not really familiar with base64, can 0x00 occur?
Completely off-message suggestion follows:
It's things like this that really argue for the sense of doing the
multiplexed approach...
How about: FLAC picture block packets + some kind of modified FLAC bos
to identify as a cover art stream? Compromise between quick-hack and
technically elegant. (For those people upset by the other off-message
suggestion that both PNG and JNG could be encapsulated following the
MNG-Ogg format.)
--
imalone
More information about the Vorbis-dev
mailing list