[vorbis] Format converters

R.J.J.H. van Son Rob.van.Son at hum.uva.nl
Mon Aug 19 02:37:05 PDT 2002



Are direct bit-rate converters possible for Ogg Vorbis? Or do they
already exist?

Even more, is it possible to directly convert MP3/ATRAC3 (i.e., Sony
Minidisc) encoding to and from Ogg Vorbis?

There has been an earlier discussion on this question. But this
discussion centered around the fact that people couldn't see the point,
and were rather hostile to the idea.

There is a very good reason to want this. I can best illustrate this
with an example case:

A while ago, a college of me, for a few years in a row, spent several
months a year in northern Irian Yaya, which is as remote as the name
suggests. She recorded a lot of speech from local villagers on her
lap-top on solar power. She was REALY isolated. 

Now, a new project would switch to Minidisc (ATRAC3) recordings,
transferring them to the lap-top/CD-ROM in compressed form (e.g., Ogg
Vorbis, 80 kbs would do). Finally this speech would end up in a huge
speech corpus for small languages, which uses a different compression
codec and another bit-rate for archiving, say MP3 at 192 kbs.

Note that the researcher in question cannot influence the encoding in
the recording device nor the final compression format of the corpus. So
answers like, "Don't do this", and "Let them switch to Ogg Vorbis" are
not productive. In that case she would simply switch to the corpus
codec, e.g., MP3 at 192 kbs.

This is not as far-fetched as it might seem. In Europe and the USA, lots
of money is currently spent on building large corpora of small
languages. These languages are spoken in Jungles (Amazonia, South-east
Asia), Tundra's (Northern Siberia, Kamchatka), and high mountain area's
(Himalaya's). Furthermore, large corpora (>100GB) of natural speech are
collected by volunteers carrying minidisc equipment. All this speech
will end up in archives using some kind of compression.

I have done some studies of the effects of compression on speech
acoustics and all these compression steps would, each individually, not
introduce distortions large enough to matter (i.e., RMS error < 1
semitone for standard speech analysis results). However, used in
cascade, the distortion explodes for whole spectrum measures. The
results suggest that the problem lies in the accumulation of
quantization noise, but I could be wrong. This explosion can be
prevented, I think, by NOT doing decoding->encoding steps, but by doing
direct format translations, WITHOUT decoding.

However, there seem to be no such translators and I do not know whether
they are even possible.

Can anyone help?

Rob

-- 
           Rob van Son
  Institute of Phonetic Sciences/ACLC
       University of Amsterdam
 

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.




More information about the Vorbis mailing list