[vorbis-dev] On the "broken" .WAV files issue

Fabian Giesen ryg at gmx.net
Sun Aug 26 03:08:12 PDT 2001



A friend of mine recently had a problem with a "broken" .WAV file
(as you call them) because oggenc first printed out a warning and
then didn't accept the file because of a "unexpected EOF error".

Because I was interested in the issue, I decided to take a look at
the oggenc source, and in fact, it is your .WAV reader that's wrong.
More precisely: there are two version of the format chunk, one that's
old and belongs to the original .WAV specification (the corresponding
structure is called WAVEFORMAT in the win32 sdk and has 14 bytes,
the variant you'll typically see in (PCM) wave files is called
PCMWAVEFORMAT and has 16 bytes - the size that you expect the format
chunk to be). But there's also a newer one, called WAVEFORMATEX,
which was created to suit the requirements of non-PCM codecs better.
The major differences are that the bits per sample value now officially
is part of the wave format structure and that extra header information
is supported, if a codec needs it. The size of this extra information
is the additional two bytes that are "always" zero (actually, not always,
that's why oggenc rejected to read my friend's WAV file).

That should be sufficient to fix the .WAV file reader - I'll include
the structure definitions and corresponding remarks from MSDN below
(those are copy&pasted and I use outlook express, so it *may* contain
fancy tags or something like that - I hope you can tolerate it, but
I'm too lazy to clean it up right now :)

So, keep up the excellent work on vorbis,

Fabian Giesen

------

1. WAVEFORMAT structure

The WAVEFORMAT structure describes the format of waveform-audio data. Only
format information common to all waveform-audio data formats is included in this
structure. This structure has been superseded by the WAVEFORMATEX structure.
typedef struct {
    WORD  wFormatTag;
    WORD  nChannels;
    DWORD nSamplesPerSec;
    DWORD nAvgBytesPerSec;
    WORD  nBlockAlign;
} WAVEFORMAT;

Members
  wFormatTag
  Format type. The following type is defined:
    WAVE_FORMAT_PCM
    Waveform-audio data is PCM.
  nChannels
  Number of channels in the waveform-audio data. Mono data uses one channel and
stereo data uses two channels.
  nSamplesPerSec
  Sample rate, in samples per second.
  nAvgBytesPerSec
  Required average data transfer rate, in bytes per second. For example, 16-bit
stereo at 44.1 kHz has an average data rate of 176,400 bytes per second (2
channels - 2 bytes per sample per channel - 44,100 samples per second).
  nBlockAlign
  Block alignment, in bytes. The block alignment is the minimum atomic unit of
data. For PCM data, the block alignment is the number of bytes used by a single
sample, including data for both channels if the data is stereo. For example, the
block alignment for 16-bit stereo PCM is 4 bytes (2 channels - 2 bytes per
sample).
Remarks
For formats that require additional information, this structure is included as a
member in another structure along with the additional information.

2. PCMWAVEFORMAT structure

The PCMWAVEFORMAT structure describes the data format for PCM waveform-audio
data. This structure has been superseded by the WAVEFORMATEX structure.

typedef struct {
    WAVEFORMAT wf;
    WORD       wBitsPerSample;
} PCMWAVEFORMAT;

Members
  wf
  A WAVEFORMAT structure containing general information about the format of the
data.
  wBitsPerSample
  Number of bits per sample.
3. WAVEFORMATEX structure

The WAVEFORMATEX structure defines the format of waveform-audio data. Only
format information common to all waveform-audio data formats is included in this
structure. For formats that require additional information, this structure is
included as the first member in another structure, along with the additional
information.

typedef struct {
    WORD  wFormatTag;
    WORD  nChannels;
    DWORD nSamplesPerSec;
    DWORD nAvgBytesPerSec;
    WORD  nBlockAlign;
    WORD  wBitsPerSample;
    WORD  cbSize;
} WAVEFORMATEX;

Members
  wFormatTag
  Waveform-audio format type. Format tags are registered with Microsoft
Corporation for many compression algorithms. A complete list of format tags can
be found in the MMREG.H header file.
  nChannels
  Number of channels in the waveform-audio data. Monaural data uses one channel
and stereo data uses two channels.
  nSamplesPerSec
  Sample rate, in samples per second (hertz), that each channel should be played
or recorded. If wFormatTag is WAVE_FORMAT_PCM, then common values for
nSamplesPerSec are 8.0 kHz, 11.025 kHz, 22.05 kHz, and 44.1 kHz. For non-PCM
formats, this member must be computed according to the manufacturer's
specification of the format tag.
  nAvgBytesPerSec
  Required average data-transfer rate, in bytes per second, for the format tag.
If wFormatTag is WAVE_FORMAT_PCM, nAvgBytesPerSec should be equal to the product
of nSamplesPerSec and nBlockAlign. For non-PCM formats, this member must be
computed according to the manufacturer's specification of the format tag.
  Playback and record software can estimate buffer sizes by using the
nAvgBytesPerSec member.

  nBlockAlign
  Block alignment, in bytes. The block alignment is the minimum atomic unit of
data for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM,
nBlockAlign should be equal to the product of nChannels and wBitsPerSample
divided by 8 (bits per byte). For non-PCM formats, this member must be computed
according to the manufacturer's specification of the format tag.
  Playback and record software must process a multiple of nBlockAlign bytes of
data at a time. Data written and read from a device must always start at the
beginning of a block. For example, it is illegal to start playback of PCM data
in the middle of a sample (that is, on a non-block-aligned boundary).

  wBitsPerSample
  Bits per sample for the wFormatTag format type. If wFormatTag is
WAVE_FORMAT_PCM, then wBitsPerSample should be equal to 8 or 16. For non-PCM
formats, this member must be set according to the manufacturer's specification
of the format tag. Note that some compression schemes cannot define a value for
wBitsPerSample, so this member can be zero.
  cbSize
  Size, in bytes, of extra format information appended to the end of the
WAVEFORMATEX structure. This information can be used by non-PCM formats to store
extra attributes for the wFormatTag. If no extra information is required by the
wFormatTag, this member must be set to zero. Note that for WAVE_FORMAT_PCM
formats (and only WAVE_FORMAT_PCM formats), this member is ignored.
Remarks
An example of a format that uses extra information is the Microsoft Adaptive
Delta Pulse Code Modulation (MS-ADPCM) format. The wFormatTag for MS-ADPCM is
WAVE_FORMAT_ADPCM. The cbSize member will typically be set to 32. The extra
information stored for WAVE_FORMAT_ADPCM is coefficient pairs required for
encoding and decoding the waveform-audio data.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'vorbis-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.



More information about the Vorbis-dev mailing list