[xiph-cvs] cvs commit: speex/doc sampledec.c sampleenc.c manual.lyx
Jean-Marc Valin
jm at xiph.org
Sun Feb 16 21:02:04 PST 2003
jm 03/02/17 00:02:03
Modified: doc manual.lyx
Added: doc sampledec.c sampleenc.c
Log:
Reorganization of the doc, added sample source code
Revision Changes Path
1.51 +2409 -2374speex/doc/manual.lyx
Index: manual.lyx
===================================================================
RCS file: /usr/local/cvsroot/speex/doc/manual.lyx,v
retrieving revision 1.50
retrieving revision 1.51
diff -u -r1.50 -r1.51
--- manual.lyx 31 Jan 2003 01:42:53 -0000 1.50
+++ manual.lyx 17 Feb 2003 05:02:03 -0000 1.51
@@ -29,7 +29,7 @@
The Speex Codec Manual
\newline
-(for version 1.0rc2)
+(for version 1.0rc3)
\layout Author
Jean-Marc Valin
@@ -167,1090 +167,980 @@
\layout Itemize
Intensity stereo encoding option
+\layout Section
+\pagebreak_top
+Feature description
\layout Standard
-The next two sections describe the internals of the codec and require some
- signal processing knowledge.
- If you are only interested in using Speex, you can skip to section
-\begin_inset LatexCommand \ref{sec:Command-line-encoder/decoder}
-
-\end_inset
-
-.
+This section explains the main Speex features, as well as some concepts
+ in speech coding that help better understand the next sections.
-\layout Section
-\pagebreak_top
-Introduction to CELP Coding
-\begin_inset LatexCommand \index{CELP}
+\layout Subsection*
+
+Sampling rate
+\begin_inset LatexCommand \index{sampling rate}
\end_inset
\layout Standard
-Speex is based on CELP, which stands for Code Excited Linear Prediction.
- This section attempts to introduce the principles behind CELP, so if you
- are already familiar with CELP, you can safely skip to section
-\begin_inset LatexCommand \ref{sec:Speex-narrowband-mode}
+Speex is mainly designed for 3 different sampling rates: 8 kHz, 16 kHz,
+ and 32 kHz.
+ These are respectively refered to as narrowband
+\begin_inset LatexCommand \index{narrowband}
\end_inset
-.
- The CELP technique is based on three ideas:
-\layout Enumerate
+, wideband
+\begin_inset LatexCommand \index{wideband}
-The use of a linear prediction (LP) model to model the vocal tract
-\layout Enumerate
+\end_inset
-The use of (adaptive and fixed) codebook entries as input (excitation) of
- the LP model
-\layout Enumerate
+ and ultra-wideband
+\begin_inset LatexCommand \index{ultra-wideband}
-The search performed in closed-loop in a
-\begin_inset Quotes eld
\end_inset
-perceptually weighted domain
-\begin_inset Quotes erd
+.
+
+\layout Subsection*
+
+Quality
+\begin_inset LatexCommand \index{quality}
+
\end_inset
\layout Standard
-This section describes the basic ideas behind CELP.
- Note that it's still incomplete.
-\layout Subsection
-
-Linear Prediction (LPC)
-\begin_inset LatexCommand \index{linear prediction}
+Speex encoding is controlled most of the time by a quality parameter that
+ range from 0 to 10.
+ In constant bit-rate
+\begin_inset LatexCommand \index{constant bit-rate}
\end_inset
+ (CBR) operation, the quality parameter is an integer, while for variable
+ bit-rate (VBR), the parameter is a float.
+
+\layout Subsection*
-\layout Standard
+Complexity
+\begin_inset LatexCommand \index{complexity}
-Linear prediction is at the base of may speech coding techniques, including
- CELP.
- The idea behind it is to predict the signal
-\begin_inset Formula $x(n)$
\end_inset
- using a linear combination of its past samples:
+ (variable)
\layout Standard
+With Speex, it is possible to vary the complexity allowed for the encoder.
+ This is done by controlling how the search is performed with an integer
+ ranging from 1 to 10 in a way that's similar to the -1 to -9 options to
+
+\emph on
+gzip
+\emph default
+ and
+\emph on
+bzip2
+\emph default
+ compression utilities.
+ For normal use, the noise level at complexity 1is between 1 and 2 dB higher
+ than at complexity 10, but the CPU requirements for complexity 10 is about
+ 5 time higher than for complexity 1.
+ In practice, the best trade-off is between complexity 2 and 4, though higher
+ settings are often useful when encoding non-speech sounds like DTMF
+\begin_inset LatexCommand \index{DTMF}
-\begin_inset Formula \[
-y[n]=\sum _{i=1}^{N}a_{i}x[n-i]\]
-
-\end_inset
-
-where
-\begin_inset Formula $y[n]$
\end_inset
- is the linear prediction of
-\begin_inset Formula $x[n]$
-\end_inset
+ tones.
+\layout Subsection*
-.
- The prediction error is thus given by:
-\begin_inset Formula \[
-e[n]=x[n]-y[n]=x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\]
+Variable Bit-Rate
+\begin_inset LatexCommand \index{variable bit-rate}
\end_inset
-
+ (VBR)
\layout Standard
-The goal of the LPC analysis is to find the best prediction coefficients
-
-\begin_inset Formula $a_{i}$
+Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically
+ to adapt to the
+\begin_inset Quotes eld
\end_inset
- which minimize the quadratic error function:
-\begin_inset Formula \[
-E=\sum _{n=0}^{L-1}\left[e[n]\right]^{2}=\sum _{n=0}^{L-1}\left[x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\right]^{2}\]
-
+difficulty
+\begin_inset Quotes erd
\end_inset
-That can be done by making all derivatives
-\begin_inset Formula $\frac{\partial E}{\partial a_{i}}$
-\end_inset
+ of the audio being encoded.
+ In the example of Speex, sounds like vowels and high-energy transients
+ require a higher bit-rate to achieve good quality, while fricatives (e.g.
+ s,f sounds) can be coded adequately with less bits.
+ For this reason, VBR can achive lower bit-rate for the same quality, or
+ a better quality for a certain bit-rate.
+ Despite its advantages, VBR has two main drawbacks: first, by only specifying
+ quality, there's no guaranty about the final average bit-rate.
+ Second, for some real-time applications like voice over IP (VoIP), what
+ counts is the maximum bit-rate, which must be low enough for the communication
+ channel.
+\layout Subsection*
- equal to zero:
-\begin_inset Formula \[
-\frac{\partial E}{\partial a_{i}}=\frac{\partial }{\partial a_{i}}\sum _{n=0}^{L-1}\left[x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\]
+Average Bit-Rate
+\begin_inset LatexCommand \index{average bit-rate}
\end_inset
-
+ (ABR)
\layout Standard
-The
-\begin_inset Formula $a_{i}$
-\end_inset
+Average bit-rate solves one of the problems of VBR, as it dynamically adjusts
+ VBR quality in order to meet a specific target bit-rate.
+ Because the quality/bit-rate is adjusted in real-time (open-loop), the
+ global quality will be slightly lower than that obtained be encoding in
+ VBR with exactly the right quality setting to meet the target average bit-rate.
+\layout Subsection*
- filter coefficients are computed using the Levinson-Durbin
-\begin_inset LatexCommand \index{Levinson-Durbin}
+Voice Activity Detection
+\begin_inset LatexCommand \index{voice activity detection}
\end_inset
- algorithm, which starts from the auto-correlation
-\begin_inset LatexCommand \index{auto-correlation}
+ (VAD)
+\layout Standard
+When enabled, voice activity detection detects whether the audio being encoded
+ is speech or silence/background noise.
+ VAD is always implicitly activated when encoding in VBR, so the option
+ is only useful in non-VBR operation.
+ In this case, Speex detects non-speech periods and encode them with just
+ enough bits to reproduce the background noise.
+ This is called
+\begin_inset Quotes eld
\end_inset
-
-\begin_inset Formula $R(m)$
+comfort noise generation
+\begin_inset Quotes erd
\end_inset
- of the signal
-\begin_inset Formula $x[n]$
+ (CNG).
+\layout Subsection*
+
+Discontinuous Transmission
+\begin_inset LatexCommand \index{discontinuous transmission}
+
\end_inset
-.
+ (DTX)
\layout Standard
+Discontinuous transmission is an addition to VAD operation, that allows
+ to stop transmitting completely when the background noise is stationnary.
+ In file-based operation, since we cannot just stop writing to the file,
+ only 5 bits are used for such frames (corresponding to 250 bps).
+\layout Subsection*
-\begin_inset Formula \[
-R(m)=\sum _{i=0}^{N-1}x[i]x[i-m]\]
+Perceptual enhancement
+\begin_inset LatexCommand \index{perceptual enhancement}
\end_inset
\layout Standard
-For an order
-\begin_inset Formula $N$
-\end_inset
-
- filter, we have:
-\begin_inset Formula \[
-\mathbf{R}=\left[\begin{array}{cccc}
- R(0) & R(1) & \cdots & R(N-1)\\
- R(1) & R(0) & \cdots & R(N-2)\\
- \vdots & \vdots & \ddots & \vdots \\
- R(N-1) & R(N-2) & \cdots & R(0)\end{array}
-\right]\]
-
-\end_inset
-
+Perceptual enhancement is a part of the decoder which, when turned on, tries
+ to reduce (the perception of) the noise produced by the coding/decoding
+ process.
+ In most cases, perceptual enhancement make the sound further from the original
+
+\emph on
+objectively
+\emph default
+ (if you use SNR), but in the end it still
+\emph on
+sounds
+\emph default
+ better (subjective improvement).
+\layout Subsection*
-\begin_inset Formula \[
-\mathbf{r}=\left[\begin{array}{c}
- R(1)\\
- R(2)\\
- \vdots \\
- R(N)\end{array}
-\right]\]
+Algorithmic delay
+\begin_inset LatexCommand \index{algorithmic delay}
\end_inset
\layout Standard
-The filter coefficients
-\begin_inset Formula $a_{i}$
+Every speech codec introduces a delay in the transmission.
+ For Speex, this delay is equal to the frame size, plus some amount of
+\begin_inset Quotes eld
\end_inset
- are found by solving the system
-\begin_inset Formula $\mathbf{Ra}=\mathbf{r}$
+look-ahead
+\begin_inset Quotes erd
\end_inset
-.
- What the Levinson-Durbin algorithm does here is making the solution to
- the problem
-\begin_inset Formula $\mathcal{O}\left(N^{2}\right)$
-\end_inset
-
- instead of
-\begin_inset Formula $\mathcal{O}\left(N^{3}\right)$
-\end_inset
-
- by exploiting the fact that matrix
-\begin_inset Formula $\mathbf{R}$
-\end_inset
-
- is toeplitz hermitian.
- Also, it can be proved that all the roots of
-\begin_inset Formula $A(z)$
-\end_inset
-
- are withing the unit circle, which means that
-\begin_inset Formula $1/A(z)$
-\end_inset
-
- is always stable.
- This is in theory; in practice because of finite precision, there are two
- commonly used techniques to make sure we have a stable filter.
- First, we multiply
-\begin_inset Formula $R(0)$
-\end_inset
-
- by a number slightly above one (such as 1.0001), which is equivalent to
- adding noise to the signal.
- Also, we can apply a window the the auto-correlation, which is equivalent
- to filtering in the frequency domain, reducing sharp resonances.
-\layout Standard
-
-The linear prediction model represents each speech sample as linear combination
- of past samples, plus an error signal called the excitation (or residual).
-\begin_inset Formula \[
-x[n]=\sum _{i=1}^{N}a_{i}x[n-i]+e[n]\]
+ required to process each frame.
+ In narrowband operation (8 kHz), the delay is 30 ms, while for wideband
+ (16 kHz), the delay is 34 ms.
+ These values don't account for the CPU time it takes to encode or decode
+ the frames.
+\layout Section
+\pagebreak_top
+Command-line encoder/decoder
+\begin_inset LatexCommand \label{sec:Command-line-encoder/decoder}
\end_inset
\layout Standard
-In the
+The base Speex distribution includes a command-line encoder (
\emph on
-z
+speexenc
\emph default
--domain, this can be expressed as
-\layout Standard
+) and decoder (
+\emph on
+speexdec
+\emph default
+).
+ This section describes how to use these tools.
+\layout Subsection
-\begin_inset Formula \[
-x(z)=\frac{1}{A(z)}\: e(z)\]
+\emph on
+speexenc
+\begin_inset LatexCommand \index{speexenc}
\end_inset
\layout Standard
-where
-\begin_inset Formula $A(z)$
-\end_inset
+The
+\emph on
+speexenc
+\emph default
+ utility is used to create Speex files from raw PCM or wave files.
+ It can be used by calling:
+\layout LyX-Code
- is defined as
+speexenc [options] input_file output_file
\layout Standard
+The value '-' for input_file or output_file corresponds respectively to
+ stdin and stdout.
+ The valid options are:
+\layout Description
+
+--narrowband\SpecialChar ~
+(-n) Tell Speex to treat the input as narrowband (8 kHz).
+ This is the default
+\layout Description
-\begin_inset Formula \[
-A(z)=1-\sum _{i=1}^{N}a_{i}z^{-i}\]
+--wideband\SpecialChar ~
+(-w) Tell Speex to treat the input as wideband (16 kHz)
+\layout Description
+--ultra-wideband\SpecialChar ~
+(-u) Tell Speex to treat the input as
+\begin_inset Quotes eld
\end_inset
+ultra-wideband
+\begin_inset Quotes erd
+\end_inset
-\layout Standard
+ (32 kHz)
+\layout Description
-We usually refer to
-\begin_inset Formula $A(z)$
-\end_inset
+--quality\SpecialChar ~
+n Set the encoding quality (0-10), default is 8
+\layout Description
- as the analysis filter and
-\begin_inset Formula $1/A(z)$
-\end_inset
+--bitrate\SpecialChar ~
+n Encoding bit-rate (use bit-rate n or lower)
+\layout Description
- as the synthesis filter.
- The whole process is called short-term prediction as it predicts the signal
-
-\begin_inset Formula $x[n]$
-\end_inset
+--vbr Enable VBR (Variable Bit-Rate), disabled by default
+\layout Description
- using a prediction using only the
-\begin_inset Formula $N$
-\end_inset
+--abr\SpecialChar ~
+n Enable ABR (Average Bit-Rate) at n kbps, disabled by default
+\layout Description
- past samples, where
-\begin_inset Formula $N$
-\end_inset
+--vad Enable VAD (Voice Activity Detection), disabled by default
+\layout Description
- is usually around 10.
-\layout Standard
+--dtx Enable DTX (Discontinuous Transmission), disabled by default
+\layout Description
-Because LPC coefficients have very little robustness to quantization, they
- are converted to Line Spectral Pair
-\begin_inset LatexCommand \index{line spectral pair}
+--nframes\SpecialChar ~
+n Pack n frames in each Ogg packet (this saves space at low bit-rates)
+\layout Description
-\end_inset
+--comp\SpecialChar ~
+n Set encoding speed/quality tradeoff.
+ The higher the value of n, the slower the encoding (default is 3)
+\layout Description
- (LSP) coefficients which have a much better behaviour with quantization,
- one of them being that it's easy to keep the filter stable.
-
-\layout Subsection
+-V Verbose operation, print bit-rate currently in use
+\layout Description
-Pitch Prediction
-\begin_inset LatexCommand \index{pitch}
+--help\SpecialChar ~
+(-h) Print the help
+\layout Description
-\end_inset
+--version\SpecialChar ~
+(-v) Print version information
+\layout Subsubsection*
+Speex comments
+\layout Description
-\layout Standard
+--comment Add the given string as an extra comment.
+ This may be used multiple times.
+
+\layout Description
-During voiced segments, the speech signal is periodic, so it is possible
- to take advantage of that property by approximating the excitation signal
+--author Author of this track.
-\begin_inset Formula $e[n]$
-\end_inset
+\layout Description
- by a gain times the past of the excitation:
-\layout Standard
+--title Title for this track.
+
+\layout Subsubsection*
+Raw input options
+\layout Description
-\begin_inset Formula \[
-e[n]\simeq p[n]=\beta e[n-T]\]
+--rate\SpecialChar ~
+n Sampling rate for raw input
+\layout Description
-\end_inset
+--stereo Consider raw input as stereo
+\layout Description
+--le Raw input is little-endian
+\layout Description
-\layout Standard
+--be Raw input is big-endian
+\layout Description
-where
-\begin_inset Formula $T$
-\end_inset
+--8bit Raw input is 8-bit unsigned
+\layout Description
- is the pitch period,
-\begin_inset Formula $\beta $
-\end_inset
+--16bit Raw input is 16-bit signed
+\layout Subsection
- is the pitch gain and
-\begin_inset Formula $c(n)$
-\end_inset
- is taken from the
\emph on
-innovation codebook
-\emph default
-.
- We call that long-term prediction since the excitation is predicted from
-
-\begin_inset Formula $e[n-T]$
-\end_inset
+speexdec
+\begin_inset LatexCommand \index{speexdec}
- with
-\begin_inset Formula $T\gg N$
\end_inset
-.
-\layout Subsection
-Innovation Codebook
\layout Standard
-The final excitation
-\begin_inset Formula $e[n]$
-\end_inset
-
- will be the sum of the pitch prediction and an
+The
\emph on
-innovation
+speexdec
\emph default
- signal
-\begin_inset Formula $c[n]$
-\end_inset
+ utility is used to decode Speex files and can be used by calling:
+\layout LyX-Code
- taken from a fixed codebook.
+speexdec [options] speex_file [output_file]
\layout Standard
+The value '-' for input_file or output_file corresponds respectively to
+ stdin and stdout.
+ Also, when no output_file is specified, the file is played to the soundcard.
+ The valid options are:
+\layout Description
-\begin_inset Formula \[
-e[n]=p[n]+c[n]=\beta e[n-T]+c[n]\]
+--enh enable post-filter (default)
+\layout Description
-\end_inset
+--no-enh disable post-filter
+\layout Description
-This is where most of the bits in a CELP codec are allocated.
- It represents the information that couldn't be obtained either from linear
- prediction or pitch prediction.
- In the
-\emph on
-z
-\emph default
--domain we can represent the final signal
-\begin_inset Formula $X(z)$
-\end_inset
+--force-nb Force decoding in narrowband
+\layout Description
- as
-\begin_inset Formula \[
-X(z)=\frac{C(z)}{A(z)\left(1-\beta z^{-T}\right)}\]
+--force-wb Force decoding in wideband
+\layout Description
-\end_inset
+--force-uwb Force decoding in ultra-wideband
+\layout Description
+--mono Force decoding in mono
+\layout Description
-\layout Subsection
+--stereo Force decoding in stereo
+\layout Description
-Analysis-by-Synthesis and Error Weighting
-\begin_inset LatexCommand \index{error weighting}
-
-\end_inset
-
-
-\begin_inset LatexCommand \index{analysis-by-synthesis}
-
-\end_inset
-
-
-\layout Standard
+--rate\SpecialChar ~
+n For decoding at n Hz sampling rate
+\layout Description
-Most (if not all) modern audio codecs attempt to
-\begin_inset Quotes eld
-\end_inset
+--packet-loss\SpecialChar ~
+n Simulate n % random packet loss
+\layout Description
-shape
-\begin_inset Quotes erd
-\end_inset
+-V Verbose operation, print bit-rate currently in use
+\layout Description
- the noise so that it appears mostly in the frequency regions where the
- ear cannot detect it.
- For example, the ear is more tolerant to noise in parts of the spectrum
- that are louder and
-\emph on
-vice versa
-\emph default
-.
- That's why instead of minimizing the simple quadratic error
-\begin_inset Formula \[
-E=\sum _{n}\left(x[n]-\overline{x}[n]\right)^{2}\]
+--help\SpecialChar ~
+(-h) Print the help
+\layout Description
-\end_inset
+--version\SpecialChar ~
+(-v) Print version information
+\layout Section
+\pagebreak_top
+Programming with Speex (the libspeex
+\begin_inset LatexCommand \index{libspeex}
-where
-\begin_inset Formula $\overline{x}[n]$
\end_inset
- is the encoder signal, we minimize the error for the perceptually weighted
- signal
-\begin_inset Formula \[
-X_{w}(z)=W(z)X(z)\]
+ API
+\begin_inset LatexCommand \index{API}
\end_inset
-where
-\begin_inset Formula $W(z)$
-\end_inset
+)
+\layout Subsection
- is the weighting filter, usually of the form
+Encoding
\layout Standard
+In order to encode speech using Speex, you first need to:
+\layout LyX-Code
-\begin_inset Formula \begin{equation}
-W(z)=\frac{A\left(\frac{z}{\gamma _{1}}\right)}{A\left(\frac{z}{\gamma _{2}}\right)}\label{eq:weighting_filter}\end{equation}
-
-\end_inset
-
-
+#include <speex.h>
\layout Standard
-with control parameters
-\begin_inset Formula $\gamma _{1}>\gamma _{2}$
-\end_inset
-
-.
- If the noise is white in the perceptually weighted domain, then in the
- signal domain its spectral shape will be of the form
-\begin_inset Formula \[
-A_{noise}(z)=\frac{1}{W(z)}=\frac{A\left(\frac{z}{\gamma _{2}}\right)}{A\left(\frac{z}{\gamma _{1}}\right)}\]
-
-\end_inset
-
+You then need to declare a Speex bit-packing struct
+\layout LyX-Code
+SpeexBits bits;
\layout Standard
-If a filter
-\begin_inset Formula $A(z)$
-\end_inset
-
- has (complex) poles at
-\begin_inset Formula $p_{i}$
-\end_inset
+and a Speex encoder state
+\layout LyX-Code
- in the
-\begin_inset Formula $z$
-\end_inset
+void *enc_state;
+\layout Standard
--plane, the filter
-\begin_inset Formula $A(z/\gamma )$
-\end_inset
+The two are initialized by:
+\layout LyX-Code
- filter will have its poles at
-\begin_inset Formula $p_{i}^{'}=\gamma p_{i}$
-\end_inset
+speex_bits_init(&bits);
+\layout LyX-Code
-, making it a flatter version of
-\begin_inset Formula $A(z)$
-\end_inset
+enc_state = speex_encoder_init(&speex_nb_mode);
+\layout Standard
+For wideband coding,
+\emph on
+speex_nb_mode
+\emph default
+ will be replaced by
+\emph on
+speex_wb_mode
+\emph default
.
-\layout Section
-\pagebreak_top
-Speex narrowband mode
-\begin_inset LatexCommand \label{sec:Speex-narrowband-mode}
-
-\end_inset
+ In most cases, you will need to know the frame size used by the mode you
+ are using.
+ You can get that value in the
+\emph on
+frame_size
+\emph default
+ variable with:
+\layout LyX-Code
+speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
+\layout Standard
-\begin_inset LatexCommand \index{narrowband}
+Once the initialization is done, for every input frame:
+\layout LyX-Code
-\end_inset
+speex_bits_reset(&bits);
+\layout LyX-Code
+speex_encode(enc_state, input_frame, &bits);
+\layout LyX-Code
+nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
\layout Standard
-This section looks at how Speex works for narrowband (
-\begin_inset Formula $8\: \mathrm{kHz}$
-\end_inset
-
- sampling rate) operation.
- The frame size for this mode is
-\begin_inset Formula $20\: \mathrm{ms}$
-\end_inset
-
-, corresponding to 160 samples.
- Each frame is also subdivided into 4 sub-frames of 40 samples each.
+where
+\emph on
+input_frame
+\emph default
+ is a
+\emph on
+(float *)
+\emph default
+ pointing to the beginning of a speech frame,
+\emph on
+byte_ptr
+\emph default
+ is a
+\emph on
+(char *)
+\emph default
+ where the encoded frame will be written,
+\emph on
+MAX_NB_BYTES
+\emph default
+ is the maximum number of bytes that can be written to
+\emph on
+byte_ptr
+\emph default
+ without causing an overflow and
+\emph on
+nbBytes
+\emph default
+ is the number of bytes actually written to
+\emph on
+byte_ptr
+\emph default
+ (the encoded size in bytes).
+ Before calling speex_bits_write, it is possible to find the number of bytes
+ that need to be written by calling
+\family typewriter
+speex_bits_nbytes(&bits)
+\family default
+, which returns a number of bytes.
+
\layout Standard
-Also many design decisions were based on the original goals and assumptions:
-\layout Itemize
+After you're done with the encoding, free all resources with:
+\layout LyX-Code
-Minimizing the amount of information extracted from past frames (for robustness
- to packet loss)
-\layout Itemize
+speex_bits_destroy(&bits);
+\layout LyX-Code
-Dynamically-selectable codebooks (LSP, pitch and innovation)
-\layout Itemize
+speex_encoder_destroy(enc_state);
+\layout Standard
-sub-vector fixed (innovation) codebooks
+That's about it for the encoder.
+
\layout Subsection
-LPC Analysis
-\begin_inset LatexCommand \index{linear prediction}
-
-\end_inset
+Decoding
+\layout Standard
+In order to encode speech using Speex, you first need to:
+\layout LyX-Code
+#include <speex.h>
\layout Standard
-An LPC analysis is first performed on a (asymetric Hamming) window that
- spans all the current frame and half a frame in advance.
- The LPC coefficients are then converted to Line Spectral Pair
-\begin_inset LatexCommand \index{line spectral pair}
+You also need to declare a Speex bit-packing struct
+\layout LyX-Code
-\end_inset
+SpeexBits bits;
+\layout Standard
- (LSP), a representation that is more robust to quantization.
- The LSP's are considered to be associated to the
-\begin_inset Formula $4^{th}$
-\end_inset
+and a Speex encoder state
+\layout LyX-Code
- sub-frames and the LSP's associated to the first 3 sub-frames are linearly
- interpolated using the current and previous LSP's.
+void *dec_state;
\layout Standard
-The LSP's are encoded using 30 bits for higher quality modes and 18 bits
- for lower quality, through the use of a multi-stage split-vector quantizer.
- For the lower quality modes, the 10 coefficients are first quantized with
- 6 bits and the error is then divided in two 5-coefficient sub-vectors.
- Each of them is quantized with 6 bits, for a total of 18 bits.
- For the higher quality modes, the remaining error on both sub-vectors is
- further quantized with 6 bits each, for a total of 30 bits.
+The two are initialized by:
+\layout LyX-Code
+
+speex_bits_init(&bits);
+\layout LyX-Code
+
+dec_state = speex_decoder_init(&speex_nb_mode);
\layout Standard
-The perceptual weighting filter
-\begin_inset Formula $W(z)$
-\end_inset
-
- used by Speex is derived from the LPC filter
-\begin_inset Formula $A(z)$
-\end_inset
-
- and corresponds to the one described by eq.
-
-\begin_inset LatexCommand \ref{eq:weighting_filter}
-
-\end_inset
-
- with
-\begin_inset Formula $\gamma _{1}=0.9$
-\end_inset
-
- and
-\begin_inset Formula $\gamma _{2}=0.6$
-\end_inset
-
+For wideband decoding,
+\emph on
+speex_nb_mode
+\emph default
+ will be replaced by
+\emph on
+speex_wb_mode
+\emph default
.
- We can use the unquantized
-\begin_inset Formula $A(z)$
-\end_inset
-
- filter since the weighting filter is only used in the encoder.
-\layout Subsection
-
-Pitch Prediction (adaptive codebook)
-\begin_inset LatexCommand \index{pitch}
-
-\end_inset
-
+ If you need to obtain the size of the frames that will be used by the decoder,
+ you can get that value in the
+\emph on
+frame_size
+\emph default
+ variable with:
+\layout LyX-Code
+speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
\layout Standard
-Speex uses a 3-tap prediction for pitch.
- That is, the pitch prediction signal
-\begin_inset Formula $p[n]$
-\end_inset
-
- is obtained by the past of the excitation by:
-\begin_inset Formula \[
-p[n]=\beta _{0}e[n-T-1]+\beta _{1}e[n-T]+\beta _{2}e[n-T+1]\]
-
-\end_inset
-
+There is also a parameter that can be set for the decoder: whether or not
+ to use a perceptual post-filter.
+ This can be set by:
+\layout LyX-Code
+speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
\layout Standard
where
-\begin_inset Formula $T$
-\end_inset
-
- is the pitch period and the
-\begin_inset Formula $\beta _{i}$
-\end_inset
-
- are the prediction (filter) taps.
- It is worth noting that when the pitch is smaller than the sub-frame size,
- we repeat the excitation at a period
-\begin_inset Formula $T$
-\end_inset
-
-.
- For example, when
-\begin_inset Formula $n-T+1$
-\end_inset
-
-, we use
-\begin_inset Formula $n-2T+1$
-\end_inset
-
- instead.
- The period and quantized gains are determined in closed loop.
- In most modes, the pitch period is encoded with 7 bits in the
-\begin_inset Formula $\left[17,144\right]$
-\end_inset
+\emph on
+enh
+\emph default
+ is an int that with value 0 to have the post-filter disabled and 1 to have
+ it enabled.
+\layout Standard
- range and the
-\begin_inset Formula $\beta _{i}$
-\end_inset
+Again, once the decoder initialization is done, for every input frame:
+\layout LyX-Code
- coefficients are vector-quantized using 7 bits (15 kbps narrowband and
- above) at higher bit-rates and 5 bits at lower bit-rates (11 kbps narrowband
- and below).
-\layout Subsection
+speex_bits_read_from(&bits, input_bytes, nbBytes);
+\layout LyX-Code
-Innovation Codebook
+speex_decode(st, &bits, output_frame);
\layout Standard
-In Speex, the innovation signal is quantized using shape-only vector quantizatio
-n (VQ).
- That means that the codebooks that are used represent both the shape and
- the gain at the same time.
- This save many bits that would otherwise be allocated for a separate gain
- at the price of a slight increase in complexity.
-
-\layout Subsection
-
-Bit allocation
+where input_bytes is a
+\emph on
+(char *)
+\emph default
+ containing the bit-stream data received for a frame,
+\emph on
+nbBytes
+\emph default
+ is the size (in bytes) of that bit-stream, and
+\emph on
+output_frame
+\emph default
+ is a
+\emph on
+(float *)
+\emph default
+ and points to the area where the decoded speech frame will be written.
+ A NULL value as the first argument indicates that we don't have the bits
+ for the current frame.
+ When a frame is lost, the Speex decoder will do its best to "guess" the
+ correct signal.
\layout Standard
-There are 7 different narrowband bit-rates defined for Speex, ranging from
- 200 bps to 18.15 kbps, although the modes below 5.9 kbps should not be used
- for speech.
- The bit-allocation for each mode is detailed in table
-\begin_inset LatexCommand \ref{cap:bits-narrowband}
-
-\end_inset
+After you're done with the decoding, free all resources with:
+\layout LyX-Code
-.
- Each frame starts with the mode ID encoded with 4 bits which allows a range
- from 0 to 15, though only the first 7 values are used (the others are reserved).
- The parameters are listed in the table in the order they are packed in
- the bit-stream.
- All frame-based parameters are packed before sub-frame parameters.
- The parameters for a certain sub-frame are all packed before the following
- sub-frame is packed.
- Note that the
-\begin_inset Quotes eld
-\end_inset
+speex_bits_destroy(&bits);
+\layout LyX-Code
-OL
-\begin_inset Quotes erd
-\end_inset
+speex_decoder_destroy(dec_state);
+\layout Subsection
- in the parameter description means the the parameter is an open loop estimation
- based on the whole frame.
+Codec Options (speex_*_ctl)
\layout Standard
+The Speex encoder and decoder support many options and requests that can
+ be accessed through the
+\emph on
+speex_encoder_ctl
+\emph default
+ and
+\emph on
+speex_decoder_ctl
+\emph default
+ functions.
+ These functions are similar to the
+\emph on
+ioctl
+\emph default
+ system call and their prototypes are:
+\layout LyX-Code
-\begin_inset Float table
-placement h
-wide true
-collapsed false
+void speex_encoder_ctl(void *encoder, int request, void *ptr);
+\layout LyX-Code
+void speex_decoder_ctl(void *encoder, int request, void *ptr);
\layout Standard
+The different values of request allowed are (note that some only apply to
+ the encoder or the decoder):
+\layout Description
-\begin_inset Tabular
-<lyxtabular version="3" rows="12" columns="11">
-<features>
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
-<row topline="true" bottomline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
+SPEEX_SET_ENH** Set perceptual enhancer
+\begin_inset LatexCommand \index{perceptual enhancement}
-Parameter
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ to on (1) or off (0) (integer)
+\layout Description
-Update rate
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_GET_ENH** Get perceptual enhancer status (integer)
+\layout Description
-\layout Standard
+SPEEX_GET_FRAME_SIZE Get the frame size used for the current mode (integer)
+\layout Description
-0
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_SET_QUALITY* Set the encoder speech quality (integer 0 to 10)
+\layout Description
-\layout Standard
+SPEEX_GET_QUALITY* Get the current encoder speech quality (integer 0 to
+ 10)
+\layout Description
-1
+SPEEX_SET_MODE*
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-2
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
+\layout Description
-3
+SPEEX_GET_MODE*
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+\layout Description
-5
+SPEEX_SET_LOW_MODE*
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-6
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
+\layout Description
-7
+SPEEX_GET_LOW_MODE*
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-
-8
-\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+\layout Description
-Wideband bit
+SPEEX_SET_HIGH_MODE*
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-frame
+\layout Description
+
+SPEEX_GET_HIGH_MODE*
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+\layout Description
-\layout Standard
+SPEEX_SET_VBR* Set variable bit-rate (VBR) to on (1) or off (0) (integer)
+\layout Description
+
+SPEEX_GET_VBR* Get variable bit-rate
+\begin_inset LatexCommand \index{variable bit-rate}
-1
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ (VBR) status (integer)
+\layout Description
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_SET_VBR_QUALITY* Set the encoder VBR speech quality (float 0 to 10)
+\layout Description
-\layout Standard
+SPEEX_GET_VBR_QUALITY* Get the current encoder VBR speech quality (float
+ 0 to 10)
+\layout Description
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_SET_COMPLEXITY* Set the CPU resources allowed for the encoder (integer
+ 1 to 10)
+\layout Description
-\layout Standard
+SPEEX_GET_COMPLEXITY* Get the CPU resources allowed for the encoder (integer
+ 1 to 10)
+\layout Description
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_SET_BITRATE* Set the bit-rate to use to the closest value not exceeding
+ the parameter (integer in bps)
+\layout Description
-\layout Standard
+SPEEX_GET_BITRATE Get the current bit-rate in use (integer in bps)
+\layout Description
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+SPEEX_SET_SAMPLING_RATE Set real sampling rate (integer in Hz)
+\layout Description
-\layout Standard
+SPEEX_GET_SAMPLING_RATE Get real sampling rate (integer in Hz)
+\layout Description
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+SPEEX_RESET_STATE Reset the encoder/decoder state to its original state
+ (zeros all memories)
+\layout Description
-\layout Standard
+SPEEX_SET_VAD* Set voice activity detection
+\begin_inset LatexCommand \index{voice activity detection}
-1
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ (VAD) to on (1) or off (0) (integer)
+\layout Description
-1
-\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_GET_VAD* Get voice activity detection (VAD) status (integer)
+\layout Description
-\layout Standard
+SPEEX_SET_DTX* Set discontinuous transmission
+\begin_inset LatexCommand \index{discontinuous transmission}
-Mode ID
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ (DTX) to on (1) or off (0) (integer)
+\layout Description
-frame
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_GET_DTX* Get discontinuous transmission (DTX) status (integer)
+\layout Description
-\layout Standard
+SPEEX_SET_ABR* Set average bit-rate
+\begin_inset LatexCommand \index{average bit-rate}
-4
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ (ABR) to a value n in bits per second (integer in bps)
+\layout Description
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+SPEEX_GET_ABR* Get average bit-rate (ABR) setting (integer in bps)
+\layout Description
-\layout Standard
+* applies only to the encoder
+\layout Description
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+** applies only to the decoder
+\layout Description
-\layout Standard
-4
+\begin_inset Formula $\dagger $
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ normally only used internally
+\layout Subsection
+Mode queries
\layout Standard
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+Speex modes have a querry system similar to the speex_encoder_ctl and speex_deco
+der_ctl calls.
+ Since modes are read-only, it is only possible to get information about
+ a particular mode.
+ The function used to do that is:
+\layout LyX-Code
+void speex_mode_query(SpeexMode *mode, int request, void *ptr);
\layout Standard
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+The admissible values for request are (unless otherwise note, the values
+ are returned through
+\emph on
+ptr
+\emph default
+):
+\layout Description
-\layout Standard
+SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
+\layout Description
-4
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified throught
+
+\emph on
+ptr
+\emph default
+ (integer in bps).
+
+\layout Subsection
-\layout Standard
+Packing and in-band signalling
+\begin_inset LatexCommand \index{in-band signalling}
-4
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-LSP
+Sometimes it is desirable to pack more than one frame per packet (or other
+ basic unit of storage).
+ The proper way to do it is to call speex_encode
+\begin_inset Formula $N$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ times before writing the stream with speex_bits_write.
+ In cases where the number of frames is not determined by an out-of-band
+ mechanism, it is possible to include a terminator code.
+ That terminator consists of the code 15 (decimal) encoded with 5 bits,
+ as shown in figure
+\begin_inset LatexCommand \ref{cap:quality_vs_bps}
-frame
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+.
+
\layout Standard
-0
+It is also possible to send in-band
+\begin_inset Quotes eld
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-
-18
+messages
+\begin_inset Quotes erd
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ to the other side.
+ All these messages are encoded as a
+\begin_inset Quotes eld
+\end_inset
-18
+pseudo-frame
+\begin_inset Quotes erd
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ of mode 14 which contain a 4-bit message type code, followed by the message.
+ Table
+\begin_inset LatexCommand \ref{cap:In-band-signalling-codes}
-18
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ lists the available codes, their meaning and the size of the message that
+ follow.
+ Most of these messages are requests that are sent to the encoder or decoder
+ on the other end, which is free to comply or ignore them.
+ By default, all in-band messages are ignored.
\layout Standard
-18
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
+\begin_inset Float table
+placement htbp
+wide false
+collapsed false
\layout Standard
-30
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+
+\begin_inset Tabular
+<lyxtabular version="3" rows="17" columns="3">
+<features>
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
+<row topline="true" bottomline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-30
+code
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-30
+Size (bits)
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1258,7 +1148,7 @@
\layout Standard
-18
+Content
\end_inset
</cell>
</row>
@@ -1268,7 +1158,7 @@
\layout Standard
-OL pitch
+0
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1276,23 +1166,25 @@
\layout Standard
-frame
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+Asks decoder to set perceptual enhancement off (0) or on(1)
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-7
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1300,23 +1192,25 @@
\layout Standard
-7
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+reserved
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+2
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1324,15 +1218,7 @@
\layout Standard
-0
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-0
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1340,41 +1226,43 @@
\layout Standard
-0
+Asks encoder to switch to mode N
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-7
+3
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-OL pitch gain
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-frame
+Asks encoder to switch to mode N for low-band
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1385,20 +1273,22 @@
4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+Asks encoder to switch to mode N for high-band
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+5
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1406,31 +1296,33 @@
\layout Standard
-0
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+Asks encoder to switch to quality N for VBR
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+6
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1438,7 +1330,7 @@
\layout Standard
-4
+Request acknowloedge (0=no, 1=all, 2=only for in-band data)
\end_inset
</cell>
</row>
@@ -1448,7 +1340,7 @@
\layout Standard
-OL Exc gain
+7
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1456,23 +1348,25 @@
\layout Standard
-frame
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+Asks encoder to set VBR off (0), on(1), VAD(2), DTX(3)
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-5
+8
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1480,23 +1374,25 @@
\layout Standard
-5
+8
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-5
+Transmit (8-bit) character to the other end
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-5
+9
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1504,7 +1400,7 @@
\layout Standard
-5
+8
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1512,15 +1408,25 @@
\layout Standard
-5
+Intensity stereo information
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-5
+10
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+16
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1528,7 +1434,7 @@
\layout Standard
-5
+Announce maximum bit-rate acceptable (N in bytes/second)
\end_inset
</cell>
</row>
@@ -1538,7 +1444,7 @@
\layout Standard
-Fine pitch
+11
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1546,23 +1452,25 @@
\layout Standard
-sub-frame
+16
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+reserved
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+12
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1570,23 +1478,25 @@
\layout Standard
-0
+32
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-7
+Acknowledge receiving packet N
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-7
+13
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1594,7 +1504,7 @@
\layout Standard
-7
+32
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1602,41 +1512,43 @@
\layout Standard
-7
+reserved
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-7
+14
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+64
\end_inset
</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-Pitch gain
+reserved
\end_inset
</cell>
+</row>
+<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-sub-frame
+15
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1644,129 +1556,240 @@
\layout Standard
-0
+64
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+reserved
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+</row>
+</lyxtabular>
+
+\end_inset
+
+
+\layout Caption
+
+In-band signalling codes
+\begin_inset LatexCommand \label{cap:In-band-signalling-codes}
+
+\end_inset
+
+
+\end_inset
+
\layout Standard
-5
+Finally, applications may define custom in-band messages using mode 13.
+ The size of the message in bytes is encoded with 5 bits, so that the decoder
+ can skip it if it doesn't know how to interpret it.
+\layout Section
+\pagebreak_top
+Formats and standards
+\begin_inset LatexCommand \index{standards}
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-5
+Speex can encode speech in both narrowband and wideband and provides different
+ bit-rates.
+ However not all features must be supported by a certain implementation
+ or device.
+ In order to be said
+\begin_inset Quotes eld
+\end_inset
+
+Speex compatible
+\begin_inset Quotes erd
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ (whatever that means), an implementation must implement at least a basic
+ set of features.
\layout Standard
-5
+At the minimum, all narrowband modes of operation MUST be supported at the
+ decoder.
+ This includes the decoding of a wideband bit-stream by the narrowband decoder
+\begin_inset Foot
+collapsed true
+
+\layout Standard
+
+The wideband bit-stream contains an embedded narrowband bit-stream which
+ can be decoded alone
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+.
+ If present, a wideband decoder MUST be able to decode a narrowband stream,
+ and MAY either be able to decode all wideband modes or be able to decode
+ the embedded narrowband part of all modes (which includes ignoring the
+ high-band bits).
\layout Standard
-7
+For encoders, at least one narrowband or wideband mode MUST be supported.
+ The main reason why all encoding modes do not have to be supported is that
+ some platforms may not be able to handle the complexity of encoding in
+ some modes.
+\layout Subsection
+
+RTP
+\begin_inset LatexCommand \index{RTP}
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+ Payload Format
\layout Standard
-7
+The latest RTP payload draft can be found at
+\begin_inset LatexCommand \url{http://www.speex.org/drafts/latest}
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+.
+ We are (2003/01/14) about to send the latest draft to the IETF for comments.
+
+\layout Subsection
+
+MIME Type
\layout Standard
-7
+Speex will use the MIME type
+\family typewriter
+audio/speex
+\family default
+.
+ We will apply for that type in the near future.
+\layout Subsection
+
+Ogg
+\begin_inset LatexCommand \index{Ogg}
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+ file format
\layout Standard
-0
+Speex bit-streams can be stored in Ogg files.
+ In this case, the first packet of the Ogg file contains the Speex header
+ described in table
+\begin_inset LatexCommand \ref{cap:ogg_speex_header}
+
+\end_inset
+
+.
+ All integer fields in the headers are stored as little-endian.
+ The
+\family typewriter
+speex_string
+\family default
+ field must contain the
+\begin_inset Quotes eld
+\end_inset
+
+
+\family typewriter
+Speex
+\family default
+\SpecialChar ~
+\SpecialChar ~
+\SpecialChar ~
+
+\begin_inset Quotes eld
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ (with 3 training spaces), which identifies the bit-stream.
+ The next field,
+\family typewriter
+speex_version
+\family default
+ contains the version of Speex that encoded the file.
+ For now, refer to speex_header.[ch] for more info.
+ The
+\emph on
+beginning of stream
+\emph default
+ (
+\family typewriter
+b_o_s
+\family default
+) flag is set to 1 for the header.
+ The header packet has
+\family typewriter
+packetno=0
+\family default
+ and
+\family typewriter
+granulepos=0
+\family default
+.
\layout Standard
-Innovation gain
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
+The second packet contains the Speex comment header.
+ The format used is the Vorbis comment format described here: http://www.xiph.org/
+ogg/vorbis/doc/v-comment.html .
+ This packet has
+\family typewriter
+packetno=1
+\family default
+ and
+\family typewriter
+granulepos=0
+\family default
+.
\layout Standard
-sub-frame
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
+The third and subsequant packets each contain one or more (number found
+ in header) Speex frames.
+ These are identified with
+\family typewriter
+packetno
+\family default
+ starting from 2 and the
+\family typewriter
+granulepos
+\family default
+ is the number of the last sample encoded in that packet.
+ Le last of these packets has the
+\emph on
+end of stream
+\emph default
+ (
+\family typewriter
+e_o_s
+\family default
+) flag is set to 1.
\layout Standard
-0
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+\begin_inset Float table
+placement htbp
+wide true
+collapsed false
\layout Standard
-0
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-1
-\end_inset
-</cell>
+\begin_inset Tabular
+<lyxtabular version="3" rows="16" columns="3">
+<features>
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
+<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-1
+Field
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1774,23 +1797,7 @@
\layout Standard
-3
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-3
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-3
+Type
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1798,17 +1805,17 @@
\layout Standard
-0
+Size
\end_inset
</cell>
</row>
-<row topline="true" bottomline="true">
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Innovation VQ
+speex_string
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1816,23 +1823,25 @@
\layout Standard
-sub-frame
+char[]
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+8
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+speex_version
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1840,10 +1849,10 @@
\layout Standard
-16
+char[]
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
@@ -1851,12 +1860,14 @@
20
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-35
+speex_version_id
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1864,23 +1875,7 @@
\layout Standard
-48
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-64
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-96
+int
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1888,17 +1883,17 @@
\layout Standard
-10
+4
\end_inset
</cell>
</row>
-<row topline="true" bottomline="true">
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Total
+header_size
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1906,23 +1901,25 @@
\layout Standard
-frame
+int
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-5
+4
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-43
+rate
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1930,23 +1927,25 @@
\layout Standard
-119
+int
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-160
+4
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-220
+mode
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -1954,23 +1953,7 @@
\layout Standard
-300
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-364
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-
-492
+int
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -1978,78 +1961,17 @@
\layout Standard
-79
+4
\end_inset
</cell>
-</row>
-</lyxtabular>
-
-\end_inset
-
-
-\layout Caption
-
-Bit allocation for narrowband modes
-\begin_inset LatexCommand \label{cap:bits-narrowband}
-
-\end_inset
-
-
-\end_inset
-
-
-\layout Standard
-
-So far, no MOS (Mean Opinion Score
-\begin_inset LatexCommand \index{mean opinion score}
-
-\end_inset
-
-) subjective evaluation has been performed for Speex.
- In order to give an idea of the quality achivable with it, table
-\begin_inset LatexCommand \ref{cap:quality_vs_bps}
-
-\end_inset
-
- presents my own subjective opinion on it.
- It sould be noted that different people will perceive the quality differently
- and that the person that designed the codec often has a bias (one way or
- another) when it comes to subjective evaluation.
- Last thing, it should be noted that for most codecs (including Speex) encoding
- quality sometimes varies depending on the input.
- Note that the complexity is only approximate (withing 0.5 mflops and using
- the lowers complexity setting).
- Decoding requires approximately 0.5 mflops
-\begin_inset LatexCommand \index{complexity}
-
-\end_inset
-
- in most modes (1 mflops with perceptual enhancement).
-\layout Standard
-
-
-\begin_inset Float table
-placement h
-wide true
-collapsed false
-
-\layout Standard
-
-
-\begin_inset Tabular
-<lyxtabular version="3" rows="17" columns="4">
-<features>
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
-<row topline="true" bottomline="true">
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Mode
+mode_bitstream_version
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2057,25 +1979,33 @@
\layout Standard
-Bit-rate
-\begin_inset LatexCommand \index{bit-rate}
-
+int
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
- (bps)
+\layout Standard
+
+4
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-mflops
-\begin_inset LatexCommand \index{complexity}
-
+nb_channels
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
+int
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -2083,7 +2013,7 @@
\layout Standard
-Quality/description
+4
\end_inset
</cell>
</row>
@@ -2093,7 +2023,7 @@
\layout Standard
-0
+bitrate
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2101,75 +2031,77 @@
\layout Standard
-250
+int
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-N/A
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-No sound (VBR only)
+frame_size
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-1
+int
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-2,150
+4
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-6
+vbr
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Vocoder (mostly for comfort noise)
+int
\end_inset
</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-2
+4
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-5,950
+frames_per_packet
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2177,7 +2109,7 @@
\layout Standard
-9
+int
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -2185,7 +2117,7 @@
\layout Standard
-Very noticeable artifacts/noise, good intelligibility
+4
\end_inset
</cell>
</row>
@@ -2195,7 +2127,7 @@
\layout Standard
-3
+extra_headers
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2203,41 +2135,51 @@
\layout Standard
-8,000
+int
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-10
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Artifacts/noise sometimes noticeable
+reserved1
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
+int
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
4
\end_inset
</cell>
+</row>
+<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-11,000
+reserved2
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2245,7 +2187,7 @@
\layout Standard
-14
+int
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -2253,713 +2195,762 @@
\layout Standard
-Artifacts usually noticeable only with headphones
+4
\end_inset
</cell>
</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+</lyxtabular>
+
+\end_inset
+
+
+\layout Caption
+
+Ogg/Speex header packet
+\begin_inset LatexCommand \label{cap:ogg_speex_header}
+
+\end_inset
+
+
+\end_inset
+
+
+\layout Section
+\pagebreak_top
+Introduction to CELP Coding
+\begin_inset LatexCommand \index{CELP}
+
+\end_inset
+
\layout Standard
-5
+The three following sections describe the internals of the codec and require
+ some signal processing knowledge.
+ If you are only interested in using Speex, they are not required.
+\layout Standard
+
+Speex is based on CELP, which stands for Code Excited Linear Prediction.
+ This section attempts to introduce the principles behind CELP, so if you
+ are already familiar with CELP, you can safely skip to section
+\begin_inset LatexCommand \ref{sec:Speex-narrowband-mode}
+
+\end_inset
+
+.
+ The CELP technique is based on three ideas:
+\layout Enumerate
+
+The use of a linear prediction (LP) model to model the vocal tract
+\layout Enumerate
+
+The use of (adaptive and fixed) codebook entries as input (excitation) of
+ the LP model
+\layout Enumerate
+
+The search performed in closed-loop in a
+\begin_inset Quotes eld
+\end_inset
+
+perceptually weighted domain
+\begin_inset Quotes erd
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-15,000
+This section describes the basic ideas behind CELP.
+ Note that it's still incomplete.
+\layout Subsection
+
+Linear Prediction (LPC)
+\begin_inset LatexCommand \index{linear prediction}
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-11
+Linear prediction is at the base of may speech coding techniques, including
+ CELP.
+ The idea behind it is to predict the signal
+\begin_inset Formula $x(n)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+ using a linear combination of its past samples:
\layout Standard
-Need good headphones to tell the difference
+
+\begin_inset Formula \[
+y[n]=\sum _{i=1}^{N}a_{i}x[n-i]\]
+
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
+where
+\begin_inset Formula $y[n]$
+\end_inset
+
+ is the linear prediction of
+\begin_inset Formula $x[n]$
+\end_inset
+
+.
+ The prediction error is thus given by:
+\begin_inset Formula \[
+e[n]=x[n]-y[n]=x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\]
+
+\end_inset
+
\layout Standard
-6
+The goal of the LPC analysis is to find the best prediction coefficients
+
+\begin_inset Formula $a_{i}$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
+ which minimize the quadratic error function:
+\begin_inset Formula \[
+E=\sum _{n=0}^{L-1}\left[e[n]\right]^{2}=\sum _{n=0}^{L-1}\left[x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\right]^{2}\]
+
+\end_inset
+
+That can be done by making all derivatives
+\begin_inset Formula $\frac{\partial E}{\partial a_{i}}$
+\end_inset
+
+ equal to zero:
+\begin_inset Formula \[
+\frac{\partial E}{\partial a_{i}}=\frac{\partial }{\partial a_{i}}\sum _{n=0}^{L-1}\left[x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\]
+
+\end_inset
+
\layout Standard
-18,200
+The
+\begin_inset Formula $a_{i}$
+\end_inset
+
+ filter coefficients are computed using the Levinson-Durbin
+\begin_inset LatexCommand \index{Levinson-Durbin}
+
+\end_inset
+
+ algorithm, which starts from the auto-correlation
+\begin_inset LatexCommand \index{auto-correlation}
+
+\end_inset
+
+
+\begin_inset Formula $R(m)$
+\end_inset
+
+ of the signal
+\begin_inset Formula $x[n]$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+.
\layout Standard
-17.5
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+\begin_inset Formula \[
+R(m)=\sum _{i=0}^{N-1}x[i]x[i-m]\]
-Hard to tell the difference even with good headphones
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-7
+For an order
+\begin_inset Formula $N$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ filter, we have:
+\begin_inset Formula \[
+\mathbf{R}=\left[\begin{array}{cccc}
+ R(0) & R(1) & \cdots & R(N-1)\\
+ R(1) & R(0) & \cdots & R(N-2)\\
+ \vdots & \vdots & \ddots & \vdots \\
+ R(N-1) & R(N-2) & \cdots & R(0)\end{array}
+\right]\]
-24,600
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-14.5
+\begin_inset Formula \[
+\mathbf{r}=\left[\begin{array}{c}
+ R(1)\\
+ R(2)\\
+ \vdots \\
+ R(N)\end{array}
+\right]\]
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-Completely transparent for voice, good quality music
+The filter coefficients
+\begin_inset Formula $a_{i}$
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-8
+ are found by solving the system
+\begin_inset Formula $\mathbf{Ra}=\mathbf{r}$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+.
+ What the Levinson-Durbin algorithm does here is making the solution to
+ the problem
+\begin_inset Formula $\mathcal{O}\left(N^{2}\right)$
+\end_inset
-3,950
+ instead of
+\begin_inset Formula $\mathcal{O}\left(N^{3}\right)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ by exploiting the fact that matrix
+\begin_inset Formula $\mathbf{R}$
+\end_inset
--
+ is toeplitz hermitian.
+ Also, it can be proved that all the roots of
+\begin_inset Formula $A(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ are within the unit circle, which means that
+\begin_inset Formula $1/A(z)$
+\end_inset
-Very noticeable artifacts/noise, good intelligibility
+ is always stable.
+ This is in theory; in practice because of finite precision, there are two
+ commonly used techniques to make sure we have a stable filter.
+ First, we multiply
+\begin_inset Formula $R(0)$
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ by a number slightly above one (such as 1.0001), which is equivalent to
+ adding noise to the signal.
+ Also, we can apply a window to the auto-correlation, which is equivalent
+ to filtering in the frequency domain, reducing sharp resonances.
\layout Standard
-9
+The linear prediction model represents each speech sample as linear combination
+ of past samples, plus an error signal called the excitation (or residual).
+\begin_inset Formula \[
+x[n]=\sum _{i=1}^{N}a_{i}x[n-i]+e[n]\]
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-N/A
+In the
+\emph on
+z
+\emph default
+-domain, this can be expressed as
+\layout Standard
+
+
+\begin_inset Formula \[
+x(z)=\frac{1}{A(z)}\: e(z)\]
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-N/A
+where
+\begin_inset Formula $A(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+ is defined as
\layout Standard
-reserved
+
+\begin_inset Formula \[
+A(z)=1-\sum _{i=1}^{N}a_{i}z^{-i}\]
+
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-10
+We usually refer to
+\begin_inset Formula $A(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ as the analysis filter and
+\begin_inset Formula $1/A(z)$
+\end_inset
-N/A
+ as the synthesis filter.
+ The whole process is called short-term prediction as it predicts the signal
+
+\begin_inset Formula $x[n]$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ using a prediction using only the
+\begin_inset Formula $N$
+\end_inset
-N/A
+ past samples, where
+\begin_inset Formula $N$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+ is usually around 10.
\layout Standard
-reserved
+Because LPC coefficients have very little robustness to quantization, they
+ are converted to Line Spectral Pair
+\begin_inset LatexCommand \index{line spectral pair}
+
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ (LSP) coefficients which have a much better behaviour with quantization,
+ one of them being that it's easy to keep the filter stable.
+
+\layout Subsection
+
+Pitch Prediction
+\begin_inset LatexCommand \index{pitch}
-11
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-N/A
+During voiced segments, the speech signal is periodic, so it is possible
+ to take advantage of that property by approximating the excitation signal
+
+\begin_inset Formula $e[n]$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ by a gain times the past of the excitation:
\layout Standard
-N/A
+
+\begin_inset Formula \[
+e[n]\simeq p[n]=\beta e[n-T]\]
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-reserved
+where
+\begin_inset Formula $T$
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ is the pitch period,
+\begin_inset Formula $\beta $
+\end_inset
-12
+ is the pitch gain and
+\begin_inset Formula $c(n)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ is taken from the
+\emph on
+innovation codebook
+\emph default
+.
+ We call that long-term prediction since the excitation is predicted from
+
+\begin_inset Formula $e[n-T]$
+\end_inset
-N/A
+ with
+\begin_inset Formula $T\gg N$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+.
+\layout Subsection
+
+Innovation Codebook
\layout Standard
-N/A
+The final excitation
+\begin_inset Formula $e[n]$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-reserved
+ will be the sum of the pitch prediction and an
+\emph on
+innovation
+\emph default
+ signal
+\begin_inset Formula $c[n]$
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ taken from a fixed codebook.
\layout Standard
-13
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+\begin_inset Formula \[
+e[n]=p[n]+c[n]=\beta e[n-T]+c[n]\]
-N/A
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-N/A
+This is where most of the bits in a CELP codec are allocated.
+ It represents the information that couldn't be obtained either from linear
+ prediction or pitch prediction.
+ In the
+\emph on
+z
+\emph default
+-domain we can represent the final signal
+\begin_inset Formula $X(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ as
+\begin_inset Formula \[
+X(z)=\frac{C(z)}{A(z)\left(1-\beta z^{-T}\right)}\]
-Application-defined, interpreted by callback or skipped
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-14
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+\layout Subsection
-\layout Standard
+Analysis-by-Synthesis and Error Weighting
+\begin_inset LatexCommand \index{error weighting}
-N/A
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
-N/A
+\begin_inset LatexCommand \index{analysis-by-synthesis}
+
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-Speex in-band signaling
+Most (if not all) modern audio codecs attempt to
+\begin_inset Quotes eld
\end_inset
-</cell>
-</row>
-<row topline="true" bottomline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-15
+shape
+\begin_inset Quotes erd
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ the noise so that it appears mostly in the frequency regions where the
+ ear cannot detect it.
+ For example, the ear is more tolerant to noise in parts of the spectrum
+ that are louder and
+\emph on
+vice versa
+\emph default
+.
+ That's why instead of minimizing the simple quadratic error
+\begin_inset Formula \[
+E=\sum _{n}\left(x[n]-\overline{x}[n]\right)^{2}\]
-N/A
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-N/A
+where
+\begin_inset Formula $\overline{x}[n]$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ is the encoder signal, we minimize the error for the perceptually weighted
+ signal
+\begin_inset Formula \[
+X_{w}(z)=W(z)X(z)\]
-Terminator code
\end_inset
-</cell>
-</row>
-</lyxtabular>
+where
+\begin_inset Formula $W(z)$
\end_inset
+ is the weighting filter, usually of the form
+\layout Standard
-\layout Caption
-Quality versus bit-rate
-\begin_inset LatexCommand \label{cap:quality_vs_bps}
+\begin_inset Formula \begin{equation}
+W(z)=\frac{A\left(\frac{z}{\gamma _{1}}\right)}{A\left(\frac{z}{\gamma _{2}}\right)}\label{eq:weighting_filter}\end{equation}
\end_inset
-\end_inset
-
+\layout Standard
-\layout Subsection
+with control parameters
+\begin_inset Formula $\gamma _{1}>\gamma _{2}$
+\end_inset
-Perceptual enhancement
-\begin_inset LatexCommand \index{perceptual enhancement}
+.
+ If the noise is white in the perceptually weighted domain, then in the
+ signal domain its spectral shape will be of the form
+\begin_inset Formula \[
+A_{noise}(z)=\frac{1}{W(z)}=\frac{A\left(\frac{z}{\gamma _{2}}\right)}{A\left(\frac{z}{\gamma _{1}}\right)}\]
\end_inset
\layout Standard
-This part of the codec only applies to the decoder and can even be changed
- without affecting inter-operability.
- For that reason, the implementation provided and described here should
- only be considered as a reference implementation.
- The enhancement system is devided in two parts.
- First, the synthesis filter
-\begin_inset Formula $S(z)=1/A(z)$
+If a filter
+\begin_inset Formula $A(z)$
\end_inset
- is replaced by an enhanced filter
-\begin_inset Formula \[
-S'(z)=\frac{A\left(z/a_{2}\right)A\left(z/a_{3}\right)}{A\left(z\right)A\left(z/a_{1}\right)}\]
-
+ has (complex) poles at
+\begin_inset Formula $p_{i}$
\end_inset
-where
-\begin_inset Formula $a_{1}$
+ in the
+\begin_inset Formula $z$
\end_inset
- and
-\begin_inset Formula $a_{2}$
+-plane, the filter
+\begin_inset Formula $A(z/\gamma )$
\end_inset
- depend on the mode in use and
-\begin_inset Formula $a_{3}=\frac{1}{r}\left(1-\frac{1-ra_{1}}{1-ra_{2}}\right)$
+ filter will have its poles at
+\begin_inset Formula $p_{i}^{'}=\gamma p_{i}$
\end_inset
- with
-\begin_inset Formula $r=.9$
+, making it a flatter version of
+\begin_inset Formula $A(z)$
\end_inset
.
- The second part of the enhancement consists of using a comb filter to enhance
- the pitch in the excitation domain.
-
\layout Section
\pagebreak_top
-Speex wideband mode (sub-band CELP)
-\begin_inset LatexCommand \index{wideband}
+Speex narrowband mode
+\begin_inset LatexCommand \label{sec:Speex-narrowband-mode}
\end_inset
-\layout Standard
-
-For wideband, the Speex approach uses a
-\emph on
-q
-\emph default
-uadrature
-\emph on
-m
-\emph default
-irror
-\emph on
-f
-\emph default
-ilter
-\begin_inset LatexCommand \index{quadrature mirror filter}
+\begin_inset LatexCommand \index{narrowband}
\end_inset
- (QMF) to split the band in two.
- The 16 kHz signal is thus divided into two 8 kHz signals, one representing
- the low band (0-4 kHz), the other the high band (4-8 kHz).
- The low band is encoded with the narrowband mode described in section
-\begin_inset LatexCommand \ref{sec:Speex-narrowband-mode}
-\end_inset
+\layout Standard
- in such a way that the resulting
-\begin_inset Quotes eld
+This section looks at how Speex works for narrowband (
+\begin_inset Formula $8\: \mathrm{kHz}$
\end_inset
-embedded narrowband bit-stream
-\begin_inset Quotes erd
+ sampling rate) operation.
+ The frame size for this mode is
+\begin_inset Formula $20\: \mathrm{ms}$
\end_inset
- can also be decoded with the narrowband decoder.
- Since the low band encoding has already been described only the high band
- encoding is described in this section.
-\layout Subsection
-
-Linear Prediction
+, corresponding to 160 samples.
+ Each frame is also subdivided into 4 sub-frames of 40 samples each.
\layout Standard
-The linear prediction part used for the high-band is very similar to what
- is done for narrowband.
- The only difference is that we use only 12 bits to encode the high-band
- LSP's using a multi-stage vector quantizer (MSVQ).
- The first level quantizes the 10 coefficients with 6 bits and the error
- is then quantized using 6 bits too.
-\layout Subsection
-
-Pitch Prediction
-\layout Standard
+Also many design decisions were based on the original goals and assumptions:
+\layout Itemize
-That part is easy: there's no pitch prediction for the high-band.
- There are two reasons for that.
- First, there is usually little harmonic structure in this band (above 4
- kHz).
- Second, it would be very hard to implement since the QMF folds the 4-8
- kHz band into 4-0 kHz (reversing the frequency axis), which means that
- the location of the harmonics are no longer at multiples of the fundamental
- (pitch).
-\layout Subsection
+Minimizing the amount of information extracted from past frames (for robustness
+ to packet loss)
+\layout Itemize
-Excitation Quantization
-\layout Standard
+Dynamically-selectable codebooks (LSP, pitch and innovation)
+\layout Itemize
-The high-band excitation is coded in the same way as for narrowband.
-
+sub-vector fixed (innovation) codebooks
\layout Subsection
-Bit allocation
-\layout Standard
-
-For the wideband mode, all the narrowband frame is packed before the high-band
- is encoded.
- The narrowband part of the bit-stream is as defined in table
-\begin_inset LatexCommand \ref{cap:bits-narrowband}
+LPC Analysis
+\begin_inset LatexCommand \index{linear prediction}
\end_inset
-.
- The high-band follows, as described in table
-\begin_inset LatexCommand \ref{cap:bits-wideband}
-
-\end_inset
-.
- This also means that a wideband frame may be correctly decoded by a narrowband
- decoder with the only caveat that if more than one frame is packed in the
- same packet, the decoder will need to skip the high-band parts in order
- to sync with the bit-stream.
\layout Standard
+An LPC analysis is first performed on a (asymetric Hamming) window that
+ spans all the current frame and half a frame in advance.
+ The LPC coefficients are then converted to Line Spectral Pair
+\begin_inset LatexCommand \index{line spectral pair}
-\begin_inset Float table
-placement h
-wide true
-collapsed false
-
-\layout Standard
+\end_inset
+ (LSP), a representation that is more robust to quantization.
+ The LSP's are considered to be associated to the
+\begin_inset Formula $4^{th}$
+\end_inset
-\begin_inset Tabular
-<lyxtabular version="3" rows="7" columns="7">
-<features>
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" width="0pt">
-<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
-<row topline="true" bottomline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ sub-frames and the LSP's associated to the first 3 sub-frames are linearly
+ interpolated using the current and previous LSP's.
+\layout Standard
+The LSP's are encoded using 30 bits for higher quality modes and 18 bits
+ for lower quality, through the use of a multi-stage split-vector quantizer.
+ For the lower quality modes, the 10 coefficients are first quantized with
+ 6 bits and the error is then divided in two 5-coefficient sub-vectors.
+ Each of them is quantized with 6 bits, for a total of 18 bits.
+ For the higher quality modes, the remaining error on both sub-vectors is
+ further quantized with 6 bits each, for a total of 30 bits.
\layout Standard
-Parameter
+The perceptual weighting filter
+\begin_inset Formula $W(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-Update rate
+ used by Speex is derived from the LPC filter
+\begin_inset Formula $A(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ and corresponds to the one described by eq.
+
+\begin_inset LatexCommand \ref{eq:weighting_filter}
-0
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-
-\layout Standard
-1
+ with
+\begin_inset Formula $\gamma _{1}=0.9$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ and
+\begin_inset Formula $\gamma _{2}=0.6$
+\end_inset
-2
+.
+ We can use the unquantized
+\begin_inset Formula $A(z)$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ filter since the weighting filter is only used in the encoder.
+\layout Subsection
+
+Pitch Prediction (adaptive codebook)
+\begin_inset LatexCommand \index{pitch}
-3
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-4
+Speex uses a 3-tap prediction for pitch.
+ That is, the pitch prediction signal
+\begin_inset Formula $p[n]$
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ is obtained by the past of the excitation by:
+\begin_inset Formula \[
+p[n]=\beta _{0}e[n-T-1]+\beta _{1}e[n-T]+\beta _{2}e[n-T+1]\]
-Wideband bit
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
\layout Standard
-frame
+where
+\begin_inset Formula $T$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ is the pitch period and the
+\begin_inset Formula $\beta _{i}$
+\end_inset
-1
+ are the prediction (filter) taps.
+ It is worth noting that when the pitch is smaller than the sub-frame size,
+ we repeat the excitation at a period
+\begin_inset Formula $T$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+.
+ For example, when
+\begin_inset Formula $n-T+1$
+\end_inset
-1
+, we use
+\begin_inset Formula $n-2T+1$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+ instead.
+ The period and quantized gains are determined in closed loop.
+ In most modes, the pitch period is encoded with 7 bits in the
+\begin_inset Formula $\left[17,144\right]$
+\end_inset
-1
+ range and the
+\begin_inset Formula $\beta _{i}$
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ coefficients are vector-quantized using 7 bits (15 kbps narrowband and
+ above) at higher bit-rates and 5 bits at lower bit-rates (11 kbps narrowband
+ and below).
+\layout Subsection
+
+Innovation Codebook
\layout Standard
-1
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+In Speex, the innovation signal is quantized using shape-only vector quantizatio
+n (VQ).
+ That means that the codebooks that are used represent both the shape and
+ the gain at the same time.
+ This save many bits that would otherwise be allocated for a separate gain
+ at the price of a slight increase in complexity.
+
+\layout Subsection
+Bit allocation
\layout Standard
-1
+There are 7 different narrowband bit-rates defined for Speex, ranging from
+ 200 bps to 18.15 kbps, although the modes below 5.9 kbps should not be used
+ for speech.
+ The bit-allocation for each mode is detailed in table
+\begin_inset LatexCommand \ref{cap:bits-narrowband}
+
\end_inset
-</cell>
-</row>
-<row topline="true">
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
-\layout Standard
+.
+ Each frame starts with the mode ID encoded with 4 bits which allows a range
+ from 0 to 15, though only the first 7 values are used (the others are reserved).
+ The parameters are listed in the table in the order they are packed in
+ the bit-stream.
+ All frame-based parameters are packed before sub-frame parameters.
+ The parameters for a certain sub-frame are all packed before the following
+ sub-frame is packed.
+ Note that the
+\begin_inset Quotes eld
+\end_inset
-Mode ID
+OL
+\begin_inset Quotes erd
\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+ in the parameter description means that the parameter is an open loop estimatio
+n based on the whole frame.
\layout Standard
-frame
-\end_inset
-</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
-\begin_inset Text
+
+\begin_inset Float table
+placement h
+wide true
+collapsed false
\layout Standard
-3
-\end_inset
-</cell>
+
+\begin_inset Tabular
+<lyxtabular version="3" rows="12" columns="11">
+<features>
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
+<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-3
+Parameter
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2967,7 +2958,7 @@
\layout Standard
-3
+Update rate
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2975,7 +2966,7 @@
\layout Standard
-3
+0
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -2983,17 +2974,15 @@
\layout Standard
-3
+1
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-LSP
+2
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3001,7 +2990,7 @@
\layout Standard
-frame
+3
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3009,7 +2998,7 @@
\layout Standard
-0
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3017,31 +3006,31 @@
\layout Standard
-12
+5
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-12
+6
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-12
+7
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-12
+8
\end_inset
</cell>
</row>
@@ -3051,7 +3040,7 @@
\layout Standard
-Excitation gain
+Wideband bit
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3059,7 +3048,7 @@
\layout Standard
-sub-frame
+frame
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3067,7 +3056,7 @@
\layout Standard
-0
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3075,7 +3064,7 @@
\layout Standard
-5
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3083,7 +3072,7 @@
\layout Standard
-4
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3091,7 +3080,7 @@
\layout Standard
-4
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3099,49 +3088,49 @@
\layout Standard
-4
+1
\end_inset
</cell>
-</row>
-<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Excitation VQ
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-sub-frame
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-0
+1
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-20
+Mode ID
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3149,7 +3138,7 @@
\layout Standard
-40
+frame
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3157,17 +3146,15 @@
\layout Standard
-80
+4
\end_inset
</cell>
-</row>
-<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Total
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3175,7 +3162,7 @@
\layout Standard
-frame
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3191,7 +3178,7 @@
\layout Standard
-36
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -3199,1013 +3186,904 @@
\layout Standard
-112
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-192
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
\begin_inset Text
\layout Standard
-352
+4
\end_inset
</cell>
-</row>
-</lyxtabular>
-
-\end_inset
-
-
-\layout Caption
-
-Bit allocation for high-band in wideband mode
-\begin_inset LatexCommand \label{cap:bits-wideband}
-
-\end_inset
-
-
-\end_inset
-
-
-\layout Standard
-
-
-\begin_inset ERT
-status Open
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
\layout Standard
-\backslash
-clearpage
+4
\end_inset
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-
-\layout Section
-\pagebreak_top
-Feature description
\layout Standard
-This section explains the main Speex features, as well as some concepts
- in speech coding that help better understand the next sections.
-
-\layout Subsection*
-
-Sampling rate
-\begin_inset LatexCommand \index{sampling rate}
-
+LSP
\end_inset
-
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-Speex is mainly designed for 3 different sampling rates: 8 kHz, 16 kHz,
- and 32 kHz.
- These are respectively refered to as narrowband
-\begin_inset LatexCommand \index{narrowband}
-
-\end_inset
-
-, wideband
-\begin_inset LatexCommand \index{wideband}
-
-\end_inset
-
- and ultra-wideband
-\begin_inset LatexCommand \index{ultra-wideband}
-
-\end_inset
-
-.
-
-\layout Subsection*
-
-Quality
-\begin_inset LatexCommand \index{quality}
-
+frame
\end_inset
-
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-Speex encoding is controlled most of the time by a quality parameter that
- range from 0 to 10.
- In constant bit-rate
-\begin_inset LatexCommand \index{constant bit-rate}
-
-\end_inset
-
- (CBR) operation, the quality parameter is an integer, while for variable
- bit-rate (VBR), the parameter is a float.
-
-\layout Subsection*
-
-Complexity
-\begin_inset LatexCommand \index{complexity}
-
+0
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (variable)
\layout Standard
-With Speex, it is possible to vary the complexity allowed for the encoder.
- This is done by controlling how the search is performed with an integer
- ranging from 1 to 10 in a way that's similar to the -1 to -9 options to
-
-\emph on
-gzip
-\emph default
- and
-\emph on
-bzip2
-\emph default
- compression utilities.
- For normal use, the noise level at complexity 1is between 1 and 2 dB higher
- than at complexity 10, but the CPU requirements for complexity 10 is about
- 5 time higher than for complexity 1.
- In practice, the best trade-off is between complexity 2 and 4, though higher
- settings are often useful when encoding non-speech sounds like DTMF
-\begin_inset LatexCommand \index{DTMF}
-
-\end_inset
-
- tones.
-\layout Subsection*
-
-Variable Bit-Rate
-\begin_inset LatexCommand \index{variable bit-rate}
-
+18
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (VBR)
\layout Standard
-Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically
- to adapt to the
-\begin_inset Quotes eld
-\end_inset
-
-difficulty
-\begin_inset Quotes erd
-\end_inset
-
- of the audio being encoded.
- In the example of Speex, sounds like vowels and high-energy transients
- require a higher bit-rate to achieve good quality, while fricatives (e.g.
- s,f sounds) can be coded adequately with less bits.
- For this reason, VBR can achive lower bit-rate for the same quality, or
- a better quality for a certain bit-rate.
- Despite its advantages, VBR has two main drawbacks: first, by only specifying
- quality, there's no guaranty about the final average bit-rate.
- Second, for some real-time applications like voice over IP (VoIP), what
- counts is the maximum bit-rate, which must be low enough for the communication
- channel.
-\layout Subsection*
-
-Average Bit-Rate
-\begin_inset LatexCommand \index{average bit-rate}
-
+18
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (ABR)
-\layout Standard
-
-Average bit-rate solves one of the problems of VBR, as it dynamically adjusts
- VBR quality in order to meet a specific target bit-rate.
- Because the quality/bit-rate is adjusted in real-time (open-loop), the
- global quality will be slightly lower than that obtained be encoding in
- VBR with exactly the right quality setting to meet the target average bit-rate.
-\layout Subsection*
-
-Voice Activity Detection
-\begin_inset LatexCommand \index{voice activity detection}
-
+\layout Standard
+
+18
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (VAD)
\layout Standard
-When enabled, voice activity detection detects whether the audio being encoded
- is speech or silence/background noise.
- VAD is always implicitly activated when encoding in VBR, so the option
- is only useful in non-VBR operation.
- In this case, Speex detects non-speech periods and encode them with just
- enough bits to reproduce the background noise.
- This is called
-\begin_inset Quotes eld
+18
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-comfort noise generation
-\begin_inset Quotes erd
-\end_inset
+\layout Standard
- (CNG).
-\layout Subsection*
+30
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-Discontinuous Transmission
-\begin_inset LatexCommand \index{discontinuous transmission}
+\layout Standard
+30
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
- (DTX)
\layout Standard
-Discontinuous transmission is an addition to VAD operation, that allows
- to stop transmitting completely when the background noise is stationnary.
- In file-based operation, since we cannot just stop writing to the file,
- only 5 bits are used for such frames (corresponding to 250 bps).
-\layout Subsection*
+30
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-Perceptual enhancement
-\begin_inset LatexCommand \index{perceptual enhancement}
+\layout Standard
+18
\end_inset
-
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-Perceptual enhancement is a part of the decoder which, when turned on, tries
- to reduce (the perception of) the noise produced by the coding/decoding
- process.
- In most cases, perceptual enhancement make the sound further from the original
-
-\emph on
-objectively
-\emph default
- (if you use SNR), but in the end it still
-\emph on
-sounds
-\emph default
- better (subjective improvement).
-\layout Subsection*
+OL pitch
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-Algorithmic delay
-\begin_inset LatexCommand \index{algorithmic delay}
+\layout Standard
+frame
\end_inset
-
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-Every speech codec introduces a delay in the transmission.
- For Speex, this delay is equal to the frame size, plus some amount of
-\begin_inset Quotes eld
+0
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-look-ahead
-\begin_inset Quotes erd
+\layout Standard
+
+7
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- required to process each frame.
- In narrowband operation (8 kHz), the delay is 30 ms, while for wideband
- (16 kHz), the delay is 34 ms.
- These values don't account for the CPU time it takes to encode or decode
- the frames.
-\layout Section
-\pagebreak_top
-Command-line encoder/decoder
-\begin_inset LatexCommand \label{sec:Command-line-encoder/decoder}
+\layout Standard
+7
\end_inset
-
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-The base Speex distribution includes a command-line encoder (
-\emph on
-speexenc
-\emph default
-) and decoder (
-\emph on
-speexdec
-\emph default
-).
- This section describes how to use these tools.
-\layout Subsection
-
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-\emph on
-speexenc
-\begin_inset LatexCommand \index{speexenc}
+\layout Standard
+0
\end_inset
-
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-The
-\emph on
-speexenc
-\emph default
- utility is used to create Speex files from raw PCM or wave files.
- It can be used by calling:
-\layout LyX-Code
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-speexenc [options] input_file output_file
\layout Standard
-The value '-' for input_file or output_file corresponds respectively to
- stdin and stdout.
- The valid options are:
-\layout Description
-
---narrowband\SpecialChar ~
-(-n) Tell Speex to treat the input as narrowband (8 kHz).
- This is the default
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
---wideband\SpecialChar ~
-(-w) Tell Speex to treat the input as wideband (16 kHz)
-\layout Description
+\layout Standard
---ultra-wideband\SpecialChar ~
-(-u) Tell Speex to treat the input as
-\begin_inset Quotes eld
+0
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-ultra-wideband
-\begin_inset Quotes erd
+\layout Standard
+
+7
\end_inset
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (32 kHz)
-\layout Description
+\layout Standard
---quality\SpecialChar ~
-n Set the encoding quality (0-10), default is 8
-\layout Description
+OL pitch gain
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---bitrate\SpecialChar ~
-n Encoding bit-rate (use bit-rate n or lower)
-\layout Description
+\layout Standard
---vbr Enable VBR (Variable Bit-Rate), disabled by default
-\layout Description
+frame
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---abr\SpecialChar ~
-n Enable ABR (Average Bit-Rate) at n kbps, disabled by default
-\layout Description
+\layout Standard
---vad Enable VAD (Voice Activity Detection), disabled by default
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---dtx Enable DTX (Discontinuous Transmission), disabled by default
-\layout Description
+\layout Standard
---nframes\SpecialChar ~
-n Pack n frames in each Ogg packet (this saves space at low bit-rates)
-\layout Description
+4
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---comp\SpecialChar ~
-n Set encoding speed/quality tradeoff.
- The higher the value of n, the slower the encoding (default is 3)
-\layout Description
+\layout Standard
--V Verbose operation, print bit-rate currently in use
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---help\SpecialChar ~
-(-h) Print the help
-\layout Description
+\layout Standard
---version\SpecialChar ~
-(-v) Print version information
-\layout Subsubsection*
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-Speex comments
-\layout Description
+\layout Standard
---comment Add the given string as an extra comment.
- This may be used multiple times.
-
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---author Author of this track.
-
-\layout Description
+\layout Standard
---title Title for this track.
-
-\layout Subsubsection*
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-Raw input options
-\layout Description
+\layout Standard
---rate\SpecialChar ~
-n Sampling rate for raw input
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
---stereo Consider raw input as stereo
-\layout Description
+\layout Standard
---le Raw input is little-endian
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
---be Raw input is big-endian
-\layout Description
+\layout Standard
---8bit Raw input is 8-bit unsigned
-\layout Description
+4
+\end_inset
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---16bit Raw input is 16-bit signed
-\layout Subsection
+\layout Standard
+OL Exc gain
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-\emph on
-speexdec
-\begin_inset LatexCommand \index{speexdec}
+\layout Standard
+frame
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
\layout Standard
-The
-\emph on
-speexdec
-\emph default
- utility is used to decode Speex files and can be used by calling:
-\layout LyX-Code
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-speexdec [options] speex_file [output_file]
\layout Standard
-The value '-' for input_file or output_file corresponds respectively to
- stdin and stdout.
- Also, when no output_file is specified, the file is played to the soundcard.
- The valid options are:
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---enh enable post-filter (default)
-\layout Description
+\layout Standard
---no-enh disable post-filter
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---force-nb Force decoding in narrowband
-\layout Description
+\layout Standard
---force-wb Force decoding in wideband
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
---force-uwb Force decoding in ultra-wideband
-\layout Description
+\layout Standard
---mono Force decoding in mono
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
---stereo Force decoding in stereo
-\layout Description
+\layout Standard
---rate\SpecialChar ~
-n For decoding at n Hz sampling rate
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
---packet-loss\SpecialChar ~
-n Simulate n % random packet loss
-\layout Description
+\layout Standard
--V Verbose operation, print bit-rate currently in use
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
---help\SpecialChar ~
-(-h) Print the help
-\layout Description
+\layout Standard
---version\SpecialChar ~
-(-v) Print version information
-\layout Section
-\pagebreak_top
-Programming with Speex (the libspeex
-\begin_inset LatexCommand \index{libspeex}
+5
+\end_inset
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
+
+Fine pitch
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- API
-\begin_inset LatexCommand \index{API}
+\layout Standard
+sub-frame
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-)
-\layout Subsection
+\layout Standard
+
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-Encoding
\layout Standard
-In order to encode speech using Speex, you first need to:
-\layout LyX-Code
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-#include <speex.h>
\layout Standard
-You then need to declare a Speex bit-packing struct
-\layout LyX-Code
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SpeexBits bits;
\layout Standard
-and a Speex encoder state
-\layout LyX-Code
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-void *enc_state;
\layout Standard
-The two are initialized by:
-\layout LyX-Code
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-speex_bits_init(&bits);
-\layout LyX-Code
+\layout Standard
+
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-enc_state = speex_encoder_init(&speex_nb_mode);
\layout Standard
-For wideband coding,
-\emph on
-speex_nb_mode
-\emph default
- will be replaced by
-\emph on
-speex_wb_mode
-\emph default
-.
- In most cases, you will need to know the frame size used by the mode you
- are using.
- You can get that value in the
-\emph on
-frame_size
-\emph default
- variable with:
-\layout LyX-Code
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
\layout Standard
-Once the initialization is done, for every input frame:
-\layout LyX-Code
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-speex_bits_reset(&bits);
-\layout LyX-Code
+\layout Standard
-speex_encode(enc_state, input_frame, &bits);
-\layout LyX-Code
+0
+\end_inset
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
\layout Standard
-where
-\emph on
-input_frame
-\emph default
- is a
-\emph on
-(float *)
-\emph default
- pointing to the beginning of a speech frame,
-\emph on
-byte_ptr
-\emph default
- is a
-\emph on
-(char *)
-\emph default
- where the encoded frame will be written,
-\emph on
-MAX_NB_BYTES
-\emph default
- is the maximum number of bytes that can be written to
-\emph on
-byte_ptr
-\emph default
- without causing an overflow and
-\emph on
-nbBytes
-\emph default
- is the number of bytes actually written to
-\emph on
-byte_ptr
-\emph default
- (the encoded size in bytes).
- Before calling speex_bits_write, it is possible to find the number of bytes
- that need to be written by calling
-\family typewriter
-speex_bits_nbytes(&bits)
-\family default
-, which returns a number of bytes.
-
-\layout Standard
+Pitch gain
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-After you're done with the encoding, free all resources with:
-\layout LyX-Code
+\layout Standard
-speex_bits_destroy(&bits);
-\layout LyX-Code
+sub-frame
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-speex_encoder_destroy(enc_state);
\layout Standard
-That's about it for the encoder.
-
-\layout Subsection
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-Decoding
\layout Standard
-In order to encode speech using Speex, you first need to:
-\layout LyX-Code
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-#include <speex.h>
\layout Standard
-You also need to declare a Speex bit-packing struct
-\layout LyX-Code
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SpeexBits bits;
\layout Standard
-and a Speex encoder state
-\layout LyX-Code
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-void *dec_state;
\layout Standard
-The two are initialized by:
-\layout LyX-Code
-
-speex_bits_init(&bits);
-\layout LyX-Code
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-dec_state = speex_decoder_init(&speex_nb_mode);
\layout Standard
-For wideband decoding,
-\emph on
-speex_nb_mode
-\emph default
- will be replaced by
-\emph on
-speex_wb_mode
-\emph default
-.
- If you need to obtain the size of the frames that will be used by the decoder,
- you can get that value in the
-\emph on
-frame_size
-\emph default
- variable with:
-\layout LyX-Code
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
\layout Standard
-There is also a parameter that can be set for the decoder: whether or not
- to use a perceptual post-filter.
- This can be set by:
-\layout LyX-Code
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
\layout Standard
-where
-\emph on
-enh
-\emph default
- is an int that with value 0 to have the post-filter disabled and 1 to have
- it enabled.
-\layout Standard
+7
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-Again, once the decoder initialization is done, for every input frame:
-\layout LyX-Code
+\layout Standard
-speex_bits_read_from(&bits, input_bytes, nbBytes);
-\layout LyX-Code
+0
+\end_inset
+</cell>
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-speex_decode(st, &bits, output_frame);
\layout Standard
-where input_bytes is a
-\emph on
-(char *)
-\emph default
- containing the bit-stream data received for a frame,
-\emph on
-nbBytes
-\emph default
- is the size (in bytes) of that bit-stream, and
-\emph on
-output_frame
-\emph default
- is a
-\emph on
-(float *)
-\emph default
- and points to the area where the decoded speech frame will be written.
- A NULL value as the first argument indicates that we don't have the bits
- for the current frame.
- When a frame is lost, the Speex decoder will do its best to "guess" the
- correct signal.
+Innovation gain
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
\layout Standard
-After you're done with the decoding, free all resources with:
-\layout LyX-Code
+sub-frame
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-speex_bits_destroy(&bits);
-\layout LyX-Code
+\layout Standard
-speex_decoder_destroy(dec_state);
-\layout Subsection
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-Codec Options (speex_*_ctl)
\layout Standard
-The Speex encoder and decoder support many options and requests that can
- be accessed through the
-\emph on
-speex_encoder_ctl
-\emph default
- and
-\emph on
-speex_decoder_ctl
-\emph default
- functions.
- These functions are similar the the
-\emph on
-ioctl
-\emph default
- system call and their prototypes are:
-\layout LyX-Code
-
-void speex_encoder_ctl(void *encoder, int request, void *ptr);
-\layout LyX-Code
+1
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-void speex_decoder_ctl(void *encoder, int request, void *ptr);
\layout Standard
-The different values of request allowed are (note that some only apply to
- the encoder or the decoder):
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_SET_ENH** Set perceptual enhancer
-\begin_inset LatexCommand \index{perceptual enhancement}
+\layout Standard
+1
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- to on (1) or off (0) (integer)
-\layout Description
+\layout Standard
-SPEEX_GET_ENH** Get perceptual enhancer status (integer)
-\layout Description
+1
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_FRAME_SIZE Get the frame size used for the current mode (integer)
-\layout Description
+\layout Standard
-SPEEX_SET_QUALITY* Set the encoder speech quality (integer 0 to 10)
-\layout Description
+3
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_QUALITY* Get the current encoder speech quality (integer 0 to
- 10)
-\layout Description
+\layout Standard
-SPEEX_SET_MODE*
-\begin_inset Formula $\dagger $
+3
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\layout Description
-
-SPEEX_GET_MODE*
-\begin_inset Formula $\dagger $
+3
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\layout Description
-
-SPEEX_SET_LOW_MODE*
-\begin_inset Formula $\dagger $
+0
\end_inset
+</cell>
+</row>
+<row topline="true" bottomline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\layout Description
-
-SPEEX_GET_LOW_MODE*
-\begin_inset Formula $\dagger $
+Innovation VQ
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\layout Description
-
-SPEEX_SET_HIGH_MODE*
-\begin_inset Formula $\dagger $
+sub-frame
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\layout Description
-
-SPEEX_GET_HIGH_MODE*
-\begin_inset Formula $\dagger $
+0
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\layout Description
-
-SPEEX_SET_VBR* Set variable bit-rate (VBR) to on (1) or off (0) (integer)
-\layout Description
+0
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_VBR* Get variable bit-rate
-\begin_inset LatexCommand \index{variable bit-rate}
+\layout Standard
+16
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (VBR) status (integer)
-\layout Description
+\layout Standard
-SPEEX_SET_VBR_QUALITY* Set the encoder VBR speech quality (float 0 to 10)
-\layout Description
+20
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_VBR_QUALITY* Get the current encoder VBR speech quality (float
- 0 to 10)
-\layout Description
+\layout Standard
-SPEEX_SET_COMPLEXITY* Set the CPU resources allowed for the encoder (integer
- 1 to 10)
-\layout Description
+35
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_COMPLEXITY* Get the CPU resources allowed for the encoder (integer
- 1 to 10)
-\layout Description
+\layout Standard
-SPEEX_SET_BITRATE* Set the bit-rate to use to the closest value not exceeding
- the parameter (integer in bps)
-\layout Description
+48
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_BITRATE Get the current bit-rate in use (integer in bps)
-\layout Description
+\layout Standard
-SPEEX_SET_SAMPLING_RATE Set real sampling rate (integer in Hz)
-\layout Description
+64
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-SPEEX_GET_SAMPLING_RATE Get real sampling rate (integer in Hz)
-\layout Description
+\layout Standard
-SPEEX_RESET_STATE Reset the encoder/decoder state to its original state
- (zeros all memories)
-\layout Description
+96
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-SPEEX_SET_VAD* Set voice activity detection
-\begin_inset LatexCommand \index{voice activity detection}
+\layout Standard
+10
\end_inset
+</cell>
+</row>
+<row topline="true" bottomline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (VAD) to on (1) or off (0) (integer)
-\layout Description
+\layout Standard
-SPEEX_GET_VAD* Get voice activity detection (VAD) status (integer)
-\layout Description
+Total
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_SET_DTX* Set discontinuous transmission
-\begin_inset LatexCommand \index{discontinuous transmission}
+\layout Standard
+frame
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (DTX) to on (1) or off (0) (integer)
-\layout Description
+\layout Standard
-SPEEX_GET_DTX* Get discontinuous transmission (DTX) status (integer)
-\layout Description
+5
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-SPEEX_SET_ABR* Set average bit-rate
-\begin_inset LatexCommand \index{average bit-rate}
+\layout Standard
+43
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- (ABR) to a value n in bits per second (integer in bps)
-\layout Description
+\layout Standard
-SPEEX_GET_ABR* Get average bit-rate (ABR) setting (integer in bps)
-\layout Description
+119
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
-* applies only to the encoder
-\layout Description
+\layout Standard
-** applies only to the decoder
-\layout Description
+160
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+\layout Standard
-\begin_inset Formula $\dagger $
+220
\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
- normally only used internally
-\layout Subsection
-
-Mode queries
\layout Standard
-Speex modes have a querry system similar to the speex_encoder_ctl and speex_deco
-der_ctl calls.
- Since modes are read-only, it is only possible to get information about
- a particular mode.
- The function used to do that is:
-\layout LyX-Code
+300
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-void speex_mode_query(SpeexMode *mode, int request, void *ptr);
\layout Standard
-The admissible values for request are (unless otherwise note, the values
- are returned through
-\emph on
-ptr
-\emph default
-):
-\layout Description
+364
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
-SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
-\layout Description
+492
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
-SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified throught
-
-\emph on
-ptr
-\emph default
- (integer in bps).
-
-\layout Subsection
+\layout Standard
-Packing and in-band signalling
-\begin_inset LatexCommand \index{in-band signalling}
+79
+\end_inset
+</cell>
+</row>
+</lyxtabular>
\end_inset
-\layout Standard
+\layout Caption
+
+Bit allocation for narrowband modes
+\begin_inset LatexCommand \label{cap:bits-narrowband}
-Sometimes it is desirable to pack more than one frame per packet (or other
- basic unit of storage).
- The proper way to do it is to call speex_encode
-\begin_inset Formula $N$
\end_inset
- times before writing the stream with speex_bits_write.
- In cases where the number of frames is not determined by an out-of-band
- mechanism, it is possible to include a terminator code.
- That terminator consists of the code 15 (decimal) encoded with 5 bits,
- as shown in figure
-\begin_inset LatexCommand \ref{cap:quality_vs_bps}
\end_inset
-.
-
+
\layout Standard
-It is also possible to send in-band
-\begin_inset Quotes eld
-\end_inset
+So far, no MOS (Mean Opinion Score
+\begin_inset LatexCommand \index{mean opinion score}
-messages
-\begin_inset Quotes erd
\end_inset
- to the other side.
- All these messages are encoded as a
-\begin_inset Quotes eld
-\end_inset
+) subjective evaluation has been performed for Speex.
+ In order to give an idea of the quality achivable with it, table
+\begin_inset LatexCommand \ref{cap:quality_vs_bps}
-pseudo-frame
-\begin_inset Quotes erd
\end_inset
- of mode 14 which contain a 4-bit message type code, followed by the message.
- Table
-\begin_inset LatexCommand \ref{cap:In-band-signalling-codes}
+ presents my own subjective opinion on it.
+ It sould be noted that different people will perceive the quality differently
+ and that the person that designed the codec often has a bias (one way or
+ another) when it comes to subjective evaluation.
+ Last thing, it should be noted that for most codecs (including Speex) encoding
+ quality sometimes varies depending on the input.
+ Note that the complexity is only approximate (within 0.5 mflops and using
+ the lowers complexity setting).
+ Decoding requires approximately 0.5 mflops
+\begin_inset LatexCommand \index{complexity}
\end_inset
- lists the available codes, their meaning and the size of the message that
- follow.
- Most of these messages are requests that are sent to the encoder or decoder
- on the other end, which is free to comply or ignore them.
- By default, all in-band messages are ignored.
+ in most modes (1 mflops with perceptual enhancement).
\layout Standard
\begin_inset Float table
-placement htbp
-wide false
+placement h
+wide true
collapsed false
\layout Standard
\begin_inset Tabular
-<lyxtabular version="3" rows="17" columns="3">
+<lyxtabular version="3" rows="17" columns="4">
<features>
<column alignment="center" valignment="top" leftline="true" width="0pt">
<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4213,7 +4091,7 @@
\layout Standard
-code
+Mode
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4221,7 +4099,25 @@
\layout Standard
-Size (bits)
+Bit-rate
+\begin_inset LatexCommand \index{bit-rate}
+
+\end_inset
+
+ (bps)
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+mflops
+\begin_inset LatexCommand \index{complexity}
+
+\end_inset
+
+
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4229,7 +4125,7 @@
\layout Standard
-Content
+Quality/description
\end_inset
</cell>
</row>
@@ -4247,7 +4143,15 @@
\layout Standard
-1
+250
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4255,7 +4159,7 @@
\layout Standard
-Asks decoder to set perceptual enhancement off (0) or on(1)
+No sound (VBR only)
\end_inset
</cell>
</row>
@@ -4273,7 +4177,15 @@
\layout Standard
-1
+2,150
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+6
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4281,7 +4193,7 @@
\layout Standard
-reserved
+Vocoder (mostly for comfort noise)
\end_inset
</cell>
</row>
@@ -4299,7 +4211,15 @@
\layout Standard
-4
+5,950
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+9
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4307,7 +4227,7 @@
\layout Standard
-Asks encoder to switch to mode N
+Very noticeable artifacts/noise, good intelligibility
\end_inset
</cell>
</row>
@@ -4325,7 +4245,15 @@
\layout Standard
-4
+8,000
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+10
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4333,7 +4261,7 @@
\layout Standard
-Asks encoder to switch to mode N for low-band
+Artifacts/noise sometimes noticeable
\end_inset
</cell>
</row>
@@ -4351,7 +4279,15 @@
\layout Standard
-4
+11,000
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+14
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4359,7 +4295,7 @@
\layout Standard
-Asks encoder to switch to mode N for high-band
+Artifacts usually noticeable only with headphones
\end_inset
</cell>
</row>
@@ -4377,7 +4313,15 @@
\layout Standard
-4
+15,000
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+11
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4385,7 +4329,7 @@
\layout Standard
-Asks encoder to switch to quality N for VBR
+Need good headphones to tell the difference
\end_inset
</cell>
</row>
@@ -4403,7 +4347,15 @@
\layout Standard
-4
+18,200
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+17.5
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4411,7 +4363,7 @@
\layout Standard
-Request acknowloedge (0=no, 1=all, 2=only for in-band data)
+Hard to tell the difference even with good headphones
\end_inset
</cell>
</row>
@@ -4429,7 +4381,15 @@
\layout Standard
-4
+24,600
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+14.5
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4437,7 +4397,7 @@
\layout Standard
-Asks encoder to set VBR off (0), on(1), VAD(2), DTX(3)
+Completely transparent for voice, good quality music
\end_inset
</cell>
</row>
@@ -4455,7 +4415,15 @@
\layout Standard
-8
+3,950
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+-
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4463,7 +4431,7 @@
\layout Standard
-Transmit (8-bit) character to the other end
+Very noticeable artifacts/noise, good intelligibility
\end_inset
</cell>
</row>
@@ -4481,7 +4449,15 @@
\layout Standard
-8
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4489,7 +4465,7 @@
\layout Standard
-Intensity stereo information
+reserved
\end_inset
</cell>
</row>
@@ -4507,7 +4483,15 @@
\layout Standard
-16
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4515,7 +4499,7 @@
\layout Standard
-Announce maximum bit-rate acceptable (N in bytes/second)
+reserved
\end_inset
</cell>
</row>
@@ -4533,7 +4517,15 @@
\layout Standard
-16
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4559,7 +4551,15 @@
\layout Standard
-32
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4567,7 +4567,7 @@
\layout Standard
-Acknowledge receiving packet N
+reserved
\end_inset
</cell>
</row>
@@ -4585,7 +4585,15 @@
\layout Standard
-32
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4593,7 +4601,7 @@
\layout Standard
-reserved
+Application-defined, interpreted by callback or skipped
\end_inset
</cell>
</row>
@@ -4611,7 +4619,15 @@
\layout Standard
-64
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4619,7 +4635,7 @@
\layout Standard
-reserved
+Speex in-band signaling
\end_inset
</cell>
</row>
@@ -4637,7 +4653,15 @@
\layout Standard
-64
+N/A
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+N/A
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
@@ -4645,7 +4669,7 @@
\layout Standard
-reserved
+Terminator code
\end_inset
</cell>
</row>
@@ -4656,8 +4680,8 @@
\layout Caption
-In-band signalling codes
-\begin_inset LatexCommand \label{cap:In-band-signalling-codes}
+Quality versus bit-rate
+\begin_inset LatexCommand \label{cap:quality_vs_bps}
\end_inset
@@ -4665,193 +4689,156 @@
\end_inset
-\layout Standard
+\layout Subsection
-Finally, applications may define custom in-band messages using mode 13.
- The size of the message in bytes is encoded with 5 bits, so that the decoder
- can skip it if it doesn't know how to interpret it.
-\layout Section
-\pagebreak_top
-Formats and standards
-\begin_inset LatexCommand \index{standards}
+Perceptual enhancement
+\begin_inset LatexCommand \index{perceptual enhancement}
\end_inset
\layout Standard
-Speex can encode speech in both narrowband and wideband and provides different
- bit-rates.
- However not all features must be supported by a certain implementation
- or device.
- In order to be said
-\begin_inset Quotes eld
+This part of the codec only applies to the decoder and can even be changed
+ without affecting inter-operability.
+ For that reason, the implementation provided and described here should
+ only be considered as a reference implementation.
+ The enhancement system is devided in two parts.
+ First, the synthesis filter
+\begin_inset Formula $S(z)=1/A(z)$
\end_inset
-Speex compatible
-\begin_inset Quotes erd
+ is replaced by an enhanced filter
+\begin_inset Formula \[
+S'(z)=\frac{A\left(z/a_{2}\right)A\left(z/a_{3}\right)}{A\left(z\right)A\left(z/a_{1}\right)}\]
+
\end_inset
- (whatever that means), an implementation must implement at least a basic
- set of features.
-\layout Standard
+where
+\begin_inset Formula $a_{1}$
+\end_inset
-At the minimum, all narrowband modes of operation MUST be supported at the
- decoder.
- This includes the decoding of a wideband bit-stream by the narrowband decoder
-\begin_inset Foot
-collapsed true
+ and
+\begin_inset Formula $a_{2}$
+\end_inset
-\layout Standard
+ depend on the mode in use and
+\begin_inset Formula $a_{3}=\frac{1}{r}\left(1-\frac{1-ra_{1}}{1-ra_{2}}\right)$
+\end_inset
-The wideband bit-stream contains an embedded narrowband bit-stream which
- can be decoded alone
+ with
+\begin_inset Formula $r=.9$
\end_inset
.
- If present, a wideband decoder MUST be able to decode a narrowband stream,
- and MAY either be able to decode all wideband modes or be able to decode
- the embedded narrowband part of all modes (which includes ignoring the
- high-band bits).
-\layout Standard
+ The second part of the enhancement consists of using a comb filter to enhance
+ the pitch in the excitation domain.
+
+\layout Section
+\pagebreak_top
+Speex wideband mode (sub-band CELP)
+\begin_inset LatexCommand \index{wideband}
-For encoders, at least one narrowband or wideband mode MUST be supported.
- The main reason why all encoding modes do not have to be supported is that
- some platforms may not be able to handle the complexity of encoding in
- some modes.
-\layout Subsection
+\end_inset
-RTP
-\begin_inset LatexCommand \index{RTP}
+
+\layout Standard
+
+For wideband, the Speex approach uses a
+\emph on
+q
+\emph default
+uadrature
+\emph on
+m
+\emph default
+irror
+\emph on
+f
+\emph default
+ilter
+\begin_inset LatexCommand \index{quadrature mirror filter}
\end_inset
- Payload Format
-\layout Standard
+ (QMF) to split the band in two.
+ The 16 kHz signal is thus divided into two 8 kHz signals, one representing
+ the low band (0-4 kHz), the other the high band (4-8 kHz).
+ The low band is encoded with the narrowband mode described in section
+\begin_inset LatexCommand \ref{sec:Speex-narrowband-mode}
-The latest RTP payload draft can be found at
-\begin_inset LatexCommand \url{http://www.speex.org/drafts/latest}
+\end_inset
+ in such a way that the resulting
+\begin_inset Quotes eld
\end_inset
-.
- We are (2003/01/14) about to send the latest draft to the IETF for comments.
-
+embedded narrowband bit-stream
+\begin_inset Quotes erd
+\end_inset
+
+ can also be decoded with the narrowband decoder.
+ Since the low band encoding has already been described only the high band
+ encoding is described in this section.
\layout Subsection
-MIME Type
+Linear Prediction
\layout Standard
-Speex will use the MIME type
-\family typewriter
-audio/speex
-\family default
-.
- We will apply for that type in the near future.
+The linear prediction part used for the high-band is very similar to what
+ is done for narrowband.
+ The only difference is that we use only 12 bits to encode the high-band
+ LSP's using a multi-stage vector quantizer (MSVQ).
+ The first level quantizes the 10 coefficients with 6 bits and the error
+ is then quantized using 6 bits too.
\layout Subsection
-Ogg
-\begin_inset LatexCommand \index{Ogg}
-
-\end_inset
-
- file format
+Pitch Prediction
\layout Standard
-Speex bit-streams can be stored in Ogg files.
- In this case, the first packet of the Ogg file contains the Speex header
- described in table
-\begin_inset LatexCommand \ref{cap:ogg_speex_header}
+That part is easy: there's no pitch prediction for the high-band.
+ There are two reasons for that.
+ First, there is usually little harmonic structure in this band (above 4
+ kHz).
+ Second, it would be very hard to implement since the QMF folds the 4-8
+ kHz band into 4-0 kHz (reversing the frequency axis), which means that
+ the location of the harmonics are no longer at multiples of the fundamental
+ (pitch).
+\layout Subsection
-\end_inset
+Excitation Quantization
+\layout Standard
-.
- All integer fields in the headers are stored as little-endian.
- The
-\family typewriter
-speex_string
-\family default
- field must contain the
-\begin_inset Quotes eld
-\end_inset
+The high-band excitation is coded in the same way as for narrowband.
+
+\layout Subsection
+Bit allocation
+\layout Standard
-\family typewriter
-Speex
-\family default
-\SpecialChar ~
-\SpecialChar ~
-\SpecialChar ~
+For the wideband mode, all the narrowband frame is packed before the high-band
+ is encoded.
+ The narrowband part of the bit-stream is as defined in table
+\begin_inset LatexCommand \ref{cap:bits-narrowband}
-\begin_inset Quotes eld
\end_inset
- (with 3 training spaces), which identifies the bit-stream.
- The next field,
-\family typewriter
-speex_version
-\family default
- contains the version of Speex that encoded the file.
- For now, refer to speex_header.[ch] for more info.
- The
-\emph on
-beginning of stream
-\emph default
- (
-\family typewriter
-b_o_s
-\family default
-) flag is set to 1 for the header.
- The header packet has
-\family typewriter
-packetno=0
-\family default
- and
-\family typewriter
-granulepos=0
-\family default
-.
-\layout Standard
-
-The second packet contains the Speex comment header.
- The format used is the Vorbis comment format described here: http://www.xiph.org/
-ogg/vorbis/doc/v-comment.html .
- This packet has
-\family typewriter
-packetno=1
-\family default
- and
-\family typewriter
-granulepos=0
-\family default
.
-\layout Standard
+ The high-band follows, as described in table
+\begin_inset LatexCommand \ref{cap:bits-wideband}
-The third and subsequant packets each contain one or more (number found
- in header) Speex frames.
- These are identified with
-\family typewriter
-packetno
-\family default
- starting from 2 and the
-\family typewriter
-granulepos
-\family default
- is the number of the last sample encoded in that packet.
- Le last of these packets has the
-\emph on
-end of stream
-\emph default
- (
-\family typewriter
-e_o_s
-\family default
-) flag is set to 1.
+\end_inset
+
+.
+ This also means that a wideband frame may be correctly decoded by a narrowband
+ decoder with the only caveat that if more than one frame is packed in the
+ same packet, the decoder will need to skip the high-band parts in order
+ to sync with the bit-stream.
\layout Standard
\begin_inset Float table
-placement htbp
+placement h
wide true
collapsed false
@@ -4859,10 +4846,14 @@
\begin_inset Tabular
-<lyxtabular version="3" rows="16" columns="3">
+<lyxtabular version="3" rows="7" columns="7">
<features>
<column alignment="center" valignment="top" leftline="true" width="0pt">
<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
+<column alignment="center" valignment="top" leftline="true" width="0pt">
<column alignment="center" valignment="top" leftline="true" rightline="true" width="0pt">
<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4870,7 +4861,7 @@
\layout Standard
-Field
+Parameter
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4878,25 +4869,23 @@
\layout Standard
-Type
+Update rate
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-Size
+0
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-speex_string
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4904,51 +4893,49 @@
\layout Standard
-char[]
+2
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-8
+3
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-speex_version
+4
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-char[]
+Wideband bit
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-20
+frame
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-speex_version_id
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4956,25 +4943,23 @@
\layout Standard
-int
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+1
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-header_size
+1
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -4982,25 +4967,25 @@
\layout Standard
-int
+1
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+Mode ID
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-rate
+frame
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5008,25 +4993,23 @@
\layout Standard
-int
+3
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+3
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-mode
+3
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5034,15 +5017,15 @@
\layout Standard
-int
+3
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+3
\end_inset
</cell>
</row>
@@ -5052,7 +5035,7 @@
\layout Standard
-mode_bitstream_version
+LSP
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5060,25 +5043,23 @@
\layout Standard
-int
+frame
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+0
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-nb_channels
+12
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5086,51 +5067,49 @@
\layout Standard
-int
+12
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+12
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-bitrate
+12
\end_inset
</cell>
+</row>
+<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-int
+Excitation gain
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+sub-frame
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-frame_size
+0
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5138,10 +5117,10 @@
\layout Standard
-int
+5
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
@@ -5149,14 +5128,12 @@
4
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-vbr
+4
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5164,25 +5141,25 @@
\layout Standard
-int
+4
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+</row>
+<row topline="true" bottomline="true">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+Excitation VQ
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-frames_per_packet
+sub-frame
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5190,25 +5167,23 @@
\layout Standard
-int
+0
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+0
\end_inset
</cell>
-</row>
-<row topline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-extra_headers
+20
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5216,25 +5191,25 @@
\layout Standard
-int
+40
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+80
\end_inset
</cell>
</row>
-<row topline="true">
+<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-reserved1
+Total
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5242,10 +5217,10 @@
\layout Standard
-int
+frame
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
@@ -5253,14 +5228,12 @@
4
\end_inset
</cell>
-</row>
-<row topline="true" bottomline="true">
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-reserved2
+36
\end_inset
</cell>
<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
@@ -5268,15 +5241,23 @@
\layout Standard
-int
+112
\end_inset
</cell>
-<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
\begin_inset Text
\layout Standard
-4
+192
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\layout Standard
+
+352
\end_inset
</cell>
</row>
@@ -5287,12 +5268,25 @@
\layout Caption
-Ogg/Speex header packet
-\begin_inset LatexCommand \label{cap:ogg_speex_header}
+Bit allocation for high-band in wideband mode
+\begin_inset LatexCommand \label{cap:bits-wideband}
+
+\end_inset
+
\end_inset
+\layout Standard
+
+
+\begin_inset ERT
+status Open
+
+\layout Standard
+
+\backslash
+clearpage
\end_inset
@@ -5558,6 +5552,47 @@
This technique was invented at the University of Sherbrooke and is now
one of the most widely used form of CELP.
Unfortunately, since it is patented, it cannot be used in Speex.
+\layout Section
+\pagebreak_top
+Sample code
+\layout Subsection
+
+sampleenc.c
+\layout Standard
+
+
+\begin_inset Include \verbatiminput{sampleenc.c}
+
+\end_inset
+
+
+\layout Subsection
+
+sampledec.c
+\layout Standard
+
+
+\begin_inset Include \verbatiminput{sampledec.c}
+
+\end_inset
+
+
+\layout Section
+\pagebreak_top
+IETF RTP Profile
+\begin_inset LatexCommand \label{sec:IETF-draft}
+
+\end_inset
+
+
+\layout Standard
+
+
+\begin_inset Include \verbatiminput{draft-herlein-speex-rtp-profile-07.txt}
+
+\end_inset
+
+
\layout Section
\pagebreak_top
GNU Free Documentation License
<p><p>1.1 speex/doc/sampledec.c
Index: sampledec.c
===================================================================
#include <speex.h>
#include <stdio.h>
#include <stdlib.h>
#define FRAME_SIZE 160
int main(int argc, char **argv)
{
char *outFile;
FILE *fout;
short out[FRAME_SIZE];
float output[FRAME_SIZE];
char cbits[200];
int nbBytes;
void *state;
SpeexBits bits;
int i, tmp;
<p> state = speex_decoder_init(&speex_nb_mode);
tmp=1;
speex_decoder_ctl(state, SPEEX_SET_ENH, &tmp);
outFile = argv[1];
fout = fopen(outFile, "w");
speex_bits_init(&bits);
while (1)
{
fread(&nbBytes, sizeof(int), 1, stdin);
fprintf (stderr, "nbBytes: %d\n", nbBytes);
if (feof(stdin))
break;
fread(cbits, 1, nbBytes, stdin);
speex_bits_read_from(&bits, cbits, nbBytes);
speex_decode(state, &bits, output);
for (i=0;i<FRAME_SIZE;i++)
out[i]=output[i];
fwrite(out, sizeof(short), FRAME_SIZE, fout);
}
speex_encoder_destroy(state);
speex_bits_destroy(&bits);
fclose(fout);
return 0;
}
<p><p>1.1 speex/doc/sampleenc.c
Index: sampleenc.c
===================================================================
#include <speex.h>
#include <stdio.h>
#include <stdlib.h>
#define FRAME_SIZE 160
int main(int argc, char **argv)
{
char *inFile;
FILE *fin;
short in[FRAME_SIZE];
float input[FRAME_SIZE];
char cbits[200];
int nbBytes;
void *state;
SpeexBits bits;
int i, tmp;
<p> state = speex_encoder_init(&speex_nb_mode);
tmp=8;
speex_encoder_ctl(state, SPEEX_SET_QUALITY, &tmp);
inFile = argv[1];
fin = fopen(inFile, "r");
speex_bits_init(&bits);
while (1)
{
fread(in, sizeof(short), FRAME_SIZE, fin);
if (feof(fin))
break;
for (i=0;i<FRAME_SIZE;i++)
input[i]=in[i];
speex_bits_reset(&bits);
speex_encode(state, input, &bits);
nbBytes = speex_bits_write(&bits, cbits, 200);
fwrite(&nbBytes, sizeof(int), 1, stdout);
fwrite(cbits, 1, nbBytes, stdout);
speex_bits_rewind(&bits);
}
speex_encoder_destroy(state);
speex_bits_destroy(&bits);
fclose(fin);
return 0;
}
<p><p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the commits
mailing list