[xiph-cvs] cvs commit: speex/doc manual.lyx

Jean-Marc Valin jm at xiph.org
Tue Nov 12 20:03:04 PST 2002


jm          02/11/12 23:03:04

Modified:    doc      manual.lyx
Log:

Revision  Changes    Path
1.35      +87 -27    speex/doc/manual.lyx

Index: manual.lyx
===================================================================
RCS file: /usr/local/cvsroot/speex/doc/manual.lyx,v
retrieving revision 1.34
retrieving revision 1.35
diff -u -r1.34 -r1.35
--- manual.lyx	11 Nov 2002 06:05:21 -0000	1.34
+++ manual.lyx	13 Nov 2002 04:03:04 -0000	1.35
@@ -29,7 +29,7 @@

The Speex Codec Manual
\newline
-(draft for Speex 1.0beta3)
+(draft for Speex 1.0beta4)
\layout Author

Jean-Marc Valin
@@ -255,22 +255,22 @@

\begin_inset Formula $-y(n)=\sum _{i=1}^{N}a_{i}x(n-i)$
+y[n]=\sum _{i=1}^{N}a_{i}x[n-i]\]

\end_inset

where
-\begin_inset Formula $y(n)$
+\begin_inset Formula $y[n]$
\end_inset

is the linear prediction of
-\begin_inset Formula $x(n)$
+\begin_inset Formula $x[n]$
\end_inset

.
The prediction error is thus given by:
\begin_inset Formula $-e(n)=x(n)-y(n)=x(n)-\sum _{i=1}^{N}a_{i}x(n-i)$
+e[n]=x[n]-y[n]=x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\]

\end_inset

@@ -284,7 +284,7 @@

which minimize the quadratic error function:
\begin_inset Formula $-E=\sum _{n=0}^{L-1}\left[e(n)\right]^{2}=\sum _{n=0}^{L-1}\left[x(n)-\sum _{i=1}^{N}a_{i}x(n-i)\right]^{2}$
+E=\sum _{n=0}^{L-1}\left[e[n]\right]^{2}=\sum _{n=0}^{L-1}\left[x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\right]^{2}\]

\end_inset

@@ -294,7 +294,7 @@

equal to zero:
\begin_inset Formula $-\frac{\partial E}{\partial a_{i}}=\frac{\partial }{\partial a_{i}}\sum _{n=0}^{L-1}\left[x(n)-\sum _{i=1}^{N}a_{i}x(n-i)\right]^{2}=0$
+\frac{\partial E}{\partial a_{i}}=\frac{\partial }{\partial a_{i}}\sum _{n=0}^{L-1}\left[x[n]-\sum _{i=1}^{N}a_{i}x[n-i]\right]^{2}=0\]

\end_inset

@@ -320,7 +320,7 @@
\end_inset

of the signal
-\begin_inset Formula $x(n)$
+\begin_inset Formula $x[n]$
\end_inset

.
@@ -328,7 +328,7 @@

\begin_inset Formula $-R(m)=\sum _{i=0}^{N-1}x(i)x(i-m)$
+R(m)=\sum _{i=0}^{N-1}x[i]x[i-m]\]

\end_inset

@@ -411,7 +411,7 @@
The linear prediction model represents each speech sample as linear combination
of past samples, plus an error signal called the excitation (or residual).
\begin_inset Formula $-x(n)=\sum _{i=1}^{N}a_{i}x(n-i)+e(n)$
+x[n]=\sum _{i=1}^{N}a_{i}x[n-i]+e[n]\]

\end_inset

@@ -459,6 +459,20 @@
\end_inset

as the synthesis filter.
+ The whole process is called short-term prediction as it predicts the signal
+
+\begin_inset Formula $x[n]$
+\end_inset
+
+ using a prediction using only the
+\begin_inset Formula $N$
+\end_inset
+
+ past samples, where
+\begin_inset Formula $N$
+\end_inset
+
+ is usually around 10.
\layout Standard

Because LPC coefficients have very little robustness to quantization, they
@@ -490,7 +504,7 @@

\begin_inset Formula $-e(n)=\beta e(n-T)+c(n)$
+e[n]=\beta e[n-T]+c[n]\]

\end_inset

@@ -552,17 +566,33 @@
\layout Standard

Most (if not all) modern audio codecs attempt to
-\emph on
+\begin_inset Quotes eld
+\end_inset
+
shape
-\emph default
- the noise so that it is the hardest to detect with the ear.
- That means that more noise can be tolerated in parts of the spectrum that
- are louder and
+\begin_inset Quotes erd
+\end_inset
+
+ the noise so that it appears mostly in the frequency regions where the
+ ear cannot detect it.
+ For example, the ear is more tolerant to noise in parts of the spectrum
+ that are louder and
\emph on
vice versa
\emph default
.
- That's why the error is minimized for the perceptually weighted signal
+\begin_inset Formula $+E=\sum _{n}\left(x[n]-\overline{x}[n]\right)^{2}$
+
+\end_inset
+
+where
+\begin_inset Formula $\overline{x}[n]$
+\end_inset
+
+ is the encoder signal, we minimize the error for the perceptually weighted
+ signal
\begin_inset Formula $X_{w}(z)=W(z)X(z)$

@@ -662,7 +692,7 @@
Dynamically-selectable codebooks (LSP, pitch and innovation)
\layout Itemize

-G.728-like fixed codebooks (without backward-adaptive grains because of patent
+G.728-like fixed codebooks (without backward-adaptive gains because of patent
issues)
\layout Subsection

@@ -674,8 +704,8 @@

\layout Standard

-An LPC analysis is first performed on a (Hamming) window that spans all
- the current frame and half a frame in advance.
+An LPC analysis is first performed on a (asymetric Hamming) window that
+ spans all the current frame and half a frame in advance.
The LPC coefficients are then converted to Line Spectral Pair
\begin_inset LatexCommand \index{line spectral pair}

@@ -696,15 +726,18 @@
6 bits and the error is then divided in two 5-coefficient sub-vectors.
Each of them is quantized with 6 bits, for a total of 18 bits.
For the higher quality modes, the remaining error on both sub-vectors is
- turther quantized with 6 bits each, for a total of 30 bits.
+ further quantized with 6 bits each, for a total of 30 bits.
\layout Standard

The perceptual weighting filter
\begin_inset Formula $W(z)$
\end_inset

- used by Speex is derived from the LPC analysis and corresponds to the one
- described by eq.
+ used by Speex is derived from the LPC filter
+\begin_inset Formula $A(z)$
+\end_inset
+
+ and corresponds to the one described by eq.

\begin_inset LatexCommand \ref{eq:weighting_filter}

@@ -735,12 +768,13 @@
\layout Standard

Speex uses a 3-tap prediction for pitch.
- That is,
-\layout Standard
-
+ That is, the pitch prediction signal
+\begin_inset Formula $p[n]$
+\end_inset

+ is obtained by the past of the excitation by:
\begin_inset Formula $-e(n)=\beta _{0}e(n-T-1)+\beta _{1}e(n-T)+\beta _{2}e(n-T+1)+c(n)$
+p[n]=\beta _{0}e[n-T-1]+\beta _{1}e[n-T]+\beta _{2}e[n-T+1]\]

\end_inset

@@ -756,7 +790,33 @@
\end_inset

are the prediction (filter) taps.
+ It is worth noting that when the pitch is smaller than the sub-frame size,
+ we repeat the excitation at a period
+\begin_inset Formula $T$
+\end_inset
+
+.
+ For example, when
+\begin_inset Formula $n-T+1$
+\end_inset
+
+, we use
+\begin_inset Formula $n-2T+1$
+\end_inset
+
The period and quantized gains are determined in closed loop.
+ In most modes, the pitch period is encoded with 7 bits in the
+\begin_inset Formula $\left[17,144\right]$
+\end_inset
+
+ range and the
+\begin_inset Formula $\beta _{i}$
+\end_inset
+
+ coefficients are vector-quantized using 7 bits (15 kbps narrowband and
+ above) at higher bit-rates and 5 bits at lower bit-rates (11 kbps narrowband
+ and below).
\layout Subsection

Innovation Codebook

<p><p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvs-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.