[xiphcvs] cvs commit: speex/doc manual.lyx
JeanMarc Valin
jm at xiph.org
Tue Nov 12 20:03:04 PST 2002
jm 02/11/12 23:03:04
Modified: doc manual.lyx
Log:
Updated doc about CELP
Revision Changes Path
1.35 +87 27 speex/doc/manual.lyx
Index: manual.lyx
===================================================================
RCS file: /usr/local/cvsroot/speex/doc/manual.lyx,v
retrieving revision 1.34
retrieving revision 1.35
diff u r1.34 r1.35
 manual.lyx 11 Nov 2002 06:05:21 0000 1.34
+++ manual.lyx 13 Nov 2002 04:03:04 0000 1.35
@@ 29,7 +29,7 @@
The Speex Codec Manual
\newline
(draft for Speex 1.0beta3)
+(draft for Speex 1.0beta4)
\layout Author
JeanMarc Valin
@@ 255,22 +255,22 @@
\begin_inset Formula \[
y(n)=\sum _{i=1}^{N}a_{i}x(ni)\]
+y[n]=\sum _{i=1}^{N}a_{i}x[ni]\]
\end_inset
where
\begin_inset Formula $y(n)$
+\begin_inset Formula $y[n]$
\end_inset
is the linear prediction of
\begin_inset Formula $x(n)$
+\begin_inset Formula $x[n]$
\end_inset
.
The prediction error is thus given by:
\begin_inset Formula \[
e(n)=x(n)y(n)=x(n)\sum _{i=1}^{N}a_{i}x(ni)\]
+e[n]=x[n]y[n]=x[n]\sum _{i=1}^{N}a_{i}x[ni]\]
\end_inset
@@ 284,7 +284,7 @@
which minimize the quadratic error function:
\begin_inset Formula \[
E=\sum _{n=0}^{L1}\left[e(n)\right]^{2}=\sum _{n=0}^{L1}\left[x(n)\sum _{i=1}^{N}a_{i}x(ni)\right]^{2}\]
+E=\sum _{n=0}^{L1}\left[e[n]\right]^{2}=\sum _{n=0}^{L1}\left[x[n]\sum _{i=1}^{N}a_{i}x[ni]\right]^{2}\]
\end_inset
@@ 294,7 +294,7 @@
equal to zero:
\begin_inset Formula \[
\frac{\partial E}{\partial a_{i}}=\frac{\partial }{\partial a_{i}}\sum _{n=0}^{L1}\left[x(n)\sum _{i=1}^{N}a_{i}x(ni)\right]^{2}=0\]
+\frac{\partial E}{\partial a_{i}}=\frac{\partial }{\partial a_{i}}\sum _{n=0}^{L1}\left[x[n]\sum _{i=1}^{N}a_{i}x[ni]\right]^{2}=0\]
\end_inset
@@ 320,7 +320,7 @@
\end_inset
of the signal
\begin_inset Formula $x(n)$
+\begin_inset Formula $x[n]$
\end_inset
.
@@ 328,7 +328,7 @@
\begin_inset Formula \[
R(m)=\sum _{i=0}^{N1}x(i)x(im)\]
+R(m)=\sum _{i=0}^{N1}x[i]x[im]\]
\end_inset
@@ 411,7 +411,7 @@
The linear prediction model represents each speech sample as linear combination
of past samples, plus an error signal called the excitation (or residual).
\begin_inset Formula \[
x(n)=\sum _{i=1}^{N}a_{i}x(ni)+e(n)\]
+x[n]=\sum _{i=1}^{N}a_{i}x[ni]+e[n]\]
\end_inset
@@ 459,6 +459,20 @@
\end_inset
as the synthesis filter.
+ The whole process is called shortterm prediction as it predicts the signal
+
+\begin_inset Formula $x[n]$
+\end_inset
+
+ using a prediction using only the
+\begin_inset Formula $N$
+\end_inset
+
+ past samples, where
+\begin_inset Formula $N$
+\end_inset
+
+ is usually around 10.
\layout Standard
Because LPC coefficients have very little robustness to quantization, they
@@ 490,7 +504,7 @@
\begin_inset Formula \[
e(n)=\beta e(nT)+c(n)\]
+e[n]=\beta e[nT]+c[n]\]
\end_inset
@@ 552,17 +566,33 @@
\layout Standard
Most (if not all) modern audio codecs attempt to
\emph on
+\begin_inset Quotes eld
+\end_inset
+
shape
\emph default
 the noise so that it is the hardest to detect with the ear.
 That means that more noise can be tolerated in parts of the spectrum that
 are louder and
+\begin_inset Quotes erd
+\end_inset
+
+ the noise so that it appears mostly in the frequency regions where the
+ ear cannot detect it.
+ For example, the ear is more tolerant to noise in parts of the spectrum
+ that are louder and
\emph on
vice versa
\emph default
.
 That's why the error is minimized for the perceptually weighted signal
+ That's why instead of minimizing the simple quadratic error
+\begin_inset Formula \[
+E=\sum _{n}\left(x[n]\overline{x}[n]\right)^{2}\]
+
+\end_inset
+
+where
+\begin_inset Formula $\overline{x}[n]$
+\end_inset
+
+ is the encoder signal, we minimize the error for the perceptually weighted
+ signal
\begin_inset Formula \[
X_{w}(z)=W(z)X(z)\]
@@ 662,7 +692,7 @@
Dynamicallyselectable codebooks (LSP, pitch and innovation)
\layout Itemize
G.728like fixed codebooks (without backwardadaptive grains because of patent
+G.728like fixed codebooks (without backwardadaptive gains because of patent
issues)
\layout Subsection
@@ 674,8 +704,8 @@
\layout Standard
An LPC analysis is first performed on a (Hamming) window that spans all
 the current frame and half a frame in advance.
+An LPC analysis is first performed on a (asymetric Hamming) window that
+ spans all the current frame and half a frame in advance.
The LPC coefficients are then converted to Line Spectral Pair
\begin_inset LatexCommand \index{line spectral pair}
@@ 696,15 +726,18 @@
6 bits and the error is then divided in two 5coefficient subvectors.
Each of them is quantized with 6 bits, for a total of 18 bits.
For the higher quality modes, the remaining error on both subvectors is
 turther quantized with 6 bits each, for a total of 30 bits.
+ further quantized with 6 bits each, for a total of 30 bits.
\layout Standard
The perceptual weighting filter
\begin_inset Formula $W(z)$
\end_inset
 used by Speex is derived from the LPC analysis and corresponds to the one
 described by eq.
+ used by Speex is derived from the LPC filter
+\begin_inset Formula $A(z)$
+\end_inset
+
+ and corresponds to the one described by eq.
\begin_inset LatexCommand \ref{eq:weighting_filter}
@@ 735,12 +768,13 @@
\layout Standard
Speex uses a 3tap prediction for pitch.
 That is,
\layout Standard

+ That is, the pitch prediction signal
+\begin_inset Formula $p[n]$
+\end_inset
+ is obtained by the past of the excitation by:
\begin_inset Formula \[
e(n)=\beta _{0}e(nT1)+\beta _{1}e(nT)+\beta _{2}e(nT+1)+c(n)\]
+p[n]=\beta _{0}e[nT1]+\beta _{1}e[nT]+\beta _{2}e[nT+1]\]
\end_inset
@@ 756,7 +790,33 @@
\end_inset
are the prediction (filter) taps.
+ It is worth noting that when the pitch is smaller than the subframe size,
+ we repeat the excitation at a period
+\begin_inset Formula $T$
+\end_inset
+
+.
+ For example, when
+\begin_inset Formula $nT+1$
+\end_inset
+
+, we use
+\begin_inset Formula $n2T+1$
+\end_inset
+
+ instead.
The period and quantized gains are determined in closed loop.
+ In most modes, the pitch period is encoded with 7 bits in the
+\begin_inset Formula $\left[17,144\right]$
+\end_inset
+
+ range and the
+\begin_inset Formula $\beta _{i}$
+\end_inset
+
+ coefficients are vectorquantized using 7 bits (15 kbps narrowband and
+ above) at higher bitrates and 5 bits at lower bitrates (11 kbps narrowband
+ and below).
\layout Subsection
Innovation Codebook
<p><p> >8 
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'cvsrequest at xiph.org'
containing only the word 'unsubscribe' in the body. No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.
More information about the commits
mailing list