blob: 2eb0726964890e8548d34ec6b08b21d9ddc48b00 [file] [log] [blame]
#LyX 2.2 created this file. For more info see http://www.lyx.org/
\lyxformat 508
\begin_document
\begin_header
\save_transient_properties true
\origin unavailable
\textclass article
\use_default_options true
\maintain_unincluded_children false
\language english
\language_package default
\inputencoding auto
\fontencoding global
\font_roman "default" "default"
\font_sans "default" "default"
\font_typewriter "default" "default"
\font_math "auto" "auto"
\font_default_family default
\use_non_tex_fonts false
\font_sc false
\font_osf false
\font_sf_scale 100 100
\font_tt_scale 100 100
\graphics default
\default_output_format default
\output_sync 0
\bibtex_command default
\index_command default
\paperfontsize default
\spacing single
\use_hyperref false
\papersize default
\use_geometry true
\use_package amsmath 1
\use_package amssymb 1
\use_package cancel 1
\use_package esint 1
\use_package mathdots 1
\use_package mathtools 1
\use_package mhchem 1
\use_package stackrel 1
\use_package stmaryrd 1
\use_package undertilde 1
\cite_engine basic
\cite_engine_type default
\biblio_style plain
\use_bibtopic false
\use_indices false
\paperorientation portrait
\suppress_date false
\justification true
\use_refstyle 1
\index Index
\shortcut idx
\color #008000
\end_index
\leftmargin 2cm
\topmargin 2cm
\rightmargin 2cm
\bottommargin 2cm
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\paragraph_indentation default
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
\end_header
\begin_body
\begin_layout Title
Stereo Quantization Improvements in Opus/CELT
\end_layout
\begin_layout Author
Jean-Marc Valin
\end_layout
\begin_layout Section
Introduction
\end_layout
\begin_layout Standard
Stereo coding in Opus is performed very differently from other audio codecs.
In the CELT coding scheme used for music, the energy of both channels is
coded explicitly to avoid energy
\emph on
leaking
\emph default
from one channel to another.
This makes it possible to use mid-side stereo even when the energy of two
channels differs significantly.
The correlation between the two channels is also explicitly coded, reducing
the risk of
\emph on
stereo unmasking
\emph default
[].
Further reducing that risk is the fact that the use dual (left-right) stereo
is limited to only the cases where the two channels have nearly no correlation.
\end_layout
\begin_layout Standard
A side effect of how CELT works is that by default the number of bits allocated
to a band does not depend on the inter-channel correlation, nor on the
intensity difference.
The encoder will also attempt to maintain the same noise-to-mask ratio,
independenly of the intensity difference, i.e.
it ignores inter-channel masking.
\end_layout
\begin_layout Standard
In this paper, we investigate how to take into account inter-channel masking
to make better encoding decisions.
\end_layout
\begin_layout Section
Inter-channel masking
\end_layout
\begin_layout Standard
Despite decades of research and measurements on psycho-acoustic masking,
there appears to be a complete lack of research into inter-channel masking.
We define inter-channel masking as the effect where the presence of a sound
in one ear changes the masking thresholds for the other ear.
It would appear as common sense that a loud sound in one ear would reduce
one's ability to detect artefacts in the other ear's more quiet signal.
Quantifying that effect is unfortunately not an easy task.
\end_layout
\begin_layout Section
Modifying stereo input vectors
\end_layout
\begin_layout Standard
Let
\begin_inset Formula $\mathbf{x}$
\end_inset
denote the normalized vector for a band of the left channel and
\begin_inset Formula $\mathbf{y}$
\end_inset
denote the corresponding vector for the right channel.
When quantizing stereo, the first step is to quantize the angle derived
from the ratio of the magnitude of the mid to the magnitude of the side
\begin_inset Formula
\[
\theta=\arctan\frac{\left\Vert \mathbf{M}\right\Vert }{\left\Vert \mathbf{S}\right\Vert }\,,
\]
\end_inset
where
\begin_inset Formula $\mathbf{M}=\mathbf{x}+\mathbf{y}$
\end_inset
and
\begin_inset Formula $\mathbf{S}=\mathbf{x}-\mathbf{y}$
\end_inset
.
\end_layout
\begin_layout Standard
It can be shown that the angle is
\begin_inset Formula $\theta$
\end_inset
is related to the angle
\begin_inset Formula $\phi$
\end_inset
between
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
by
\begin_inset Formula $\phi=2\theta$
\end_inset
, where
\begin_inset Formula
\[
\cos\phi=\mathbf{x}^{T}\mathbf{y}\,.
\]
\end_inset
\end_layout
\begin_layout Standard
When
\begin_inset Formula $\theta$
\end_inset
is quantized to
\begin_inset Formula $\hat{\theta}$
\end_inset
, it causes distortion to both channels.
The distortion (sum of squared errors) for each channel is given by the
law of cosines to be
\begin_inset Formula
\[
D=2-2\cos\delta\,,
\]
\end_inset
where
\begin_inset Formula $\delta$
\end_inset
is the angle by which each of the vectors was
\emph on
moved
\emph default
by the quantization.
Since both channels are affected by the same amount,
\begin_inset Formula $\delta=\frac{\hat{\phi}-\phi}{2}=\hat{\theta}-\theta$
\end_inset
.
\end_layout
\begin_layout Standard
However, we may want to change that behaviour when the two channels differ
in loudness.
Let
\begin_inset Formula $w_{x}$
\end_inset
and
\begin_inset Formula $w_{y}$
\end_inset
be the weight we assign to each of the channels.
The total weighted distortion then becomes
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
D=w_{x}\left(2-2\cos\delta_{x}\right)+w_{y}\left(2-2\cos\delta_{y}\right)\,.
\]
\end_inset
\end_layout
\begin_layout Standard
Let
\begin_inset Formula $S=\delta_{x}+\delta_{y}=\hat{\phi}-\phi$
\end_inset
be a known value (from the quantization process).
We can minimize the weighted distortion by substituting
\begin_inset Formula $\delta_{y}=S-\delta_{x}$
\end_inset
and solving:
\begin_inset Formula
\begin{align*}
\frac{\partial D}{\partial\delta_{x}}=2w_{x}\sin\delta_{x}-2w_{y}\sin\left(S-\delta_{x}\right) & =0\\
2w_{x}\sin\delta_{x}-2w_{y}\left(\sin S\cos\delta_{x}-\cos S\sin\delta_{x}\right) & =0\\
w_{x}\sin\delta_{x}+w_{y}\cos S\sin\delta_{x} & =w_{y}\sin S\cos\delta_{y}\\
\sin\delta_{x}\cdot & \left(w_{x}+w_{y}\cos S\right)=w_{y}\sin S\cos\delta_{x}\\
\tan\delta_{x} & =\frac{w_{y}\sin S}{w_{x}+w_{y}\cos S}\,.
\end{align*}
\end_inset
Using a similar derivation, we can find
\begin_inset Formula
\[
\tan\delta_{y}=\frac{w_{x}\sin S}{w_{y}+w_{x}\cos S}\,.
\]
\end_inset
\end_layout
\begin_layout Standard
Given these values, we want to compute
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
that will be quantized instead of
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
.
Since quantizing
\begin_inset Formula $\theta$
\end_inset
keep
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
in the same plane, we also want
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
to lie on the same plane as
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
.
We express them as linear combinations of
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
such that the angle between
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\mathbf{x}$
\end_inset
is
\begin_inset Formula $\delta_{x}$
\end_inset
and the angle between
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
is
\begin_inset Formula $\delta_{y}$
\end_inset
.
To make the calcualtion easier, we are not yet concerned about the norm
of
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
.
Let us consider
\begin_inset Formula $\tilde{\mathbf{x}}=\mathbf{x}+\alpha_{x}\mathbf{y}$
\end_inset
, the angle between
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\mathbf{x}$
\end_inset
is given by
\begin_inset Formula
\[
\delta_{x}=\arctan\frac{\alpha_{x}\sin\phi}{1+\alpha_{x}cos\phi}\,,
\]
\end_inset
where again
\begin_inset Formula $\phi$
\end_inset
is the angle between
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
.
Solving for
\begin_inset Formula $\alpha_{x}$
\end_inset
, we get
\begin_inset Formula
\begin{align*}
\tan\delta_{x}\left(1+\alpha_{x}\cos\phi\right) & =\alpha_{x}\sin\phi\\
\tan\delta_{x} & =\alpha_{x}\sin\phi-\alpha_{x}\cos\phi\tan\delta_{x}\\
\alpha_{x} & =\frac{\tan\delta_{x}}{\sin\phi-\cos\phi\tan\delta_{x}}\,.
\end{align*}
\end_inset
\end_layout
\begin_layout Standard
Since we are not concerned with scaling, we can avoid the division by simply
defining a denormalized
\begin_inset Formula
\[
\tilde{\mathbf{x}}_{d}=g_{xx}\mathbf{x}+g_{xy}\mathbf{y}\,,
\]
\end_inset
with
\begin_inset Formula
\begin{align*}
g_{xx} & =\sin\phi-\cos\phi\tan\delta_{x}\\
g_{xy} & =\tan\delta_{x}\,.
\end{align*}
\end_inset
\end_layout
\begin_layout Standard
Using the law of cosines, the magnitude of
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
is given by
\begin_inset Formula
\begin{align*}
\left\Vert \tilde{\mathbf{x}}_{d}\right\Vert & =\tan^{2}\delta_{x}+\left(\sin\phi-\cos\phi\tan\delta_{x}\right)^{2}+2\cos\phi\tan\delta_{x}\left(\sin\phi-\cos\phi\tan\delta_{x}\right)\\
& =\tan^{2}\delta_{x}+\sin^{2}\phi+\cos^{2}\phi\tan^{2}\delta_{x}-2\sin\phi\cos\phi\tan\delta_{x}+2\cos\phi\tan\delta_{x}\sin\phi-2\cos^{2}\phi\tan^{2}\delta_{x}\\
& =\tan^{2}\delta_{x}+\sin^{2}\phi-\cos^{2}\phi\tan^{2}\delta_{x}\\
& =\left(1-\cos^{2}\phi\right)\tan^{2}\delta_{x}+\sin^{2}\phi\\
& =\sin^{2}\phi\left(1+\tan^{2}\delta_{x}\right)\\
& =\frac{\sin^{2}\phi}{\cos^{2}\delta_{x}}\,.
\end{align*}
\end_inset
Knowing this, we can compute a normalized
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
as
\begin_inset Formula
\[
\tilde{\mathbf{x}}=\frac{\cos\delta_{x}}{\sin\phi}\tilde{\mathbf{x}}_{d}\,.
\]
\end_inset
\end_layout
\begin_layout Standard
We can then compute
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
similarly.
Replacing
\begin_inset Formula $\mathbf{x}$
\end_inset
and
\begin_inset Formula $\mathbf{y}$
\end_inset
with
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
in the quantization process, we can give more weight to one channel or
the other.
When trying multiple values of
\begin_inset Formula $\hat{\theta}$
\end_inset
, we will derive a different value of
\begin_inset Formula $\tilde{\mathbf{x}}$
\end_inset
and
\begin_inset Formula $\tilde{\mathbf{y}}$
\end_inset
and each
\begin_inset Formula $\hat{\theta}$
\end_inset
.
\end_layout
\begin_layout Section
Stereo bit allocation
\end_layout
\begin_layout Standard
By dumping quantization data from the encoder and looking at the normalized
distortion as a function of the angle
\begin_inset Formula $\phi$
\end_inset
and the rate, we have come up with the following approximation that best
fits the data with a simple enough function:
\end_layout
\begin_layout Standard
\begin_inset Formula
\[
D=3\left(4^{-r}\sin\phi+4^{-2r}\left(1-\sin\phi\right)\right)\,,
\]
\end_inset
where
\begin_inset Formula $r$
\end_inset
is the bit depth
\begin_inset Formula
\[
r=\frac{b}{2N-1}\,.
\]
\end_inset
\end_layout
\begin_layout Standard
If instead we want a fixed distortion and find the corresponding bit depth,
we get
\begin_inset Formula
\[
R=\frac{-3\sin\phi+\sqrt{9\sin^{2}\phi+12D\left(1-\sin\phi\right)}}{6\left(1-\sin\phi\right)}\,,
\]
\end_inset
with
\begin_inset Formula $r=-\log_{4}R$
\end_inset
.
\end_layout
\begin_layout Standard
Let
\begin_inset Formula $D=3R_{0}$
\end_inset
the distortion we obtain for
\begin_inset Formula $\phi=\pi/2$
\end_inset
,
\begin_inset Formula
\begin{align*}
R & =\frac{-3\sin\phi+\sqrt{9\sin^{2}\phi+12\cdot3R_{0}\left(1-\sin\phi\right)}}{6\left(1-\sin\phi\right)}\\
& =\sin\phi\cdot\frac{-1+\sqrt{1+\frac{4R_{0}\left(1-\sin\phi\right)}{\sin^{2}\phi}}}{2-2\sin\phi}
\end{align*}
\end_inset
\end_layout
\begin_layout Standard
At high rate, we have:
\end_layout
\begin_layout Standard
\begin_inset Formula
\begin{align*}
R & =\sin\phi\frac{\frac{2R_{0}\left(1-\sin\phi\right)}{\sin^{2}\phi}}{2-2\sin\phi}\\
& =\frac{R_{0}}{\sin\phi}\\
r & =-\log_{4}\frac{R_{0}}{\sin\phi}\\
& =r_{0}+\log_{4}\sin\phi\\
& =r_{0}+\frac{1}{2}\log_{2}\sin\phi
\end{align*}
\end_inset
At low rate we instead have
\begin_inset Formula
\begin{align*}
R & =\frac{\sqrt{4R_{0}\left(1-\sin\phi\right)}}{2-2\sin\phi}\\
& =\sqrt{\frac{R_{0}}{\left(1-\sin\phi\right)}}\\
& =\sqrt{R_{0}}\\
r & =-\log_{4}\sqrt{R_{0}}\\
& =r_{0}/2
\end{align*}
\end_inset
\end_layout
\end_body
\end_document