| #LyX 2.2 created this file. For more info see http://www.lyx.org/ |
| \lyxformat 508 |
| \begin_document |
| \begin_header |
| \save_transient_properties true |
| \origin unavailable |
| \textclass article |
| \use_default_options true |
| \maintain_unincluded_children false |
| \language english |
| \language_package default |
| \inputencoding auto |
| \fontencoding global |
| \font_roman "default" "default" |
| \font_sans "default" "default" |
| \font_typewriter "default" "default" |
| \font_math "auto" "auto" |
| \font_default_family default |
| \use_non_tex_fonts false |
| \font_sc false |
| \font_osf false |
| \font_sf_scale 100 100 |
| \font_tt_scale 100 100 |
| \graphics default |
| \default_output_format default |
| \output_sync 0 |
| \bibtex_command default |
| \index_command default |
| \paperfontsize default |
| \spacing single |
| \use_hyperref false |
| \papersize default |
| \use_geometry true |
| \use_package amsmath 1 |
| \use_package amssymb 1 |
| \use_package cancel 1 |
| \use_package esint 1 |
| \use_package mathdots 1 |
| \use_package mathtools 1 |
| \use_package mhchem 1 |
| \use_package stackrel 1 |
| \use_package stmaryrd 1 |
| \use_package undertilde 1 |
| \cite_engine basic |
| \cite_engine_type default |
| \biblio_style plain |
| \use_bibtopic false |
| \use_indices false |
| \paperorientation portrait |
| \suppress_date false |
| \justification true |
| \use_refstyle 1 |
| \index Index |
| \shortcut idx |
| \color #008000 |
| \end_index |
| \leftmargin 2cm |
| \topmargin 2cm |
| \rightmargin 2cm |
| \bottommargin 2cm |
| \secnumdepth 3 |
| \tocdepth 3 |
| \paragraph_separation indent |
| \paragraph_indentation default |
| \quotes_language english |
| \papercolumns 1 |
| \papersides 1 |
| \paperpagestyle default |
| \tracking_changes false |
| \output_changes false |
| \html_math_output 0 |
| \html_css_as_file 0 |
| \html_be_strict false |
| \end_header |
| |
| \begin_body |
| |
| \begin_layout Title |
| Stereo Quantization Improvements in Opus/CELT |
| \end_layout |
| |
| \begin_layout Author |
| Jean-Marc Valin |
| \end_layout |
| |
| \begin_layout Section |
| Introduction |
| \end_layout |
| |
| \begin_layout Standard |
| Stereo coding in Opus is performed very differently from other audio codecs. |
| In the CELT coding scheme used for music, the energy of both channels is |
| coded explicitly to avoid energy |
| \emph on |
| leaking |
| \emph default |
| from one channel to another. |
| This makes it possible to use mid-side stereo even when the energy of two |
| channels differs significantly. |
| The correlation between the two channels is also explicitly coded, reducing |
| the risk of |
| \emph on |
| stereo unmasking |
| \emph default |
| []. |
| Further reducing that risk is the fact that the use dual (left-right) stereo |
| is limited to only the cases where the two channels have nearly no correlation. |
| |
| \end_layout |
| |
| \begin_layout Standard |
| A side effect of how CELT works is that by default the number of bits allocated |
| to a band does not depend on the inter-channel correlation, nor on the |
| intensity difference. |
| The encoder will also attempt to maintain the same noise-to-mask ratio, |
| independenly of the intensity difference, i.e. |
| it ignores inter-channel masking. |
| |
| \end_layout |
| |
| \begin_layout Standard |
| In this paper, we investigate how to take into account inter-channel masking |
| to make better encoding decisions. |
| \end_layout |
| |
| \begin_layout Section |
| Inter-channel masking |
| \end_layout |
| |
| \begin_layout Standard |
| Despite decades of research and measurements on psycho-acoustic masking, |
| there appears to be a complete lack of research into inter-channel masking. |
| We define inter-channel masking as the effect where the presence of a sound |
| in one ear changes the masking thresholds for the other ear. |
| It would appear as common sense that a loud sound in one ear would reduce |
| one's ability to detect artefacts in the other ear's more quiet signal. |
| Quantifying that effect is unfortunately not an easy task. |
| \end_layout |
| |
| \begin_layout Section |
| Modifying stereo input vectors |
| \end_layout |
| |
| \begin_layout Standard |
| Let |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| denote the normalized vector for a band of the left channel and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| denote the corresponding vector for the right channel. |
| When quantizing stereo, the first step is to quantize the angle derived |
| from the ratio of the magnitude of the mid to the magnitude of the side |
| \begin_inset Formula |
| \[ |
| \theta=\arctan\frac{\left\Vert \mathbf{M}\right\Vert }{\left\Vert \mathbf{S}\right\Vert }\,, |
| \] |
| |
| \end_inset |
| |
| where |
| \begin_inset Formula $\mathbf{M}=\mathbf{x}+\mathbf{y}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{S}=\mathbf{x}-\mathbf{y}$ |
| \end_inset |
| |
| . |
| |
| \end_layout |
| |
| \begin_layout Standard |
| It can be shown that the angle is |
| \begin_inset Formula $\theta$ |
| \end_inset |
| |
| is related to the angle |
| \begin_inset Formula $\phi$ |
| \end_inset |
| |
| between |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| by |
| \begin_inset Formula $\phi=2\theta$ |
| \end_inset |
| |
| , where |
| \begin_inset Formula |
| \[ |
| \cos\phi=\mathbf{x}^{T}\mathbf{y}\,. |
| \] |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| When |
| \begin_inset Formula $\theta$ |
| \end_inset |
| |
| is quantized to |
| \begin_inset Formula $\hat{\theta}$ |
| \end_inset |
| |
| , it causes distortion to both channels. |
| The distortion (sum of squared errors) for each channel is given by the |
| law of cosines to be |
| \begin_inset Formula |
| \[ |
| D=2-2\cos\delta\,, |
| \] |
| |
| \end_inset |
| |
| where |
| \begin_inset Formula $\delta$ |
| \end_inset |
| |
| is the angle by which each of the vectors was |
| \emph on |
| moved |
| \emph default |
| by the quantization. |
| Since both channels are affected by the same amount, |
| \begin_inset Formula $\delta=\frac{\hat{\phi}-\phi}{2}=\hat{\theta}-\theta$ |
| \end_inset |
| |
| . |
| \end_layout |
| |
| \begin_layout Standard |
| However, we may want to change that behaviour when the two channels differ |
| in loudness. |
| Let |
| \begin_inset Formula $w_{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $w_{y}$ |
| \end_inset |
| |
| be the weight we assign to each of the channels. |
| The total weighted distortion then becomes |
| \end_layout |
| |
| \begin_layout Standard |
| \begin_inset Formula |
| \[ |
| D=w_{x}\left(2-2\cos\delta_{x}\right)+w_{y}\left(2-2\cos\delta_{y}\right)\,. |
| \] |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| Let |
| \begin_inset Formula $S=\delta_{x}+\delta_{y}=\hat{\phi}-\phi$ |
| \end_inset |
| |
| be a known value (from the quantization process). |
| We can minimize the weighted distortion by substituting |
| \begin_inset Formula $\delta_{y}=S-\delta_{x}$ |
| \end_inset |
| |
| and solving: |
| \begin_inset Formula |
| \begin{align*} |
| \frac{\partial D}{\partial\delta_{x}}=2w_{x}\sin\delta_{x}-2w_{y}\sin\left(S-\delta_{x}\right) & =0\\ |
| 2w_{x}\sin\delta_{x}-2w_{y}\left(\sin S\cos\delta_{x}-\cos S\sin\delta_{x}\right) & =0\\ |
| w_{x}\sin\delta_{x}+w_{y}\cos S\sin\delta_{x} & =w_{y}\sin S\cos\delta_{y}\\ |
| \sin\delta_{x}\cdot & \left(w_{x}+w_{y}\cos S\right)=w_{y}\sin S\cos\delta_{x}\\ |
| \tan\delta_{x} & =\frac{w_{y}\sin S}{w_{x}+w_{y}\cos S}\,. |
| \end{align*} |
| |
| \end_inset |
| |
| Using a similar derivation, we can find |
| \begin_inset Formula |
| \[ |
| \tan\delta_{y}=\frac{w_{x}\sin S}{w_{y}+w_{x}\cos S}\,. |
| \] |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| Given these values, we want to compute |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| that will be quantized instead of |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| . |
| Since quantizing |
| \begin_inset Formula $\theta$ |
| \end_inset |
| |
| keep |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| in the same plane, we also want |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| to lie on the same plane as |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| . |
| We express them as linear combinations of |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| such that the angle between |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| is |
| \begin_inset Formula $\delta_{x}$ |
| \end_inset |
| |
| and the angle between |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| is |
| \begin_inset Formula $\delta_{y}$ |
| \end_inset |
| |
| . |
| To make the calcualtion easier, we are not yet concerned about the norm |
| of |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| . |
| Let us consider |
| \begin_inset Formula $\tilde{\mathbf{x}}=\mathbf{x}+\alpha_{x}\mathbf{y}$ |
| \end_inset |
| |
| , the angle between |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| is given by |
| \begin_inset Formula |
| \[ |
| \delta_{x}=\arctan\frac{\alpha_{x}\sin\phi}{1+\alpha_{x}cos\phi}\,, |
| \] |
| |
| \end_inset |
| |
| where again |
| \begin_inset Formula $\phi$ |
| \end_inset |
| |
| is the angle between |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| . |
| Solving for |
| \begin_inset Formula $\alpha_{x}$ |
| \end_inset |
| |
| , we get |
| \begin_inset Formula |
| \begin{align*} |
| \tan\delta_{x}\left(1+\alpha_{x}\cos\phi\right) & =\alpha_{x}\sin\phi\\ |
| \tan\delta_{x} & =\alpha_{x}\sin\phi-\alpha_{x}\cos\phi\tan\delta_{x}\\ |
| \alpha_{x} & =\frac{\tan\delta_{x}}{\sin\phi-\cos\phi\tan\delta_{x}}\,. |
| \end{align*} |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| Since we are not concerned with scaling, we can avoid the division by simply |
| defining a denormalized |
| \begin_inset Formula |
| \[ |
| \tilde{\mathbf{x}}_{d}=g_{xx}\mathbf{x}+g_{xy}\mathbf{y}\,, |
| \] |
| |
| \end_inset |
| |
| with |
| \begin_inset Formula |
| \begin{align*} |
| g_{xx} & =\sin\phi-\cos\phi\tan\delta_{x}\\ |
| g_{xy} & =\tan\delta_{x}\,. |
| \end{align*} |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| Using the law of cosines, the magnitude of |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| is given by |
| \begin_inset Formula |
| \begin{align*} |
| \left\Vert \tilde{\mathbf{x}}_{d}\right\Vert & =\tan^{2}\delta_{x}+\left(\sin\phi-\cos\phi\tan\delta_{x}\right)^{2}+2\cos\phi\tan\delta_{x}\left(\sin\phi-\cos\phi\tan\delta_{x}\right)\\ |
| & =\tan^{2}\delta_{x}+\sin^{2}\phi+\cos^{2}\phi\tan^{2}\delta_{x}-2\sin\phi\cos\phi\tan\delta_{x}+2\cos\phi\tan\delta_{x}\sin\phi-2\cos^{2}\phi\tan^{2}\delta_{x}\\ |
| & =\tan^{2}\delta_{x}+\sin^{2}\phi-\cos^{2}\phi\tan^{2}\delta_{x}\\ |
| & =\left(1-\cos^{2}\phi\right)\tan^{2}\delta_{x}+\sin^{2}\phi\\ |
| & =\sin^{2}\phi\left(1+\tan^{2}\delta_{x}\right)\\ |
| & =\frac{\sin^{2}\phi}{\cos^{2}\delta_{x}}\,. |
| \end{align*} |
| |
| \end_inset |
| |
| Knowing this, we can compute a normalized |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| as |
| \begin_inset Formula |
| \[ |
| \tilde{\mathbf{x}}=\frac{\cos\delta_{x}}{\sin\phi}\tilde{\mathbf{x}}_{d}\,. |
| \] |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| We can then compute |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| similarly. |
| Replacing |
| \begin_inset Formula $\mathbf{x}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\mathbf{y}$ |
| \end_inset |
| |
| with |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| in the quantization process, we can give more weight to one channel or |
| the other. |
| When trying multiple values of |
| \begin_inset Formula $\hat{\theta}$ |
| \end_inset |
| |
| , we will derive a different value of |
| \begin_inset Formula $\tilde{\mathbf{x}}$ |
| \end_inset |
| |
| and |
| \begin_inset Formula $\tilde{\mathbf{y}}$ |
| \end_inset |
| |
| and each |
| \begin_inset Formula $\hat{\theta}$ |
| \end_inset |
| |
| . |
| |
| \end_layout |
| |
| \begin_layout Section |
| Stereo bit allocation |
| \end_layout |
| |
| \begin_layout Standard |
| By dumping quantization data from the encoder and looking at the normalized |
| distortion as a function of the angle |
| \begin_inset Formula $\phi$ |
| \end_inset |
| |
| and the rate, we have come up with the following approximation that best |
| fits the data with a simple enough function: |
| \end_layout |
| |
| \begin_layout Standard |
| \begin_inset Formula |
| \[ |
| D=3\left(4^{-r}\sin\phi+4^{-2r}\left(1-\sin\phi\right)\right)\,, |
| \] |
| |
| \end_inset |
| |
| where |
| \begin_inset Formula $r$ |
| \end_inset |
| |
| is the bit depth |
| \begin_inset Formula |
| \[ |
| r=\frac{b}{2N-1}\,. |
| \] |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| If instead we want a fixed distortion and find the corresponding bit depth, |
| we get |
| \begin_inset Formula |
| \[ |
| R=\frac{-3\sin\phi+\sqrt{9\sin^{2}\phi+12D\left(1-\sin\phi\right)}}{6\left(1-\sin\phi\right)}\,, |
| \] |
| |
| \end_inset |
| |
| with |
| \begin_inset Formula $r=-\log_{4}R$ |
| \end_inset |
| |
| . |
| \end_layout |
| |
| \begin_layout Standard |
| Let |
| \begin_inset Formula $D=3R_{0}$ |
| \end_inset |
| |
| the distortion we obtain for |
| \begin_inset Formula $\phi=\pi/2$ |
| \end_inset |
| |
| , |
| \begin_inset Formula |
| \begin{align*} |
| R & =\frac{-3\sin\phi+\sqrt{9\sin^{2}\phi+12\cdot3R_{0}\left(1-\sin\phi\right)}}{6\left(1-\sin\phi\right)}\\ |
| & =\sin\phi\cdot\frac{-1+\sqrt{1+\frac{4R_{0}\left(1-\sin\phi\right)}{\sin^{2}\phi}}}{2-2\sin\phi} |
| \end{align*} |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \begin_layout Standard |
| At high rate, we have: |
| \end_layout |
| |
| \begin_layout Standard |
| \begin_inset Formula |
| \begin{align*} |
| R & =\sin\phi\frac{\frac{2R_{0}\left(1-\sin\phi\right)}{\sin^{2}\phi}}{2-2\sin\phi}\\ |
| & =\frac{R_{0}}{\sin\phi}\\ |
| r & =-\log_{4}\frac{R_{0}}{\sin\phi}\\ |
| & =r_{0}+\log_{4}\sin\phi\\ |
| & =r_{0}+\frac{1}{2}\log_{2}\sin\phi |
| \end{align*} |
| |
| \end_inset |
| |
| At low rate we instead have |
| \begin_inset Formula |
| \begin{align*} |
| R & =\frac{\sqrt{4R_{0}\left(1-\sin\phi\right)}}{2-2\sin\phi}\\ |
| & =\sqrt{\frac{R_{0}}{\left(1-\sin\phi\right)}}\\ |
| & =\sqrt{R_{0}}\\ |
| r & =-\log_{4}\sqrt{R_{0}}\\ |
| & =r_{0}/2 |
| \end{align*} |
| |
| \end_inset |
| |
| |
| \end_layout |
| |
| \end_body |
| \end_document |