Audio Mixer tests

These tests validate the core of Fuchsia's system audio mixing (our Mixer, OutputProducer and Gain objects) at a unit level, using tests in these areas:

DataFormats
Pass-Thru
Gain/Mute
Volume Ramping
Timing
Numerical Analysis
Noise Floor
Frequency Response
Signal-to-Noise-and-Distortion (SINAD)
Dynamic Range

Items 1 & 2 have been grouped into a file of bitwise tests; items 3 & 4 are located in a file of gain tests and include overflow, underflow validation; item 5 is related to interpolation precision and is included in resampling tests. Item 6 is a set of test functions related to how we analyze our results (such as the Fast Fourier Transform) that have been separated into their own analysis source file and tested in their own right. Items 7, 8 & 9 use these test functions to perform audio fidelity testing in the frequency domain, ensuring that our processing does not color the input (frequency response) nor add additional artifacts (signal-to-noise-and-distortion, or SINAD): response tests. Item 10 (range tests) measures the accuracy of our gain control, as well as its impact on our noise floor.

Future areas for mixer evaluation may include:

Out-of-Band Rejection
Impulse Response
Phase Response

Frequency Response/SINAD and Dynamic Range tests (as well as Noise Floor tests that were previously considered transparency tests) have been added as normal unit tests, as they are tightly related to mixer and gain objects respectively. Fuller versions of frequency response, SINAD and dynamic range tests are included in audio mixer “full profile” tests that can be executed from the command-line by adding the --full flag, rather than as a part of the CQ test set.

FrequencySet

The frequency-based tests (noise floor, frequency response, sinad and dynamic range) use a series of individual sinusoid waves, as inputs to the audio subsystem. Sinusoids are universally used in this type of testing, because they are easily and repeatably generated, and they cause predictable responses.

Note that although we use waves of various frequencies and amplitudes, we always send only a single wave at a time. Future tests such as Intermodulation (SMPTE IM) or Difference Frequency Distortion (DFD) may use multiple frequencies to target the effects that signals may have on each other.

The summary versions of these tests use either a single frequency -- 1000 Hz -- or a short list -- 40, 1000 and 12000 Hz. The full versions use a list of 40 frequencies across the audible audio spectrum, using the standard set of 3 frequencies per octave (20, 25, 31, 40, 50, 63, 80, 100, 125, 160, 200, ...) plus a few extra frequencies near 20-24 kHz. Eight additional out-of-band frequencies are included, spanning from 25 kHz to 96 kHz, always taking frequency aliasing into account when above the sample rate. These out-of-band frequencies are only used in Out-of-Band Rejection tests (formerly measured as a part of SINAD tests).

Although sinusoids are continuous, we use them to generate discrete signals: a series of snapshots sampled at specific instants in time. To characterize a waveform most effectively, it is best to sample it at numerous places throughout its complete cycle (as opposed to just a few locations on the waveform). This leads us to prefer test frequencies that are not closely related to the core sample rate frequency. For this reason, keeping our 48 kHz sample rate in mind, we choose 39 Hz instead of 40 Hz, 997 Hz instead of 1000, and so on.

These reference frequencies are stored in the array kReferenceFreqs. Because the summary frequencies will always be found in the reference frequencies, we store the sumary frequencies as an array of the specific kReferenceFreqs indices that are also used in the summary tests.

A bool UseFullFrequencySet specifies whether the full frequency range should be used. This is set in main.cc, during test app startup, and referenced during the frequency tests as well as in the recap section. This flag and the previously-mentioned frequency arrays (and constants for array-length) are found in the static class FrequencySet.

AudioResult

For each of the frequency tests, the results are saved in various members of the static class AudioResult. For multi-frequency tests, these are stored in arrays of length kNumInBandReferenceFreqs (if only audible frequencies are measured) or kNumReferenceFreqs (if out-of-band frequencies are included, e.g. when measuring aliasing). Results are stored in double-precision float format, and are precisely compared to previous results, which are also stored in constexpr arrays within AudioResult. In the absence of code change, the measurements should be exactly the same each time, so the measured results are compared strictly to the previous results, with any regression causing a failure. The expectation is that any code change that causes a regression in these metrics would likely be coming from the media team, and if the code is sufficiently important to cause a regression in our audio quality, then the CL would carry with it an appropriate change to the AudioResult thresholds.

The terminology used in the audio tests is quite specific and intentional. A level is the magnitude (in decibels) of the response, when a test signal is provided. The term noise often refers to the magnitude of all other frequencies (in dB RMS, hence combined via root-sum-square) besides the intended frequency. For some people, noise excludes frequencies that are harmonics (multiples) of the signal frequency, calling these distortion. A sinad measurement, then, is a more accurate term for exactly this: the ratio of signal to noise and distortion.

The limits that are stored in AudioResult are all either minimum values or tolerances. The minimum values include frequency response and sinad; all test code referencing these values should EXPECT_GE. The tolerances (always explicitly called by this term) are always compared in symmetric manner, on both sides of the expected level.

Updating AudioResult thresholds

Frequency response or SINAD failures include the measured value in the log, at the point that the failure is surfaced. If the intention is to update AudioResult in a way that essentially accepts the new result as the expected value, then that value (at eight total digits of precision) can be used. For more significant updates to AudioResult values, the --dump flag is available. This option automatically includes all frequencies (i.e. it implies --full); following the run, all measured values are displayed in a format that is easily copied into audio_result.cc. Note that these values will be displayed with 9 digits of precision, so care must be taken when including them in audio_result.cc. The rule of thumb is to use only eight total digits of precision, and to err on the side of “more loose” when reducing the number of digits. Generally this means that for tolerance thresholds and frequency response, any additional digit should be “ceiling-ed” up (a frequency response measurement of -1.57207701 should be saved as -1.5720771); however, for noise floor, out-of-band rejection and dynamic range, the additional digit would be “floored” away (a SINAD of 19.3948736 would be saved as the slightly-less-strict 19.394873, while a SINAD measurement of -19.3948736 would be saved as -19.394874, also slightly less tight).

Performance Profiling

The audio_mixer_tests test binary also contains the ability to profile the performance of the Mixer, Gain and OutputProducer classes. Use the --profile flag to trigger these micro-benchmark tests, which use zx::clock::get_monotonic() to measure the time required for a target to execute Mix() or ProduceOutput() calls (for Mixer/Gain or OutputProducer objects, respectively) to generate 64k frames. The aggregated results that are displayed for each permutation of parameters represent the time consumed per-call, although to determine a relatively reliable Mean we run these micro-benchmarks many tens or even hundreds of times. As is often the case with performance profiling, one should be mindful not to directly compare results from different machines; generally this profiling functionality should be used to provide a general sense of “before versus after” with regards to a specific change that affects the mixer pipeline.

Issues

Each Jira issue below represents a system behavior encountered during the creation of these tests. Presumably, when/if each product issue is addressed, the related test(s) will need some amount of rework as well; all of these tests have been annotated, including the Jira item. That said, these tests tightly focus on current system behavior; as a rule they demonstrate how the current system behaves as-implemented.

Below, the existing mixer-related bugs are classified in groups related to their stage in the flow of audio through the mixer:

Normalize (Ingest)

Rechannel

Interpolate

MTWN-87
Today, interpolation and media scheduling is performed using audio sample positions that are represented by a 32-bit integer, in fixed-point form: 19 bits of integer, and 13 bits of fraction. This by design puts a limit on the accuracy of our interpolating sample-rate converters. By increasing the number of fractional bits, we can improve our SRC quality.
MTWN-75
When enabling NxN channel passthru in our resamplers, there was significant code duplication. This could be refactored to increase code reuse, leading to higher code resilience and easier future extensibility.
MTWN-45
In addition to the existing resamplers (SampleAndHold, LinearInterpolation), we should create new ones with increased fidelity. This would more fully allow clients to make the quality-vs.-performance tradeoff themselves.

Gain

MTWN-70
The Gain object contains two functions, through which clients can provide two (float) values and receive a (fixed-point) representation of their product. The documented behavior of this object in multi-threaded scenarios should be clarified (might be as easy as changing a “should” to a “must”). Depending on whether single-threaded is a requirement, we will need additional product code and tests.

Accumulate

Denormalize (Output)

Numerous Stages

Interface to AudioRenderers (or other parts of audio_core)

MTWN-88
The AudioRenderer API schedules each incoming audio packet on an integer sample boundary, regardless of whether a fractional sample location would better represent the timestamp specified. This bug represents the investigation (and potential enabling) of scheduling these packets on fractional sample positions.
MTWN-93
The AudioSampleFormat enum includes entries for ALL and NONE, with the intention that these would be used in future scenarios. Until then, however, it might clarify things for client developers if we remove these.