| # Audio Driver Streaming Interface |
| |
| This document describes the audio streaming interface exposed by audio drivers |
| in Zircon. It is meant to serve as a reference for both users and |
| driver-authors, and to unambiguously define the interface contract which drivers |
| must implement and users must follow. |
| |
| ## Overview |
| |
| Audio streams are device nodes published by driver services intended to be used |
| by applications in order to capture or render audio on a Zircon device, or both. |
| Each stream in the system (input or output) represents a stream of digital audio |
| information which may be either received or transmitted by device. Streams are |
| dynamic and may created or destroyed by the system at any time. Which streams |
| exist at any given point in time, and what controls their lifecycles are |
| considered to be issues of audio policy and codec management and are not |
| discussed in this document. Additionally, the information present in audio |
| outputs streams is exclusive to the application owner of the stream. Mixing of |
| audio is not a service provided by the audio stream interface. |
| |
| > TODO(fxbug.dev/35523): |
| > The pre-FIDL serialization still in use as of 2020/02/04 |
| > is defined in [audio_streaming_original.md](audio_streaming_original.md), |
| > that file is a copy for reference of this file at 34d6afd26d0. |
| > Remove this paragraph once conversion of all users to FIDL is done. |
| |
| > TODO: extend this interface to support the concept of low-latency hardware |
| > mixers. |
| |
| ### Definitions |
| |
| | Term | Definition | |
| | ----------------------------- | -------------------------------------------- | |
| | Sample | A representation of the sound rendered by a | |
| : : single speaker, or captured by a single : |
| : : microphone, at a single instant in time. : |
| | LPCM | Linear pulse code modulation. The specific | |
| : : representation of audio samples present in : |
| : : all Zircon uncompressed audio streams. LPCM : |
| : : audio samples are representations of the : |
| : : amplitude of the audio signal at an instant : |
| : : in time where the numeric values of the : |
| : : encoded audio are linearly distributed : |
| : : across the amplitude levels of the rendering : |
| : : or capture device. This is in contrast to : |
| : : A-law and μ-law encodings which have : |
| : : non-linear mappings from numeric value to : |
| : : amplitude level. : |
| | Channel | Within an audio stream, the subset of | |
| : : information which will be rendered by a : |
| : : single speaker, or which was captured by a : |
| : : single microphone in a stream. : |
| | Frame | A set of audio samples for every channel of | |
| : : a audio stream captured/rendered at a single : |
| : : instant in time. : |
| | Frame Rate | a.k.a. "Sample Rate". The rate (in Hz) at | |
| : : which audio frames are produced or consumed. : |
| : : Common sample rates include 44.1 KHz, 48 : |
| : : KHz, 96 KHz, and so on. : |
| | Client or User or Application | These terms are used interchangeably in this | |
| : : document. They refer to modules that use : |
| : : these interfaces to communicate with an : |
| : : audio driver/device. : |
| |
| > TODO: do we need to extend this interface to support non-linear audio sample |
| > encodings? This may be important for telephony oriented microphones which |
| > deliver μ-law encoded samples. |
| |
| ### Basic Operation |
| |
| Communication with an audio stream device is performed using messages sent over |
| a [channel](/docs/reference/kernel_objects/channel.md). Applications open the device node for a |
| stream and obtain a channel by issuing a FIDL request. After obtaining the |
| channel, the device node may be closed. All subsequent communication with the |
| stream occurs using channels. |
| |
| The stream channel is used for most command and control tasks, including: |
| |
| * Capability interrogation |
| * Format negotiation |
| * Hardware gain control |
| * Determining outboard latency |
| * Plug detection notification |
| * Access control capability detection and signalling |
| |
| > TODO: Should plug/unplug detection be done by sending notifications over the |
| > stream channel (as it is today), or by publishing/unpublishing the device |
| > nodes (and closing all channels in the case of unpublished channels)? |
| |
| In order to actually send or receive audio information on the stream, the |
| specific format to be used must first be set. The response to a successful |
| `CreateRingBuffer` operation will contain a new "ring-buffer" channel. The ring-buffer |
| channel may be used to request a shared buffer from the stream (delivered in the |
| form of a [VMO](/docs/reference/kernel_objects/vm_object.md)) which may be mapped into the address |
| space of the application and used to send or receive audio data as appropriate. |
| Generally, the operations conducted over the ring buffer channel include: |
| |
| * Requesting a shared buffer |
| * Starting and Stopping stream playback and capture |
| * Receiving notifications of playback and capture progress |
| * Receiving clock recovery information in the case that the audio output clock |
| is based on a different oscillator than the oscillator which backs |
| the [monotonic clock](/docs/reference/syscalls/clock_get_monotonic.md) |
| |
| ## Operational Details |
| |
| ### Device nodes |
| |
| Audio stream device nodes must be published by drivers using the protocol |
| preprocessor symbol given in the table below. This will cause stream device |
| nodes to be published in the locations given in the table. Applications can |
| monitor these directories in order to discover new streams as they are published |
| by the drivers. |
| |
| Stream Type | Protocol | Location |
| ----------- | -------------------------- | ----------------------- |
| Input | `ZX_PROTOCOL_AUDIO_INPUT` | /dev/class/audio-input-2 |
| Output | `ZX_PROTOCOL_AUDIO_OUTPUT` | /dev/class/audio-output-2 |
| |
| ### Establishing the stream channel |
| |
| After opening the device node, client applications may obtain a stream channel |
| for subsequent communication using the |
| `fuchsia.hardware.audio.Device/GetChannel` FIDL message. |
| |
| ### Client side termination of the stream channel |
| |
| Clients may terminate the connection to the stream at any time simply by |
| calling [zx_handle_close(...)](/docs/reference/syscalls/handle_close.md) on the stream |
| channel. Drivers must close any active ring-buffer channels established |
| using this stream channel and must make every attempt to gracefully quiesce |
| any on-going streaming operations in the process. |
| |
| ### Sending and receiving messages on the stream and ring-buffer channels |
| |
| All of the messages and message payloads which may be sent or received over |
| stream and ring buffer channels are defined in |
| [stream.fidl](/sdk/fidl/fuchsia.hardware.audio/stream.fidl) |
| and [ring_buffer.fidl](/sdk/fidl/fuchsia.hardware.audio/ring_buffer.fidl). |
| Messages may be sent to the driver using the |
| [zx_channel_write(...)](/docs/reference/syscalls/channel_write.md) syscall. If a response is |
| expected, it may be read using the |
| [zx_channel_read(...)](/docs/reference/syscalls/channel_read.md) syscall. Best practice, |
| however, is to queue packets for your [channel(s)](/docs/reference/kernel_objects/channel.md) |
| [port](/docs/reference/kernel_objects/port.md) using the |
| [zx_port_queue(...)](/docs/reference/syscalls/port_queue.md) syscall, and use the |
| [zx_port_wait(...)](/docs/reference/syscalls/port_wait.md) syscall to determine when your set |
| of channels have messages (either expected responses or asynchronous |
| notifications) to be read. |
| There are bindings for different languages to facilitate sending and receiving |
| FIDL messages, and in particular for C++ drivers there is also a library |
| [SimpleAudioStream](/src/media/audio/lib/simple-audio-stream) that facilitates the creation |
| of drivers in C++, this library uses the |
| [LLCPP](/docs/reference/fidl/bindings/llcpp-bindings.md) bindings to send and receive |
| FIDL messages. |
| |
| ## Format Negotiation |
| |
| ### Sample Formats |
| |
| The `Format` related protocol messages allow the driver to list its supported |
| formats to the client. The supported formats may include multiple rates, bit per sample, |
| etc. Each driver advertises what it can support and the client mandates what format |
| is to be used for each driver. |
| |
| To find out what formats are supported by a given driver, the client uses the |
| `GetSupportedFormats` function. The driver replies with a vector of |
| `SupportedFormats`, where each `SupportedFormats` includes a `PcmSupportedFormats` with: |
| |
| * A vector of number of channels. This lists the number of channels supported |
| by the driver, for example `<2,4,6,8>`. A driver that supports either two or |
| four channels would report a vector with two elements `<2,4>`. Must be in ascending order. |
| * A vector of sample formats, e.g. `PCM_SIGNED`. |
| * A vector of rates. Frame rates, for example 44100, 48000, and 96000. Must be in ascending order. |
| * A number of bits per channel/slot/container. Number of bits in each channel allocated |
| to hold a sample, e.g. 32 bits per channel. Must be in ascending order. |
| * A vector of bits per sample. Sample widths, this could be smaller than the channel |
| e.g. 24 bits per sample in a 32 bits channel. Must be in ascending order. |
| |
| When not all combinations supported by the driver can be described with one |
| `PcmSupportedFormats`, the driver returns more than one `PcmSupportedFormats` in |
| the returned vector. For example, if one `PcmSupportedFormats` allows for 16 or 32 bits samples at |
| 48KHz, and 16 bits samples at 96KHz, but not 32 bits samples at 96KHz, then the driver |
| replies with 2 `PcmSupportedFormats`: `<<16bits,32bits>,<48KHz>>` and |
| `<<16bits>,<96KHz>>`. For simplicity, this example ignores parameters other than |
| rate and bits per sample. In the case where the driver supports either 16 or 32 |
| bits samples at either 48 or 96KHz, the driver would reply with 1 |
| `PcmSupportedFormats`: `<<16bits,32bits>,<48KHz,96KHz>>`. |
| |
| Additionally, it is assumed that bits per sample is always smaller or equal to |
| bits per channel. Hence, a driver can report |
| `<<16bits_per_channel,32bits_per_channel>,<16bits_per_sample,32bits_per_sample>>` |
| and this does not imply that it is reporting that 32 bits per sample on 16 bits |
| samples is valid, it specifies only the 3 valid combinations: |
| |
| * 16 bits channels with 16 bits samples |
| * 32 bits channels with 32 bits samples |
| * 32 bits channels with 16 bits samples |
| |
| The client specifies the format to use with the `CreateRingBuffer` function based on |
| information that the driver provides in `GetSupportedFormats` reply, what is supported |
| by the client, and any other requirements. This function takes a parameter that specifies: |
| |
| * A number of channels. This is the number of channels available in the buffer. |
| * A bitmask of channels to use. These are the channels in the buffer to be used by |
| the driver. For instance for stereo this must be a bitmask with 2 bits enabled `0x3`, |
| i.e. both channels 0 and 1 are used. |
| * A sample format. |
| * A frame rate. |
| * A number of bits per channel. |
| * A number of bits per sample. |
| |
| Notes: |
| |
| * By default, multi-byte sample formats are assumed to use host-endianness. |
| * The `PCM_FLOAT` encoding uses specifically the |
| [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point |
| representation. |
| * By default, non-floating point PCM encodings are assumed expressed using |
| [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) signed |
| integers. eg. the bit values for a 16 bit PCM sample format would range from |
| [0x8000, 0x7FFF] with 0x0000 representing zero speaker deflection. If the |
| `PCM_UNSIGNED` sample format is used, the bit values would range from [0x0000, |
| 0xFFFF] with 0x8000 representing zero deflection. |
| * When encoding a smaller sample size in a larger channel (e.g. 20 or 24bit in |
| 32), the most significant bits of the 32 bit container are used while the |
| least significant bits will be ignored (left justified). e.g. a 20 bit sample would be mapped |
| onto the range [12,31] (bits [0,11] would be ignored) of the 32 bit container. |
| |
| ### Setting the desired stream format |
| |
| In order to select a stream format, applications send a `CreateRingBuffer` message over the |
| stream channel. In the message, the application specifies the format to be used. |
| |
| The client specifies the new ring buffer channel over which |
| streaming operations will be conducted. If a previous ring buffer channel had been |
| established and was still active, the driver must close this channel and |
| make every attempt to gracefully quiesce any on-going streaming operations in |
| the process. |
| |
| > TODO: specify how compressed bitstream formats will be set |
| |
| ## Determining external latency |
| |
| The external latency of an audio stream is defined as the amount of time it |
| takes outbound audio to travel from the system's interconnect to the speakers |
| themselves, or inbound audio to travel from the microphone to the system's |
| interconnect. As an example, consider an external codec connected to the system |
| using a TDM interconnect: if this interconnect introduces a 4 frame delay |
| between the reception of a TDM frame and the rendering of that frame at the |
| speakers themselves, then the external delay of this audio path is the time |
| duration equivalent to 4 audio frames. |
| |
| External delay is reported in the `external_delay` field of a `RingBufferProperties` |
| response to a `GetProperties`. Drivers should make their best attempt to |
| accurately report the total of all of the sources of delay the driver knows about. |
| Information about this delay can frequently be found in codec data sheets, |
| dynamically reported as properties of codecs using protocols such as Intel HDA |
| or the USB Audio specifications, or reported by down stream devices using |
| mechanisms such as EDID when using HDMI or DisplayPort interconnects. |
| |
| ## Hardware Gain Control |
| |
| ### Hardware gain control capability reporting |
| |
| In order to determine a stream's gain control capabilities, if it has not done |
| so yet, an application sends a `GetProperties` message over the stream channel. |
| No parameters need to be supplied with this message. The driver replies with a |
| `StreamProperties` including gain capabilities among others. All stream drivers |
| must respond to this message, regardless of whether or not the stream |
| hardware is capable of any gain control. All gain values are expressed using 32 |
| bit floating point numbers expressed in dB. |
| |
| Drivers respond to this message with values which indicate the current stream's |
| gain control capabilities. Current gain settings are expressed using a bool |
| indicating whether the stream can be muted, a bool that indicates whether the |
| stream can AGC, the minimum and maximum gain settings, and a `gain_step_db`. The |
| `gain_step_db` indicates the smallest increment with which the gain can be |
| controlled counting from the minimum gain value. |
| |
| For example, an amplifier which has 5 gain steps of 7.5 dB each and a maximum 0 |
| dB gain would indicate a range of (-30.0, 0.0) and a step size of 7.5. |
| Amplifiers capable of functionally continuous gain control may encode their |
| gain step size as 0.0. |
| |
| Regardless of mute capabilities, drivers for fixed gain streams must report |
| their min and max gain as (0.0, 0.0). `gain_step_db` is meaningless in this |
| situation, but drivers should report it as 0.0. |
| |
| ### Setting hardware gain control levels |
| |
| In order to change a stream's current gain settings, applications send a |
| `SetGain` message over the stream channel. This message include a parameter |
| `GainState` indicating gain parameters to be configured including the dB gain which |
| should be applied to the stream, muted and AGC enablement. |
| |
| Presuming that the request is valid, drivers should round the request to the |
| nearest supported gain step size. For example, if a stream can control its gain |
| on the range from -60.0 to 0.0 dB, using a gain step size of 0.5 dB, then a |
| request to set the gain to -33.3 dB should result in a gain of -33.5 being |
| applied. A request to that same stream for a gain of -33.2 dB should result |
| in a gain of -33.0 being applied. |
| |
| ### Gain state notifications |
| |
| Clients may request that streams send them asynchronous notifications of |
| gain state changes by using the `WatchGainState` command. The driver will reply to the |
| first |WatchGainState| sent by the client and will not respond to subsequent |
| client |WatchGainState| calls until the gain state changes from what was most recently |
| reported. |
| |
| ## Plug detection |
| |
| In addition to streams being published/unpublished in response to being |
| connected or disconnected to/from their bus, streams may have the ability to be |
| plugged or unplugged at any given point in time. For example, a set of USB |
| headphones may publish a new output stream when connected to USB, but choose to |
| be "hardwired" from a plug detection standpoint. A different USB audio adapter |
| with a standard 3.5mm phono jack might publish an output stream when connected |
| via USB, but choose to change its plugged/unplugged state as the user plugs and |
| unplugs an analog device via the 3.5mm jack. |
| |
| The ability to query the currently plugged or unplugged state of a stream, and |
| to register for asynchonous notifications of plug state changes (if supported) |
| is handled via plug detection messages. |
| |
| ### Plug detect capabilities |
| |
| In order to determine a stream's plug detection capabilities, if it has not done |
| so yet, an application sends a `GetProperties` command over the stream channel. |
| The driver replies with a `StreamProperties` including plug detect capabilities |
| in `plug_detect_capabilities` among others fields. |
| |
| Valid plug-detect capabilities flags currently defined are: |
| |
| * `HARDWIRED` is set when the stream hardware is considered to be |
| "hardwired". In other words, the stream is considered to be connected as |
| long as the device is published. Examples include a set of built-in |
| speakers, a pair of USB headphones, or a pluggable audio device with no plug |
| detection functionality. |
| * `CAN_ASYNC_NOTIFY` is set when the stream hardware is capable of both |
| asynchronously detecting that a device's plug state has changed, and sending |
| a notification message if the client has requested these notifications. |
| |
| ### Plug state notifications |
| |
| Clients may request that streams send them asynchronous notifications of |
| plug state changes by using the `WatchPlugState` command if the `CAN_ASYNC_NOTIFY` |
| flag was sent by the driver in `StreamProperties`. I.e. drivers for streams which |
| do not set the `CAN_ASYNC_NOTIFY` flag are free to ignore the `WatchPlugState` sent |
| by applications. Driver with `CAN_ASYNC_NOTIFY` set will reply to the first |
| |WatchPlugState| sent by the client and will not respond to subsequent client |
| |WatchPlugState| calls until the plug state changes from what was most recently reported. |
| |
| ## Access control capability detection and signaling |
| |
| > TODO: specify how this works. In particular, specify how drivers indicate to |
| > applications support for various digital access control mechanisms such as |
| > S/PDIF control words and HDCP. |
| |
| ## Stream purpose and association |
| |
| > TODO: specify how drivers can indicate the general "purpose" of an audio |
| > stream in the system (if known), as well as its relationship to other streams |
| > (if known). For example, an embedded target like a phone or a tablet needs to |
| > indicate which output stream is the built-in speaker vs. which is the headset |
| > jack output. In addition, it needs to make clear which input stream is the |
| > microphone associated with the headset output vs. the builtin speaker. |
| |
| ## Ring-Buffer Channels |
| |
| ### Overview |
| |
| Once an application has successfully set the format of a stream, it receives in |
| the response a new [channel](/docs/reference/kernel_objects/channel.md) representing its connection |
| to the stream's ring-buffer. Clients use the ring-buffer channel to establish a |
| shared memory buffer and start and stop playback and capture of audio stream data. |
| |
| The ring buffer contents are produced by the client side (for playback) and the |
| driver side (for recording). Hence, a client is a producer for playback and a consumer |
| for recording and a driver is a producer for recording and a consumer for playback. |
| The ring buffer contents may be directly consumed or produced by the audio hardware, or |
| it may go through software processing of each sample done by the driver. |
| |
| Ring buffer data production proceeds at the nominal rate from the point in time |
| given in a successful response to the `Start` command. Note though that the ring-buffer |
| will almost certainly have some form of FIFO buffer |
| between the memory bus and the audio hardware which causes it to either |
| read-ahead in the stream (in the case of playback), or potentially hold onto |
| data (in the case of capturing). It is important for clients to query the size |
| of this buffer before beginning |
| operation so they know how far ahead/behind the stream's nominal inferred |
| read/write position they need to stay in order to prevent audio glitching. |
| Also note that because of the shared buffer nature of the system, and the fact |
| that drivers are likely to be DMA-ing directly from this buffer to hardware, it |
| is important for clients running on architectures which are not automatically |
| cache coherent to be sure that they have properly written-back their cache after |
| writing playback data to the buffer, or invalidated their cache before reading |
| captured data. |
| |
| ### Determining the FIFO depth |
| |
| In order to determine a stream's `fifo_depth`, if it has not done so yet, an application |
| sends a `GetProperties` command over the stream channel. The driver replies with a |
| `StreamProperties` including `fifo_depth` among others fields. To ensure proper |
| playback or capture of audio, applications and drivers must be careful to |
| respect this value. Drivers must not read beyond the nominal playback position |
| of the stream plus this number of bytes when playing audio stream data. |
| Applications must stay this number of bytes behind the nominal capture point of |
| the stream when capturing audio stream data. |
| |
| The ring buffer data may be directly consumed/generated by hardware, in this case |
| `fifo_depth` maps directly to the size of a hardware FIFO block, since the hardware FIFO |
| block determines the amount of data read ahead or held back. |
| |
| The ring buffer data may instead be consumed/generated by audio driver software that is |
| conceptually situated between the ring buffer and the audio hardware. In this case, for |
| playback the `fifo_depth` read ahead amount is set large enough such that the driver |
| guarantees no undetected underruns, this assuming the client is generating the data as |
| determined by the `CreateRingBuffer` and `Start` commands. For capture, the |
| `fifo_depth` held back amount is set large enough such that the driver guarantees no |
| undetected underruns when generating the data as determined by the `CreateRingBuffer` and |
| `Start` commands. The driver must set `fifo_depth` big enough such that the potential |
| delays added by any software interfacing with the audio hardware do not occur under most |
| scenarios, and must detect and report underruns. How an underrun is reported is not defined |
| here. |
| |
| Once the format of a stream is set and a ring-buffer channel has been opened, |
| the driver must not change this value. From an application's point of view, |
| it is a constant property of the ring-buffer channel. |
| |
| ### Obtaining a shared buffer |
| |
| To send or receive audio, the application must first establish a shared memory |
| buffer. This is done by sending an `CreateRingBuffer` request over the |
| ring-buffer channel. This may only be done while the ring-buffer is stopped. |
| |
| If the channel created via `CreateRingBuffer` is closed by the driver for instance |
| because a buffer has already been established and the ring-buffer has already |
| been started, it must not either stop the ring-buffer, or discard the |
| existing shared memory. If the application requests a new buffer after having |
| already established a buffer while the ring buffer is stopped, it must |
| consider the existing buffer ii has to be invalid, the old buffer is now gone. |
| |
| Applications must specify two parameters when requesting a ring buffer: |
| `min_frames` and `clock_recovery_notifications_per_ring`. |
| |
| #### `min_frames` |
| |
| The minimum number of frames of audio the client needs allocated for the ring |
| buffer. Drivers may make this buffer larger to meet hardware requirements. |
| Clients must use the returned VMOs size (in bytes) to determine the actual |
| size of the ring buffer. Clients must not assume that the size of the buffer |
| (as determined by the driver) is exactly the size they requested. Drivers |
| must ensure that the size of the ring buffer is an integral number of audio |
| frames. |
| |
| > TODO : Is it reasonable to require that drivers produce buffers which are an |
| > integral number of audio frames in length? It certainly makes the audio |
| > client's life easier (client code never needs to split or re-assemble a frame |
| > before processing), but it might make it difficult for some audio hardware to |
| > meet its requirements without making the buffer significantly larger than the |
| > client asked for. |
| |
| #### `clock_recovery_notifications_per_ring` |
| |
| Optional number of position update notifications the client would like the driver to |
| send per cycle through the ring buffer, these notifications are meant to be used for clock |
| recovery. Drivers must only send these as a reply to a `WatchClockRecoveryPositionInfo` request. |
| Drivers should attempt to space notifications uniformly throughout the ring; however clients |
| must not rely on perfectly uniform spacing of the update notifications. |
| |
| #### `ring_buffer` |
| |
| If the request succeeds, the driver must return a handle to a |
| [VMO](/docs/reference/kernel_objects/vm_object.md) with permissions which allow applications to map |
| the VMO into their address space using [zx_vmar_map](/docs/reference/syscalls/vmar_map.md), |
| and to read/write data in the buffer in the case of playback, or simply to read |
| the data in the buffer in the case of capture. |
| |
| #### `num_frames` |
| |
| If the request succeeds, the driver will also return the actual number of frames of audio |
| it will use in the buffer. The size of the VMO returned (as reported |
| by [zx_vmo_get_size()](/docs/reference/syscalls/vmo_get_size.md)) must not be larger than |
| this number of frames (when converted to bytes). This number may be larger |
| than the `min_frames` request from the client but must not be smaller than this number. |
| |
| ### Starting and Stopping the ring-buffer |
| |
| Clients may request that a ring-buffer start or stop using the `Start` and `Stop` |
| commands. Attempting to start a stream |
| which is already started must be considered a failure. Attempting to stop a |
| stream which is already stopped should be considered a success. Ring-buffers |
| cannot be either stopped or started until after a shared buffer has been |
| established using the `CreateRingBuffer` operation. |
| |
| Upon successfully starting a stream, drivers must provide their best estimate of |
| the time at which their hardware began to transmit or capture the stream in the |
| `start_time` field of the response. This time stamp must be taken from the clock |
| exposed via the |
| [zx_clock_get_monotonic()](/docs/reference/syscalls/clock_get_monotonic.md) |
| syscall. Along with the FIFO depth property of the ring buffer, this timestamp |
| allows applications to send or receive stream data without the need for periodic |
| position updates from the driver. Along with the outboard latency estimate |
| provided by the stream channel, this timestamp allows applications to |
| synchronize presentation of audio information across multiple streams, or even |
| multiple devices (provided that an external time synchronization protocol is |
| used to synchronize the |
| [monotonic](/docs/reference/syscalls/clock_get_monotonic.md) timelines across |
| the cohort of synchronized devices). |
| |
| > TODO: Redefine `start_time` to allow it to be an arbitrary 'audio stream |
| > clock' instead of the `zx_clock_get_monotonic()` clock. If the stream clock is |
| > made to count in audio frames since start, then this `start_time` can be |
| > replaced with the terms for a segment of a piecewise linear transformation |
| > which can be subsequently updated via notifications sent by the driver in the |
| > case that the audio hardware clock is rooted in a different oscillator from |
| > the system's tick counter. Clients can then use this transformation either to |
| > control the rate of consumption of input streams, or to determine where to |
| > sample in the input stream to effect clock correction. |
| |
| Upon successfully starting a stream, drivers must guarantee that no position |
| notifications will be sent before the start response has been enqueued into the |
| ring-buffer channel. |
| |
| Upon successfully stopping a stream, drivers must guarantee that no position |
| notifications will be enqueued into the ring-buffer channel after the stop |
| response has been enqueued. |
| |
| ### Position notifications |
| |
| If requested by the client via a non-zero `clock_recovery_notifications_per_ring` in the |
| `CreateRingBuffer` operation, the driver will |
| periodically send updates to the client informing it of its current production |
| or consumption position in the buffer. This position is expressed in bytes in |
| the `position` field of the `RingBufferPositionInfo` struct sent on |
| a reply to the `WatchClockRecoveryPositionInfo` message. The |
| message also includes a `timestamp` field that contains the time (as |
| zx::time) that this byte position was valid. `WatchClockRecoveryPositionInfo` replies |
| must only be sent while the ring-buffer is started. Note, these position |
| notifications indicate where in the buffer the driver has consumed or produced |
| data, *not* the nominal playback or capture position (sometimes called the |
| "write cursor" or "read cursor" respectively). The timing of their arrival is |
| not guaranteed to be perfectly uniform and should not be used to effect clock |
| recovery. However, the correspondence pair (`timestamp`, `position`) |
| values themselves ARE intended to be used to recover the clock for the audio |
| stream. If a client discovers that a driver has consumed past the point in the |
| ring buffer where that client has written playback data, audio presentation is |
| undefined. Clients should increase their clock lead time and be certain to stay |
| ahead of this point in the stream in the future. Likewise, clients which capture |
| audio should not attempt to read beyond the point in the ring buffer |
| indicated by the most recent position notification sent by the driver. |
| |
| Driver playback and capture position must *always* begin at ring buffer byte 0, |
| immediately following a successful `Start` command. When the ring |
| buffer position reaches the end of the VMO (as indicated by |
| [zx_vmo_get_size(...)](/docs/reference/syscalls/vmo_get_size.md)), the ring buffer position |
| wraps back to zero. Drivers are not required to consume or produce data in |
| integral numbers of audio frames. Clients whose notion of stream position |
| depends on position notifications should take care to request that a sufficient |
| number of notifications per ring be sent (minimum 2) and to process them quickly |
| enough that aliasing does not occur. |
| |
| ### Clock recovery and synchronization |
| |
| Upon receiving `AUDIO_STREAM_CMD_GET_CLOCK_DOMAIN` message, the driver must |
| respond with the identifier of the clock domain containing that device. If the |
| audio device is locked to the local system monotonic clock and does not expose a |
| mechanism by which its rate would be fine-tuned, then it should return the value |
| 0 to represent the local CLOCK_MONOTONIC domain. A client may use this |
| information (in addition to `AUDIO_RB_POSITION_NOTIFY` messages) to simplify the |
| process of recovering the audio device's clock. |
| |
| > TODO: extend this section to include how clock recovery occurs, and how this |
| > is exposed to clients. Also, detail how slewable oscillators are discovered |
| > and controlled. We may need rate-change notifications to clients of slewable |
| > clocks. |
| > |
| > Previous content: TODO: define a way that clock recovery information can be |
| > sent to clients in the case that the audio output oscillator is not derived |
| > from the `zx_clock_get_monotonic()` oscillator. In addition, if the oscillator |
| > is slew-able in hardware, provide the ability to discover this capability and |
| > control the slew rate. Given the fact that this oscillator is likely to be |
| > shared by multiple streams, it might be best to return some form of system |
| > wide clock identifier and provide the ability to obtain a channel on which |
| > clock recovery notifications can be delivered to clients and hardware slewing |
| > command can be sent from clients to the clock. |
| |
| ### Error notifications |
| |
| > TODO: define these and what driver behavior should be, if/when they occur. |
| |
| ### Unexpected client termination |
| |
| If the client side of a ring buffer control channel is closed for any reason, |
| drivers must immediately close the control channel and shut down the ring |
| buffer, such that no further audio is emitted nor captured. While drivers are |
| encouraged to do so in a way which produces a graceful transition to silence, |
| they must ensure that the audio stream goes silent instead of looping. Once |
| the transition to silence is complete, resources associated with playback or |
| capture may be released and reused by the driver. |
| |
| This way, if a playback client teminates unexpectedly, the system will close the |
| client channels, causing audio playback to stop instead of continuing to loop. |