| # Audio Driver Streaming Interface |
| |
| This document describes the audio streaming interface exposed by audio drivers |
| in Zircon. It is meant to serve as a reference for both users and |
| driver-authors, and to unambiguously define the interface contract which drivers |
| must implement and users must follow. |
| |
| ## Overview |
| |
| Audio streams are device nodes published by driver services intended to be used |
| by applications in order to capture or render audio on a Zircon device, or both. |
| Each stream in the system (input or output) represents a stream of digital audio |
| information which may be either received or transmitted by device. Streams are |
| dynamic and may created or destroyed by the system at any time. Which streams |
| exist at any given point in time, and what controls their lifecycles are |
| considered to be issues of audio policy and codec management and are not |
| discussed in this document. Additionally, the information present in audio |
| outputs streams is exclusive to the application owner of the stream. Mixing of |
| audio is not a service provided by the audio stream interface. |
| |
| > TODO: extend this interface to support the concept of low-latency hardware |
| > mixers. |
| |
| ### Basic Vocabulary |
| |
| | Term | Definition | |
| | ----------------------------- | -------------------------------------------- | |
| | Sample | A representation of the sound rendered by a | |
| : : single speaker, or captured by a single : |
| : : microphone, at a single instant in time. : |
| | LPCM | Linear pulse code modulation. The specific | |
| : : representation of audio samples present in : |
| : : all Zircon uncompressed audio streams. LPCM : |
| : : audio samples are representations of the : |
| : : amplitude of the audio signal at an instant : |
| : : in time where the numeric values of the : |
| : : encoded audio are linearly distributed : |
| : : across the amplitude levels of the rendering : |
| : : or capture device. This is in contrast to : |
| : : A-law and μ-law encodings which have : |
| : : non-linear mappings from numeric value to : |
| : : amplitude level. : |
| | Channel | Within an audio stream, the subset of | |
| : : information which will be rendered by a : |
| : : single speaker, or which was captured by a : |
| : : single microphone in a stream. : |
| | Frame | A set of audio samples for every channel of | |
| : : a audio stream captured/rendered at a single : |
| : : instant in time. : |
| | Frame Rate | a.k.a. "Sample Rate". The rate (in Hz) at | |
| : : which audio frames are produced or consumed. : |
| : : Common sample rates include 44.1 KHz, 48 : |
| : : KHz, 96 KHz, and so on. : |
| | Client or User or Application | These terms are used interchangeably in this | |
| : : document. They refer to modules that use : |
| : : these interfaces to communicate with an : |
| : : audio driver/device. : |
| |
| > TODO: do we need to extend this interface to support non-linear audio sample |
| > encodings? This may be important for telephony oriented microphones which |
| > deliver μ-law encoded samples. |
| |
| ### Basic Operation |
| |
| Communication with an audio stream device is performed using messages sent over |
| a [channel](/docs/reference/kernel_objects/channel.md). Applications open the device node for a |
| stream and obtain a channel by issuing a FIDL request. After obtaining the |
| channel, the device node may be closed. All subsequent communication with the |
| stream occurs using channels. |
| |
| The stream channel is used for most command and control tasks, including: |
| |
| * Capability interrogation |
| * Format negotiation |
| * Hardware gain control |
| * Determining outboard latency |
| * Plug detection notification |
| * Access control capability detection and signalling |
| |
| > TODO: Should plug/unplug detection be done by sending notifications over the |
| > stream channel (as it is today), or by publishing/unpublishing the device |
| > nodes (and closing all channels in the case of unpublished channels)? |
| |
| In order to actually send or receive audio information on the stream, the |
| specific format to be used must first be set. The response to a successful |
| `SetFormat` operation will contain a new "ring-buffer" channel. The ring-buffer |
| channel may be used to request a shared buffer from the stream (delivered in the |
| form of a [VMO](/docs/reference/kernel_objects/vm_object.md)) which may be mapped into the address |
| space of the application and used to send or receive audio data as appropriate. |
| Generally, the operations conducted over the ring buffer channel include: |
| |
| * Requesting a shared buffer |
| * Starting and Stopping stream playback/capture |
| * Receiving notifications of playback/capture progress |
| * Receiving notifications of error conditions such as HW FIFO under/overflow, |
| bus transaction failure, etc. |
| * Receiving clock recovery information in the case that the audio output clock |
| is based on a different oscillator than the oscillator which backs |
| the [monotonic clock](/docs/reference/syscalls/clock_get_monotonic.md) |
| |
| ## Operational Details |
| |
| ### Protocol definition |
| |
| In order to use the C API definitions of the |
| [audio](/zircon/system/public/zircon/device/audio.h) protocol, applications and |
| drivers simply say |
| |
| ```C |
| #include <device/audio.h> |
| ``` |
| |
| ### Device nodes |
| |
| Audio stream device nodes **must** be published by drivers using the protocol |
| preprocessor symbol given in the table below. This will cause stream device |
| nodes to be published in the locations given in the table. Applications can |
| monitor these directories in order to discover new streams as they are published |
| by the drivers. |
| |
| Stream Type | Protocol | Location |
| ----------- | -------------------------- | ----------------------- |
| Input | `ZX_PROTOCOL_AUDIO_INPUT` | /dev/class/audio-input |
| Output | `ZX_PROTOCOL_AUDIO_OUTPUT` | /dev/class/audio-output |
| |
| ### Establishing the stream channel |
| |
| After opening the device node, client applications may obtain a stream channel |
| for subsequent communication using the |
| `fuchsia.hardware.audio.Device/GetChannel` FIDL message. For example: |
| |
| ```C |
| zx_handle_t OpenStream(const char* dev_node_path) { |
| zx_handle_t local, remote; |
| zx_status_t status = zx_channel_create(0, &local, &remote); |
| if (status != ZX_OK) { |
| return ZX_HANDLE_INVALID; |
| } |
| status = fdio_service_connect(dev_node_path, remote); |
| if (status != ZX_OK) { |
| LOG("Failed to open \"%s\" (res %d)\n", dev_node_path, status); |
| zx_handle_close(local); |
| return ZX_HANDLE_INVALID; |
| } |
| |
| zx_handle_t audio_channel; |
| status = fuchsia_hardware_audio_DeviceGetChannel(local, &audio_channel); |
| if (status != ZX_OK) { |
| printf("Failed to obtain channel (res %d)\n", status); |
| return ZX_HANDLE_INVALID; |
| } |
| zx_handle_close(local); |
| return audio_channel; |
| } |
| ``` |
| |
| ### Client side termination of the stream channel |
| |
| Clients **may** terminate the connection to the stream at any time simply by |
| calling [zx_handle_close(...)](/docs/reference/syscalls/handle_close.md) on the stream |
| channel. Drivers **must** close any active ring-buffer channels established |
| using this stream channel and **must** make every attempt to gracefully quiesce |
| any on-going streaming operations in the process. |
| |
| ### Sending and receiving messages on the stream and ring-buffer channels |
| |
| All of the messages and message payloads which may be sent or received over |
| stream and ring buffer channels are defined in the |
| [audio](/zircon/system/public/zircon/device/audio.h) protocol header. Messages |
| may be sent to the driver using the |
| [zx_channel_write(...)](/docs/reference/syscalls/channel_write.md) syscall. If a response is |
| expected, it may be read using the |
| [zx_channel_read(...)](/docs/reference/syscalls/channel_read.md) syscall. Best practice, |
| however, is to queue packets for your [channel(s)](/docs/reference/kernel_objects/channel.md) |
| [port](/docs/reference/kernel_objects/port.md) using the |
| [zx_port_queue(...)](/docs/reference/syscalls/port_queue.md) syscall, and use the |
| [zx_port_wait(...)](/docs/reference/syscalls/port_wait.md) syscall to determine when your set |
| of channels have messages (either expected responses or asynchronous |
| notifications) to be read. |
| |
| All messages either sent or received over stream and ring buffer channels are |
| prefaced with an `audio_cmd_hdr_t` structure which contains a 32-bit transaction |
| ID and an `audio_cmd_hdr_t` enumeration value indicating the specific command |
| being requested by the application, the specific command being responded to by |
| the driver, or the asynchronous notification being delivered by the driver to |
| the application. |
| |
| When sending a command to the driver, applications **must** place a transaction |
| ID in the header's `transaction_id` field which is not equal to |
| `AUDIO_INVALID_TRANSACTION_ID`. If a response to a command needs to be sent by |
| the driver to the application, the driver **must** use the transaction ID and |
| `audio_cmd_t` values sent by the client during the request. When sending |
| asynchronous notification to the application, the driver **must** use |
| `AUDIO_INVALID_TRANSACTION_ID` as the transaction ID for the message. |
| Transaction IDs may be used by clients for whatever purpose they desire, however |
| if the IDs are kept unique across all transactions in-flight, the |
| [zx_channel_call(...)](/docs/reference/syscalls/channel_call.md) may be used to implement a |
| simple synchronous calling interface. |
| |
| ### Validation requirements |
| |
| All drivers **must** validate requests and enforce the protocol described above. |
| In case of any violation, drivers **should** immediately quiesce their hardware |
| and **must** close the channel, terminating any operations which happen to be in |
| flight at the time. Additionally, they **may** log a message to a central |
| logging service to assist in application developers in debugging the cause of |
| the protocol violation. Examples of protocol violation include: |
| |
| * Using `AUDIO_INVALID_TRANSACTION_ID` as the value of |
| `message.hdr.transaction_id` |
| * Using a value not present in the `audio_cmd_t` enumeration as the value of |
| `message.hdr.cmd` |
| * Supplying a payload whose size does not match the size of the request |
| payload for a given command. |
| |
| ## Format Negotiation |
| |
| ### Sample Formats |
| |
| Sample formats are described using the `audio_sample_format_t` type. It is a |
| bitfield style enumeration which describes either the numeric encoding of the |
| uncompressed LPCM audio samples as they reside in memory, or indicating that the |
| audio stream consists of a compressed bitstream instead of uncompressed LPCM |
| samples. Refer to the [audio](/zircon/system/public/zircon/device/audio.h) |
| protocol header for exact symbol definitions. |
| |
| The formats described by `audio_sample_format_t` have the following properties: |
| |
| * With the exception of `FORMAT_BITSTREAM`, samples are always assumed to use |
| linear PCM encoding. BITSTREAM is used for transporting compressed audio |
| encodings (such as AC3, DTS, and so on) over a digital interconnect to a |
| decoder device somewhere outside of the system. |
| * By default, multi-byte sample formats are assumed to use host-endianness. If |
| the `INVERT_ENDIAN` flag is set on the format, the format uses the opposite |
| of host endianness. eg. A 16 bit little endian PCM audio format would have |
| the `INVERT_ENDIAN` flag set on it in a when used on a big endian host. The |
| `INVERT_ENDIAN` flag has no effect on COMPRESSED, 8BIT or FLOAT encodings. |
| * The `32BIT_FLOAT` encoding uses specifically the |
| [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754) floating point |
| representation. |
| * By default, non-floating point PCM encodings are assumed expressed using |
| [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement) signed |
| integers. eg. the bit values for a 16 bit PCM sample format would range from |
| [0x8000, 0x7FFF] with 0x0000 representing zero speaker deflection. If the |
| UNSIGNED flag is set on the format, the bit values would range from [0x0000, |
| 0xFFFF] with 0x8000 representing zero deflection. |
| * When used to set formats, exactly one non-flag bit **must** be set. |
| * When used to describe supported formats, any number of non-flag bits **may** |
| be set. Flags (when present) apply to all of the relevant non-flag bits in |
| the bitfield. eg. If a stream supports BITSTREAM, 16BIT and 32BIT_FLOAT, and |
| the UNSIGNED bit is set, it applies only to the 16BIT format. |
| * When encoding a smaller sample size in a larger container (eg 20 or 24bit in |
| 32), the most significant bits of the 32 bit container are used while the |
| least significant bits should be zero. eg. a 20 bit sample would be mapped |
| onto the range [12,32] of the 32 bit container. |
| |
| > TODO: can we make the claim that the LSBs will be ignored, or do we have to |
| > require that they be zero? |
| |
| > TODO: describe what 20-bit packed audio looks like in memory. Does it need to |
| > have an even number of channels in the overall format? Should we strike it |
| > from this list if we cannot find a piece of hardware which demands this format |
| > in memory? |
| |
| ### Enumeration of supported formats |
| |
| In order to determine the formats supported by a given audio stream, |
| applications send an `AUDIO_STREAM_CMD_GET_FORMATS` message over the stream |
| channel. No additional parameters are required. Drivers **must** respond to this |
| request using one or more `audio_stream_cmd_get_formats_resp_t` messages, even |
| if only to report that there are no formats currently supported. |
| |
| ### Range structures |
| |
| Drivers indicate support for formats by sending messages containing zero or more |
| `audio_stream_format_range_t` structures. Each structure contains fields which |
| describe: |
| |
| * A bitmask of supported sample formats. |
| * A minimum and maximum number of channels. |
| * A set of frame rates. |
| |
| A single range structure indicates support for each of the combinations of the |
| three different sets of values (sample formats, channel counts, and frame |
| rates). For example, if a range structure indicated support for: |
| |
| * 16 bit signed LPCM samples |
| * 48000, and 44100 Hz frame rates |
| * 1 and 2 channels |
| |
| Then the fully expanded set of supported formats indicated by the range |
| structure would be: |
| |
| * Stereo 16-bit 48 KHz audio |
| * Stereo 16-bit 44.1 KHz audio |
| * Mono 16-bit 48 KHz audio |
| * Mono 16-bit 44.1 KHz audio |
| |
| See the Sample Formats section (above) for a description of how sample formats |
| are encoded in the `sample_formats` member of a range structure. |
| |
| Supported channel counts are indicated using a pair of min/max channels fields |
| which indicate an exclusive range of channel counts which apply to this range. |
| For example, a min/max channels range of [1, 4] would indicate that this audio |
| stream supports 1, 2, 3 or 4 channels. A range of [2, 2] would indicate that |
| this audio stream supports only stereo audio. |
| |
| Supported frame rates are signalled similarly to channel counts using a pair of |
| min/max frame per second fields along with a flags field. While the min/max |
| values provide an inclusive range of frame rates, the flags determine how to |
| interpret this range. Currently defined flags include: |
| |
| | Flag | Definition | |
| | --------------------------------- | ---------------------------------------- | |
| | `ASF_RANGE_FLAG_FPS_CONTINUOUS` | The frame rate range is continuous. All | |
| : : frame rates in the range [min, max] are : |
| : : valid. : |
| | `ASF_RANGE_FLAG_FPS_48000_FAMILY` | The frame rate range includes the | |
| : : members of the 48 KHz family which exist : |
| : : in the range [min, max] : |
| | `ASF_RANGE_FLAG_FPS_44100_FAMILY` | The frame rate range includes the | |
| : : members of the 44.1 KHz family which : |
| : : exist in the range [min, max] : |
| |
| So, conceptually, the valid frame rates are the union of the sets produced by |
| applying each of the flags which are set to the inclusive [min, max] range. For |
| example, if both the 48 KHz and 44.1 KHz were set, and the range given was |
| [16000, 47999], then the supported frame rates for this range would be |
| |
| * 16000 Hz |
| * 22050 Hz |
| * 32000 Hz |
| * 44100 Hz |
| |
| The official members of the 48 KHz and 44.1 KHz families are |
| |
| | Family | Frame Rates | |
| | --------------------------------- | ---------------------------------- | |
| | `ASF_RANGE_FLAG_FPS_48000_FAMILY` | 8000, 16000, 32000, 48000, 96000, | |
| : : 192000, 384000, 768000 : |
| | `ASF_RANGE_FLAG_FPS_44100_FAMILY` | 11025, 22050, 44100, 88200, 176400 | |
| |
| Drivers **must** set at least one of the flags, or else the set of supported |
| frame rates is empty and this range structure is not allowed. Also note that the |
| set of valid frame rates is the union of the frame rates produce by applying |
| each of the set flags. If the `ASF_RANGE_FLAG_FPS_CONTINUOUS` flag is set the |
| other flags have no effect. While it is legal to do so, drivers **should** avoid |
| this behavior. |
| |
| ### Transporting range structures |
| |
| Range structures are transmitted from drivers to applications within the |
| `audio_stream_cmd_get_formats_resp_t` message. If a large number of formats are |
| supported by a stream, drivers may need to send multiple messages to enumerate |
| all available modes. Messages include the following fields: |
| |
| * A standard `audio_cmd_hdr_t` header. **All** messages involved in the |
| response to an `AUDIO_STREAM_CMD_GET_FORMATS` request **must** use the |
| transaction ID of the original request, and the cmd field in the header |
| **must** be `AUDIO_STREAM_CMD_GET_FORMATS`. |
| * A `format_range_count` field. This indicates the total number of format |
| range structures which will be sent in this response to the application. |
| This number **must** be present in **all** messages involved in the |
| response, and **must not** change from message to message. |
| * A `first_format_range_ndx` field indicating the zero-based index of the |
| first format range being specified in this particular message. See below for |
| details. |
| * An array of `audio_stream_cmd_get_formats_resp_t` structures which is at |
| most `AUDIO_STREAM_CMD_GET_FORMATS_MAX_RANGES_PER_RESPONSE` elements long. |
| |
| Drivers **must**: |
| |
| * Always transmit all of the available audio format ranges. |
| * Always transmit the available audio format ranges in ascending index order. |
| * Always pack as many ranges as possible in the fixed size message structure. |
| * Never overlap index regions or leave gaps. |
| |
| Given these requirements, if the maximum number of ranges per response were 15, |
| and a driver needed to send 35 ranges in response to an application's request, |
| then 3 messages in total would be needed, and the `format_range_count` and |
| `first_format_range_ndx` fields for each message would be as follows. |
| |
| Msg # | `format_range_count` | `first_format_range_ndx` |
| ----- | -------------------- | ------------------------ |
| 1 | 35 | 0 |
| 2 | 35 | 15 |
| 3 | 35 | 30 |
| |
| `first_format_range_ndx` **must** never be greater than `format_range_count`, |
| however `format_range_count` **may** be zero if an audio stream currently |
| supports no formats. The total number of `audio_stream_format_range_t` |
| structures in an `audio_stream_cmd_get_formats_resp_t` message is given by the |
| formula |
| |
| ```C |
| valid_ranges = MIN(AUDIO_STREAM_CMD_GET_FORMATS_MAX_RANGES_PER_RESPONSE, |
| msg.format_range_count - msg.first_format_range_ndx); |
| ``` |
| |
| Drivers **may** choose to always send an entire |
| `audio_stream_cmd_get_formats_resp_t` message, or to send a truncated message |
| which ends after the last valid range structure in the `format_ranges` array. |
| Applications **must** be prepared to receive up to |
| `sizeof(audio_stream_cmd_get_formats_resp_t)` bytes for each message, but also |
| accept messages as short as `offsetof(audio_stream_cmd_get_formats_resp_t, |
| format_ranges)` |
| |
| > TODO: how do devices signal a change of supported formats (e.g., HDMI hot-plug |
| > event)? Are such devices required to simply remove and republish the device? |
| |
| > TODO: define how to enumerate supported compressed bitstream formats. |
| |
| ### Setting the desired stream format |
| |
| In order to select a stream format, applications send an |
| `AUDIO_STREAM_CMD_SET_FORMAT` message over the stream channel. In the message, |
| for uncompressed audio streams, the application specifies: |
| |
| * The frame rate of the stream in Hz using the `frames_per_second` field (in |
| the case of an uncompressed audio stream). |
| * The number of channels packed into each frame using the `channels` field. |
| * The format of the samples in the frame using the `sample_format` field (see |
| Sample Formats, above) |
| |
| Success or failure, drivers **must** respond to a request to set format using a |
| `audio_stream_cmd_set_format_resp_t`. |
| |
| In the case of success, drivers **must** set the `result` field of the response |
| to `ZX_OK` and **must** return a new ring buffer channel over which streaming |
| operations will be conducted. If a previous ring buffer channel had been |
| established and was still active, the driver **must** close this channel and |
| make every attempt to gracefully quiesce any on-going streaming operations in |
| the process. |
| |
| In the case of failure, drivers **must** indicate the cause of failure using the |
| `result` field of the message and **must not** simply close the stream channel |
| as is done for a generic protocol violation. Additionally, they **may** choose |
| to preserve a pre-existing ring-buffer channel, or to simply close such a |
| channel as is mandated for a successful operation. |
| |
| > TODO: specify how compressed bitstream formats will be set |
| |
| ## Determining external latency |
| |
| The external latency of an audio stream is defined as the amount of time it |
| takes outbound audio to travel from the system's interconnect to the speakers |
| themselves, or inbound audio to travel from the microphone to the system's |
| interconnect. As an example, consider an external codec connected to the system |
| using a TDM interconnect: if this interconnect introduces a 4 frame delay |
| between the reception of a TDM frame and the rendering of that frame at the |
| speakers themselves, then the external delay of this audio path is the time |
| duration equivalent to 4 audio frames. |
| |
| External delay is reported in the `external_delay_nsec` field of a successful |
| `AUDIO_STREAM_CMD_SET_FORMAT` response as a non-negative number of nanoseconds. |
| Drivers **should** make their best attempt to accurately report the total of all |
| of the sources of delay the driver knows about. Information about this delay can |
| frequently be found in codec data sheets, dynamically reported as properties of |
| codecs using protocols such as Intel HDA or the USB Audio specifications, or |
| reported by down stream devices using mechanisms such as EDID when using HDMI or |
| DisplayPort interconnects. |
| |
| ## Hardware Gain Control |
| |
| ### Hardware gain control capability reporting |
| |
| In order to determine a stream's gain control capabilities, applications send an |
| `AUDIO_STREAM_CMD_GET_GAIN` message over the stream channel. No parameters need |
| to be supplied with this message. All stream drivers **must** respond to this |
| message, regardless of whether or not the stream hardware is capable of any gain |
| control. All gain values are expressed using 32 bit floating point numbers |
| expressed in dB. |
| |
| Drivers respond to this message with values which indicate the current gain |
| settings of the stream, as well as the stream's gain control capabilities. |
| Current gain settings are expressed using a bool/float tuple indicating if the |
| stream is currently muted or not along with the current dB gain of the stream. |
| Gain capabilities consist of bool and 3 floats. The bool indicates whether or |
| not the stream can be muted. The floats give the minimum and maximum gain |
| settings, along with the `gain step size`. The `gain step size` indicates the |
| smallest increment with which the gain can be controlled counting from the |
| minimum gain value. |
| |
| For example, an amplifier which has 5 gain steps of 7.5 dB each and a maximum 0 |
| dB gain would indicate a range of (-30.0, 0.0) and a step size of 7.5. |
| Amplifiers capable of functionally continuous gain control **may** encode their |
| gain step size as 0.0. |
| |
| Regardless of mute capabilities, drivers for fixed gain streams **must** report |
| their min/max gain as (0.0, 0.0). The gain step size is meaningless in this |
| situation, but drivers **should** report their step size as 0.0. |
| |
| ### Setting hardware gain control levels |
| |
| In order to change a stream's current gain settings, applications send an |
| `AUDIO_STREAM_CMD_SET_GAIN` message over the stream channel. Two parameters are |
| supplied with this message, a set of flags which control the request, and a |
| float indicating the dB gain which should be applied to the stream. |
| |
| Three valid flags are currently defined: |
| |
| * `AUDIO_SGF_MUTE_VALID`. Set when the application wishes to set the |
| muted/un-muted state of the stream. Clear if the application wishes to |
| preserve the current muted/un-muted state. |
| * `AUDIO_SGF_GAIN_VALID`. Set when the application wishes to set the dB gain |
| state of the stream. Clear if the application wishes to preserve the current |
| gain state. |
| * `AUDIO_SGF_MUTE`. Indicates the application's desired mute/un-mute state for |
| the stream. Significant only if `AUDIO_SGF_MUTE_VALID` is also set. |
| |
| Drivers **must** fail the request with an `ZX_ERR_INVALID_ARGS` result if the |
| application's request is incompatible with the stream's capabilities. |
| Incompatible requests include: |
| |
| * The requested gain is less than the minimum support gain for the stream. |
| * The requested gain is more than the maximum support gain for the stream. |
| * Mute was requested, but the stream does not support an explicit mute. |
| |
| Presuming that the request is valid, drivers **should** round the request to the |
| nearest supported gain step size. For example, if a stream can control its gain |
| on the range from -60.0 to 0.0 dB, using a gain step size of 0.5 dB, then a |
| request to set the gain to -33.3 dB **should** result in a gain of -33.5 being |
| applied. A request to that same stream for a gain of -33.2 dB **should** result |
| in a gain of -33.0 being applied. |
| |
| Applications **may** choose not to receive an acknowledgement of a `SET_GAIN` |
| command by setting the `AUDIO_FLAG_NO_ACK` flag on their command. No response |
| message will be sent to the application, regardless of the success or failure of |
| the command. If an acknowledgement was requested by the application, drivers |
| respond with a message indicating the success or failure of the operation as |
| well as the current gain/mute status of the system (regardless of whether the |
| request was a success). |
| |
| ## Plug Detection |
| |
| In addition to streams being published/unpublished in response to being |
| connected or disconnected to/from their bus, streams may have the ability to be |
| plugged or unplugged at any given point in time. For example, a set of USB |
| headphones may publish a new output stream when connected to USB, but choose to |
| be "hardwired" from a plug detection standpoint. A different USB audio adapter |
| with a standard 3.5mm phono jack might publish an output stream when connected |
| via USB, but choose to change its plugged/unplugged state as the user plugs and |
| unplugs an analog device via the 3.5mm jack. |
| |
| The ability to query the currently plugged or unplugged state of a stream, and |
| to register for asynchonous notifications of plug state changes (if supported) |
| is handled via plug detection messages. |
| |
| ### AUDIO_STREAM_CMD_PLUG_DETECT |
| |
| In order to determine a stream's plug detection capabilities and current plug |
| state, and to enable or disable for asynchronous plug detection notifications, |
| applications send a `AUDIO_STREAM_CMD_PLUG_DETECT` command over the stream |
| channel. Drivers respond with a set of `audio_pd_notify_flags_t`, along with a |
| timestamp referenced from the system's monotonic clock indicating the last time |
| the plug state changed. |
| |
| Three valid plug-detect notification flags (PDNF) are currently defined: |
| |
| * `AUDIO_PDNF_HARDWIRED` is set when the stream hardware is considered to be |
| "hardwired". In other words, the stream is considered to be connected as |
| long as the device is published. Examples include a set of built-in |
| speakers, a pair of USB headphones, or a pluggable audio device with no plug |
| detection functionality. |
| * `AUDIO_PDNF_CAN_NOTIFY` is set when the stream hardware is capable of both |
| asynchronously detecting that a device's plug state has changed, and sending |
| a notification message if the client has requested these notifications. |
| * `AUDIO_PDNF_PLUGGED` is set when the stream hardware considers the stream to |
| be currently in the "plugged-in" state. |
| |
| When responding to the `PLUG_DETECT` message, drivers for "hardwired" streams |
| **must not** set the `CAN_NOTIFY` flag, and **must** set the `PLUGGED` flag. |
| Additionally, these drivers **should** always set the plug state time to the |
| time at which the stream device was published by the driver. |
| |
| Applications **may** choose not to receive an acknowledgement of a `PLUG_DETECT` |
| command by setting the `AUDIO_FLAG_NO_ACK` flag on their command. No response |
| message will be sent to the application, regardless of the success or failure of |
| the command. The most common use for this would be when an application wanted to |
| disable asynchronous plug state detection messages and was not actually |
| interested in the current plugged/unplugged state of the stream. |
| |
| ### AUDIO_STREAM_PLUG_DETECT_NOTIFY |
| |
| Applications may request that streams send them asynchronous notifications of |
| plug state changes, using the flags field of the `AUDIO_STREAM_CMD_PLUG_DETECT` |
| command. |
| |
| Two valid flags are currently defined: |
| |
| * `AUDIO_PDF_ENABLE NOTIFICATIONS` is set by clients in order to request that |
| the stream proactively generate `AUDIO_STREAM_PLUG_DETECT_NOTIFY` messages |
| when its plug state changes, if the stream has this capability. |
| * `AUDIO_PDF_DISABLE_NOTIFICATIONS` is set by clients in order to request that |
| NO subsequent `AUDIO_STREAM_PLUG_DETECT_NOTIFY` messages should be sent, |
| regardless of the stream's ability to generate them. |
| |
| In order to request the current plug state without altering the current |
| notification behavior, clients simply set neither `ENABLE` nor `DISABLE` -- |
| passing either 0, or the value `AUDIO_PDF_NONE`. Clients **should** not set both |
| flags at the same time. If they do, drivers **must** interpret this to mean that |
| the final state of the system should be _disabled_. |
| |
| Clients which request asynchronous notifications of plug state changes |
| **should** always check the `CAN_NOTIFY` flag in the driver response. Streams |
| may be capable of plug detection (i.e. if `HARDWIRED` is not set), yet be |
| incapable of detecting plug state changes asynchronously. Clients may still |
| learn of plug state changes, but only by periodically polling the state with |
| `PLUG_DETECT` commands. Drivers for streams which do not set the `CAN_NOTIFY` |
| flag are free to ignore enable/disable notification requests from applications, |
| and **must** not ever send an `AUDIO_STREAM_PLUG_DETECT_NOTIFY` message. Note |
| that even such a driver must always respond to a `AUDIO_STREAM_CMD_PLUG_DETECT` |
| message. |
| |
| ## Access control capability detection and signaling |
| |
| > TODO: specify how this works. In particular, specify how drivers indicate to |
| > applications support for various digital access control mechanisms such as |
| > S/PDIF control words and HDCP. |
| |
| ## Stream purpose and association |
| |
| > TODO: specify how drivers can indicate the general "purpose" of an audio |
| > stream in the system (if known), as well as its relationship to other streams |
| > (if known). For example, an embedded target like a phone or a tablet needs to |
| > indicate which output stream is the built-in speaker vs. which is the headset |
| > jack output. In addition, it needs to make clear which input stream is the |
| > microphone associated with the headset output vs. the builtin speaker. |
| |
| ## Ring-Buffer Channels |
| |
| ### Overview |
| |
| Once an application has successfully set the format of a stream, it receives in |
| the response a new [channel](/docs/reference/kernel_objects/channel.md) representing its connection |
| to the stream's ring-buffer. Clients use the ring-buffer channel to establish a |
| shared memory buffer and start/stop playback/capture of audio stream data. |
| |
| Once started, stream consumption/production is assumed to proceed at the nominal |
| rate from the point in time given in a successful response to the start command, |
| allowing clients to operate without the need to receive any periodic |
| notifications about consumption/production position from the ring buffer itself. |
| Note that the ring-buffer will almost certainly have some form of FIFO buffer |
| between the memory bus and the audio hardware which causes it to either |
| read-ahead in the stream (in the case of playback), or potentially hold onto |
| data (in the case of capturing). In the case of open-loop operation, it is |
| important for clients to query the size of this buffer before beginning |
| operation so they know how far ahead/behind the stream's nominal inferred |
| read/write position they need to stay in order to prevent audio glitching. |
| |
| Also note that because of the shared buffer nature of the system, and the fact |
| that drivers are likely to be DMA-ing directly from this buffer to hardware, it |
| is important for clients running on architectures which are not automatically |
| cache coherent to be sure that they have properly written-back their cache after |
| writing playback data to the buffer, or invalidated their cache before reading |
| captured data. |
| |
| ### Determining the FIFO depth |
| |
| Applications determine a stream's FIFO depth using the |
| `AUDIO_RB_CMD_GET_FIFO_DEPTH` command. Drivers **must** return their FIFO depth, |
| expressed in bytes, in the `fifo_depth` field of the response. To ensure proper |
| playback or capture of audio, applications and drivers must be careful to |
| respect this value. Drivers must not read beyond the nominal playback position |
| of the stream plus this number of bytes when playing audio stream data. |
| Applications must stay this number of bytes behind the nominal capture point of |
| the stream when capturing audio stream data. |
| |
| Once the format of a stream is set and a ring-buffer channel has been opened, |
| the driver **must not** change this value. From an application's point of view, |
| it is a constant property of the ring-buffer channel. |
| |
| ### Obtaining a shared buffer |
| |
| To send or receive audio, the application must first establish a shared memory |
| buffer. This is done by sending an `AUDIO_RB_CMD_GET_BUFFER` request over the |
| ring-buffer channel. This may only be done while the ring-buffer is stopped. |
| Applications **must** specify two parameters when requesting a ring buffer: |
| `min_ring_buffer_frames` and `notifications_per_ring`. |
| |
| #### `min_ring_buffer_frames` |
| |
| The minimum number of frames of audio the client needs allocated for the ring |
| buffer. Drivers **may** make this buffer larger to meet hardware requirements. |
| Clients **must** use the returned VMOs size (in bytes) to determine the actual |
| size of the ring buffer. Clients **may not** assume that the size of the buffer |
| (as determined by the driver) is exactly the size they requested. Drivers |
| **must** ensure that the size of the ring buffer is an integral number of audio |
| frames. |
| |
| > TODO : Is it reasonable to require that drivers produce buffers which are an |
| > integral number of audio frames in length? It certainly makes the audio |
| > client's life easier (client code never needs to split or re-assemble a frame |
| > before processing), but it might make it difficult for some audio hardware to |
| > meet its requirements without making the buffer significantly larger than the |
| > client asked for. |
| |
| #### `notifications_per_ring` |
| |
| The number of position update notifications (`audio_rb_position_notify_t`) the |
| client would like the driver to send per cycle through the ring buffer. Drivers |
| should attempt to space notifications uniformly throughout the ring. Clients |
| **may not** rely on perfectly uniform spacing of the update notifications. |
| Clients are not required to request any notifications and may use only start |
| time and FIFO depth information to determine the driver's playout or capture |
| position. |
| |
| Success or failure, drivers **must** respond to a `GET_BUFFER` request using an |
| `audio_rb_cmd_get_buffer_resp_t` message. If the driver fails the request |
| because a buffer has already been established and the ring-buffer has already |
| been started, it **must not** either stop the ring-buffer, or discard the |
| existing shared memory. If the application requests a new buffer after having |
| already established a buffer while the ring buffer is stopped, it **must** |
| consider the existing buffer is has to be invalid. Success or failure, the old |
| buffer is now gone. |
| |
| If the request succeeds, the driver **must** return a handle to a |
| [VMO](/docs/reference/kernel_objects/vm_object.md) with permissions which allow applications to map |
| the VMO into their address space using [zx_vmar_map](/docs/reference/syscalls/vmar_map.md), |
| and to read/write data in the buffer in the case of playback, or simply to read |
| the data in the buffer in the case of capture. Additionally, the driver **must** |
| report the actual number of frames of audio it will use in the buffer via the |
| `num_ring_buffer_frames` field of the `audio_rb_cmd_get_buffer_resp_t` message. |
| The size of the VMO returned (as reported by |
| [zx_vmo_get_size()](/docs/reference/syscalls/vmo_get_size.md)) **must not** be larger than |
| this number of frames (when converted to bytes). This number **may** be larger |
| than the `min_ring_buffer_frames` request from the client but **must not** be |
| smaller than this number. |
| |
| ### Starting and Stopping the ring-buffer |
| |
| Clients may request that a ring-buffer start or stop using the |
| `AUDIO_RB_CMD_START` and `AUDIO_RB_CMD_STOP` commands. Success or failure, |
| drivers **must** send a response to these requests. Attempting to start a stream |
| which is already started **must** be considered a failure. Attempting to stop a |
| stream which is already stopped **should** be considered a success. Ring-buffers |
| cannot be either stopped or started until after a shared buffer has been |
| established using the `GET_BUFFER` operation. |
| |
| Upon successfully starting a stream, drivers **must** provide their best |
| estimate of the time at which their hardware began to transmit or capture the |
| stream in the `start_time` field of the response. This time stamp **must** be |
| taken from the clock exposed via the |
| [`zx_clock_get_monotonic()`](/docs/reference/syscalls/clock_get_monotonic.md) |
| syscall. Along with the FIFO depth property of the ring buffer, this timestamp |
| allows applications to send or receive stream data without the need for periodic |
| position updates from the driver. Along with the outboard latency estimate |
| provided by the stream channel, this timestamp allows applications to |
| synchronize presentation of audio information across multiple streams, or even |
| multiple devices (provided that an external time synchronization protocol is |
| used to synchronize the [clock |
| monotonic](/docs/reference/syscalls/clock_get_monotonic.md) timelines across the |
| cohort of synchronized devices). |
| |
| > TODO: Redefine `start_time` to allow it to be an arbitrary 'audio stream |
| > clock' instead of the `zx_clock_get_monotonic()` clock. If the stream clock is |
| > made to count in audio frames since start, then this `start_time` can be |
| > replaced with the terms for a segment of a piecewise linear transformation |
| > which can be subsequently updated via notifications sent by the driver in the |
| > case that the audio hardware clock is rooted in a different oscillator from |
| > the system's tick counter. Clients can then use this transformation either to |
| > control the rate of consumption of input streams, or to determine where to |
| > sample in the input stream to effect clock correction. |
| |
| Upon successfully starting a stream, drivers **must** guarantee that no position |
| notifications will be sent before the start response has been enqueued into the |
| ring-buffer channel. |
| |
| Upon successfully stopping a stream, drivers **must** guarantee that no position |
| notifications will be enqueued into the ring-buffer channel after the stop |
| response has been enqueued. |
| |
| ### Position notifications |
| |
| If requested by the client during the `GET_BUFFER` operation, the driver will |
| periodically send updates to the client informing it of its current production |
| or consumption position in the buffer. This position is expressed in bytes in |
| the `ring_buffer_pos` field of the `audio_rb_position_notify_t` message. The |
| message also includes a `monotonic_time` field that contains the time (as |
| zx_time_t) that this byte position was valid. AUDIO_RB_POSITION_NOTIFY messages |
| **must** only be sent while the ring-buffer is started. Note, these position |
| notifications indicate where in the buffer the driver has consumed or produced |
| data, *not* the nominal playback or capture position (sometimes called the |
| "write cursor" or "read cursor" respectively). The timing of their arrival is |
| not guaranteed to be perfectly uniform and should not be used to effect clock |
| recovery. However, the correspondence pair (`monotonic_time`, `ring_buffer_pos`) |
| values themselves ARE intended to be used to recover the clock for the audio |
| stream. If a client discovers that a driver has consumed past the point in the |
| ring buffer where that client has written playback data, audio presentation is |
| undefined. Clients should increase their clock lead time and be certain to stay |
| ahead of this point in the stream in the future. Likewise, clients which capture |
| audio **should not** attempt to read beyond the point in the ring buffer |
| indicated by the most recent position notification sent by the driver. |
| |
| Driver playback/capture position **must** *always* begin at ring buffer byte 0, |
| immediately following a successful `AUDIO_RB_CMD_START` command. When the ring |
| buffer position reaches the end of the VMO (as indicated by |
| [zx_vmo_get_size(...)](/docs/reference/syscalls/vmo_get_size.md)), the ring buffer position |
| wraps back to zero. Drivers are not required to consume or produce data in |
| integral numbers of audio frames. Clients whose notion of stream position |
| depends on position notifications should take care to request that a sufficient |
| number of notifications per ring be sent (minimum 2) and to process them quickly |
| enough that aliasing does not occur. |
| |
| ### Clock recovery |
| |
| > TODO: rewrite this section to include how clock recovery occurs, and how this |
| > is exposed to clients. Also, detail how slewable oscillators are discovered |
| > and controlled. We may need rate-change notifications to clients of slewable |
| > clocks. |
| > |
| > Previous content: TODO: define a way that clock recovery information can be |
| > sent to clients in the case that the audio output oscillator is not derived |
| > from the monotonic clock's oscillator. In addition, if the oscillator is |
| > slew-able in hardware, provide the ability to discover this capability and |
| > control the slew rate. Given the fact that this oscillator is likely to be |
| > shared by multiple streams, it might be best to return some form of system |
| > wide clock identifier and provide the ability to obtain a channel on which |
| > clock recovery notifications can be delivered to clients and HW slewing |
| > command can be sent from clients to the clock. |
| |
| ### Error notifications |
| |
| > TODO: define these and what driver behavior should be, if/when they occur. |
| |
| ### Unexpected client termination |
| |
| If the client side of a ring buffer control channel is closed for any reason, |
| drivers **must** immediately close the control channel and shut down the ring |
| buffer, such that no further audio is emitted nor captured. While drivers are |
| encouraged to do so in a way which produces a graceful transition to silence, |
| they **must** ensure that the audio stream goes silent instead of looping. Once |
| the transition to silence is complete, resources associated with playback or |
| capture **may** be released and reused by the driver. |
| |
| This way, if a playback client teminates unexpectedly, the system will close the |
| client channels, causing audio playback to stop instead of continuing to loop. |