| |
| |
| <!-- |
| (C) Copyright 2018 The Fuchsia Authors. All rights reserved. |
| Use of this source code is governed by a BSD-style license that can be |
| found in the LICENSE file. |
| --> |
| |
| # Hardware Interfacing |
| |
| This document is part of the [Driver Development Kit tutorial](ddk-tutorial.md) documentation. |
| |
| ## Overview |
| |
| In past chapters, we saw how the protocol stack was organized within a devhost, |
| and some of the work that goes into binding the individual driver protocols into |
| a device driver. |
| |
| In this section, we'll look at practical considerations of dealing with hardware |
| such as determining configuration, binding to interrupts, allocating memory, |
| and performing DMA operations. |
| |
| Here, we'll look at the concepts involved, and show snippets of code as required. |
| Complete working code is shown in subsequent chapters (e.g., [Ethernet Devices](ethernet.md)). |
| |
| For the most part, we'll focus on the PCI bus, and we'll cover the following |
| functions: |
| |
| * Access related: |
| * **pci_map_bar()** |
| * Interrupt related: |
| * **pci_map_interrupt()** |
| * **pci_query_irq_mode()** |
| * **pci_set_irq_mode()** |
| * DMA related: |
| * **pci_enable_bus_master()** |
| * **pci_get_bti()** |
| |
| # Configuration |
| |
| Hardware peripherals are attached to the CPU via a bus, such as the PCI bus. |
| |
| During bootup, the BIOS (or equivalent platform startup software) |
| discovers all of the peripherals attached to the PCI bus. |
| Each peripheral is assigned resources (notably interrupt vectors, |
| and address ranges for configuration registers). |
| |
| The impact of this is that the actual resources assigned to each peripheral may |
| be different across reboots. |
| When the operating system software starts up, it enumerates |
| the bus and starts drivers for all supported devices. |
| The drivers then call PCI functions in order to obtain configuration information about |
| their device(s) so that they can map registers and bind to interrupts. |
| |
| ## Base address register |
| |
| The Base Address Register (**BAR**) is a configuration register that exists on each |
| PCI device. |
| It's where the BIOS stores information about the device, such as the assigned interrupt vector |
| and addresses of control registers. |
| Other, device specific information, is stored there as well. |
| |
| Call **pci_map_bar()** |
| to cause the BAR register to be mapped into the devhost's address space: |
| |
| ```c |
| zx_status_t pci_map_bar(const pci_protocol_t* pci, uint32_t bar_id, |
| uint32_t cache_policy, void** vaddr, size_t* size, |
| zx_handle_t* out_handle); |
| ``` |
| |
| The first parameter, `pci`, is a pointer to the PCI protocol. |
| Typically, you obtain this in your **bind()** function via |
| **device_get_protocol()**. |
| |
| The second parameter, `bar_id`, is the BAR register number, starting with `0`. |
| |
| The third parameter, `cache_policy`, determines the caching policy for access, |
| and can take on the following values: |
| |
| `cache_policy` value | Meaning |
| ------------------------------------|--------------------- |
| `ZX_CACHE_POLICY_CACHED` | use hardware caching |
| `ZX_CACHE_POLICY_UNCACHED` | disable caching |
| `ZX_CACHE_POLICY_UNCACHED_DEVICE` | disable caching, and treat as device memory |
| `ZX_CACHE_POLICY_WRITE_COMBINING` | uncached with write combining |
| |
| Note that `ZX_CACHE_POLICY_UNCACHED_DEVICE` is architecture dependent |
| and may in fact be equivalent to `ZX_CACHE_POLICY_UNCACHED` on some architectures. |
| |
| The next three arguments are return values. |
| The `vaddr` and `size` return a pointer (and length) of the register region, while |
| `out_handle` stores the created handle to the |
| [VMO](/docs/reference/kernel_objects/vm_object.md). |
| |
| ## Reading and writing memory |
| |
| Once the **pci_map_bar()** |
| function returns with a valid result, you can access the BAR via simple pointer |
| operations, for example: |
| |
| ```c |
| volatile uint32_t* base; |
| ... |
| zx_status_t rc; |
| rc = pci_map_bar(dev->pci, 0, ZX_CACHE_POLICY_UNCACHED_DEVICE, &base, &size, &handle); |
| if (rc == ZX_OK) { |
| base[REGISTER_X] = 0x1234; // configure register X for deep sleep mode |
| } |
| ``` |
| |
| It's important to declare `base` as `volatile` — this tells the compiler not to |
| make any assumptions about the contents of the data that `base` points to. |
| For example: |
| |
| ```c |
| int timeout = 1000; |
| while (timeout-- > 0 && !(base[REGISTER_READY] & READY_BIT)) ; |
| ``` |
| |
| is a typical (bounded) polling loop, intended for short polling sequences. |
| Without the `volatile` keyword in the declaration, the compiler would have no reason |
| to believe that the value at `base[REGISTER_READY]` would ever change, so it would |
| cause it to be read only once. |
| |
| # Interrupts |
| |
| An interrupt is an asynchronous event, generated by a device when it needs servicing. |
| For example, an interrupt is generated when data is available on a serial port, |
| or an ethernet packet has arrived. |
| Interrupts allow a driver to know about an event as soon as it |
| occurs, but without the driver spending time polling (actively waiting) for it. |
| |
| The general architecture of a driver that uses interrupts is that a background |
| Interrupt Handling Thread (**IHT**) is created during the driver startup / binding |
| operation. |
| This thread waits for an interrupt to happen, and, when it does, performs some |
| kind of servicing action. |
| |
| As an example, consider a serial port driver. |
| It may receive interrupts due to any of the following events happening: |
| |
| * one or more characters have arrived, |
| * room is now available to transmit one or more characters, |
| * a control line (like `DTR`, for example) has changed state. |
| |
| The interrupt wakes up the IHT. |
| The IHT determines the cause of the event, usually by reading some status registers. |
| Then, it runs an appropriate service function to handle the event. |
| Once done, the IHT goes back to sleep, waiting for the next interrupt. |
| |
| For example, if a character arrives, the IHT wakes up, reads a status register that |
| indicates "data is available," and then calls a function that drains all available |
| characters from the serial port FIFO into the driver's buffer. |
| |
| ## No kernel-level code required |
| |
| You may be familiar with other operating systems which use Interrupt |
| Service Routines (**ISR**). |
| These are kernel-level handlers that run in privileged mode and interface with |
| the interrupt controller hardware. |
| |
| In Fuchsia, the kernel deals with the privileged part of the interrupt |
| handling, and provides thread-level functions for driver use. |
| |
| The difference is that the IHT runs at thread level, whereas the ISR runs |
| at kernel level in a very restricted (and sometimes fragile) environment. |
| A principal advantage is that if the IHT crashes, it takes out only the |
| driver, whereas a failing ISR can take out the entire operating system. |
| |
| ## Attaching to an interrupt |
| |
| Currently, the only bus that provides interrupts is the PCI bus. |
| It supports two kinds: legacy and Message Signaled Interrupts (**MSI**). |
| |
| Therefore, in order to use interrupts on PCI: |
| |
| 1. determine which kind your device supports (legacy or MSI), |
| 2. set the interrupt mode to match, |
| 3. get a handle to your device's interrupt vector (usually one, but may be multiple), |
| 4. start IHT background thread, |
| 5. arrange for IHT thread to wait for interrupts (on handle(s) from step 3). |
| |
| Steps `1` and `2` are usually done closely together, for example: |
| |
| ```c |
| // Query whether we have MSI or Legacy interrupts. |
| uint32_t irq_cnt = 0; |
| if ((pci_query_irq_mode(&edev->pci, ZX_PCIE_IRQ_MODE_MSI, &irq_cnt) == ZX_OK) && |
| (pci_set_irq_mode(&edev->pci, ZX_PCIE_IRQ_MODE_MSI, 1) == ZX_OK)) { |
| // using MSI interrupts |
| } else if ((pci_query_irq_mode(&edev->pci, ZX_PCIE_IRQ_MODE_LEGACY, &irq_cnt) == ZX_OK) && |
| (pci_set_irq_mode(&edev->pci, ZX_PCIE_IRQ_MODE_LEGACY, 1) == ZX_OK)) { |
| // using legacy interrupts |
| } else { |
| // an error |
| } |
| ``` |
| |
| The **pci_query_irq_mode()** |
| function takes three arguments: |
| |
| ```c |
| zx_status_t pci_query_irq_mode(const pci_protocol_t* pci, |
| zx_pci_irq_mode_t mode, |
| uint32_t* out_max_irqs); |
| ``` |
| |
| The first argument, `pci`, is a pointer to the PCI protocol stack bound to your device |
| just like we saw above, in the BAR documentation. |
| |
| The second argument, `mode`, is the kind of interrupt that you are interested in; |
| it's one of the two constants shown in the example. |
| |
| > @@@ there's also a `ZX_PCIE_IRQ_MODE_MSI_X` in the syscalls/pci.h file; should I say anything about that? How would we use it in the above case, just make a third condition? |
| |
| The third argument is a pointer to integer that returns how many |
| interrupts of the specified type your device supports. |
| |
| Having determined the kind of interrupt supported, you then call |
| **pci_set_irq_mode()** |
| to indicate that this is indeed the kind of interrupt that you wish to use. |
| |
| Finally, you call **pci_map_interrupt()** |
| to create a handle to the selected interrupt. Note that |
| **pci_map_interrupt()** has the following prototype: |
| |
| ```c |
| zx_status_t pci_map_interrupt(const pci_protocol_t* pci, |
| int which_irq, |
| zx_handle_t* out_handle); |
| ``` |
| |
| The first argument is the same as in the previous call, the second argument, `which_irq` |
| indicates the device-relative interrupt number you'd like, and the third argument |
| is a pointer to the created interrupt handle. |
| |
| You now have an interrupt handle. |
| |
| > Note that the vast majority of devices have just one interrupt, so simply passing |
| > `0` for `which_irq` is normal. |
| > If your device does have more than one interrupt, the common practice is to run the |
| > **pci_map_interrupt()** function in a `for` loop |
| > and bind handles to each interrupt. |
| |
| ## Waiting for the interrupt |
| |
| In your IHT, you call [**zx_interrupt_wait()**](/docs/reference/syscalls/interrupt_wait.md) |
| to wait for the interrupt. |
| The following prototype applies: |
| |
| ```c |
| zx_status_t zx_interrupt_wait(zx_handle_t handle, |
| zx_time_t* out_timestamp); |
| ``` |
| |
| The first argument is the handle you obtained via the call to |
| **pci_map_interrupt()**, |
| and the second parameter can be `NULL` (typical), or it can be a pointer to a time |
| stamp that indicates when the interrupt was triggered (in nanoseconds, |
| relative to the clock source `ZX_CLOCK_MONOTONIC`). |
| |
| Therefore, a typical IHT would have the following shape: |
| |
| ```c |
| static int irq_thread(void* arg) { |
| my_device_t* dev = arg; |
| for (;;) { |
| zx_status_t rc; |
| rc = zx_interrupt_wait(dev->irq_handle, NULL); |
| // do stuff |
| } |
| } |
| ``` |
| |
| The convention is that the argument passed to the IHT is your device context block. |
| The context block has a member (here `irq_handle`) that is the handle you obtained via |
| **pci_map_interrupt()**. |
| |
| ## Edge vs level interrupt mode |
| |
| The interrupt hardware can operate in one of two modes; "edge" or "level". |
| |
| In edge mode, the interrupt is armed on the active-going edge (when the hardware |
| signal goes from inactive to active), and works as a one-shot. |
| That is, the signal must go back to inactive before it can be recognized again. |
| |
| In level mode, the interrupt is active when the hardware signal is in the |
| active state. |
| |
| Typically, edge mode is used when the interrupt is dedicated, and level mode is |
| used when the interrupt is shared by multiple devices (because you want the |
| interrupt to remain active until *all* devices have de-asserted their request line). |
| |
| The Zircon kernel automatically masks and unmasks the interrupt as appropriate. |
| For level-triggered hardware interrupts, |
| [**zx_interrupt_wait()**](/docs/reference/syscalls/interrupt_wait.md) |
| masks the interrupt before returning, and unmasks it when called the next time. |
| For edge-triggered interrupts, the interrupt remains unmasked. |
| |
| > The IHT should not perform any long-running tasks. |
| > For drivers that perform lengthy tasks, use a worker thread. |
| |
| ## Shutting down a driver that uses interrupts |
| |
| In order to cleanly shut down a driver that uses interrupts, you can use |
| [**zx_interrupt_destroy()**](/docs/reference/syscalls/interrupt_destroy.md) |
| to abort the |
| [**zx_interrupt_wait()**](/docs/reference/syscalls/interrupt_wait.md) |
| call. |
| |
| The idea is that when the foreground thread determines that the driver should be |
| shut down, it simply destroys the interrupt handle, causing the IHT to shut down: |
| |
| ```c |
| static void main_thread() { |
| ... |
| if (shutdown_requested) { |
| // destroy the handle, this will cause zx_interrupt_wait() to pop |
| zx_interrupt_destroy(dev->irq_handle); |
| |
| // wait for the IHT to finish |
| thrd_join(dev->iht, NULL); |
| } |
| ... |
| } |
| |
| static int irq_thread(void* arg) { |
| ... |
| for(;;) { |
| zx_status_t rc; |
| rc = zx_interrupt_wait(dev->irq_handle, NULL); |
| if (rc == ZX_ERR_CANCELED) { |
| // we are being shut down, do any cleanups required |
| ... |
| return; |
| } |
| ... |
| } |
| } |
| ``` |
| |
| The main thread, when requested to shut down, destroys the interrupt handle. |
| This causes the IHT's |
| [**zx_interrupt_wait()**](/docs/reference/syscalls/interrupt_wait.md) |
| call to wake up with an error code. |
| The IHT looks at the error code (in this case, `ZX_ERR_CANCELED`) and makes |
| the decision to end. |
| Meanwhile, the main thread is waiting to join the IHT via the call |
| to **thrd_join()**. |
| Once the IHT exits, **thrd_join()** returns, and the main |
| thread can finish its processing. |
| |
| The advanced reader is invited to look at some of the other interrupt related |
| functions available: |
| |
| * [**zx_interrupt_ack()**](/docs/reference/syscalls/interrupt_ack.md) |
| * [**zx_interrupt_bind()**](/docs/reference/syscalls/interrupt_bind.md) |
| * [**zx_interrupt_create()**](/docs/reference/syscalls/interrupt_create.md) |
| * [**zx_interrupt_trigger()**](/docs/reference/syscalls/interrupt_trigger.md) |
| |
| # DMA |
| |
| Direct Memory Access (**DMA**) is a feature that allows hardware to access |
| memory without CPU intervention. |
| At the highest level, the hardware is given the source and destination of the |
| memory region to transfer (along with its size) and told to copy the data. |
| Some hardware peripherals even support the ability to do multiple |
| "scatter / gather" style operations, where several copy operations |
| can be performed, one after the other, without additional CPU intervention. |
| |
| ## DMA considerations |
| |
| In order to fully appreciate the issues involved, it's important to |
| keep the following in mind: |
| |
| * each process operates in a virtual address space, |
| * an MMU can map a contiguous virtual address range onto multiple, |
| discontiguous physical address ranges (and vice-versa), |
| * each process has a limited window into physical address space, |
| * some peripherals support their own virtual addresses |
| via an Input / Output Memory Management Unit (**IOMMU**). |
| |
| Let's discuss each point in turn. |
| |
| ### Virtual, physical, and device-physical addresses |
| |
| The addresses that the process has access to are virtual; that is, they are |
| an illusion created by the CPU's Memory Management Unit (**MMU**). |
| A virtual address is mapped by the MMU into a physical address. |
| The mapping granularity is based on a parameter called "page size," which |
| is at least 4k bytes, though larger sizes are available on modern processors. |
| |
|  |
| |
| In the diagram above, we show a specific process (process 12) with a number of |
| virtual addresses (in blue). |
| The MMU is responsible for mapping the blue virtual addresses into CPU physical |
| bus addresses (red). |
| Each process has its own mapping; so even though process 12 has a virtual address |
| `300`, some other process may also have a virtual address `300`. |
| That other process's virtual address `300` (if it exists) would be mapped |
| to a different physical address than the one in process 12. |
| |
| > Note that we've used small decimal numbers as "addresses" to keep the discussion simple. |
| > In reality, each square shown above represents a page of memory (4k or more), |
| > and is identified by a 32 or 64 bit value (depending on the platform). |
| |
| The key points shown in the diagram are: |
| |
| 1. virtual addresses can be allocated in groups (three are shown, `300`-`303`, `420`-`421`, |
| and `770`-`771`), |
| 2. virtually contiguous (e.g., `300`-`303`) is not necessarily physically contiguous. |
| 3. some virtual addresses are not mapped (for example, there is no virtual address |
| `304`) |
| 4. not all physical addresses are available to each process (for example, process |
| `12` doesn't have access to physical address `120`). |
| |
| Depending on the hardware available on the platform, a device's address space |
| may or may not follow a similar translation. |
| Without an IOMMU, the addresses that the peripheral uses are the same as |
| the physical addresses used by the CPU: |
| |
|  |
| |
| In the diagram above, portions of the device's address space (for example, a |
| frame buffer, or control registers), appear directly in the CPU's physical |
| address range. |
| That is to say, the device occupies physical addresses `122` through `125` |
| inclusive. |
| |
| In order for the process to access the device's memory, it would need to create |
| an MMU mapping from some virtual addresses to the physical addresses `122` through |
| `125`. |
| We'll see how to do that, below. |
| |
| But with an IOMMU, the addresses seen by a peripheral may be different than |
| the CPU's physical addresses: |
| |
|  |
| |
| Here, the device has its own "device-physical" addresses that it knows about, |
| that is, addresses `0` through `3` inclusive. |
| It's up to the IOMMU to map the device-physical addresses `0` through `3` |
| into CPU physical addresses `109`, `110`, `101`, and `119`, respectively. |
| |
| In this scenario, in order for the process to use the device's memory, it needs |
| to arrange two mappings: |
| |
| * one set from the virtual address space (e.g., `300` through `303`) to the |
| CPU physical address space (`109`, `110`, `101`, and `119`, respectively), |
| via the MMU, and |
| * one set from the CPU physical address space (addresses `109`, `110`, `101`, |
| and `119`) to the device-physical addresses (`0` through `3`) via the IOMMU. |
| |
| While this may seem complicated, Zircon provides an abstraction that removes |
| the complexity. |
| |
| Also, as we'll see below, the reason for having an IOMMU, and the benefits provided, |
| are similar to those obtained by having an MMU. |
| |
| ### Contiguity of memory |
| |
| When you allocate a large chunk of memory (e.g. via **calloc()**), |
| your process will, of course, see a large, contiguous virtual address range. |
| The MMU creates the illusion of contiguous memory at the virtual addressing |
| level, even though the MMU may choose to back that memory area with physically |
| discontiguous memory at the physical address level. |
| |
| Furthermore, as processes allocate and deallocate memory, the mapping of |
| physical memory to virtual address space tends to become more |
| complex, encouraging more "swiss cheese" holes to appear (that is, |
| more discontiguities in the mapping). |
| |
| Therefore, it's important to keep in mind that contiguous virtual addresses |
| are not necessarily contiguous physical addresses, and indeed that contiguous |
| physical memory becomes more precious over time. |
| |
| ### Access controls |
| |
| Another benefit of the MMU is that processes are limited in their view of |
| physical memory (for security and reliability reasons). |
| The impact on drivers, though, is that a process has to specifically request |
| a mapping from virtual address space to physical address space, and |
| have the requisite privilege in order to do so. |
| |
| ### IOMMU |
| |
| Contiguous physical memory is generally preferred. |
| It's more efficient to do one transfer (with one source address and one |
| destination address) than it is to set up and manage multiple individual |
| transfers (which may require CPU intervention between each transfer in |
| order to set up the next one). |
| |
| The IOMMU, if available, alleviates this problem by doing the same thing for |
| the peripherals that the CPU's MMU does for the process — it gives the peripheral |
| the illusion that it's dealing with a contiguous address space by |
| mapping multiple discontiguous chunks into a virtually contiguous space. |
| By limiting the mapping region, the IOMMU also provides security (in the same way as |
| the MMU does), by preventing the peripheral from accessing memory that's not "in scope" |
| for the current operation. |
| |
| ### Tying it all together |
| |
| So, it may appear that you need to worry about virtual, physical, and device-physical |
| address spaces when you are writing your driver. |
| But that's not the case. |
| |
| ## DMA and your driver |
| |
| Zircon provides a set of functions that allow you to cleanly deal with all of the |
| above. |
| The following work together: |
| |
| * a Bus Transaction Initiator ([BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md)), and |
| * a Virtual Memory Object ([VMO](/docs/reference/kernel_objects/vm_object.md)). |
| |
| The [BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md) |
| kernel object provides an abstraction of the model, and an API to deal with |
| physical (or device-physical) addresses associated with |
| [VMO](/docs/reference/kernel_objects/vm_object.md)s. |
| |
| In your driver's initialization, call |
| **pci_get_bti()** |
| to obtain a [BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md) handle: |
| |
| ```c |
| zx_status_t pci_get_bti(const pci_protocol_t* pci, |
| uint32_t index, |
| zx_handle_t* bti_handle); |
| ``` |
| |
| The **pci_get_bti()** |
| function takes a `pci` protocol pointer (just like all the other **pci_...()** functions |
| discussed above) and an `index` (reserved for future use, use `0`). |
| It returns a [BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md) |
| handle through the `bti_handle` pointer argument. |
| |
| Next, you need a [VMO](/docs/reference/kernel_objects/vm_object.md). |
| Simplistically, you can think of the [VMO](/docs/reference/kernel_objects/vm_object.md) |
| as a pointer to a chunk of memory, |
| but it's more than that — it's a kernel object that represents a set |
| of virtual pages (that may or may not have physical pages committed to them), |
| which can be mapped into the virtual address space of the driver process. |
| (It's even more than that, but that's a discussion for a different chapter.) |
| |
| Ultimately, these pages serve as the source or destination of the DMA transfer. |
| |
| There are two functions, |
| [**zx_vmo_create()**](/docs/reference/syscalls/vmo_create.md) |
| and |
| [**zx_vmo_create_contiguous()**](/docs/reference/syscalls/vmo_create_contiguous.md) |
| that allocate memory and bind it to a [VMO](/docs/reference/kernel_objects/vm_object.md): |
| |
| ```c |
| zx_status_t zx_vmo_create(uint64_t size, |
| uint32_t options, |
| zx_handle_t* out); |
| |
| zx_status_t zx_vmo_create_contiguous(zx_handle_t bti, |
| size_t size, |
| uint32_t alignment_log2, |
| zx_handle_t* out); |
| ``` |
| |
| As you can see, they both take a `size` parameter indicating the number of bytes required, |
| and they both return a [VMO](/docs/reference/kernel_objects/vm_object.md) (via `out`). |
| They both allocate virtually contiguous pages, for a given size. |
| |
| > Note that this differs from the standard C library memory allocation functions, |
| > (e.g., **malloc()**), which allocate virtually contiguous memory, but without |
| > regard to page boundaries. Two small **malloc()** calls in a row might allocate |
| > two memory regions from the *same* page, for instance, whereas |
| > the [VMO](/docs/reference/kernel_objects/vm_object.md) |
| > creation functions will always allocate memory starting with a *new* page. |
| |
| The |
| [**zx_vmo_create_contiguous()**](/docs/reference/syscalls/vmo_create_contiguous.md) |
| function does what |
| [**zx_vmo_create()**](/docs/reference/syscalls/vmo_create.md) |
| does, *and* ensures that the pages are suitably |
| organized for use with the specified [BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md) |
| (which is why it needs the [BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md) handle). |
| It also features an `alignment_log2` parameter that can be used to specify a minimum |
| alignment requirement. |
| As the name suggests, it must be an integer power of 2 (with the value `0` indicating |
| page aligned). |
| |
| At this point, you have two "views" of the allocated memory: |
| |
| * one contiguous virtual address space that represents memory |
| from the point of view of the driver, and |
| * a set of (possibly contiguous, possibly committed) physical pages |
| for use by the peripheral. |
| |
| Before using these pages, you need to ensure that they are present in memory (that is, |
| "committed" — the physical pages are accessible to your process), and that the |
| peripheral has access to them (via the IOMMU if present). |
| You will also need the addresses of the pages (from the point of view of the device) |
| so that you can program the DMA controller on your device to access them. |
| |
| The |
| [**zx_bti_pin()**](/docs/reference/syscalls/bti_pin.md) |
| function is used to do all that: |
| |
| ```c |
| #include <zircon/syscalls.h> |
| |
| zx_status_t zx_bti_pin(zx_handle_t bti, uint32_t options, |
| zx_handle_t vmo, uint64_t offset, uint64_t size, |
| zx_paddr_t* addrs, size_t addrs_count, |
| zx_handle_t* pmt); |
| ``` |
| |
| There are 8 parameters to this function: |
| |
| Parameter | Purpose |
| ----------------|------------------------------------ |
| `bti` | the [BTI](/docs/reference/kernel_objects/bus_transaction_initiator.md) for this peripheral |
| `options` | options (see below) |
| `vmo` | the [VMO](/docs/reference/kernel_objects/vm_object.md) for this memory region |
| `offset` | offset from the start of the [VMO](/docs/reference/kernel_objects/vm_object.md) |
| `size` | total number of bytes in [VMO](/docs/reference/kernel_objects/vm_object.md) |
| `addrs` | list of return addresses |
| `addrs_count` | number of elements in `addrs` |
| `pmt` | returned [PMT](/docs/reference/kernel_objects/pinned_memory_token.md) (see below) |
| |
| The `addrs` parameter is a pointer to an array of `zx_paddr_t` that you supply. |
| This is where the peripheral addresses for each page are returned into. |
| The array is `addrs_count` elements long, and must match the count of |
| elements expected from |
| [**zx_bti_pin()**](/docs/reference/syscalls/bti_pin.md). |
| |
| > The values written into `addrs` are suitable for programming the peripheral's |
| > DMA controller — that is, they take into account any translations that |
| > may be performed by an IOMMU, if present. |
| |
| On a technical note, the other effect of |
| [**zx_bti_pin()**](/docs/reference/syscalls/bti_pin.md) |
| is that the kernel will ensure those pages are not decommitted |
| (i.e., moved or reused) while pinned. |
| |
| The `options` argument is actually a bitmap of options: |
| |
| Option | Purpose |
| ------------------------|-------------------------------- |
| `ZX_BTI_PERM_READ` | pages can be read by the peripheral (written by the driver) |
| `ZX_BTI_PERM_WRITE` | pages can be written by the peripheral (read by the driver) |
| `ZX_BTI_COMPRESS` | (see "Minimum contiguity property," below) |
| |
| For example, refer to the diagrams above showing "Device #3". |
| If an IOMMU is present, `addrs` would contain `0`, `1`, `2`, and `3` (that is, |
| the device-physical addresses). |
| If no IOMMU is present, `addrs` would contain `109`, `110`, `101`, and `119` (that is, |
| the physical addresses). |
| |
| ### Permissions |
| |
| Keep in mind that the permissions are from the perspective |
| *of the peripheral*, and not the driver. |
| For example, in a block device **write** operation, the device **reads** from memory pages and |
| therefore the driver specifies `ZX_BTI_PERM_READ`, and vice versa in the block device read. |
| |
| ### Minimum contiguity property |
| |
| By default, each address returned through `addrs` is one page long. |
| Larger chunks may be requested by setting the `ZX_BTI_COMPRESS` option |
| in the `options` argument. |
| In that case, the length of each entry returned corresponds to the "minimum contiguity" property. |
| While you can't set this property, you can read it via |
| [**zx_object_get_info()**](/docs/reference/syscalls/object_get_info.md). |
| Effectively, the minimum contiguity property is a guarantee that |
| [**zx_bti_pin()**](/docs/reference/syscalls/bti_pin.md) |
| will always be able to return addresses that are contiguous for at least that many bytes. |
| |
| For example, if the property had the value 1MB, then a call to |
| [**zx_bti_pin()**](/docs/reference/syscalls/bti_pin.md) |
| with a requested size of 2MB would return at most two physically-contiguous runs. |
| If the requested size was 2.5MB, it would return at most three physically-contiguous runs, |
| and so on. |
| |
| ### Pinned Memory Token (PMT) |
| |
| [`zx_bti_pin()`](/docs/reference/syscalls/bti_pin.md) returns a Pinned Memory Token |
| ([PMT](/docs/reference/kernel_objects/pinned_memory_token.md)) |
| upon success in the *pmt* argument. |
| The driver must call [`zx_pmt_unpin()`](/docs/reference/syscalls/pmt_unpin.md) when the device is done with |
| the memory transaction to unpin and revoke access to the memory pages by the device. |
| |
| ## Advanced topics |
| |
| ### Cache Coherency |
| |
| On fully DMA-coherent architectures, hardware ensures the data in the CPU cache is the same |
| as the data in main memory without software intervention. Not all architectures are |
| DMA-coherent. On these systems, the driver must ensure the CPU cache is made coherent by |
| invoking appropriate cache operations on the memory range before performing DMA operations, |
| so that no stale data will be accessed. |
| |
| To invoke cache operations on the memory represented by [VMO](/docs/reference/kernel_objects/vm_object.md)s, use the |
| [`zx_vmo_op_range()`](/docs/reference/syscalls/vmo_op_range.md) |
| syscall. |
| Prior to a peripheral-read |
| (driver-write) operation, clean the cache using `ZX_VMO_OP_CACHE_CLEAN` to write out dirty |
| data to main memory. Prior to a peripheral-write (driver-read), mark the cache lines |
| as invalid using `ZX_VMO_OP_CACHE_INVALIDATE` to ensure data is fetched from main |
| memory on the next access. |
| |