| # Data Layout Modeling |
| |
| Data layout information allows the compiler to answer questions related to how a |
| value of a particular type is stored in memory. For example, the size of a value |
| or its address alignment requirements. It enables, among others, the generation |
| of various linear memory addressing schemes for containers of abstract types and |
| deeper reasoning about vectors. |
| |
| The data layout subsystem is designed to scale to MLIR's open type and operation |
| system. At the top level, it consists of: |
| |
| * attribute interfaces that can be implemented by concrete data layout |
| specifications; |
| * type interfaces that should be implemented by types subject to data layout; |
| * operation interfaces that must be implemented by operations that can serve |
| as data layout scopes (e.g., modules); |
| * and dialect interfaces for data layout properties unrelated to specific |
| types. |
| |
| Built-in types are handled specially to decrease the overall query cost. |
| Similarly, built-in `ModuleOp` supports data layouts without going through the |
| interface. |
| |
| [TOC] |
| |
| ## Usage |
| |
| ### Scoping |
| |
| Following MLIR's nested structure, data layout properties are _scoped_ to |
| regions belonging to either operations that implement the |
| `DataLayoutOpInterface` or `ModuleOp` operations. Such scoping operations |
| partially control the data layout properties and may have attributes that affect |
| them, typically organized in a data layout specification. |
| |
| Types may have a different data layout in different scopes, including scopes |
| that are nested in other scopes such as modules contained in other modules. At |
| the same time, within the given scope excluding any nested scope, a given type |
| has fixed data layout properties. Types are also expected to have a default, |
| "natural" data layout in case they are used outside of any operation that |
| provides data layout scope for them. This ensures that data layout queries |
| always have a valid result. |
| |
| ### Compatibility and Transformations |
| |
| The information necessary to compute layout properties can be combined from |
| nested scopes. For example, an outer scope can define layout properties for a |
| subset of types while inner scopes define them for a disjoint subset, or scopes |
| can progressively relax alignment requirements on a type. This mechanism is |
| supported by the notion of data layout _compatibility_: the layout defined in a |
| nested scope is expected to be compatible with that of the outer scope. MLIR |
| does not prescribe what compatibility means for particular ops and types but |
| provides hooks for them to provide target- and type-specific checks. For |
| example, one may want to only allow relaxation of alignment constraints (i.e., |
| smaller alignment) in nested modules or, alternatively, one may require nested |
| modules to fully redefine all constraints of the outer scope. |
| |
| Data layout compatibility is also relevant during IR transformation. Any |
| transformation that affects the data layout scoping operation is expected to |
| maintain data layout compatibility. It is under responsibility of the |
| transformation to ensure it is indeed the case. |
| |
| ### Queries |
| |
| Data layout property queries can be performed on the special object -- |
| `DataLayout` -- which can be created for the given scoping operation. These |
| objects allow one to interface with the data layout infrastructure and query |
| properties of given types in the scope of the object. The signature of |
| `DataLayout` class is as follows. |
| |
| ```c++ |
| class DataLayout { |
| public: |
| explicit DataLayout(DataLayoutOpInterface scope); |
| |
| unsigned getTypeSize(Type type) const; |
| unsigned getTypeSizeInBits(Type type) const; |
| unsigned getTypeABIAlignment(Type type) const; |
| unsigned getTypePreferredAlignment(Type type) const; |
| }; |
| ``` |
| |
| The user can construct the `DataLayout` object for the scope of interest. Since |
| the data layout properties are fixed in the scope, they will be computed only |
| once upon first request and cached for further use. Therefore, |
| `DataLayout(op.getParentOfType<DataLayoutOpInterface>()).getTypeSize(type)` is |
| considered an anti-pattern since it discards the cache after use. Because of |
| caching, a `DataLayout` object returns valid results as long as the data layout |
| properties of enclosing scopes remain the same, that is, as long as none of the |
| ancestor operations are modified in a way that affects data layout. After such a |
| modification, the user is expected to create a fresh `DataLayout` object. To aid |
| with this, `DataLayout` asserts that the scope remains identical if MLIR is |
| compiled with assertions enabled. |
| |
| ## Custom Implementations |
| |
| Extensibility of the data layout modeling is provided through a set of MLIR |
| [Interfaces](Interfaces.md). |
| |
| ### Data Layout Specifications |
| |
| Data layout specification is an [attribute](LangRef.md/#attributes) that is |
| conceptually a collection of key-value pairs called data layout specification |
| _entries_. Data layout specification attributes implement the |
| `DataLayoutSpecInterface`, described below. Each entry is itself an attribute |
| that implements the `DataLayoutEntryInterface`. Entries have a key, either a |
| `Type` or a `StringAttr`, and a value. Keys are used to associate entries with |
| specific types or dialects: when handling a data layout properties request, a |
| type or a dialect can only see the specification entries relevant to them and |
| must go through the supplied `DataLayout` object for any recursive query. This |
| supports and enforces better composability because types cannot (and should not) |
| understand layout details of other types. Entry values are arbitrary attributes, |
| specific to the type. |
| |
| For example, a data layout specification may be an actual list of pairs with |
| simple custom syntax resembling the following: |
| |
| ```mlir |
| #my_dialect.layout_spec< |
| #my_dialect.layout_entry<!my_dialect.type, size=42>, |
| #my_dialect.layout_entry<"my_dialect.endianness", "little">, |
| #my_dialect.layout_entry<!my_dialect.vector, prefer_large_alignment>> |
| ``` |
| |
| The exact details of the specification and entry attributes, as well as their |
| syntax, are up to implementations. |
| |
| We use the notion of _type class_ throughout the data layout subsystem. It |
| corresponds to the C++ class of the given type, e.g., `IntegerType` for built-in |
| integers. MLIR does not have a mechanism to represent type classes in the IR. |
| Instead, data layout entries contain specific _instances_ of a type class, for |
| example, `IntegerType{signedness=signless, bitwidth=8}` (or `i8` in the IR) or |
| `IntegerType{signedness=unsigned, bitwidth=32}` (or `ui32` in the IR). When |
| handling a data layout property query, a type class will be supplied with _all_ |
| entries with keys belonging to this type class. For example, `IntegerType` will |
| see the entries for `i8`, `si16` and `ui32`, but will _not_ see those for `f32` |
| or `memref<?xi32>` (neither will `MemRefType` see the entry for `i32`). This |
| allows for type-specific "interpolation" behavior where a type class can compute |
| data layout properties of _any_ specific type instance given properties of other |
| instances. Using integers as an example again, their alignment could be computed |
| by taking that of the closest from above integer type with power-of-two |
| bitwidth. |
| |
| [include "Interfaces/DataLayoutAttrInterface.md"] |
| |
| ### Data Layout Scoping Operations |
| |
| Operations that define a scope for data layout queries, and that can be used to |
| create a `DataLayout` object, are expected to implement the |
| `DataLayoutOpInterface`. Such ops must provide at least a way of obtaining the |
| data layout specification. The specification need not be necessarily attached to |
| the operation as an attribute and may be constructed on-the-fly; it is only |
| fetched once per `DataLayout` object and cached. Such ops may also provide |
| custom handlers for data layout queries that provide results without forwarding |
| the queries down to specific types or post-processing the results returned by |
| types in target- or scope-specific ways. These custom handlers make it possible |
| for scoping operations to (re)define data layout properties for types without |
| having to modify the types themselves, e.g., when types are defined in another |
| dialect. |
| |
| [include "Interfaces/DataLayoutOpInterface.md"] |
| |
| ### Types with Data Layout |
| |
| Type classes that intend to handle data layout queries themselves are expected |
| to implement the `DataLayoutTypeInterface`. This interface provides overridable |
| hooks for each data layout query. Each of these hooks is supplied with the type |
| instance, a `DataLayout` object suitable for recursive queries, and a list of |
| data layout queries relevant for the type class. It is expected to provide a |
| valid result even if the list of entries is empty. These hooks do not have |
| access to the operation in the scope of which the query is handled and should |
| use the supplied entries instead. |
| |
| [include "Interfaces/DataLayoutTypeInterface.md"] |
| |
| ### Dialects with Data Layout Identifiers |
| |
| For data layout entries that are not related to a particular type class, the key |
| of the entry is an Identifier that belongs to some dialect. In this case, the |
| dialect is expected to implement the `DataLayoutDialectInterface`. This dialect |
| provides hooks for verifying the validity of the entry value attributes and for |
| and the compatibility of nested entries. |
| |
| ### Bits and Bytes |
| |
| Two versions of hooks are provided for sizes: in bits and in bytes. The version |
| in bytes has a default implementation that derives the size in bytes by rounding |
| up the result of division of the size in bits by 8. Types exclusively targeting |
| architectures with different assumptions can override this. Operations can |
| redefine this for all types, providing scoped versions for cases of byte sizes |
| other than eight without having to modify types, including built-in types. |
| |
| ### Query Dispatch |
| |
| The overall flow of a data layout property query is as follows. |
| |
| 1. The user constructs a `DataLayout` at the given scope. The constructor |
| fetches the data layout specification and combines it with those of |
| enclosing scopes (layouts are expected to be compatible). |
| 2. The user calls `DataLayout::query(Type ty)`. |
| 3. If `DataLayout` has a cached response, this response is returned |
| immediately. |
| 4. Otherwise, the query is handed down by `DataLayout` to the closest layout |
| scoping operation. If it implements `DataLayoutOpInterface`, then the query |
| is forwarded to`DataLayoutOpInterface::query(ty, *this, relevantEntries)` |
| where the relevant entries are computed as described above. If it does not |
| implement `DataLayoutOpInterface`, it must be a `ModuleOp`, and the query is |
| forwarded to `DataLayoutTypeInterface::query(dataLayout, relevantEntries)` |
| after casting `ty` to the type interface. |
| 5. Unless the `query` hook is reimplemented by the op interface, the query is |
| handled further down to `DataLayoutTypeInterface::query(dataLayout, |
| relevantEntries)` after casting `ty` to the type interface. If the type does |
| not implement the interface, an unrecoverable fatal error is produced. |
| 6. The type is expected to always provide the response, which is returned up |
| the call stack and cached by the `DataLayout.` |
| |
| ## Default Implementation |
| |
| The default implementation of the data layout interfaces directly handles |
| queries for a subset of built-in types. |
| |
| ### Built-in Modules |
| |
| Built-in `ModuleOp` allows at most one attribute that implements |
| `DataLayoutSpecInterface`. It does not implement the entire interface for |
| efficiency and layering reasons. Instead, `DataLayout` can be constructed for |
| `ModuleOp` and handles modules transparently alongside other operations that |
| implement the interface. |
| |
| ### Built-in Types |
| |
| The following describes the default properties of built-in types. |
| |
| The size of built-in integers and floats in bytes is computed as |
| `ceildiv(bitwidth, 8)`. The ABI alignment of integer types with bitwidth below |
| 64 and of the float types is the closest from above power-of-two number of |
| bytes. The ABI alignment of integer types with bitwidth 64 and above is 4 bytes |
| (32 bits). |
| |
| The size of built-in vectors is computed by first rounding their number of |
| elements in the _innermost_ dimension to the closest power-of-two from above, |
| then getting the total number of elements, and finally multiplying it with the |
| element size. For example, `vector<3xi32>` and `vector<4xi32>` have the same |
| size. So do `vector<2x3xf32>` and `vector<2x4xf32>`, but `vector<3x4xf32>` and |
| `vector<4x4xf32>` have different sizes. The ABI and preferred alignment of |
| vector types is computed by taking the innermost dimension of the vector, |
| rounding it up to the closest power-of-two, taking a product of that with |
| element size in bytes, and rounding the result up again to the closest |
| power-of-two. |
| |
| Note: these values are selected for consistency with the |
| [default data layout in LLVM](https://llvm.org/docs/LangRef.html#data-layout), |
| which MLIR assumed until the introduction of proper data layout modeling, and |
| with the |
| [modeling of n-D vectors](https://mlir.llvm.org/docs/Dialects/Vector/#deeperdive). |
| They **may change** in the future. |
| |
| #### `index` type |
| |
| Index type is an integer type used for target-specific size information in, |
| e.g., `memref` operations. Its data layout is parameterized by a single integer |
| data layout entry that specifies its bitwidth. For example, |
| |
| ```mlir |
| module attributes { dlti.dl_spec = #dlti.dl_spec< |
| #dlti.dl_entry<index, 32> |
| >} {} |
| ``` |
| |
| specifies that `index` has 32 bits. All other layout properties of `index` match |
| those of the integer type with the same bitwidth defined above. |
| |
| In absence of the corresponding entry, `index` is assumed to be a 64-bit |
| integer. |
| |
| #### `complex` type |
| |
| By default complex type is treated like a 2 element structure of its given |
| element type. This is to say that each of its elements are aligned to their |
| preferred alignment, the entire complex type is also aligned to this preference, |
| and the complex type size includes the possible padding between elements to enforce |
| alignment. |
| |
| ### Byte Size |
| |
| The default data layout assumes 8-bit bytes. |
| |
| ### DLTI Dialect |
| |
| The [DLTI](Dialects/DLTI.md) dialect provides the attributes implementing |
| `DataLayoutSpecInterface` and `DataLayoutEntryInterface`, as well as a dialect |
| attribute that can be used to attach the specification to a given operation. The |
| verifier of this attribute triggers those of the specification and checks the |
| compatibility of nested specifications. |