|  | # Introduction to `fbl` intrusive containers | 
|  |  | 
|  | Intrusive containers (or intrusive data structures) are the type of containers | 
|  | offered by the Fuchsia Base Library (`fbl::`). | 
|  |  | 
|  | This document will: | 
|  |  | 
|  | 1. [Define what an intrusive container is.](#what-is-an-intrusive-container) | 
|  | 2. [Compare the differences between intrusive and non-intrusive containers.](#intrusive-vs-not) | 
|  | 3. [Discuss the similarities and differences between `fbl::` and `std::` containers.](#fbl-vs-std) | 
|  | 4. [Introduce some basic terminology which will be used throughout this guide.](#terminology) | 
|  |  | 
|  | ## What is an intrusive container? {#what-is-an-intrusive-container} | 
|  |  | 
|  | An intrusive container is a data structure which can be used to hold a | 
|  | collection of objects where the bookkeeping used to track membership in the | 
|  | container is stored in the objects themselves instead of holding the object | 
|  | alongside of the bookkeeping in a structure. To highlight the distinction, | 
|  | consider a structure which defines a point in a 2-dimensional coordinate system. | 
|  |  | 
|  | ```cpp | 
|  | struct Point { | 
|  | float x, y; | 
|  | }; | 
|  | ``` | 
|  |  | 
|  | In a traditional non-intrusive implementation of a doubly linked list, a node in | 
|  | the list might look something like this | 
|  |  | 
|  | ```cpp | 
|  | struct ListNode { | 
|  | Point val; | 
|  | ListNode *next, *prev; | 
|  | }; | 
|  | ``` | 
|  |  | 
|  | While an intrusive version of the same list would be | 
|  |  | 
|  | ```cpp | 
|  | struct Point { | 
|  | float x, y; | 
|  | Point *next, *prev; | 
|  | }; | 
|  | ``` | 
|  |  | 
|  | Initially, these appear to be very similar approaches. The distinction seems | 
|  | mostly like a question of which object is contained within the other. As time goes | 
|  | on, however, the differences become more apparent. | 
|  |  | 
|  | Adding a point to a non-intrusive list of points means: | 
|  |  | 
|  | 1. Allocating a `ListNode` | 
|  | 1. Copying the `Point` you want to add to the list into the `val` member of the `ListNode` | 
|  | 1. Linking the `next`/`prev` pointers of the `ListNode` into the list. | 
|  |  | 
|  | Removing the point is the same process in reverse. | 
|  |  | 
|  | 1. The pointers are unlinked. | 
|  | 1. The contents of the `Point` are copied out to local storage. | 
|  | 1. Finally, the `ListNode` is de-allocated. | 
|  |  | 
|  | When using an intrusive list, the add/remove operations skip the | 
|  | allocation/deallocation and copying of the `Point`'s members. The `next`/`prev` | 
|  | pointers, which exist in the point object already, are simply linked into the | 
|  | list during the add operation and unlinked during the remove. While the | 
|  | intrusive form does not have to perform any allocations or copy any structures | 
|  | to add elements to a list, every `Point` in the system is must carry around the | 
|  | overhead of the next/prev bookkeeping, even while not in a list. | 
|  |  | 
|  | While a small structure or primitive data type might be held by value in a | 
|  | non-intrusive list, when objects become larger, it becomes more common to track | 
|  | the object using some appropriate pointer type instead of storing the object by | 
|  | value. For example | 
|  |  | 
|  | ```cpp | 
|  | struct ListNode { | 
|  | std::shared_ptr<LargeObject> ptr; | 
|  | ListNode *next, *prev; | 
|  | }; | 
|  | ``` | 
|  |  | 
|  | The "value" being stored in this list is an instance of a shared pointer, so | 
|  | while adding or removing a pointer to a `LargeObject` to a list still requires | 
|  | management of the allocation used to store the node state, the object itself no | 
|  | longer needs to be copied when being added to a list. Additionally, this | 
|  | approach allows a single instance of a `LargeObject` to exist in multiple | 
|  | containers at the same time as the nodes simply track references to the objects | 
|  | and not copies of the objects. A simple implementation of an intrusive form of | 
|  | this would put the `next`/`prev` pointers into `LargeObject` directly implying | 
|  | that an instance of a `LargeObject` could be a member of exactly 0 or 1 | 
|  | intrusive lists, but not more. | 
|  |  | 
|  | ## Intrusive vs. non-intrusive containers {#intrusive-vs-not} | 
|  |  | 
|  | The primary advantage of an intrusive container approach to tracking objects is | 
|  | the lack of allocations when adding/removing elements to/from a container. In | 
|  | some (usually specialized) environments, heap interactions can introduce a | 
|  | potential failure path which would ideally not exist. Additionally, | 
|  | interactions with heaps usually involve interactions with locks/mutexes which | 
|  | could result in involuntary blocking. This might introduce undesirable timing | 
|  | indeterminism, or in some cases may even not be an option if the code is running | 
|  | in a context in which blocking is not allowed. | 
|  |  | 
|  | For example, what happens if code needs to add a bookkeeping structure to a list | 
|  | during a hard interrupt handler in the kernel if the allocation for the | 
|  | bookkeeping structure cannot be allocated, or if allocation of the bookkeeping | 
|  | would require the IRQ handler to block due to lock contention (something that | 
|  | cannot happen in the exception handler)? By allocating the container | 
|  | bookkeeping as part of the object itself ahead of time, an intrusive container | 
|  | approach to this problem allows these issues to be avoided. | 
|  |  | 
|  | In general, intrusive containers can provide performance advantages when the | 
|  | design of the system allows the implementer to know at compile time all of the | 
|  | various containers that an object will ever exist in. This does not mean that they | 
|  | are always the best choice, however, as they come with limitations as well. | 
|  |  | 
|  | Finally, for many types of intrusive containers, holding a reference to an | 
|  | object implies that you have located that object in all of the container types | 
|  | it can possible exist in where the same is not true when an object is tracked by | 
|  | reference in multiple non-intrusive container instances. For example, an object | 
|  | which could exist in a balanced binary search tree as well as in a doubly linked | 
|  | list (both intrusive) could be located in O(log n) time in the tree, and then | 
|  | removed from the list in O(1) time, instead of needing to find the object in the | 
|  | list in O(n) time before removing it. | 
|  |  | 
|  | A short list of some pros and cons of the two approaches might look something | 
|  | like this: | 
|  |  | 
|  | Non-intrusive: | 
|  | * Pro: Can be used to track simple objects by value, even primitive data types | 
|  | like `int` or `float` | 
|  | * Pro: Does not require changing the definition of the object itself in order to | 
|  | manage it in a different container. This is especially helpful when an object | 
|  | `O` is defined in Library A, but a user wishes to create a collection of | 
|  | `A::O`s in their own program and cannot reasonably re-define `O`. | 
|  | * Pro: Objects tracked by reference can easily exist in multiple containers | 
|  | independently. | 
|  | * Con: Requires independent allocation management of the bookkeeping, usually on | 
|  | every add/remove operation. This management frequently implies hidden heap | 
|  | interactions which can introduce overhead, timing uncertainty, and additional | 
|  | complexity as users need to manage potential failure paths. | 
|  | * Con: Finding the location of an object in container A provides no information | 
|  | about the location of the object in container B. For example, finding an | 
|  | object which is contained in a `std::map` with an O(log n) operation does not | 
|  | help me to locate the same object which is simultaneously contained in a | 
|  | `std::deque`. If I want to remove the object from the `std::deque`, I must find | 
|  | it there with a separate O(n) operation first. | 
|  |  | 
|  | Intrusive: | 
|  | * Pro: Minimal overhead to join or leave a container, and no chance of failure | 
|  | or involuntary yielding due to lock contention as the bookkeeping overhead was | 
|  | paid up-front during object creation. | 
|  | * Pro: Finding an instance of an object implicitly finds the instance of the | 
|  | object in all of the containers it can exist in. | 
|  | * Pro: Knowledge of whether an object is currently container in container type A | 
|  | is a property of the object and can always be tested (from the object) in O(1) | 
|  | time. | 
|  | * Con: The ability to be contained is a fundamental property of the object. In | 
|  | order to change the number or types of the containers an object may exist in, | 
|  | the definition of the object itself must be changed and all of the code which | 
|  | depends on that definition must be re-compiled. | 
|  | * Con: Only objects themselves can be tracked. Primitive data types do not have | 
|  | any place to store extra bookkeeping information and cannot be tracked in an | 
|  | intrusive fashion. | 
|  |  | 
|  | ## `fbl::` vs. `std::` {#fbl-vs-std} | 
|  |  | 
|  | The C++ Standard Library offers a large set of implementation agnostic | 
|  | non-intrusive containers. For example, `std::map` is an ordered associative | 
|  | container, however there are no requirements that an implementation of the | 
|  | standard library use any specific data structure in order to implement | 
|  | `std::map`, only that the implementation chosen provides certain | 
|  | guarantees for various container operations as specified by the standard, such | 
|  | as worst case algorithmic complexities, or guarantees around iterator | 
|  | invalidation. | 
|  |  | 
|  | If you need a non-intrusive container, this is where you should be looking. | 
|  |  | 
|  | While there have been draft proposals, the C++ standard library does _not_ | 
|  | currently offer any intrusive container implementations. Similar to `boost::`, | 
|  | `fbl::` attempts to fill this gap, but in a much more limited/focused fashion. | 
|  | The algorithms backing the types of containers types offered by `fbl::` are | 
|  | explicit and embedded in the names of the container types themselves. The APIs | 
|  | are inspired by and attempt to emulate those offered by `std::`, however there | 
|  | are some differences as a result of the fundamental differences between | 
|  | intrusive and non-intrusive containers. There are currently 3 primary container | 
|  | types defined by `fbl::` and one composed container type. They are: | 
|  |  | 
|  | * `fbl::SinglyLinkedList<>` | 
|  | * `fbl::DoublyLinkedList<>` | 
|  | * `fbl::WAVLTree<>` | 
|  | * `fbl::HashTable<>` | 
|  |  | 
|  | `fbl::HashTable<>` is considered to be a composed container type as it offers a | 
|  | fixed bucket count hash table which resolves collision using chaining where the | 
|  | bucket type may be either `fbl::SinglyLinkedList<>` or | 
|  | `fbl::DoublyLinkedList<>`. Please refer to [Setting up build | 
|  | dependencies](getting_started.md#build-dependencies) for details on which files | 
|  | need to be included in order to use the various container types. | 
|  |  | 
|  | ## Terminology {#terminology} | 
|  |  | 
|  | Here are some terms and their definitions which will be used throughout this | 
|  | guide. | 
|  |  | 
|  | * Container : An object which tracks a collection of references to objects using | 
|  | a specific algorithm. | 
|  | * Node or Node Storage : The set of data which serve as the bookkeeping which | 
|  | allows an object to exist in a specific container type. | 
|  | * Listable, Containable, Mix-in : An object which can be used as a base class to | 
|  | allow an object to be easily stored in a container of a specific type. | 
|  | * Raw pointer : A non-managed `T*`-style pointer to an object. |