docs/development/languages/c-cpp/fbl_containers_guide/introduction.md - fuchsia - Git at Google

 # Introduction to `fbl` intrusive containers

 Intrusive containers (or intrusive data structures) are the type of containers
 offered by the Fuchsia Base Library (`fbl::`).

 This document will:

 1. [Define what an intrusive container is.](#what-is-an-intrusive-container)
 2. [Compare the differences between intrusive and non-intrusive containers.](#intrusive-vs-not)
 3. [Discuss the similarities and differences between `fbl::` and `std::` containers.](#fbl-vs-std)
 4. [Introduce some basic terminology that will be used throughout this guide.](#terminology)

 ## What is an intrusive container? {#what-is-an-intrusive-container}

 An intrusive container is a data structure that can be used to hold a
 collection of objects where the bookkeeping used to track membership in the
 container is stored in the objects themselves instead of holding the object
 alongside of the bookkeeping in a structure. To highlight the distinction,
 consider a structure that defines a point in a 2-dimensional coordinate system.

 ```cpp
 struct Point {
   float x, y;
 };
 ```

 In a traditional non-intrusive implementation of a doubly linked list, a node in
 the list might look something like this

 ```cpp
 struct ListNode {
   Point val;
   ListNode *next, *prev;
 };
 ```

 While an intrusive version of the same list would be

 ```cpp
 struct Point {
   float x, y;
   Point *next, *prev;
 };
 ```

 Initially, these appear to be very similar approaches. The distinction seems
 mostly like a question of which object is contained within the other. As time goes
 on, however, the differences become more apparent.

 Adding a point to a non-intrusive list of points means:

 1. Allocating a `ListNode`
 1. Copying the `Point` you want to add to the list into the `val` member of the `ListNode`
 1. Linking the `next`/`prev` pointers of the `ListNode` into the list.

 Removing the point is the same process in reverse.

 1. The pointers are unlinked.
 1. The contents of the `Point` are copied out to local storage.
 1. Finally, the `ListNode` is de-allocated.

 When using an intrusive list, the add/remove operations skip the
 allocation/deallocation and copying of the `Point`'s members. The `next`/`prev`
 pointers, which exist in the point object already, are simply linked into the
 list during the add operation and unlinked during the remove. While the
 intrusive form does not have to perform any allocations or copy any structures
 to add elements to a list, every `Point` in the system is must carry around the
 overhead of the next/prev bookkeeping, even while not in a list.

 While a small structure or primitive data type might be held by value in a
 non-intrusive list, when objects become larger, it becomes more common to track
 the object using some appropriate pointer type instead of storing the object by
 value. For example

 ```cpp
 struct ListNode {
   std::shared_ptr<LargeObject> ptr;
   ListNode *next, *prev;
 };
 ```

 The "value" being stored in this list is an instance of a shared pointer, so
 while adding or removing a pointer to a `LargeObject` to a list still requires
 management of the allocation used to store the node state, the object itself no
 longer needs to be copied when being added to a list. Additionally, this
 approach allows a single instance of a `LargeObject` to exist in multiple
 containers at the same time as the nodes simply track references to the objects
 and not copies of the objects. A simple implementation of an intrusive form of
 this would put the `next`/`prev` pointers into `LargeObject` directly implying
 that an instance of a `LargeObject` could be a member of exactly 0 or 1
 intrusive lists, but not more.

 ## Intrusive vs. non-intrusive containers {#intrusive-vs-not}

 The primary advantage of an intrusive container approach to tracking objects is
 the lack of allocations when adding/removing elements to/from a container. In
 some (usually specialized) environments, heap interactions can introduce a
 potential failure path, which would ideally not exist. Additionally,
 interactions with heaps usually involve interactions with locks/mutexes, which
 could result in involuntary blocking. This might introduce undesirable timing
 indeterminism, or in some cases may even not be an option if the code is running
 in a context in which blocking is not allowed.

 For example, what happens if code needs to add a bookkeeping structure to a list
 during a hard interrupt handler in the kernel if the allocation for the
 bookkeeping structure cannot be allocated, or if allocation of the bookkeeping
 would require the IRQ handler to block due to lock contention (something that
 cannot happen in the exception handler)? By allocating the container
 bookkeeping as part of the object itself ahead of time, an intrusive container
 approach to this problem allows these issues to be avoided.

 In general, intrusive containers can provide performance advantages when the
 design of the system allows the implementer to know at compile time all of the
 various containers that an object will ever exist in. This does not mean that they
 are always the best choice, however, as they come with limitations as well.

 Finally, for many types of intrusive containers, holding a reference to an
 object implies that you have located that object in all of the container types
 it can possible exist in where the same is not true when an object is tracked by
 reference in multiple non-intrusive container instances. For example, an object
 which could exist in a balanced binary search tree as well as in a doubly linked
 list (both intrusive) could be located in O(log n) time in the tree, and then
 removed from the list in O(1) time, instead of needing to find the object in the
 list in O(n) time before removing it.

 A short list of some pros and cons of the two approaches might look something
 like this:

 Non-intrusive:

 - Pros

   - Can be used to track simple objects by value, even primitive data types
   like `int` or `float`

   - Does not require changing the definition of the object itself in order to
   manage it in a different container. This is especially helpful when an object
   `O` is defined in Library A, but a user wishes to create a collection of
   `A::O`s in their own program and cannot reasonably re-define `O`.

   - Objects tracked by reference can easily exist in multiple containers
   independently.

 - Cons
   - Requires independent allocation management of the bookkeeping, usually on
   every add or remove operation. This management frequently implies hidden heap
   interactions, which can introduce overhead, timing uncertainty, and additional
   complexity as users need to manage potential failure paths.

   - Finding the location of an object in container A provides no information
   about the location of the object in container B. For example, finding an
   object contained in a `std::map` with an O(log n) operation does not
   help me to locate the same object that is simultaneously contained in a
   `std::deque`. If I want to remove the object from the `std::deque`, I must find
   it there with a separate O(n) operation first.

 Intrusive:

 - Pros

   - Minimal overhead to join or leave a container, and no chance of failure
   or involuntary yielding due to lock contention as the bookkeeping overhead was
   paid up-front during object creation.

   - Finding an instance of an object implicitly finds the instance of the
   object in all of the containers it can exist in.

   - Knowledge of whether an object is currently container in container type A
   is a property of the object and can always be tested (from the object) in O(1)
   time.

 - Cons

   - The ability to be contained is a fundamental property of the object. In
   order to change the number or types of the containers an object may exist in,
   the definition of the object itself must be changed, and all of the code that
   depends on that definition must be re-compiled.

   - Only objects themselves can be tracked. Primitive data types do not have
   any place to store extra bookkeeping information and cannot be tracked in an
   intrusive fashion.

 ## `fbl::` vs. `std::` {#fbl-vs-std}

 The C++ Standard Library offers a large set of implementation agnostic
 non-intrusive containers. For example, `std::map` is an ordered associative
 container, however there are no requirements that an implementation of the
 standard library use any specific data structure in order to implement
 `std::map`, only that the implementation chosen provides certain
 guarantees for various container operations as specified by the standard, such
 as worst case algorithmic complexities, or guarantees around iterator
 invalidation.

 If you need a non-intrusive container, this is where you should be looking.

 While there have been draft proposals, the C++ standard library does _not_
 currently offer any intrusive container implementations. Similar to `boost::`,
 `fbl::` attempts to fill this gap, but in a much more limited/focused fashion.
 The algorithms backing the types of containers types offered by `fbl::` are
 explicit and embedded in the names of the container types themselves. The APIs
 are inspired by and attempt to emulate those offered by `std::`, however there
 are some differences as a result of the fundamental differences between
 intrusive and non-intrusive containers. There are currently 3 primary container
 types defined by `fbl::` and one composed container type. They are:

 * `fbl::SinglyLinkedList<>`
 * `fbl::DoublyLinkedList<>`
 * `fbl::WAVLTree<>`
 * `fbl::HashTable<>`

 `fbl::HashTable<>` is considered to be a composed container type as it offers a
 fixed bucket count hash table that resolves collision using chaining where the
 bucket type may be either `fbl::SinglyLinkedList<>` or
 `fbl::DoublyLinkedList<>`. Please refer to [Setting up build
 dependencies](getting_started.md#build-dependencies) for details on which files
 need to be included in order to use the various container types.

 ## Terminology {#terminology}

 Here are some definitions of terms used throughout this
 guide.

 * Container : An object that tracks a collection of references to objects using
   a specific algorithm.
 * Node or Node Storage : The set of data that provides the bookkeeping that
   allows an object to exist in a specific container type.
 * Listable, Containable, Mix-in : An object that can be used as a base class to
   allow an object to be easily stored in a container of a specific type.
 * Raw pointer : A non-managed `T*`-style pointer to an object.
	# Introduction to `fbl` intrusive containers

	Intrusive containers (or intrusive data structures) are the type of containers
	offered by the Fuchsia Base Library (`fbl::`).

	This document will:

	1. [Define what an intrusive container is.](#what-is-an-intrusive-container)
	2. [Compare the differences between intrusive and non-intrusive containers.](#intrusive-vs-not)
	3. [Discuss the similarities and differences between `fbl::` and `std::` containers.](#fbl-vs-std)
	4. [Introduce some basic terminology that will be used throughout this guide.](#terminology)

	## What is an intrusive container? {#what-is-an-intrusive-container}

	An intrusive container is a data structure that can be used to hold a
	collection of objects where the bookkeeping used to track membership in the
	container is stored in the objects themselves instead of holding the object
	alongside of the bookkeeping in a structure. To highlight the distinction,
	consider a structure that defines a point in a 2-dimensional coordinate system.

	```cpp
	struct Point {
	float x, y;
	};
	```

	In a traditional non-intrusive implementation of a doubly linked list, a node in
	the list might look something like this

	```cpp
	struct ListNode {
	Point val;
	ListNode next, prev;
	};
	```

	While an intrusive version of the same list would be

	```cpp
	struct Point {
	float x, y;
	Point next, prev;
	};
	```

	Initially, these appear to be very similar approaches. The distinction seems
	mostly like a question of which object is contained within the other. As time goes
	on, however, the differences become more apparent.

	Adding a point to a non-intrusive list of points means:

	1. Allocating a `ListNode`
	1. Copying the `Point` you want to add to the list into the `val` member of the `ListNode`
	1. Linking the `next`/`prev` pointers of the `ListNode` into the list.

	Removing the point is the same process in reverse.

	1. The pointers are unlinked.
	1. The contents of the `Point` are copied out to local storage.
	1. Finally, the `ListNode` is de-allocated.

	When using an intrusive list, the add/remove operations skip the
	allocation/deallocation and copying of the `Point`'s members. The `next`/`prev`
	pointers, which exist in the point object already, are simply linked into the
	list during the add operation and unlinked during the remove. While the
	intrusive form does not have to perform any allocations or copy any structures
	to add elements to a list, every `Point` in the system is must carry around the
	overhead of the next/prev bookkeeping, even while not in a list.

	While a small structure or primitive data type might be held by value in a
	non-intrusive list, when objects become larger, it becomes more common to track
	the object using some appropriate pointer type instead of storing the object by
	value. For example

	```cpp
	struct ListNode {
	std::shared_ptr<LargeObject> ptr;
	ListNode next, prev;
	};
	```

	The "value" being stored in this list is an instance of a shared pointer, so
	while adding or removing a pointer to a `LargeObject` to a list still requires
	management of the allocation used to store the node state, the object itself no
	longer needs to be copied when being added to a list. Additionally, this
	approach allows a single instance of a `LargeObject` to exist in multiple
	containers at the same time as the nodes simply track references to the objects
	and not copies of the objects. A simple implementation of an intrusive form of
	this would put the `next`/`prev` pointers into `LargeObject` directly implying
	that an instance of a `LargeObject` could be a member of exactly 0 or 1
	intrusive lists, but not more.

	## Intrusive vs. non-intrusive containers {#intrusive-vs-not}

	The primary advantage of an intrusive container approach to tracking objects is
	the lack of allocations when adding/removing elements to/from a container. In
	some (usually specialized) environments, heap interactions can introduce a
	potential failure path, which would ideally not exist. Additionally,
	interactions with heaps usually involve interactions with locks/mutexes, which
	could result in involuntary blocking. This might introduce undesirable timing
	indeterminism, or in some cases may even not be an option if the code is running
	in a context in which blocking is not allowed.

	For example, what happens if code needs to add a bookkeeping structure to a list
	during a hard interrupt handler in the kernel if the allocation for the
	bookkeeping structure cannot be allocated, or if allocation of the bookkeeping
	would require the IRQ handler to block due to lock contention (something that
	cannot happen in the exception handler)? By allocating the container
	bookkeeping as part of the object itself ahead of time, an intrusive container
	approach to this problem allows these issues to be avoided.

	In general, intrusive containers can provide performance advantages when the
	design of the system allows the implementer to know at compile time all of the
	various containers that an object will ever exist in. This does not mean that they
	are always the best choice, however, as they come with limitations as well.

	Finally, for many types of intrusive containers, holding a reference to an
	object implies that you have located that object in all of the container types
	it can possible exist in where the same is not true when an object is tracked by
	reference in multiple non-intrusive container instances. For example, an object
	which could exist in a balanced binary search tree as well as in a doubly linked
	list (both intrusive) could be located in O(log n) time in the tree, and then
	removed from the list in O(1) time, instead of needing to find the object in the
	list in O(n) time before removing it.

	A short list of some pros and cons of the two approaches might look something
	like this:

	Non-intrusive:

	- Pros

	- Can be used to track simple objects by value, even primitive data types
	like `int` or `float`

	- Does not require changing the definition of the object itself in order to
	manage it in a different container. This is especially helpful when an object
	`O` is defined in Library A, but a user wishes to create a collection of
	`A::O`s in their own program and cannot reasonably re-define `O`.

	- Objects tracked by reference can easily exist in multiple containers
	independently.

	- Cons
	- Requires independent allocation management of the bookkeeping, usually on
	every add or remove operation. This management frequently implies hidden heap
	interactions, which can introduce overhead, timing uncertainty, and additional
	complexity as users need to manage potential failure paths.

	- Finding the location of an object in container A provides no information
	about the location of the object in container B. For example, finding an
	object contained in a `std::map` with an O(log n) operation does not
	help me to locate the same object that is simultaneously contained in a
	`std::deque`. If I want to remove the object from the `std::deque`, I must find
	it there with a separate O(n) operation first.

	Intrusive:

	- Pros

	- Minimal overhead to join or leave a container, and no chance of failure
	or involuntary yielding due to lock contention as the bookkeeping overhead was
	paid up-front during object creation.

	- Finding an instance of an object implicitly finds the instance of the
	object in all of the containers it can exist in.

	- Knowledge of whether an object is currently container in container type A
	is a property of the object and can always be tested (from the object) in O(1)
	time.

	- Cons

	- The ability to be contained is a fundamental property of the object. In
	order to change the number or types of the containers an object may exist in,
	the definition of the object itself must be changed, and all of the code that
	depends on that definition must be re-compiled.

	- Only objects themselves can be tracked. Primitive data types do not have
	any place to store extra bookkeeping information and cannot be tracked in an
	intrusive fashion.

	## `fbl::` vs. `std::` {#fbl-vs-std}

	The C++ Standard Library offers a large set of implementation agnostic
	non-intrusive containers. For example, `std::map` is an ordered associative
	container, however there are no requirements that an implementation of the
	standard library use any specific data structure in order to implement
	`std::map`, only that the implementation chosen provides certain
	guarantees for various container operations as specified by the standard, such
	as worst case algorithmic complexities, or guarantees around iterator
	invalidation.

	If you need a non-intrusive container, this is where you should be looking.

	While there have been draft proposals, the C++ standard library does _not_
	currently offer any intrusive container implementations. Similar to `boost::`,
	`fbl::` attempts to fill this gap, but in a much more limited/focused fashion.
	The algorithms backing the types of containers types offered by `fbl::` are
	explicit and embedded in the names of the container types themselves. The APIs
	are inspired by and attempt to emulate those offered by `std::`, however there
	are some differences as a result of the fundamental differences between
	intrusive and non-intrusive containers. There are currently 3 primary container
	types defined by `fbl::` and one composed container type. They are:

	* `fbl::SinglyLinkedList<>`
	* `fbl::DoublyLinkedList<>`
	* `fbl::WAVLTree<>`
	* `fbl::HashTable<>`

	`fbl::HashTable<>` is considered to be a composed container type as it offers a
	fixed bucket count hash table that resolves collision using chaining where the
	bucket type may be either `fbl::SinglyLinkedList<>` or
	`fbl::DoublyLinkedList<>`. Please refer to [Setting up build
	dependencies](getting_started.md#build-dependencies) for details on which files
	need to be included in order to use the various container types.

	## Terminology {#terminology}

	Here are some definitions of terms used throughout this
	guide.

	* Container : An object that tracks a collection of references to objects using
	a specific algorithm.
	* Node or Node Storage : The set of data that provides the bookkeeping that
	allows an object to exist in a specific container type.
	* Listable, Containable, Mix-in : An object that can be used as a base class to
	allow an object to be easily stored in a container of a specific type.
	* Raw pointer : A non-managed `T*`-style pointer to an object.