ACPI ERST DEVICE
================

The ACPI ERST device is utilized to support the ACPI Error Record
Serialization Table, ERST, functionality. This feature is designed for
storing error records in persistent storage for future reference
and/or debugging.

The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
(APEI)", and specifically subsection "Error Serialization", outlines a
method for storing error records into persistent storage.

The format of error records is described in the UEFI specification[2],
in Appendix N "Common Platform Error Record".

While the ACPI specification allows for an NVRAM "mode" (see
GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is
directly exposed for direct access by the OS/guest, this device
implements the non-NVRAM "mode". This non-NVRAM "mode" is what is
implemented by most BIOS (since flash memory requires programming
operations in order to update its contents). Furthermore, as of the
time of this writing, Linux only supports the non-NVRAM "mode".


Background/Motivation
---------------------

Linux uses the persistent storage filesystem, pstore, to record
information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
independent of, and runs before, kdump.  In certain scenarios (ie.
hosts/guests with root filesystems on NFS/iSCSI where networking
software and/or hardware fails, and thus kdump fails), pstore may
contain information available for post-mortem debugging.

Two common storage backends for the pstore filesystem are ACPI ERST
and UEFI. Most BIOS implement ACPI ERST. UEFI is not utilized in all
guests. With QEMU supporting ACPI ERST, it becomes a viable pstore
storage backend for virtual machines (as it is now for bare metal
machines).

Enabling support for ACPI ERST facilitates a consistent method to
capture kernel panic information in a wide range of guests: from
resource-constrained microvms to very large guests, and in particular,
in direct-boot environments (which would lack UEFI run-time services).

Note that Microsoft Windows also utilizes the ACPI ERST for certain
crash information, if available[3].


Configuration|Usage
-------------------

To use ACPI ERST, a memory-backend-file object and acpi-erst device
can be created, for example:

 qemu ...
 -object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \
 -device acpi-erst,memdev=erstnvram

For proper operation, the ACPI ERST device needs a memory-backend-file
object with the following parameters:

 - id: The id of the memory-backend-file object is used to associate
   this memory with the acpi-erst device.
 - size: The size of the ACPI ERST backing storage. This parameter is
   required.
 - mem-path: The location of the ACPI ERST backing storage file. This
   parameter is also required.
 - share: The share=on parameter is required so that updates to the
   ERST backing store are written to the file.

and ERST device:

 - memdev: Is the object id of the memory-backend-file.
 - record_size: Specifies the size of the records (or slots) in the
   backend storage. Must be a power of two value greater than or
   equal to 4096 (PAGE_SIZE).


PCI Interface
-------------

The ERST device is a PCI device with two BARs, one for accessing the
programming registers, and the other for accessing the record exchange
buffer.

BAR0 contains the programming interface consisting of ACTION and VALUE
64-bit registers.  All ERST actions/operations/side effects happen on
the write to the ACTION, by design. Any data needed by the action must
be placed into VALUE prior to writing ACTION.  Reading the VALUE
simply returns the register contents, which can be updated by a
previous ACTION.

BAR1 contains the 8KiB record exchange buffer, which is the
implemented maximum record size.


Backend Storage Format
----------------------

The backend storage is divided into fixed size "slots", 8KiB in
length, with each slot storing a single record.  Not all slots need to
be occupied, and they need not be occupied in a contiguous fashion.
The ability to clear/erase specific records allows for the formation
of unoccupied slots.

Slot 0 contains a backend storage header that identifies the contents
as ERST and also facilitates efficient access to the records.
Depending upon the size of the backend storage, additional slots will
be designated to be a part of the slot 0 header. For example, at 8KiB,
the slot 0 header can accomodate 1021 records. Thus a storage size
of 8MiB (8KiB * 1024) requires an additional slot for use by the
header. In this scenario, slot 0 and slot 1 form the backend storage
header, and records can be stored starting at slot 2.

Below is an example layout of the backend storage format (for storage
size less than 8MiB). The size of the storage is a multiple of 8KiB,
and contains N number of slots to store records. The example below
shows two records (in CPER format) in the backend storage, while the
remaining slots are empty/available.

::

 Slot   Record
        <------------------ 8KiB -------------------->
        +--------------------------------------------+
    0   | storage header                             |
        +--------------------------------------------+
    1   | empty/available                            |
        +--------------------------------------------+
    2   | CPER                                       |
        +--------------------------------------------+
    3   | CPER                                       |
        +--------------------------------------------+
  ...   |                                            |
        +--------------------------------------------+
    N   | empty/available                            |
        +--------------------------------------------+

The storage header consists of some basic information and an array
of CPER record_id's to efficiently access records in the backend
storage.

All fields in the header are stored in little endian format.

::

  +--------------------------------------------+
  | magic                                      | 0x0000
  +--------------------------------------------+
  | record_offset        | record_size         | 0x0008
  +--------------------------------------------+
  | record_count         | reserved | version  | 0x0010
  +--------------------------------------------+
  | record_id[0]                               | 0x0018
  +--------------------------------------------+
  | record_id[1]                               | 0x0020
  +--------------------------------------------+
  | record_id[...]                             |
  +--------------------------------------------+
  | record_id[N]                               | 0x1FF8
  +--------------------------------------------+

The 'magic' field contains the value 0x524F545354535245.

The 'record_size' field contains the value 0x2000, 8KiB.

The 'record_offset' field points to the first record_id in the array,
0x0018.

The 'version' field contains 0x0100, the first version.

The 'record_count' field contains the number of valid records in the
backend storage.

The 'record_id' array fields are the 64-bit record identifiers of the
CPER record in the corresponding slot. Stated differently, the
location of a CPER record_id in the record_id[] array provides the
slot index for the corresponding record in the backend storage.

Note that, for example, with a backend storage less than 8MiB, slot 0
contains the header, so the record_id[0] will never contain a valid
CPER record_id. Instead slot 1 is the first available slot and thus
record_id_[1] may contain a CPER.

A 'record_id' of all 0s or all 1s indicates an invalid record (ie. the
slot is available).


References
----------

[1] "Advanced Configuration and Power Interface Specification",
    version 4.0, June 2009.

[2] "Unified Extensible Firmware Interface Specification",
    version 2.1, October 2008.

[3] "Windows Hardware Error Architecture", specfically
    "Error Record Persistence Mechanism".
