docs/drivers/freedreno/hw/lrz.rst - third_party/mesa - Git at Google

 Low Resolution Z Buffer
 =======================

 This doc is based on a6xx HW reverse engineering, a5xx should be similar to
 a6xx before gen3.

 Low Resolution Z buffer is very similar to a depth prepass that helps
 the HW to avoid executing the fragment shader on those fragments that will
 be subsequently discarded by the depth test afterwards.

 The interesting part of this feature is that it allows applications
 to submit the vertices in any order.

 Citing official Adreno documentation:

 ::

   [A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
   depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
   and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
   is then used during the rendering pass to reject pixels efficiently before testing
   against the full resolution Z-buffer.

 TODO: a7xx

 Limitations
 -----------

 There are two main limitations of LRZ:

 - Since LRZ is an early depth test, such test cannot be used when late-z is required;
 - LRZ buffer could be formed only in one direction, changing depth comparison directions
   without disabling LRZ would lead to a malformed LRZ buffer.

 Pre-a650 (before gen3)
 ----------------------

 The direction is fully tracked on CPU. In render pass LRZ starts with
 unknown direction, the direction is set first time when depth write occurs
 and if it does change afterwards then the direction becomes invalid and LRZ is
 disabled for the rest of the render pass.

 Since the direction is not tracked by the GPU, it's impossible to know whether
 LRZ is enabled during construction of secondary command buffers.

 For the same reason, it's impossible to reuse LRZ between render passes.

 A650+ (gen3+)
 -------------

 Now LRZ direction can be tracked on GPU. There are two parts:

 - Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
 - Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.

 The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
 is used, its direction is compared to the previously known direction
 and direction byte is set to disabled when directions are incompatible.

 Additionally, to reuse LRZ between render passes, ``GRAS_LRZ_CNTL`` checks
 if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
 stored in the buffer. If not, LRZ is disabled. This is necessary
 because depth buffer may have several layers and mip levels, while the
 LRZ buffer represents only a single layer + mip level.

 LRZ Fast-Clear
 --------------

 The LRZ fast-clear buffer is initialized to zeroes and read/written
 when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
 ``0`` means block has original depth clear value, and ``1`` means that the
 corresponding block in LRZ has been modified.

 LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
 written the LRZ block which corresponds to a single fast-clear bit is cleared:

 - To ``0.0`` if depth comparison is ``GREATER``
 - To ``1.0`` if depth comparison is ``LESS``

 This way it's always valid to fast-clear.

 LRZ Precision
 -------------

 LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
 not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
 This especially raises questions in context of fast-clear, if fast-clear
 uses a value which cannot be precisely represented by LRZ - we wouldn't
 be able to round it in the correct direction since direction is tracked
 on GPU.

 However, it seems that depth comparisons with LRZ values have some "slack"
 and nothing special should be done for such depth clear values.

 How it was tested:

 - Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``

   - LRZ buffer contains all zeroes.

 - Do draws and check whether all samples are passing:

   - ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
   - ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
   - ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
   - ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
   - ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.

 In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.

 LRZ Caches
 ----------

 ``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:

 - Cache for fast-clear buffer;
 - Cache for direction byte + depth view params.

 They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
 the caches should be flushed with ``LRZ_FLUSH`` afterwards.

 ``GRAS_LRZ_CNTL`` reads from these caches.
	Low Resolution Z Buffer
	=======================

	This doc is based on a6xx HW reverse engineering, a5xx should be similar to
	a6xx before gen3.

	Low Resolution Z buffer is very similar to a depth prepass that helps
	the HW to avoid executing the fragment shader on those fragments that will
	be subsequently discarded by the depth test afterwards.

	The interesting part of this feature is that it allows applications
	to submit the vertices in any order.

	Citing official Adreno documentation:

	::

	[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
	depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
	and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
	is then used during the rendering pass to reject pixels efficiently before testing
	against the full resolution Z-buffer.

	TODO: a7xx

	Limitations
	-----------

	There are two main limitations of LRZ:

	- Since LRZ is an early depth test, such test cannot be used when late-z is required;
	- LRZ buffer could be formed only in one direction, changing depth comparison directions
	without disabling LRZ would lead to a malformed LRZ buffer.

	Pre-a650 (before gen3)
	----------------------

	The direction is fully tracked on CPU. In render pass LRZ starts with
	unknown direction, the direction is set first time when depth write occurs
	and if it does change afterwards then the direction becomes invalid and LRZ is
	disabled for the rest of the render pass.

	Since the direction is not tracked by the GPU, it's impossible to know whether
	LRZ is enabled during construction of secondary command buffers.

	For the same reason, it's impossible to reuse LRZ between render passes.

	A650+ (gen3+)
	-------------

	Now LRZ direction can be tracked on GPU. There are two parts:

	- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
	- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.

	The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
	is used, its direction is compared to the previously known direction
	and direction byte is set to disabled when directions are incompatible.

	Additionally, to reuse LRZ between render passes, ``GRAS_LRZ_CNTL`` checks
	if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
	stored in the buffer. If not, LRZ is disabled. This is necessary
	because depth buffer may have several layers and mip levels, while the
	LRZ buffer represents only a single layer + mip level.

	LRZ Fast-Clear
	--------------

	The LRZ fast-clear buffer is initialized to zeroes and read/written
	when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
	``0`` means block has original depth clear value, and ``1`` means that the
	corresponding block in LRZ has been modified.

	LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
	written the LRZ block which corresponds to a single fast-clear bit is cleared:

	- To ``0.0`` if depth comparison is ``GREATER``
	- To ``1.0`` if depth comparison is ``LESS``

	This way it's always valid to fast-clear.

	LRZ Precision
	-------------

	LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
	not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
	This especially raises questions in context of fast-clear, if fast-clear
	uses a value which cannot be precisely represented by LRZ - we wouldn't
	be able to round it in the correct direction since direction is tracked
	on GPU.

	However, it seems that depth comparisons with LRZ values have some "slack"
	and nothing special should be done for such depth clear values.

	How it was tested:

	- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``

	- LRZ buffer contains all zeroes.

	- Do draws and check whether all samples are passing:

	- ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
	- ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
	- ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
	- ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
	- ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.

	In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.

	LRZ Caches
	----------

	``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:

	- Cache for fast-clear buffer;
	- Cache for direction byte + depth view params.

	They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
	the caches should be flushed with ``LRZ_FLUSH`` afterwards.

	``GRAS_LRZ_CNTL`` reads from these caches.