docs/drivers/freedreno/fdm.rst - third_party/mesa - Git at Google

 Fragment Density Map
 ====================

 ``VK_EXT_fragment_density_map`` is an extension which is intended to allow
 users to render parts of the screen at a lower resolution. It is designed to be
 implemented on tiled rendering GPU architectures such as Adreno, and the
 intention is that it is implemented by rendering some of the tiles at a lower
 resolution and scaling them up when resolving to system memory or when sampling
 the resulting image. This inherently means that it is "all or nothing," that
 is, it must be enabled or disabled for the entire render pass. While the idea is
 simple, the implementation in turnip is very subtle with lots of
 interactions with various different features. This page attempts to document
 the main principles behind the implementation.

 Coordinate Space Soup
 ---------------------

 In order to render a tile at lower resolution, we have to override the user's
 viewport and scissor for each tile depending on the scaling factor provided by
 the user. This becomes complicated fast, so let's start by defining a few
 coordinate spaces that we'll have to work with.

 Framebuffer space
 ^^^^^^^^^^^^^^^^^

 This is the space of the final rendered image. From the user's perspective
 everything is specified in this space, and fragments created by the rasterizer
 appear to be larger than 1 pixel. But this is not what actually happens in the
 hardware, it is a fiction created by the driver. The other spaces below are
 what the hardware actually "sees".

 GMEM Space
 ^^^^^^^^^^

 This space exists whenever tiled rendering/GMEM is used, even without FDM. It
 is the space used to access GMEM, with the origin at the upper left of the
 tile. The hardware automatically transforms rendering space into GMEM space
 whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
 origin of this space will be called :math:`b_{cs}`, the common bin start, for
 reasons that are explained below. When using FDM, coordinates in this space
 must be multiplied by the scaling factor :math:`s` derived from the fragment
 density map, or equivalently divided by the fragment area (as defined by the
 Vulkan specification), with the origin still at the upper left of the tile. For
 example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
 without FDM and all coordinates in this space must be divided by 2.

 Rendering space
 ^^^^^^^^^^^^^^^

 This is the space in which the hardware rasterizer operates and produces
 fragments. Normally this is the same as framebuffer space, but with FDM it is
 not. We transform the viewport and scissor from framebuffer space to
 rendering space by patching them per-tile in the driver and then when we
 resolve the tile we scale the resulting tile back to the correct resolution by
 blitting from the rendering space source to the framebuffer space destination.

 In order to come up with the correct transform from framebuffer space to
 rendering space, it has to shrink the coordinates by :math:`s` while
 mapping the original bin start in framebuffer space :math:`b_s` to
 :math:`b_{cs}`. Since :math:`b_{cs}` is entirely defined by the driver when
 programming ``*_WINDOW_OFFSET``, one tempting way to do this is to just
 multiply by :math:`s` and define :math:`b_{cs} = b_s * s`. It turns out,
 however, that this doesn't work. A key requirement is to handle cases where the
 same scene is rendered in multiple different views at the same time using
 ``VK_KHR_multiview``, as in VR use-cases, and in this case we want :math:`s` to
 vary per view, but :math:`b_{cs}` is always the same for every view because
 there is only one ``*_WINDOW_OFFSET`` register for all layers (hence the name).

 We follow the blob by leaving :math:`b_{cs}` the same regardless of whether FDM
 is enabled or not. This means that normally :math:`b_s = b_{cs}`, although this
 is not the case if ``VK_EXT_fragment_density_map_offset`` is in use and the
 bins are shifted per-view. Since the coordinates need to be scaled by :math:`s`,
 we know that the transform needs to look like :math:`x' = s * x + o`, where
 only the offset :math:`o` is free. Plugging in the constraint that :math:`b_s`
 maps to :math:`b_{cs}`, we get that :math:`b_{cs} = s * b_s + o` or
 :math:`o = b_{cs} - s * b_s`. This is the function computed by
 ``tu_fdm_per_bin_offset`` and used to calculate the transform for the viewport,
 scissor, and ``gl_FragCoord``. One critical thing is that the offset must be an
 integer, or in other words the framebuffer space bin start :math:`b_s` must be
 a multiple of :math:`1 / s`.  This is a natural constraint anyway, because if
 it wasn't the case then the bin would start in the middle of a fragment which
 isn't possible to handle correctly.

 Viewport and Scissor Patching
 -----------------------------

 In order to have :math:`s` differ per view, we have to be able to override the
 viewport per view. That is, we need to transform the viewport for each view
 differently. If there is only one viewport, then we duplicate the user's
 viewport for each view and transform it using the :math:`b_s` and :math:`s` for
 that view, and we set a "per-view viewport" bit to select the viewport per view
 instead of using the default viewport 0. When
 ``VK_VALVE_fragment_density_map_layered`` is in use, we instead have to insert
 shader code to achieve the same thing.

 If the user specifies multiple viewports but they are per-view because
 ``VK_QCOM_multiview_per_view_viewport`` is enabled, then we can just set the
 per-view viewport bit and transform each user viewport individually by the
 corresponding scale. But if the user explicitly writes ``gl_ViewportIndex``,
 then there is nothing we can do and we have to make :math:`s` the same for all
 views by conservatively taking the minimum. Then we apply :math:`s` to all of
 the user-specified viewports.

 Because the bin size is now per-view, the usual mechanism of
 ``*_WINDOW_SCISSOR`` for clipping fragments outside the bin doesn't work.
 Instead the driver needs to intersect the transformed user-specified scissor
 with the transformed rendering-space bin coordinates, effectively replacing
 ``*_WINDOW_SCISSOR``.

 Fragment density map offset
 ---------------------------

 In order to "properly" implement ``VK_EXT_fragment_density_map_offset``, we
 need to add an extra row/column of bins at the end and then shift the binning
 grid up and to the left by an offset :math:`b_o`. This offset is based on the
 user's offset but has the opposite sign, i.e. when shifting the FDM to the left
 we have to shift the binning grid to the right, and once the user's offset
 becomes large enough then we "wrap around" and shift over the scaling factor
 :math:`s` to the next bin.  This has to happen per-view. In turnip the function
 that computes :math:`b_o` is called ``tu_bin_offset``. Each tile then gets an
 offseted start :math:`b_s = b_{cs} - b_o` except for the first row/column which
 only shrink in height/width respectively.

 If we cannot make :math:`s` per-view, then we also cannot make :math:`b_s`
 per-view and so we cannot shift the bins over. Therefore we fall back to only
 shifting where :math:`s` is sampled from, which produces jittery and jarring
 transitions when a bin suddenly changes resolution.

 Bin merging
 -----------

 FDM shrinks the size of the bin in GMEM, which results in a lot of wasteful
 unused extra space in GMEM. a7xx mitigates this by introducing "bin merging".
 If two tiles next to each other have the same scaling for each view, then we
 combine them into one tile, as long as the combined size in rendering space
 isn't larger than the original size of an unscaled bin in framebuffer space. We
 can even merge larger groups of tiles. The only hardware feature needed for
 this to work is the ability to merge the visibility streams for the tiles,
 which was added on a7xx by a new bitmask in ``CP_SET_BIN_DATA5`` and variants.
 Only bins within the same visibility stream/VSC pipe can be merged.

 Hardware scaling registers and LRZ
 ----------------------------------

 One disadvantage of FDM on a6xx is that low-resolution tiles cannot use
 LRZ, because the LRZ hardware is not aware of the transform between framebuffer
 space and rendering space and applies the framebuffer-space LRZ values to the
 rendering-space fragments. In order to fix this, a740 adds new offset and scale
 registers. The offset :math:`o'` is applied to fragment coordinates during
 rasterization *after* LRZ, so that viewport, scissor, and LRZ are in a
 new "LRZ space" while the other operations (resolves and unresolves, and
 attachment writes) still happen in the rendering space which is now offset.
 :math:`o'` is specified for each layer. The scale :math:`s` is the same as
 before, and it is used to multiply the fragment area covered by each LRZ pixel.

 Without ``VK_EXT_fragment_density_map_offset``, we can simply make LRZ space
 equal to framebuffer space scaled down by :math:`s`. That is, we can set
 :math:`o'` to what :math:`o` was before and then set :math:`o` to 0, only
 scaling down the viewport but not shifting it and letting the hardware handle
 the shift. Then LRZ pixels will be scaled up appropriately and everything will
 work. However, this doesn't work if there is a bin offset :math:`b_o`. In order
 to make binning work, we shift the viewport and scissor by :math:`b_o` when
 binning. Unfortunately the offset registers do not have any effect when
 binning, so rendering space and LRZ space have to be the same when binning, and
 the visibility stream is generated from rendering space. This means that LRZ
 space also has to be shifted over compared to framebuffer space, and the LRZ
 buffer must be overallocated when FDM offset might be used with it (which is
 signalled by ``VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT``) because
 the LRZ image will be shifted by :math:`b_o`.

 In order for LRZ to work, LRZ space when rendering must be equal to LRZ space
 when binning scaled down by :math:`s`. The origin of LRZ space when binning is
 :math:`-b_o`, and this must be mapped to 0. The transform from
 framebuffer space to LRZ space is :math:`x' = x * s + o`, and the transform
 from framebuffer space to rendering space is :math:`x'' = x * s + o + o'`.
 We get that :math:`o + o' = b_{cs} - b_s * s`, similar to before, and
 :math:`0 = -b_o * s + o` so that :math:`o = b_o * s` and finally
 :math:`o' = b_{cs} - b_s * s - b_o * s`, or after rearranging
 :math:`o' = b_{cs} - (b_s + b_o) * s`. For all tiles except those in the first
 row or column, this simplifies to :math:`o' = b_{cs} - b_{cs} * s` because
 :math:`b_{cs} = b_s + b_o`. For tiles in the first row or column, :math:`b_s`
 and :math:`b_{cs}` are both 0 in one of the coordinates, so it becomes
 :math:`o' = -b_o * s` in that coordinate. This isn't representable in hardware,
 both because it is negative (which can be worked around by artifically
 shifting :math:`b_{cs}`) but more importantly because it may not meet the
 alignment requirements for the hardware register (which is currently 8 pixels).
 We have to just disable LRZ in this case.
	Fragment Density Map
	====================

	``VK_EXT_fragment_density_map`` is an extension which is intended to allow
	users to render parts of the screen at a lower resolution. It is designed to be
	implemented on tiled rendering GPU architectures such as Adreno, and the
	intention is that it is implemented by rendering some of the tiles at a lower
	resolution and scaling them up when resolving to system memory or when sampling
	the resulting image. This inherently means that it is "all or nothing," that
	is, it must be enabled or disabled for the entire render pass. While the idea is
	simple, the implementation in turnip is very subtle with lots of
	interactions with various different features. This page attempts to document
	the main principles behind the implementation.

	Coordinate Space Soup
	---------------------

	In order to render a tile at lower resolution, we have to override the user's
	viewport and scissor for each tile depending on the scaling factor provided by
	the user. This becomes complicated fast, so let's start by defining a few
	coordinate spaces that we'll have to work with.

	Framebuffer space
	^^^^^^^^^^^^^^^^^

	This is the space of the final rendered image. From the user's perspective
	everything is specified in this space, and fragments created by the rasterizer
	appear to be larger than 1 pixel. But this is not what actually happens in the
	hardware, it is a fiction created by the driver. The other spaces below are
	what the hardware actually "sees".

	GMEM Space
	^^^^^^^^^^

	This space exists whenever tiled rendering/GMEM is used, even without FDM. It
	is the space used to access GMEM, with the origin at the upper left of the
	tile. The hardware automatically transforms rendering space into GMEM space
	whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
	origin of this space will be called :math:`b_{cs}`, the common bin start, for
	reasons that are explained below. When using FDM, coordinates in this space
	must be multiplied by the scaling factor :math:`s` derived from the fragment
	density map, or equivalently divided by the fragment area (as defined by the
	Vulkan specification), with the origin still at the upper left of the tile. For
	example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
	without FDM and all coordinates in this space must be divided by 2.

	Rendering space
	^^^^^^^^^^^^^^^

	This is the space in which the hardware rasterizer operates and produces
	fragments. Normally this is the same as framebuffer space, but with FDM it is
	not. We transform the viewport and scissor from framebuffer space to
	rendering space by patching them per-tile in the driver and then when we
	resolve the tile we scale the resulting tile back to the correct resolution by
	blitting from the rendering space source to the framebuffer space destination.

	In order to come up with the correct transform from framebuffer space to
	rendering space, it has to shrink the coordinates by :math:`s` while
	mapping the original bin start in framebuffer space :math:`b_s` to
	:math:`b_{cs}`. Since :math:`b_{cs}` is entirely defined by the driver when
	programming ``*_WINDOW_OFFSET``, one tempting way to do this is to just
	multiply by :math:`s` and define :math:`b_{cs} = b_s * s`. It turns out,
	however, that this doesn't work. A key requirement is to handle cases where the
	same scene is rendered in multiple different views at the same time using
	``VK_KHR_multiview``, as in VR use-cases, and in this case we want :math:`s` to
	vary per view, but :math:`b_{cs}` is always the same for every view because
	there is only one ``*_WINDOW_OFFSET`` register for all layers (hence the name).

	We follow the blob by leaving :math:`b_{cs}` the same regardless of whether FDM
	is enabled or not. This means that normally :math:`b_s = b_{cs}`, although this
	is not the case if ``VK_EXT_fragment_density_map_offset`` is in use and the
	bins are shifted per-view. Since the coordinates need to be scaled by :math:`s`,
	we know that the transform needs to look like :math:`x' = s * x + o`, where
	only the offset :math:`o` is free. Plugging in the constraint that :math:`b_s`
	maps to :math:`b_{cs}`, we get that :math:`b_{cs} = s * b_s + o` or
	:math:`o = b_{cs} - s * b_s`. This is the function computed by
	``tu_fdm_per_bin_offset`` and used to calculate the transform for the viewport,
	scissor, and ``gl_FragCoord``. One critical thing is that the offset must be an
	integer, or in other words the framebuffer space bin start :math:`b_s` must be
	a multiple of :math:`1 / s`. This is a natural constraint anyway, because if
	it wasn't the case then the bin would start in the middle of a fragment which
	isn't possible to handle correctly.

	Viewport and Scissor Patching
	-----------------------------

	In order to have :math:`s` differ per view, we have to be able to override the
	viewport per view. That is, we need to transform the viewport for each view
	differently. If there is only one viewport, then we duplicate the user's
	viewport for each view and transform it using the :math:`b_s` and :math:`s` for
	that view, and we set a "per-view viewport" bit to select the viewport per view
	instead of using the default viewport 0. When
	``VK_VALVE_fragment_density_map_layered`` is in use, we instead have to insert
	shader code to achieve the same thing.

	If the user specifies multiple viewports but they are per-view because
	``VK_QCOM_multiview_per_view_viewport`` is enabled, then we can just set the
	per-view viewport bit and transform each user viewport individually by the
	corresponding scale. But if the user explicitly writes ``gl_ViewportIndex``,
	then there is nothing we can do and we have to make :math:`s` the same for all
	views by conservatively taking the minimum. Then we apply :math:`s` to all of
	the user-specified viewports.

	Because the bin size is now per-view, the usual mechanism of
	``*_WINDOW_SCISSOR`` for clipping fragments outside the bin doesn't work.
	Instead the driver needs to intersect the transformed user-specified scissor
	with the transformed rendering-space bin coordinates, effectively replacing
	``*_WINDOW_SCISSOR``.

	Fragment density map offset
	---------------------------

	In order to "properly" implement ``VK_EXT_fragment_density_map_offset``, we
	need to add an extra row/column of bins at the end and then shift the binning
	grid up and to the left by an offset :math:`b_o`. This offset is based on the
	user's offset but has the opposite sign, i.e. when shifting the FDM to the left
	we have to shift the binning grid to the right, and once the user's offset
	becomes large enough then we "wrap around" and shift over the scaling factor
	:math:`s` to the next bin. This has to happen per-view. In turnip the function
	that computes :math:`b_o` is called ``tu_bin_offset``. Each tile then gets an
	offseted start :math:`b_s = b_{cs} - b_o` except for the first row/column which
	only shrink in height/width respectively.

	If we cannot make :math:`s` per-view, then we also cannot make :math:`b_s`
	per-view and so we cannot shift the bins over. Therefore we fall back to only
	shifting where :math:`s` is sampled from, which produces jittery and jarring
	transitions when a bin suddenly changes resolution.

	Bin merging
	-----------

	FDM shrinks the size of the bin in GMEM, which results in a lot of wasteful
	unused extra space in GMEM. a7xx mitigates this by introducing "bin merging".
	If two tiles next to each other have the same scaling for each view, then we
	combine them into one tile, as long as the combined size in rendering space
	isn't larger than the original size of an unscaled bin in framebuffer space. We
	can even merge larger groups of tiles. The only hardware feature needed for
	this to work is the ability to merge the visibility streams for the tiles,
	which was added on a7xx by a new bitmask in ``CP_SET_BIN_DATA5`` and variants.
	Only bins within the same visibility stream/VSC pipe can be merged.

	Hardware scaling registers and LRZ
	----------------------------------

	One disadvantage of FDM on a6xx is that low-resolution tiles cannot use
	LRZ, because the LRZ hardware is not aware of the transform between framebuffer
	space and rendering space and applies the framebuffer-space LRZ values to the
	rendering-space fragments. In order to fix this, a740 adds new offset and scale
	registers. The offset :math:`o'` is applied to fragment coordinates during
	rasterization after LRZ, so that viewport, scissor, and LRZ are in a
	new "LRZ space" while the other operations (resolves and unresolves, and
	attachment writes) still happen in the rendering space which is now offset.
	:math:`o'` is specified for each layer. The scale :math:`s` is the same as
	before, and it is used to multiply the fragment area covered by each LRZ pixel.

	Without ``VK_EXT_fragment_density_map_offset``, we can simply make LRZ space
	equal to framebuffer space scaled down by :math:`s`. That is, we can set
	:math:`o'` to what :math:`o` was before and then set :math:`o` to 0, only
	scaling down the viewport but not shifting it and letting the hardware handle
	the shift. Then LRZ pixels will be scaled up appropriately and everything will
	work. However, this doesn't work if there is a bin offset :math:`b_o`. In order
	to make binning work, we shift the viewport and scissor by :math:`b_o` when
	binning. Unfortunately the offset registers do not have any effect when
	binning, so rendering space and LRZ space have to be the same when binning, and
	the visibility stream is generated from rendering space. This means that LRZ
	space also has to be shifted over compared to framebuffer space, and the LRZ
	buffer must be overallocated when FDM offset might be used with it (which is
	signalled by ``VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT``) because
	the LRZ image will be shifted by :math:`b_o`.

	In order for LRZ to work, LRZ space when rendering must be equal to LRZ space
	when binning scaled down by :math:`s`. The origin of LRZ space when binning is
	:math:`-b_o`, and this must be mapped to 0. The transform from
	framebuffer space to LRZ space is :math:`x' = x * s + o`, and the transform
	from framebuffer space to rendering space is :math:`x'' = x * s + o + o'`.
	We get that :math:`o + o' = b_{cs} - b_s * s`, similar to before, and
	:math:`0 = -b_o * s + o` so that :math:`o = b_o * s` and finally
	:math:`o' = b_{cs} - b_s * s - b_o * s`, or after rearranging
	:math:`o' = b_{cs} - (b_s + b_o) * s`. For all tiles except those in the first
	row or column, this simplifies to :math:`o' = b_{cs} - b_{cs} * s` because
	:math:`b_{cs} = b_s + b_o`. For tiles in the first row or column, :math:`b_s`
	and :math:`b_{cs}` are both 0 in one of the coordinates, so it becomes
	:math:`o' = -b_o * s` in that coordinate. This isn't representable in hardware,
	both because it is negative (which can be worked around by artifically
	shifting :math:`b_{cs}`) but more importantly because it may not meet the
	alignment requirements for the hardware register (which is currently 8 pixels).
	We have to just disable LRZ in this case.