| // Copyright (c) 2020 NVIDIA Corporation |
| // |
| // SPDX-License-Identifier: CC-BY-4.0 |
| |
| include::{generated}/meta/{refprefix}VK_NV_device_generated_commands.txt[] |
| |
| === Other Extension Metadata |
| |
| *Last Modified Date*:: |
| 2020-02-20 |
| *Interactions and External Dependencies*:: |
| - This extension requires Vulkan 1.1 |
| - This extension requires `VK_EXT_buffer_device_address` or |
| `VK_KHR_buffer_device_address` or Vulkan 1.2 for the ability to bind |
| vertex and index buffers on the device. |
| - This extension interacts with `VK_NV_mesh_shader`. |
| If the latter extension is not supported, remove the command token to |
| initiate mesh tasks drawing in this extension. |
| *Contributors*:: |
| - Christoph Kubisch, NVIDIA |
| - Pierre Boudier, NVIDIA |
| - Jeff Bolz, NVIDIA |
| - Eric Werness, NVIDIA |
| - Yuriy O'Donnell, Epic Games |
| - Baldur Karlsson, Valve |
| - Mathias Schott, NVIDIA |
| - Tyson Smith, NVIDIA |
| - Ingo Esser, NVIDIA |
| |
| === Description |
| |
| This extension allows the device to generate a number of critical graphics |
| commands for command buffers. |
| |
| When rendering a large number of objects, the device can be leveraged to |
| implement a number of critical functions, like updating matrices, or |
| implementing occlusion culling, frustum culling, front to back sorting, etc. |
| Implementing those on the device does not require any special extension, |
| since an application is free to define its own data structures, and just |
| process them using shaders. |
| |
| However, if the application desires to quickly kick off the rendering of the |
| final stream of objects, then unextended Vulkan forces the application to |
| read back the processed stream and issue graphics command from the host. |
| For very large scenes, the synchronization overhead and cost to generate the |
| command buffer can become the bottleneck. |
| This extension allows an application to generate a device side stream of |
| state changes and commands, and convert it efficiently into a command buffer |
| without having to read it back to the host. |
| |
| Furthermore, it allows incremental changes to such command buffers by |
| manipulating only partial sections of a command stream -- for example |
| pipeline bindings. |
| Unextended Vulkan requires re-creation of entire command buffers in such a |
| scenario, or updates synchronized on the host. |
| |
| The intended usage for this extension is for the application to: |
| |
| * create sname:VkBuffer objects and retrieve physical addresses from them |
| via flink:vkGetBufferDeviceAddressEXT |
| * create a graphics pipeline using |
| sname:VkGraphicsPipelineShaderGroupsCreateInfoNV for the ability to |
| change shaders on the device. |
| * create a slink:VkIndirectCommandsLayoutNV, which lists the |
| elink:VkIndirectCommandsTokenTypeNV it wants to dynamically execute as |
| an atomic command sequence. |
| This step likely involves some internal device code compilation, since |
| the intent is for the GPU to generate the command buffer in the |
| pipeline. |
| * fill the input stream buffers with the data for each of the inputs it |
| needs. |
| Each input is an array that will be filled with token-dependent data. |
| * set up a preprocess sname:VkBuffer that uses memory according to the |
| information retrieved via |
| flink:vkGetGeneratedCommandsMemoryRequirementsNV. |
| * optionally preprocess the generated content using |
| flink:vkCmdPreprocessGeneratedCommandsNV, for example on an asynchronous |
| compute queue, or for the purpose of re-using the data in multiple |
| executions. |
| * call flink:vkCmdExecuteGeneratedCommandsNV to create and execute the |
| actual device commands for all sequences based on the inputs provided. |
| |
| For each draw in a sequence, the following can be specified: |
| |
| * a different shader group |
| * a number of vertex buffer bindings |
| * a different index buffer, with an optional dynamic offset and index type |
| * a number of different push constants |
| * a flag that encodes the primitive winding |
| |
| While the GPU can be faster than a CPU to generate the commands, it will not |
| happen asynchronously to the device, therefore the primary use-case is |
| generating "`less`" total work (occlusion culling, classification to use |
| specialized shaders, etc.). |
| |
| include::{generated}/interfaces/VK_NV_device_generated_commands.txt[] |
| |
| === Issues |
| |
| 1) How to name this extension ? |
| |
| `VK_NV_device_generated_commands` |
| |
| As usual, one of the hardest issues ;) |
| |
| Alternatives: `VK_gpu_commands`, `VK_execute_commands`, |
| `VK_device_commands`, `VK_device_execute_commands`, `VK_device_execute`, |
| `VK_device_created_commands`, `VK_device_recorded_commands`, |
| `VK_device_generated_commands` `VK_indirect_generated_commands` |
| |
| 2) Should we use a serial stateful token stream or stateless sequence |
| descriptions? |
| |
| Similarly to slink:VkPipeline, fixed layouts have the most likelihood to be |
| cross-vendor adoptable. |
| They also benefit from being processable in parallel. |
| This is a different design choice compared to the serial command stream |
| generated through `GL_NV_command_list`. |
| |
| 3) How to name a sequence description? |
| |
| `VkIndirectCommandsLayout` as in the NVX extension predecessor. |
| |
| Alternative: `VkGeneratedCommandsLayout` |
| |
| 4) Do we want to provide code:indirectCommands inputs with layout or at |
| code:indirectCommands time? |
| |
| Separate layout from data as Vulkan does. |
| Provide full flexibility for code:indirectCommands. |
| |
| 5) Should the input be provided as SoA or AoS? |
| |
| Both ways are desireable. |
| AoS can provide portability to other APIs and easier to setup, while SoA |
| allows to update individual inputs in a cache-efficient manner, when others |
| remain static. |
| |
| 6) How do we make developers aware of the memory requirements of |
| implementation-dependent data used for the generated commands? |
| |
| Make the API explicit and introduce a `preprocess` slink:VkBuffer. |
| Developers have to allocate it using |
| flink:vkGetGeneratedCommandsMemoryRequirementsNV. |
| |
| In the NVX version the requirements were hidden implicitly as part of the |
| command buffer reservation process, however as the memory requirements can |
| be substantial, we want to give developers the ability to budget the memory |
| themselves. |
| By lowering the `maxSequencesCount` the memory consumption can be reduced. |
| Furthermore reuse of the memory is possible, for example for doing explicit |
| preprocessing and execution in a ping-pong fashion. |
| |
| The actual buffer size is implementation-dependent and may be zero, i.e. not |
| always required. |
| |
| When making use of Graphics Shader Groups, the programs should behave |
| similar with regards to vertex inputs, clipping and culling outputs of the |
| geometry stage, as well as sample shading behavior in fragment shaders, to |
| reduce the amount of the worst-case memory approximation. |
| |
| 7) Should we allow additional per-sequence dynamic state changes? |
| |
| Yes |
| |
| Introduced a lightweight indirect state flag |
| elink:VkIndirectStateFlagBitsNV. |
| So far only switching front face winding state is exposed. |
| Especially in CAD/DCC mirrored transforms that require such changes are |
| common, and similar flexibility is given in the ray tracing instance |
| description. |
| |
| The flag could be extended further, for example to switch between |
| primitive-lists or -strips, or make other state modifications. |
| |
| Furthermore, as new tokens can be added easily, future extension could add |
| the ability to change any elink:VkDynamicState. |
| |
| 8) How do we allow re-using already "`generated`" code:indirectCommands? |
| |
| Expose a `preprocessBuffer` to reuse implementation-dependencyFlags data. |
| Set the `isPreprocessed` to true in flink:vkCmdExecuteGeneratedCommandsNV. |
| |
| 9) Under which conditions is flink:vkCmdExecuteGeneratedCommandsNV legal? |
| |
| It behaves like a regular draw call command. |
| |
| 10) Is flink:vkCmdPreprocessGeneratedCommandsNV copying the input data or |
| referencing it? |
| |
| There are multiple implementations possible: |
| |
| * one could have some emulation code that parses the inputs, and generates |
| an output command buffer, therefore copying the inputs. |
| * one could just reference the inputs, and have the processing done in |
| pipe at execution time. |
| |
| If the data is mandated to be copied, then it puts a penalty on |
| implementation that could process the inputs directly in pipe. |
| If the data is "`referenced`", then it allows both types of implementation. |
| |
| The inputs are "`referenced`", and must: not be modified after the call to |
| flink:vkCmdExecuteGeneratedCommandsNV has completed. |
| |
| 11) Which buffer usage flags are required for the buffers referenced by |
| sname:VkGeneratedCommandsInfoNV ? |
| |
| Reuse existing ename:VK_BUFFER_USAGE_INDIRECT_BUFFER_BIT |
| |
| * slink:VkGeneratedCommandsInfoNV::pname:preprocessBuffer |
| * slink:VkGeneratedCommandsInfoNV::pname:sequencesCountBuffer |
| * slink:VkGeneratedCommandsInfoNV::pname:sequencesIndexBuffer |
| * slink:VkIndirectCommandsStreamNV::pname:buffer |
| |
| 12) In which pipeline stage does the device generated command expansion |
| happen? |
| |
| flink:vkCmdPreprocessGeneratedCommandsNV is treated as if it occurs in a |
| separate logical pipeline from either graphics or compute, and that pipeline |
| only includes ename:VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, a new stage |
| ename:VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_NV, and |
| ename:VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT. |
| This new stage has two corresponding new access types, |
| ename:VK_ACCESS_COMMAND_PREPROCESS_READ_BIT_NV and |
| ename:VK_ACCESS_COMMAND_PREPROCESS_WRITE_BIT_NV, used to synchronize reading |
| the buffer inputs and writing the preprocess memory output. |
| |
| The generated output written in the preprocess buffer memory by |
| flink:vkCmdExecuteGeneratedCommandsNV is considered to be consumed by the |
| ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT pipeline stage. |
| |
| Thus, to synchronize from writing the input buffers to preprocessing via |
| flink:vkCmdPreprocessGeneratedCommandsNV, use: |
| |
| * pname:dstStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_NV |
| * pname:dstAccessMask = ename:VK_ACCESS_COMMAND_PREPROCESS_READ_BIT_NV |
| |
| To synchronize from flink:vkCmdPreprocessGeneratedCommandsNV to executing |
| the generated commands by flink:vkCmdExecuteGeneratedCommandsNV, use: |
| |
| * pname:srcStageMask = ename:VK_PIPELINE_STAGE_COMMAND_PREPROCESS_BIT_NV |
| * pname:srcAccessMask = ename:VK_ACCESS_COMMAND_PREPROCESS_WRITE_BIT_NV |
| * pname:dstStageMask = ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT |
| * pname:dstAccessMask = ename:VK_ACCESS_INDIRECT_COMMAND_READ_BIT |
| |
| When flink:vkCmdExecuteGeneratedCommandsNV is used with a |
| pname:isPreprocessed of `VK_FALSE`, the generated commands are implicitly |
| preprocessed, therefore one only needs to synchronize the inputs via: |
| |
| * pname:dstStageMask = ename:VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT |
| * pname:dstAccessMask = ename:VK_ACCESS_INDIRECT_COMMAND_READ_BIT |
| |
| 13) What if most token data is "`static`", but we frequently want to render |
| a subsection? |
| |
| Added "`sequencesIndexBuffer`". |
| This allows to easier sort and filter what should actually be executed. |
| |
| 14) What are the changes compared to the previous NVX extension? |
| |
| * Compute dispatch support was removed (was never implemented in drivers). |
| There are different approaches how dispatching from the device should |
| work, hence we defer this to a future extension. |
| * The `ObjectTableNVX` was replaced by using physical buffer addresses and |
| introducing Shader Groups for the graphics pipeline. |
| * Less state changes are possible overall, but the important operations |
| are still there (reduces complexity of implementation). |
| * The API was redesigned so all inputs must be passed at both |
| preprocessing and execution time (this was implicit in NVX, now it is |
| explicit) |
| * The reservation of intermediate command space is now mandatory and |
| explicit through a preprocess buffer. |
| * The elink:VkIndirectStateFlagBitsNV were introduced |
| |
| 15) When porting from other APIs, their indirect buffers may use different |
| enums, for example for index buffer types. |
| How to solve this? |
| |
| Added "`pIndexTypeValues`" to map custom `uint32_t` values to corresponding |
| ename:VkIndexType. |
| |
| 16) Do we need more shader group state overrides? |
| |
| The NVX version allowed all PSO states to be different, however as the goal |
| is not to replace all state setup, but focus on highly-frequent state |
| changes for drawing lots of objects, we reduced the amount of state |
| overrides. |
| Especially VkPipelineLayout as well as VkRenderPass configuration should be |
| left static, the rest is still open for discussion. |
| |
| The current focus is just to allow VertexInput changes as well as shaders, |
| while all shader groups use the same shader stages. |
| |
| Too much flexibility will increase the test coverage requirement as well. |
| However, further extensions could allow more dynamic state as well. |
| |
| 17) Do we need more detailed physical device feature queries/enables? |
| |
| An EXT version would need detailed implementor feedback to come up with a |
| good set of features. |
| Please contact us if you are interested, we are happy to make more features |
| optional, or add further restrictions to reduce the minimum feature set of |
| an EXT. |
| |
| 18) Is there an interaction with VK_KHR_pipeline_library planned? |
| |
| Yes, a future version of this extension will detail the interaction, once |
| VK_KHR_pipeline_library is no longer provisional. |
| |
| === Example Code |
| |
| Open-Source samples illustrating the usage of the extension can be found at |
| the following location (may not yet exist at time of writing): |
| |
| https://github.com/nvpro-samples/vk_device_generated_cmds |
| |
| |
| === Version History |
| |
| * Revision 1, 2020-02-20 (Christoph Kubisch) |
| - Initial version |
| * Revision 2, 2020-03-09 (Christoph Kubisch) |
| - Remove VK_EXT_debug_report interactions |
| * Revision 3, 2020-03-09 (Christoph Kubisch) |
| - Fix naming VkPhysicalDeviceGenerated to |
| VkPhysicalDeviceDeviceGenerated |