Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct3D 12 Rendering Driver (SPIRV-Cross + DXC approach) #64304

Closed
wants to merge 6 commits into from

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented Aug 12, 2022

Superseded by #70315.


Direct3D 12 Rendering Driver (via SPIRV-Cross + DXC)

DirectX 12 image

This is a feature-complete Direct3D 12 RenderingDevice implementation for Godot Engine. It works as a drop-in replacement for the Vulkan one. It is selectable in the project settings as an alternative to use on Windows.

By supporting Direct3D 12, Godot gains support for multiple new platforms, such as:

  • Windows Store (UWP).
  • Windows on ARM.
  • GDK.
  • XBox —which can't be supported officially by Godot, but for which Direct3D 12 support is essential—.

This PR includes some preparatory changes, to uncouple the RenderingDevice from Vulkan, that is, abstracting the modern Godot rendering architecture from whatever rendering API is used. Moreover, instead of a monolithic commit, the code of the driver itself is split into three, much more manageable commits.

Highlights

Performance

Depending on the complexity of the scene, effects used, etc., this first version of the renderer performs generally worse than the Vulkan one. In some tests, D3D12 has not been able to deliver more than 75% of the Vulkan frames per second. In some other, D3D12 has been able to outperform Vulkan by a small margin. Performance improvements will be ironed out over time.

Homogeneity

The D3D12 rendering driver has been written taking the Vulkan one as a basis and keeping as much as possible from the original. This effort gives two-fold benefits: on the one hand, the overall structure of the code files, including auxiliary structures and other elements, is very similar, which makes maintenance easier; on the other hand, both renderers are more similar at the functional level. An example of this is that the D3D12 renderer will be as picky as the Vulkan one when it comes to validation and error checking, even in areas where the Microsft API wouldn't impose such strict constraints.

Specialization Constants

In Vulkan it is possible to create multiple variations of a pipeline with different values for certain parameters that end up as compile-time constants in the shader generated under the hood. Those parameters are called specialization constants.

In Direct3D there's no counterpart of that mechanism. However, Godot rendering relies on it for some of its shaders. A way to have specialization constants in the Direct3D/DXIL world had to be researched. It was finally found and is used in this code. The technique is explained in this Twitter thread: https://twitter.com/RandomPedroJ/status/1532725156623286272.

Code Comments

To avoid making this PR description unnecessarily long, the reader is advised to find additional insight in the comments.

Assertions

Given that some data crosses many stages from its inception to where it's finally used, the code is full of dev-only checks ensure the sanity of many different data structures at different points in time. The expectation is that this will make easier to catch bugs —even subtle ones— in areas of high complexity.

Known Issues

  • Multiview rendering does not work. Only the left eye is rendered.
  • SDFGI glitches on AMD GPUs. At least, it does on integrated AMD Radeon. Deep investigation led to the finding that it's a bug in some third-party element — e.g. the Radeon driver, Direct3D or the DirectX Shader compiler. The clue is that graphics debugging tools show the pipeline status as if some of the needed bindings hadn't been set. Moreover, the affected shaders work fine if compiled with optimizations disabled.
  • No MinGW supported. So far only building with MSVC works, due to some vendor-specific stuff in third-party code.

Compilation & Distribution

  • Grab the (main, not PDB) .zip file corresponding to the 1.7.2207 version of the DirectX Shader Compiler from https://github.com/Microsoft/DirectXShaderCompiler/releases.
  • Unzip the file to some path.
  • Optional (only for developers wanting to debug graphics with the PIX tool, only for debug builds):
    • Locate the WinPixEventRuntime package (version 1.0.220810001 is the latest tested), at https://devblogs.microsoft.com/pix/download/. You’ll be finally taken to a NuGet package page where you can click Download package to get it.
    • Change the file extension to .zip.
    • Unzip the file to some path.
  • Build Godot with the following additional parameters to SCons: d3d12=yes DXC_PATH=<...> plus PIX_PATH=<...> if you want PIX.

NOTE: The build process will copy dxcompiler.dll and dxil.dll from the bin/x64/ directory in the DXC zipfile to the Godot binary directory. D3D12-enabled Godot packages for distribution to end users must include those files, both for the editor and games.

Future Work

Besides fixing the known issues described in another section, there are many options for potential improvement, the most important of which are described below. The code also has a number of TODO items that refer to these and other, generally smaller, potential enhancements or nice-to-haves.

Render Pass API

The D3D12 renderer uses what in the Vulkan world is called dynamic rendering. In other words, it doesn't use render pass —and subpass— APIs. This was done to make things simpler, but came with a couple of downsides.

  • First, a lot of code to do the proper setup of rendering passes is avoided, but some of the things the API would do by itself still require an amount of code —operations like clearing or discarding a framebuffer—.
  • Second, this mentioned sort of emulation of builtin render subpasses won't perform as well on TBDR hardware because of the performance gains that input attachments provide can't be obtained from manual management of subpasses. This performance limitation only affects the mobile backend, though.

Actionable item: Re-work render pass management with the proper APIs, which may be needed to squeeze performance from certain kind of devices.

Enhanced Barriers

Direct3D 12 was released with a way to synchronize the GPU work consisting in resource barriers. In short, they are not nearly as fine-grained as Vulkan's memory and pipeline barriers are, the biggest consequence of this being comparatively worse performance. Microsoft has later powered Direct3D with the so-called enhanced barriers, which are the same that Vulkan has. Recent GPU drivers and Windows versions already support them.

Actionable item: Re-work synchronization based on enhanced barriers, which will give more performance and make the code more similar to the one in the Vulkan renderer.

More Reasonable Dependencies

Currently, this is using SPIRV-Cross for shader translation to HLSL and an important chunk of DXC for the specialization constants hack. When the Microsoft provided support for DXIL in Mesa is mature —when checked for the purpose of this work it wasn't yet—, we may be able to use it —via NIR— instead of that two other dependencies for those purposes. Microsoft is donating engineer time to Mesa for this effort, so we hope it will be in an usable state soon for us.

Actionable item: Watch the status of DXIL in Mesa and replace SPIRV-Cross and the DXC source code as soon as feasible.

Deprecate Texture Aliasing

In Vulkan it is possible to tell upfront which formats a texture will be interpreted as, and it'll just work. In Direct3D 12 there was traditionally no way to do the same. Therefore, there are limitations on which reinterpretations one can do.

Godot needs to do two of them that are illegal in D3D12: write as R32 and read as R9G9B9E5, and write as R16 and read as R4B4G4A4. The Direct3D 12 renderer code works around that limitation by abusing texture aliases, which, according to some tests across different GPUs, seems to work fine in practice.

The legal approach would be to make copies of the textures when the time to read comes. However, that won't still work for the R4B4G4A4 case. Therefore, the aliasing workaround is used for every case by now.

Luckily, Direct3D has recently added a new API CreateCommittedResource3() that provides the same nicety as Vulkan, but it's still not widely available and, at the time of this writing, the D3D12 Memory Allocator library still doesn't support it (there's a PR, though: GPUOpen-LibrariesAndSDKs/D3D12MemoryAllocator#44).

Thanks go to Matías N. Goldberg, which was of great help in this investigation.

Actionable item: Add check support and prefer CreateCommittedResource3() to the aliasing hack where possible.

Further Homogeneity

Actionable item: Fuse as much as possible the elements that Vulkan and D3D12 have in common —staging buffer, static arrays of data format names, etc.—. This should reduce the codebase size and make it easier to maintain (and eventually add more platforms).

More

Just to make it complete, there are a few more potential improvements that may or may not be already in a TODO in the comments:

  • Try to assign HLSL bindings manually and inform SPIRV-Cross in a deterministic way. That would make reflection, management of root signature and population of handle heaps simpler and more efficient. (Credit: @reduz.)
  • More sensible use of the shared heap (i.e., track which resources/samplers are already bound and reuse somehow). Update: Done.
  • Leverage promotion and decay of the state of buffers.
  • While still using resource barriers, do split barriering where possible. (Maybe by taking the now neglected p_post_barrier parameters as a hint somehow?)
  • A way to use the HLSL implementation of fsr_upscale.h directly, given the appropriate defines.
  • Consider promoting some samplers (material_samplers) to static samplers and/or descriptors to root descriptors when possible.
  • In case of D3D12_FEATURE_DATA_ARCHITECTURE being UMA, use WriteToSubresource() instead of memcpy().
  • Consider D3D12_BUFFER_SRV_FLAG_RAW for CBV, or another usage.

🍀 This work has been financed and kindly donated to the Godot Engine project by W4 Games. 🍀

@GeorgeS2019
Copy link

Do you have a plan to try Godot using Hololens Emulator?

@RandomShaper
Copy link
Member Author

Do you have a plan to try Godot using Hololens Emulator?

It would be interesting, but I have other priorities on my horizon.

@reduz
Copy link
Member

reduz commented Aug 12, 2022

@Lucrecious yeah

@bruvzg
Copy link
Member

bruvzg commented Aug 12, 2022

Windows on ARM.

I guess it's time to get proper arch instead of bits (#55778) and add support for ARM Windows builds, with DX support it should be a valid option.

For the reference, using https://github.com/mstorsjo/llvm-mingw tool chain, it's possible to build current master for ARM Windows. With exception to a few modules (as usual raycast, denoise and theora using wrong assembler code, and mbedtls which is trying to use UNIX gettimeofday for some reason). But I do not have any ARM Windows setup to test it.

@Calinou
Copy link
Member

Calinou commented Aug 12, 2022

If someone has an HDR display (preferably OLED or miniLED) and Windows 11, it might be worth trying to enable Auto HDR on Godot while using the Direct3D 12 renderer. This could address godotengine/godot-proposals#1004 on Windows until there is a native implementation 🙂

@DarkMessiah
Copy link
Contributor

DarkMessiah commented Aug 12, 2022

Are additional SDKs required for compilation? (Windows 11, VS 2019)

Errors

D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2065: 'IDxcUtils': undeclared identifier D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2923: 'Microsoft::WRL::ComPtr': 'IDxcUtils' is not a valid template type argument for parameter 'T' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): note: see declaration of 'IDxcUtils' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2955: 'Microsoft::WRL::ComPtr': use of class template requires template argument list C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt\wrl/client.h(211): note: see declaration of 'Microsoft::WRL::ComPtr' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2065: 'IDxcUtils': undeclared identifier D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2923: 'Microsoft::WRL::ComPtr': 'IDxcUtils' is not a valid template type argument for parameter 'T' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): note: see declaration of 'IDxcUtils' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2955: 'Microsoft::WRL::ComPtr': use of class template requires template argument list C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt\wrl/client.h(211): note: see declaration of 'Microsoft::WRL::ComPtr' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2065: 'IDxcUtils': undeclared identifier D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2923: 'Microsoft::WRL::ComPtr': 'IDxcUtils' is not a valid template type argument for parameter 'T' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): note: see declaration of 'IDxcUtils' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2955: 'Microsoft::WRL::ComPtr': use of class template requires template argument list C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt\wrl/client.h(211): note: see declaration of 'Microsoft::WRL::ComPtr' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2065: 'IDxcUtils': undeclared identifier D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2923: 'Microsoft::WRL::ComPtr': 'IDxcUtils' is not a valid template type argument for parameter 'T' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): note: see declaration of 'IDxcUtils' D:\godot-dx12\drivers/d3d12/rendering_device_d3d12.h(96): error C2955: 'Microsoft::WRL::ComPtr': use of class template requires template argument list C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt\wrl/client.h(211): note: see declaration of 'Microsoft::WRL::ComPtr'

@jenatali
Copy link

Actionable item: Watch the status of DXIL in Mesa and replace SPIRV-Cross and the DXC source code as soon as feasible.

Is there a particular feature set you need at which point you'd consider that translator mature?

@0xF6
Copy link

0xF6 commented Aug 12, 2022

Hmm, UWP is deprecated, not?

@RandomShaper
Copy link
Member Author

RandomShaper commented Aug 12, 2022

Actionable item: Watch the status of DXIL in Mesa and replace SPIRV-Cross and the DXC source code as soon as feasible.

Is there a particular feature set you need at which point you'd consider that translator mature?

@jenatali, on the one hand, it's stability; last time I checked (months ago) I got the impression that the functions I needed were still changing a lot in the recent history of the repo. On the other hand, it's really about having enough features to replace one or both of SPIRV-Cross and DXC-LLVM's IR handling. The latter (IR parse/serialize) may never be a thing, I guess.

@jenatali
Copy link

@RandomShaper Understood. If you do take a closer look, and find it lacking for whatever reason, please reach out and let me know what's missing or what you'd need to make it viable for this use case.

@RandomShaper
Copy link
Member Author

@DarkMessiah, my first impression is that the DXC_PATH is wrong and the compiler is getting an old version of dxcapi.h which doesn't declare IDxcUtils. Can you build with verbose=yes and paste the compiler command that generates errors?

@RandomShaper
Copy link
Member Author

@RandomShaper Understood. If you do take a closer look, and find it lacking for whatever reason, please reach out and let me know what's missing or what you'd need to make it viable for this use case.

@jenatali, thank you very much. I'll do as soon as possible.

@GeorgeS2019
Copy link

GeorgeS2019 commented Aug 12, 2022

For new ones joining this discussion to get other related perspectives.
Dozen is Vulkan Over DirectX12. Available in Windows, WSL2 (Linux)

It is great to see @jenatali is here to offer his experience.

How Godot4 vulkan failed to run in Dozen

image

@Oliver-makes-code
Copy link

Wouldn't adding DX restrict the platforms that games using it can target?

@LikeLakers2
Copy link
Contributor

  • Windows Store (UWP).

I thought Godot already supported UWP? Or is there something I've missed?

@RandomShaper
Copy link
Member Author

meet a new problem, when the project manager started, it will close automatically soon

@John-Gdi, I believe that's already fixed. But I have first to fix a compile error before you can try again.

@RandomShaper
Copy link
Member Author

@jenatali, I've created a couple of issues (feature requests) in the Mesa repo so I can track the progress of the two remaining features without having to check periodically when they have been added.

@RandomShaper
Copy link
Member Author

I am hitting the following error in several locations in the editor:

ERROR: Cannot bind uniform set because there's no enough room in current frame's SAMPLERS desciptors heap.
Please increase the value of the rendering/rendering_device/d3d12/max_sampler_descriptors_per_frame project setting.

@lmurray, please test again. I've made a few changes to optimize the usage of the descriptor heap. I have to admit I wasn't entirely happy with the way it was being done before. Now there's a recycling mechanism, plus the determination of how much space is needed in the heaps has been made more accurate.

@lmurray
Copy link

lmurray commented Dec 16, 2022

@RandomShaper Thanks for addressing this. I'll test out the changes when I'm next available which is looking like sometime the week after next.

@RandomShaper RandomShaper changed the title Direct3D 12 Rendering Driver Direct3D 12 Rendering Driver (SPIRV-Cross + DXC approach) Dec 19, 2022
@RandomShaper
Copy link
Member Author

Superseded by #70315.

@lmurray
Copy link

lmurray commented Feb 11, 2023

I've finally managed to test the new changes, sorry for the delay. I can confirm that the descriptor heap changes have fixed the issues that I was encountering. As I'm going to be keeping this DXC approach in my project (I trust it more) I won't be able to test the Mesa branch going forward, however I did backport some of the Mesa branch changes locally back into this branch. The SRV/UAV ambiguity feature and the general logic changes that were compatible with DXC seem to work fine in my project although I honestly don't know how to properly test them with the shader combinations that I have available (it's a 2D project). The new descriptor table strategy used in the Mesa branch doesn't seem to be compatible with DXC so I'll live with using potentially twice the amount of root signature space compared to the other branch.

@RandomShaper
Copy link
Member Author

RandomShaper commented Feb 11, 2023

This is very interesting. Thanks for the update.

In order to be able to use what the NIR approach does and so simplify the structure and population of the root signature, you would need to force your own binding assignments before the shader reaches DXC. Maybe the easiest would be to get the set and binding from SPIR-V and inject them as register() declarations in the HLSL, in a way that every stage get the same (in NIR I'm just computing something like set * 1000000 + binding * 1000, to be sure each can take up to 1000 slots (for arrays, which in HLSL/DXIL take one slot per element).

In fact, I was about to do something like that since I hadn't thought of such kind of simplification until @reduz hinted the possibility to me, but then we found that NIR was already viable, so I didn't make it here in the end.

May I ask why you need a D3D12 renderer for your project?

@GeorgeS2019

This comment was marked as off-topic.

@lmurray
Copy link

lmurray commented Feb 11, 2023

@RandomShaper Thanks for the pointers. If I ever need to optimize I'll consider that approach. In regards to your question, I am a commercial game dev targeting Windows and the big three consoles only, and as I have a personal preference for using "official" tools and libraries where possible to do so I intend to ship with D3D12 (maybe exclusively) on Windows instead of shipping Vulkan. This personal preference is also why I'm preferring DXC over Mesa.

My project isn't that intense visually so I don't really need the utmost performance and can afford to leave some on the table, but I do need something robust so I don't end up wasting time debugging exotic micro-optimizations when a deadline looms near. Since I've already tested this branch thoroughly over the past four months and as it works fine there is no real need for me to replace it with something else entirely that is known to be more experimental and less battle tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet