239 Commits

Author SHA1 Message Date
PabloMK7
5e2161d90b
video_core: Refactor state tracking (#1059) 2025-05-26 14:37:03 +02:00
PabloMK7
bf587885cb
Allow SPIR-V when using Shadow2D texture mapping (#1057) 2025-05-24 00:05:16 +02:00
PabloMK7
fac8ae2682 Fix VS uniform fields type declaration 2025-03-17 22:59:47 +00:00
PabloMK7
0eb4a71720
Implement framebuffer vertical flip flag (#699)
* Implement framebuffer vertical flip flag

* Make VerticalMirror const
2025-03-17 19:39:55 +00:00
Reg Tiangha
d27f322ddd shader_jit_x64_compiler.cpp: Make register conversions more consistent
Needed as xbyak v7.23+ is more strict about register sizes. Not doing so
leads to "bad size of register" errors during the tests phase.
2025-03-03 19:31:44 +00:00
PabloMK7
758ded7fae renderer: Add disable right eye performance hack 2024-11-19 16:36:24 +00:00
Wunk
bb1301b98e shader_jit_a64: Compact host executable memory (#230)
* common/aarch64: Allow generic code generator types

Use the templated `BasicCodeGenerator` type rather than the specialized
`CodeGenerator` type.
Allows `VectorCodeGenerator` to work with these functions.

* common/aarch64: Add `VectorCodeGenerator` to `CallFarFunction`

`VectorCodeGenerator` will always do far-calls since we cannot resolve any absolute addresses here.

* shader_jit_a64: Implement position-independent VectorCodeGenerator

Generates more position-independent assembly to allow for code to be
generated within a resizable vector before copying into executable
memory, allowing for more compact memory allocations and usage rather
than a statically defined worst-case for all-cases.

`VectorCodeGenerator` will need to generate position-independent code
rather than use absolute addresses. Assumes all far function calls in the
case of `VectorCodeGenerator` to use absolute addresses rather than
potentially use a relative `BL` branch after memory relocation.
2024-11-07 14:47:30 +01:00
Wunk
0ea9f23f0c shader_jit: Fix/optimize conditional evaluation (#234)
* shader_jit: Add conditional unit-tests

Tests all permutations of X, Y, AND, OR with each possible input value.

* video_core: Fix shader-interpreter conditional-code initialization

Rather than reserving the incoming state of the conditional codes, the
shader-interpreter was setting them both to false. In pretty much all
cases, the initial state of a shaderunit can be zero-initialized
statically. Just running the interpreter shouldn't necessarily reset the
conditional codes though.  The JIT loads incoming conditional codes
while the shader-interpreter resets them to false. This makes the
interpreter match the behavior of the shader-jit.

* shader_jit_a64: Fix/optimize conditional evaluation

Fix some of the regressions introduced by the previous optimization.
EOR does not support a constant of `0` in its immediate. In these cases
the COND{0,1} registers can be utilized immediately.

* shader_jit_x64: Fix conditional evaluation extended-bit hazard

The unit test seems to have identified a bug in the x64 jit too. The x64
jit was doing 32-bit comparisons despite the condition flags being 8-bit
values and is sensitive to garbage being in the upper 24 bits of the
register. This is fixed by using the proper 8-bit register types rather
than the 32-bit ones(`eax,`ebx` -> `al`, `bl`).

* shader_jit_x64: Zero-extend conditional-code bytes

`mov` was doing a partial update of bits within the register, allowing
garbage to be introduced in the upper bits of the register.
2024-11-07 14:47:27 +01:00
Wunk
cfc74d2d08 shader_jit_a64: Optimize conditional tests (#229)
These conditional tests are a 1:1 translation from the x64 code but do
not have to be. Reference-values are known at emit-time and can be
embedded as an immediate into an `EOR` instruction rather than moved
into a register. The `TST` instruction can be utilized to more optimally
test and update the `EQ`/`NE` status flags.
2024-11-07 14:47:27 +01:00
Reg Tiangha
6f3d11d1ad renderer_vulkan: Rewrite descriptor set management and various small fixes (#77)
* renderer_vulkan: Remove vulkan prefix in SetObjectName

* renderer_vulkan: Rename renderpass cache to render manager

* It is no longer just a cache

* renderer_vulkan: Rewrite descriptor management

* Switch to batched vkUpdateDescriptorSets from cached descriptor sets with templates

* vk_master_semaphore: Remove waitable atomic

* These are buggy on some platforms and regular condition_variables are faster most of the time

* vk_texture_runtime.cpp: remove outdated references

* vk_render_manager: Minor cleanups and rename to RenderManager

* It is no longer just a renderpass cache
* Revert variable name change from render_manager back to renderpass_cache

---------

Co-authored-by: GPUCode <geoster3d@gmail.com>
2024-11-07 14:43:54 +01:00
GPUCode
106364e01e
video_core: Use source3 when GPU_PREVIOUS is used in first stage (#7411) 2024-02-05 09:53:54 -08:00
GPUCode
d5a1bd07f3
glsl_shader_gen: Increase z=0 epsillon (#7408) 2024-02-05 09:53:41 -08:00
GPUCode
480604ec72
glsl_shader_fs_gen: Apply shadow before ambient light (#7404) 2024-01-31 23:29:39 +02:00
merry
63feac6bb3
externals: Update dynarmic to 6.6.1, Update oaknut to 2.0.1 (#7398) 2024-01-30 19:50:39 -08:00
GPUCode
8d82adb3d3
glsl_shader_gen: Remove invariant qualifier (#7376)
* glsl_shader_gen: Remove invariant qualifier

* Causes visual regressions in Pokemon with RADV

* rasterizer_cache: Clear null surface to transparent
2024-01-21 13:39:35 +02:00
GPUCode
8e87bd606c
glsl_shader_gen: Use epsilon for both ends of NDC range (#7355) 2024-01-20 22:13:16 +02:00
Vitor K
c8c2beaeff
misc: fix issues pointed out by msvc (#7316)
* do not move constant variables

* applet_manager: avoid possible use after move

* use constant references where pointed out by msvc

* extra_hid: initialize response

* ValidateSaveState: passing slot separately is not necessary

* common: mark HashCombine as nodiscard

* cityhash: remove use of using namespace std

* Prefix all size_t with std::

done automatically by executing regex replace `([^:0-9a-zA-Z_])size_t([^0-9a-zA-Z_])` -> `$1std::size_t$2`
based on 7d8f115

* shared_memory.cpp: fix log error format

* fix compiling with pch off
2024-01-07 12:37:42 -08:00
GPUCode
2bb7f89c30
video_core: Refactor GPU interface (#7272)
* video_core: Refactor GPU interface

* citra_qt: Better debug widget lifetime
2023-12-28 11:46:57 +01:00
Wunk
ea9f522c0c
shader_jit_a64: Use LDP/STP for address registers (#7225)
Move `address_registers` to be earlier in the `UnitState` structure to allow LDP/STP's 7-bit offset to reach these members.

Follow-up of https://github.com/citra-emu/citra/pull/7002#discussion_r1367270804
2023-12-03 05:07:21 -08:00
Wunk
83b329f6e1
video_core/shader: Refactor JIT-Engines into JitEngine type (#7210) 2023-11-26 15:15:36 -08:00
GPUCode
26d5727b19
video_core: Merge tex0 and tex_cube (#7173) 2023-11-17 03:14:10 -08:00
GPUCode
d5b50a9fc0
spv_fs_shader_gen: Remove OpTypeSampledImage from texture buffers (#7153) 2023-11-12 22:40:30 -08:00
GPUCode
168f168c33
spv_fs_shader_gen: Implement quaternion correction with barycentric extension (#7152) 2023-11-12 22:40:21 -08:00
merry
271218b733
shader_jit_a64_compiler: Improve MAX, MIN (#7137) 2023-11-11 18:27:01 +05:30
merry
80213bf88f
shader_jit_a64_compiler: Improve Compile_SwizzleSrc (#7136) 2023-11-11 18:26:48 +05:30
Steveice10
d4f31bc617
video_core: Fix fragment shader interlock usage on OpenGL. (#7144) 2023-11-10 13:14:52 -08:00
Steveice10
84f9e9a10f
video_core: Perform quaternion correction and interpolation in fragment shader using barycentric extension. (#7126) 2023-11-09 15:23:56 -08:00
Wunk
1d4d421097
shader_jit_a64: Optimize MOVA dest-enable (#7122)
Rather than branching the 3 cases of dest-enablement, just emit a single
move-and-sign-extend instruction for each case.

From this review:
https://github.com/citra-emu/citra/pull/7002#discussion_r1381560584
2023-11-07 11:46:40 -08:00
GPUCode
1f6393e7d5
video_core: Refactor GLSL fragment emitter (#7093)
* video_core: Refactor GLSL fragment emitter

* shader: Add back custom normal maps
2023-11-06 12:26:28 -08:00
Wunk
e13735b624
video_core: Implement an arm64 shader-jit backend (#7002)
* externals: Add oaksim submodule

Used for emitting ARM64 assembly

* common: Implement aarch64 ABI

Utilize oaknut to implement a stack frame.

* tests: Allow shader-jit tests for x64 and a64

Run the shader-jit tests for both x86_64 and arm64 targets

* video_core: Initialize arm64 shader-jit backend

Passes all current unit tests!

* shader_jit_a64: protect/unprotect memory when jit-ing

Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.

* shader_jit_a64: Fix ARM64-Imm overflow

These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.

* shader_jit_a64: Fix Geometry shader conditional

* shader_jit_a64: Replace `ADRL` with `MOVP2R`

Fixes some immediate-generation exceptions.

* common/aarch64: Fix CallFarFunction

* shader_jit_a64: Optimize `SantitizedMul`

Co-authored-by: merryhime <merryhime@users.noreply.github.com>

* shader_jit_a64: Fix address register offset behavior

Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.

* shader_jit_a64: Fix `RET` address offset

A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.

* shader_jit_a64: Increase max program size

Tuned for A64 program size.

* shader_jit_a64: Use `UBFX` for extracting loop-state

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Optimize `CMP+B` to `CBNZ`

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Use `FMOV` for `ONE` vector

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove x86-specific documentation

* shader_jit_a64: Use `UBFX` to extract exponent

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check

Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:

No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f

* shader_jit/tests: Add Dest Mask unit test

* shader_jit_a64: Fix Dest-Mask `BSL` operand order

Passes the dest-mask unit tests now.

* shader_jit_a64: Use `MOVI` for DestEnable mask

Accelerate certain cases of masking with MOVI as well

Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>

* shader_jit/tests: Add source-swizzle unit test

This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.

* shader_jit_a64: Fix swizzle index generation

This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.

* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`

Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly.  This enabled the ability
emit assembly that is not directly supported by nihstro.

* shader_jit/tests: Add `CALL` unit-test

Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.

EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.

* shader_jit_a64: Fix nested `BL` subroutines

`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.

Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.

* shader_jit/tests: Allocate generated tests on heap

Each of these generated shader-test objects were causing the stack to
overflow.  Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.

* shader_jit_a64: Preserve `lr` register from external function calls

`EMIT` makes an external function call, and should be preserving `lr`

* shader_jit/tests: Add `MAD` unit-test

The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68

Instead, the program code is manually configured and added.

* shader_jit/tests: Fix uninitialized instructions

These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.

* shader_jit_a64: Remove unneeded `MOV`

Residue from the direct-port of x64 code.

* shader_jit_a64: Use `std::array` for `instr_table`

Add some type-safety and const-correctness around this type as well.

* shader_jit_a64: Avoid c-style offset casting

Add some more const-correctness to this function as well.

* video_core: Add arch preprocessor comments

* common/aarch64: Use X16 as the veneer register

https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard

* shader_jit/tests: Add uniform reading unit-test

Particularly to ensure that addresses are being properly truncated

* common/aarch64: Use `X0` as `ABI_RETURN`

`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.

* common/aarch64: Add veneer register note

`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.

* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`

* shader_jit_a64: Fix CALLU condition

Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!

---------

Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
2023-11-05 21:40:31 +01:00
GPUCode
ef43776c7b
shader: Fix address register offset behavior in x64 Jit (#6942)
* shader: Fix address register offset behavior in x64 Jit

* shader: Remove redundant jump

* tests: Add address register tests

* shader: Remove additional pre-multiplications by 16

* tests: Add catch-stringifier for vec4f

* tests: Format
2023-10-18 19:41:36 +03:00
Steveice10
4c59443ed2
common: Add more robust ZSTD handling. (#7071) 2023-10-15 14:08:29 -07:00
SachinVin
053b3618dc
Android: gl_shader_decompiler.cpp: get rid of the not-so-accurate multiplication (#7011) 2023-10-05 10:28:26 -07:00
Steveice10
50f22d1f59
video_core: Abstract shader generators. (#6990)
* video_core: Abstract shader generators.

* shader: Extract common generator structures and move generators to specific namespaces.

* shader: Minor fixes and clean-up.
2023-09-30 02:06:06 -07:00
Vitor K
6cfb8e02a8
clang format (#7017) 2023-09-27 13:42:19 -03:00
GPUCode
542209c993
video_core: Initialize unrefered attributes to 1.f (#6966) 2023-09-16 14:46:44 -07:00
GPUCode
1159e4d928
video_core: Take factors into account with min/max blending functions (#6925)
* sw_framebuffer: Take factors into account for min/max blending

* renderer_gl: Take factors into account for min/max blending

* Address review comments

* gl_shader_gen: Fix frambuffer fetch on qcom and mali

* renderer_opengl: Add fallback path for mesa

* gl_shader_gen: Avoid emitting blend emulation if minmax_factor is present
2023-08-30 21:26:28 +03:00
Steveice10
df57012c50
video_core: Correct shader interpeter address register handling. (#6910) 2023-08-27 01:49:31 +03:00
GPUCode
a94297922b
shader_interpreter: Fix control flow edge cases and implement break/breakc (#6844) 2023-08-21 18:12:46 -07:00
Steveice10
6ddf4b241f
renderer/vulkan: Emulate custom border colors in shaders when unavailable. (#6878) 2023-08-17 22:22:25 +02:00
Wunk
a94af8ea62
shader_jit: Add optimizations up to x86-64-v4 (#6668) 2023-07-11 09:21:37 -07:00
Wunk
71aea7e571
shader/jit: Use xbyak::util::Cpu for Host capabilities (#6643)
Xbyak has a complete utility-class for determining the host-processor's
ISA-features such as SSE4.1, AVX, AVX2, AVX512{F,VL,DQ,VBMI,etc}, and so
on for further potential optimizations.
2023-07-02 08:05:07 +05:30
GPUCode
9b82de6b24
Refactor software renderer (#6621) 2023-06-24 00:59:18 +02:00
Steveice10
03dbdfc12f
video_core: Reduce nihstro includes in headers. (#6626) 2023-06-19 15:49:22 -07:00
SachinVin
796e8a9f24 video_core:fix warnings 2023-06-17 21:23:58 +05:30
Vitor K
34de77d429
Various miscelaneous changes (#6496) 2023-05-03 17:24:10 +02:00
SachinVin
41f13456c0
Chore: Enable warnings as errors on MSVC (#6456)
* tests: add Sanity test for SplitFilename83

fix test

fix test

* disable `C4715:not all control paths return a value` for nihstro includes

nihstro: no warn

* Chore: Enable warnings as errors on msvc + fix warnings

fixes

some more warnings

clang-format

* more fixes

* Externals: Add target_compile_options `/W0` nihstro-headers and ...

Revert "disable `C4715:not all control paths return a value` for nihstro includes"
This reverts commit 606d79b55d3044b744fb835025b8eb0f4ea5b757.

* src\citra\config.cpp: ReadSetting: simplify type casting

* settings.cpp: Get*Name: remove superflous logs
2023-05-01 22:38:58 +03:00
GPUCode
b5d6f645bd
Prepare frontend for multiple graphics APIs (#6347)
* externals: Update dynarmic

* settings: Introduce GraphicsAPI enum

* For now it's OpenGL only but will be expanded upon later

* citra_qt: Introduce backend agnostic context management

* Mostly a direct port from yuzu

* core: Simplify context acquire

* settings: Add option to create debug contexts

* renderer_opengl: Abstract initialization to Driver

* This commit also updates glad and adds some useful extensions which we will use in part 2

* Rasterizer construction is moved to the specific renderer instead of RendererBase.
  Software rendering has been disable to achieve this but will be brought back in the next commit.

* video_core: Remove Init/Shutdown methods from renderer

* The constructor and destructor can do the same job

* In addition move opengl function loading to Qt since SDL already does this. Also remove ErrorVideoCore which is never reached

* citra_qt: Decouple software renderer from opengl part 1

* citra: Decouple software renderer from opengl part 2

* android: Decouple software renderer from opengl part 3

* swrasterizer: Decouple software renderer from opengl part 4

* This commit simply enforces the renderer naming conventions in the software renderer

* video_core: Move RendererBase to VideoCore

* video_core: De-globalize screenshot state

* video_core: Pass system to the renderers

* video_core: Commonize shader uniform data

* video_core: Abstract backend agnostic rasterizer operations

* bootmanager: Remove references to OpenGL for macOS

OpenGL macOS headers definitions clash heavily with each other

* citra_qt: Proper title for api settings

* video_core: Reduce boost usage

* bootmanager: Fix hide mouse option

Remove event handlers from RenderWidget for events that are
already handled by the parent GRenderWindow.
Also enable mouse tracking on the RenderWidget.

* android: Remove software from graphics api list

* code: Address review comments

* citra: Port per-game settings read

* Having to update the default value for all backends is a pain so lets centralize it

* android: Rename to OpenGLES

---------

Co-authored-by: MerryMage <MerryMage@users.noreply.github.com>
Co-authored-by: Vitor Kiguchi <vitor-kiguchi@hotmail.com>
2023-03-27 14:29:17 +03:00
Steveice10
a8848cce43 build: Update to support multi-arch builds. 2023-01-07 01:09:32 -08:00
Tobias
bb05d8c12a
src/CMakeLists: Enforce multiple warnings on MSVC (#5692) 2022-11-09 23:14:28 +01:00