* common/aarch64: Allow generic code generator types
Use the templated `BasicCodeGenerator` type rather than the specialized
`CodeGenerator` type.
Allows `VectorCodeGenerator` to work with these functions.
* common/aarch64: Add `VectorCodeGenerator` to `CallFarFunction`
`VectorCodeGenerator` will always do far-calls since we cannot resolve any absolute addresses here.
* shader_jit_a64: Implement position-independent VectorCodeGenerator
Generates more position-independent assembly to allow for code to be
generated within a resizable vector before copying into executable
memory, allowing for more compact memory allocations and usage rather
than a statically defined worst-case for all-cases.
`VectorCodeGenerator` will need to generate position-independent code
rather than use absolute addresses. Assumes all far function calls in the
case of `VectorCodeGenerator` to use absolute addresses rather than
potentially use a relative `BL` branch after memory relocation.
* shader_jit: Add conditional unit-tests
Tests all permutations of X, Y, AND, OR with each possible input value.
* video_core: Fix shader-interpreter conditional-code initialization
Rather than reserving the incoming state of the conditional codes, the
shader-interpreter was setting them both to false. In pretty much all
cases, the initial state of a shaderunit can be zero-initialized
statically. Just running the interpreter shouldn't necessarily reset the
conditional codes though. The JIT loads incoming conditional codes
while the shader-interpreter resets them to false. This makes the
interpreter match the behavior of the shader-jit.
* shader_jit_a64: Fix/optimize conditional evaluation
Fix some of the regressions introduced by the previous optimization.
EOR does not support a constant of `0` in its immediate. In these cases
the COND{0,1} registers can be utilized immediately.
* shader_jit_x64: Fix conditional evaluation extended-bit hazard
The unit test seems to have identified a bug in the x64 jit too. The x64
jit was doing 32-bit comparisons despite the condition flags being 8-bit
values and is sensitive to garbage being in the upper 24 bits of the
register. This is fixed by using the proper 8-bit register types rather
than the 32-bit ones(`eax,`ebx` -> `al`, `bl`).
* shader_jit_x64: Zero-extend conditional-code bytes
`mov` was doing a partial update of bits within the register, allowing
garbage to be introduced in the upper bits of the register.
These conditional tests are a 1:1 translation from the x64 code but do
not have to be. Reference-values are known at emit-time and can be
embedded as an immediate into an `EOR` instruction rather than moved
into a register. The `TST` instruction can be utilized to more optimally
test and update the `EQ`/`NE` status flags.
* renderer_vulkan: Remove vulkan prefix in SetObjectName
* renderer_vulkan: Rename renderpass cache to render manager
* It is no longer just a cache
* renderer_vulkan: Rewrite descriptor management
* Switch to batched vkUpdateDescriptorSets from cached descriptor sets with templates
* vk_master_semaphore: Remove waitable atomic
* These are buggy on some platforms and regular condition_variables are faster most of the time
* vk_texture_runtime.cpp: remove outdated references
* vk_render_manager: Minor cleanups and rename to RenderManager
* It is no longer just a renderpass cache
* Revert variable name change from render_manager back to renderpass_cache
---------
Co-authored-by: GPUCode <geoster3d@gmail.com>
* do not move constant variables
* applet_manager: avoid possible use after move
* use constant references where pointed out by msvc
* extra_hid: initialize response
* ValidateSaveState: passing slot separately is not necessary
* common: mark HashCombine as nodiscard
* cityhash: remove use of using namespace std
* Prefix all size_t with std::
done automatically by executing regex replace `([^:0-9a-zA-Z_])size_t([^0-9a-zA-Z_])` -> `$1std::size_t$2`
based on 7d8f115
* shared_memory.cpp: fix log error format
* fix compiling with pch off
* externals: Add oaksim submodule
Used for emitting ARM64 assembly
* common: Implement aarch64 ABI
Utilize oaknut to implement a stack frame.
* tests: Allow shader-jit tests for x64 and a64
Run the shader-jit tests for both x86_64 and arm64 targets
* video_core: Initialize arm64 shader-jit backend
Passes all current unit tests!
* shader_jit_a64: protect/unprotect memory when jit-ing
Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.
* shader_jit_a64: Fix ARM64-Imm overflow
These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.
* shader_jit_a64: Fix Geometry shader conditional
* shader_jit_a64: Replace `ADRL` with `MOVP2R`
Fixes some immediate-generation exceptions.
* common/aarch64: Fix CallFarFunction
* shader_jit_a64: Optimize `SantitizedMul`
Co-authored-by: merryhime <merryhime@users.noreply.github.com>
* shader_jit_a64: Fix address register offset behavior
Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.
* shader_jit_a64: Fix `RET` address offset
A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.
* shader_jit_a64: Increase max program size
Tuned for A64 program size.
* shader_jit_a64: Use `UBFX` for extracting loop-state
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Optimize `CMP+B` to `CBNZ`
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Use `FMOV` for `ONE` vector
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Remove x86-specific documentation
* shader_jit_a64: Use `UBFX` to extract exponent
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check
Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:
No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f
* shader_jit/tests: Add Dest Mask unit test
* shader_jit_a64: Fix Dest-Mask `BSL` operand order
Passes the dest-mask unit tests now.
* shader_jit_a64: Use `MOVI` for DestEnable mask
Accelerate certain cases of masking with MOVI as well
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit/tests: Add source-swizzle unit test
This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.
* shader_jit_a64: Fix swizzle index generation
This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.
* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`
Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly. This enabled the ability
emit assembly that is not directly supported by nihstro.
* shader_jit/tests: Add `CALL` unit-test
Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.
EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.
* shader_jit_a64: Fix nested `BL` subroutines
`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.
Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.
* shader_jit/tests: Allocate generated tests on heap
Each of these generated shader-test objects were causing the stack to
overflow. Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.
* shader_jit_a64: Preserve `lr` register from external function calls
`EMIT` makes an external function call, and should be preserving `lr`
* shader_jit/tests: Add `MAD` unit-test
The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68
Instead, the program code is manually configured and added.
* shader_jit/tests: Fix uninitialized instructions
These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.
* shader_jit_a64: Remove unneeded `MOV`
Residue from the direct-port of x64 code.
* shader_jit_a64: Use `std::array` for `instr_table`
Add some type-safety and const-correctness around this type as well.
* shader_jit_a64: Avoid c-style offset casting
Add some more const-correctness to this function as well.
* video_core: Add arch preprocessor comments
* common/aarch64: Use X16 as the veneer register
https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard
* shader_jit/tests: Add uniform reading unit-test
Particularly to ensure that addresses are being properly truncated
* common/aarch64: Use `X0` as `ABI_RETURN`
`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.
* common/aarch64: Add veneer register note
`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.
* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`
* shader_jit_a64: Fix CALLU condition
Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!
---------
Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* video_core: Abstract shader generators.
* shader: Extract common generator structures and move generators to specific namespaces.
* shader: Minor fixes and clean-up.
* sw_framebuffer: Take factors into account for min/max blending
* renderer_gl: Take factors into account for min/max blending
* Address review comments
* gl_shader_gen: Fix frambuffer fetch on qcom and mali
* renderer_opengl: Add fallback path for mesa
* gl_shader_gen: Avoid emitting blend emulation if minmax_factor is present
Xbyak has a complete utility-class for determining the host-processor's
ISA-features such as SSE4.1, AVX, AVX2, AVX512{F,VL,DQ,VBMI,etc}, and so
on for further potential optimizations.
* tests: add Sanity test for SplitFilename83
fix test
fix test
* disable `C4715:not all control paths return a value` for nihstro includes
nihstro: no warn
* Chore: Enable warnings as errors on msvc + fix warnings
fixes
some more warnings
clang-format
* more fixes
* Externals: Add target_compile_options `/W0` nihstro-headers and ...
Revert "disable `C4715:not all control paths return a value` for nihstro includes"
This reverts commit 606d79b55d3044b744fb835025b8eb0f4ea5b757.
* src\citra\config.cpp: ReadSetting: simplify type casting
* settings.cpp: Get*Name: remove superflous logs
* externals: Update dynarmic
* settings: Introduce GraphicsAPI enum
* For now it's OpenGL only but will be expanded upon later
* citra_qt: Introduce backend agnostic context management
* Mostly a direct port from yuzu
* core: Simplify context acquire
* settings: Add option to create debug contexts
* renderer_opengl: Abstract initialization to Driver
* This commit also updates glad and adds some useful extensions which we will use in part 2
* Rasterizer construction is moved to the specific renderer instead of RendererBase.
Software rendering has been disable to achieve this but will be brought back in the next commit.
* video_core: Remove Init/Shutdown methods from renderer
* The constructor and destructor can do the same job
* In addition move opengl function loading to Qt since SDL already does this. Also remove ErrorVideoCore which is never reached
* citra_qt: Decouple software renderer from opengl part 1
* citra: Decouple software renderer from opengl part 2
* android: Decouple software renderer from opengl part 3
* swrasterizer: Decouple software renderer from opengl part 4
* This commit simply enforces the renderer naming conventions in the software renderer
* video_core: Move RendererBase to VideoCore
* video_core: De-globalize screenshot state
* video_core: Pass system to the renderers
* video_core: Commonize shader uniform data
* video_core: Abstract backend agnostic rasterizer operations
* bootmanager: Remove references to OpenGL for macOS
OpenGL macOS headers definitions clash heavily with each other
* citra_qt: Proper title for api settings
* video_core: Reduce boost usage
* bootmanager: Fix hide mouse option
Remove event handlers from RenderWidget for events that are
already handled by the parent GRenderWindow.
Also enable mouse tracking on the RenderWidget.
* android: Remove software from graphics api list
* code: Address review comments
* citra: Port per-game settings read
* Having to update the default value for all backends is a pain so lets centralize it
* android: Rename to OpenGLES
---------
Co-authored-by: MerryMage <MerryMage@users.noreply.github.com>
Co-authored-by: Vitor Kiguchi <vitor-kiguchi@hotmail.com>