dynarmic

mirror of https://github.com/azahar-emu/dynarmic synced 2025-11-11 01:19:59 +01:00

Author	SHA1	Message	Date
merry	bb713194a0	backend/x64: Implement SHA256 polyfills	2022-03-20 13:59:18 +00:00
merry	98cff8dd0d	IR: Implement SHA256MessageSchedule{0,1}	2022-03-20 13:59:18 +00:00
merry	f0a4bf1f6a	IR: Implement SHA256Hash	2022-03-20 13:59:18 +00:00
merry	a4daad6336	block_of_code: Add HostFeature SHA	2022-03-20 00:13:03 +00:00
Merry	bcfe377aaa	x64/reg_alloc: More zero extension paranoia	2022-03-06 12:24:50 +00:00
Merry	316b95bb3f	{a32,a64}_emit_x64_memory: Zero extension paranoia	2022-03-06 12:10:40 +00:00
Merry	0fd32c5fa4	a64_emit_x64_memory: Fix bug in 128 bit exclusive write fallback	2022-02-28 19:53:43 +00:00
merry	5ea2b49ef0	backend/x64: Inline exclusive memory access operations (#664 ) * a64_emit_x64_memory: Add Unsafe_IgnoreGlobalMonitor optimization * a32_emit_x64_memory: Add Unsafe_IgnoreGlobalMonitor optimization * a32_emit_x64_memory: Remove dead code * {a32,a64}_emit_x64_memory: Also verify vaddr in Exclusive{Read,Write}MemoryInlineUnsafe * a64_emit_x64_memory: Full fallback for ExclusiveWriteMemoryInlineUnsafe * a64_emit_x64_memory: Inline full locking * a64_emit_x64_memory: Allow inlined locking to be optionally removed * spin_lock: Use xbyak instead of inline asm * a64_emit_x64_memory: Recompile on exclusive fastmem failure * Avoid variable shadowing * a32_emit_x64_memory: Implement recompilation * Fix recompilation * spin_lock: Clang format fix * fix fallback function calls	2022-02-28 08:13:10 +00:00
merry	0a11e79b55	backend/x64: Ensure all HostCalls are appropriately zero-extended	2022-02-27 20:04:44 +00:00
merry	6c4fa780e0	{a32,a64}_emit_x64_memory: Ensure return value of fastmem callback are zero-extended	2022-02-27 19:58:23 +00:00
merry	593de127d2	a64_emit_x64: Clear fastmem patch information on ClearCache	2022-02-27 19:50:05 +00:00
Merry	c90173151e	backend/x64: Split off memory emitters	2022-02-26 21:25:09 +00:00
Merry	19a423034e	block_of_code: Fix inaccurate size reporting in SpaceRemaining Typo: getCode should be getCurr: Instead of comparing against the current pointer, we were incorrectly comparing against the start of memory	2022-02-26 16:09:11 +00:00
Merry	ea08a389b4	emit_x64_floating_point: EmitFPToFixed: No need to round if rounding_mode == TowardsZero cvttsd2si truncates during operation	2022-02-23 20:44:02 +00:00
merry	b34214f953	emit_x64_floating_point: Improve EmitFPToFixed codegen	2022-02-23 19:42:15 +00:00
merry	5fe274f510	emit_x64_floating_point: Deinterlace 64-bit FPToFixed signed/unsigned codepaths	2022-02-23 19:14:41 +00:00
merry	b8dd1c7510	emit_x64_floating_point: Correct dead-code warning in MSVC 2019	2022-02-12 22:07:26 +00:00
merry	95a1ebfb97	backend/x64: Bugfix: A32 frontent also uses FPSCR.QC	2022-02-12 21:46:45 +00:00
Fernando Sahmkow	a8cbfd9af4	X86_Backend: set fences correctly for memory barriers and synchronization.	2022-02-01 14:27:54 +00:00
Wunkolo	ad5465d6ce	constant_pool: Use `tsl::robin_map` rather than `unordered_map` Finding a much more drastic improvement with `robin_map`. `map`: ``` [master] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 567.0 ms ± 6.9 ms [User: 513.1 ms, System: 53.2 ms] Range (min … max): 554.4 ms … 588.1 ms 100 runs ``` `unordered_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 561.1 ms ± 4.5 ms [User: 508.1 ms, System: 52.3 ms] Range (min … max): 552.6 ms … 574.2 ms 100 runs ``` `tsl::robin_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 553.5 ms ± 5.6 ms [User: 500.7 ms, System: 52.1 ms] Range (min … max): 545.7 ms … 569.3 ms 100 runs ```	2022-01-01 12:13:13 +00:00
Wunkolo	e57bb0569a	constant_pool: Convert hashtype from `tuple` to `pair`	2022-01-01 12:13:13 +00:00
Wunkolo	befc22a61e	constant_pool: Use `unordered_map` rather than `map` `map` is an ordinal structure with log(n) time searches. `unordered_map` uses O(1) average-time searches and O(n) in the worst case where a bucket has a to a colliding hash and has to start chaining. The unordered version should speed up our general-case when looking up constants. I've added a trivial order-dependent(_(0,1) and (1,0) will return a different hash_) hash to combine a 128-bit constant into a 64-bit hash that generally will not collide, using a bit-rotate to preserve entropy.	2022-01-01 12:13:13 +00:00
Morph	28714ee75a	general: Rename files with duplicate names In MSVC, having files with identical filenames will result into massive slowdowns when compiling. The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h	2021-12-23 11:38:58 +00:00
Fernando S	e4146ec3a1	x64 Interface: Allow for asynchronous invalidation (#647 ) * x64 Interface: Make Invalidation asynchronous. * Apply suggestions from code review	2021-10-05 15:06:41 +01:00
Wunkolo	5e7d2afe0f	IR: Introduce `VectorReduceAdd{8,16,32,64}` opcode Adds all elements of vector and puts the result into the lowest element. Accelerates the `addv` instruction into a vectorized implementation rather than a serial one.	2021-09-27 19:54:11 +01:00
Marshall Mohror	0b8fd755d8	Fix `signal_stack_size` for glibc 2.34 `SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.	2021-09-22 20:38:11 +01:00
Ben	6ce8bfaf32	Add API function to retrieve dissassembly as vector of strings (#644 ) Co-authored-by: ben <Avuxo@users.noreply.github.com>	2021-09-16 16:45:20 -04:00
Merry	615ce8c7c5	IR: Remove A32 IR instructions Get{N,Z,V}Flag	2021-08-12 13:06:15 +01:00
Wunkolo	1e94acff66	ir: Add VectorBroadcastElement{Lower} IR instruction The lane-splatting variant of `FMUL` and `FMLA` is very common in instruction streams when implementing things like matrix multiplication. When used, they are used very densely. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication The way this is currently implemented is by grabbing the particular lane into a general purpose register and then broadcasting it into a simd register through `VectorGetElement` and `VectorBroadcast`. ```cpp const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index)); ``` What could be done instead is to keep it within the vector-register and use a permute/shuffle to "splat" the particular lane across all other lanes, removing the GPR-round-trip. This is implemented as the new IR instruction `VectorBroadcastElement`: ```cpp const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index); ```	2021-08-07 23:03:57 +01:00
Merry	d41bc492fe	{a32,a64}_jitstate: Remove unnecessary headers	2021-08-07 19:35:33 +01:00
Merry	07b5734fb0	xbyak: Correct xbyak include directory xbyak is intended to be installed in /usr/local/include/xbyak. Since we desire not to install xbyak before using it, we copy the headers to the appropriate directory structure and use that instead	2021-08-07 15:13:49 +01:00
Merry	59fb568b27	tests: Use Zydis for disassembly	2021-08-06 15:29:43 +01:00
Wunkolo	f33bd69ec2	emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed AVX512 introduces the _unsigned_ variant of float-to-integer conversion functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not representable as an unsigned integer, it will result in `0xFFFFF...` which can be utilized to get "free" saturation when the floating point value exceeds the unsigned range, after masking away negative values. https://www.felixcloutier.com/x86/vcvttps2udq https://www.felixcloutier.com/x86/vcvttpd2uqq This PR also speeds up the _signed_ conversion function for fp64->int64 https://www.felixcloutier.com/x86/vcvttpd2qq	2021-07-17 22:13:11 +01:00
SachinVin	048da372e9	block_of_code.cpp: remove redundant `align()`	2021-07-17 22:12:31 +01:00
Wunkolo	5971361160	IR: Add AndNot{32,64} IR instruction Also includes BMI1-acceleration for x64, when available	2021-07-02 22:27:29 +01:00
Wunkolo	49d00634f9	IR: Add VectorAndNot IR instruction And(a, Not(b)) is a common enough operation that this can be fused into a single `AndNot` operation. On x64 this is also a single `pandn` instruction rather than two.	2021-07-02 22:27:29 +01:00
Wunkolo	1fc96fd0c2	emit_x64{_vector}_floating_point: Unsafe AVX512 implementation of Emit{RSqrt,Recip}Estimate This implementation exists within the unsafe optimization paths and utilize the 14-bit-precision `vrsqrt14` and `vrcp14p` instructions provided by AVX512F+VL. These are _more_ accurate than the fallback path and the current `rsqrt`-based unsafe code-path but still falls in line with what is expected of the `Unsafe_ReducedErrorFP` optimization flag. Having AVX512 available will mean this function has 14 bits of precision. Not having AVX512 available will mean these functions have 11 bits of precision.	2021-06-27 11:18:58 +01:00
Wunkolo	c6125082ea	emit_x64_floating_point: AVX512 implementation of EmitFPMinMaxNumeric	2021-06-20 10:12:27 +01:00
Wunkolo	776208742b	emit_x64_{vector_}floating_point: Centralize implementation of FP{Vector}{Abs,Neg} Removes dependency on the constants at the top of some files such as `f16_negative_zero` and `f32_non_sign_mask` in favor of the `FPInfo` trait-type. Also removes bypass delays by selecting between instructions such as `pand`, `andps`, or `andpd` depending on the type and keeps them in their respective uop domain. See https://www.agner.org/optimize/instruction_tables.pdf for more info on bypass delays.	2021-06-10 00:04:57 +01:00
SachinVin	ccf27f9c8c	ir_emitter: Remove 32-bit-only `AddWithCarry`	2021-06-09 01:54:03 +01:00
Wunkolo	5385edcc66	emit_x64_vector_floating_point: AVX512 implementation of EmitFPVector{Min,Max}{32,64}	2021-06-08 17:50:28 +01:00
Wunkolo	0c67b913fe	backend/x64: Add vcmp constants	2021-06-08 17:50:28 +01:00
Wunkolo	8fde505943	backend/x64: Add vfpclass constants Bit-wise constants for use with the `vfpclass` instruction.	2021-06-08 17:50:28 +01:00
Wunkolo	c82e29ed82	backend/x64: Add vrange constants Adds compile-time `FpRangeLUT` for generating the 8-bit immediate LUT value for the `vrange*` instruction	2021-06-08 17:50:28 +01:00
MerryMage	c1d5a7977e	Add Unsafe_IgnoreStandardFPCRValue optimization	2021-06-08 17:26:45 +01:00
Wunkolo	c157dfcc4c	emit_x64_vector: Reduce `gf2p8affineqb` requirement to `GFNI` Currently, every usage of `gf2p8affineqb` is guarded by the `AVX512F + AVX512VL + GFNI` requirement, when really we only need `GFNI` on its own. This will allow `GFNI`-only chips to get emit GFNI features without needing to have AVX512 as well. There _are_ chips in existance currently that strictly ship with GFNI and have no implementation of AVX1/AVX2/AVX512(and thus no VEX/EVEX encoding) such as Tremont(Lakefield) chips.	2021-06-08 14:00:00 +01:00
Wunkolo	e47d0d11c3	emit_x64_vector: AVX512 implementation of EmitVectorNot Single in-place ternary logic instruction.	2021-06-08 03:11:38 +01:00
Markus Wick	0c12614d1a	A64/config.h: Split fastmem and page_table options. We might want to allocate different sizes for each of them. e.g. for the unsafe fastmem approach without bounds checking. Or for using the full 48bit adress range (with mirrors) by allocating our real arena as close to 1<<47 as possible.	2021-06-06 17:25:51 +01:00
MerryMage	828959caed	IR: Implement FPVector{To,From}Half32 Implement ASIMD VCVT (half) in terms of this instruction. Correct handling of ASIMDStandardValue.	2021-06-05 03:39:48 +01:00
Wunkolo	9a23c09c3b	emit_x64_floating_point: AVX implementation of ZeroIfNaN	2021-05-31 13:41:05 +01:00

1 2

76 Commits