- The current limits are being hit in yuzu with some games (e.g. newer updates of BotW and SSBU).
- Increasing this fixes slow-downs in these games due to code being recompiled.
BMI2 instructions such as `pdep` and `pext` have been
known to be incredibly slow on AMD. But on Zen3
and newer, the performance of these instructions
are now much greater, but previous versions of AMD
architectures should still avoid BMI2.
On Zen 2, pdep/pext were 300 cycles. Now on Zen 3 it is 3 cycles.
This is a big enough improvement to allow BMI2 code to
be dispatched if available. The Zen 3 architecture is checked for
by detecting the family of the processor.
* Use pext/pdep where not previously used
* Limit pext/pdep to non-AMD platforms due to slowness on AMD
* Use imul/and as alternatives for AMD and non-BMI2 platforms
Removes unnecessary header dependencies that have accumulated over time
as changes have been made. Lessens the amount of files that need to be
rebuilt when the headers change.