23 Commits

Author SHA1 Message Date
Merry
08b123feb5 IR: Modify VectorSignedSaturatedShiftLeftUnsigned to only accept immediate shift amounts 2022-10-18 15:04:30 +01:00
Merry
9313f5ea88 IR: Remove VectorShuffleHighHalfwords and VectorShuffleLowHalfwords 2022-10-18 15:04:30 +01:00
Merry
a97105c296 IR: Split VectorSignedSaturatedDoublingMultiply into VectorSignedSaturatedDoublingMultiply{High,HighRounding} 2022-10-18 15:04:30 +01:00
Merry
61d509dda2 IR: Add VectorMultiply{Signed,Unsigned}Widen instructions
Polyfill for x86-64 backend
2022-10-18 15:04:30 +01:00
Merry
babfb7d7b8 IR/saturation: Revamp saturated add/sub IR instructions 2022-10-18 15:04:30 +01:00
Merry
cd537dc711 IR: Rename PackedAbsDiffSumS8 to PackedAbsDiffSumU8 2022-10-18 15:04:30 +01:00
Merry
8b41755db0 ir_emitter: Remove unused ResultAndCarryAndOverflow structure 2022-10-18 15:04:30 +01:00
Merry
a2b3199adf Convert NZCV to C flag where able 2022-07-23 11:46:07 +01:00
Merry
72c87d11e4 a32_get_set_elimination_pass: Correct insertion point 2022-07-20 16:53:48 +01:00
Merry
78b4ba10c9 Migrate to mcl 2022-04-19 18:05:04 +01:00
merry
879f211686 ir/value: Add AccType to Value 2022-03-26 15:38:10 +00:00
merry
98cff8dd0d IR: Implement SHA256MessageSchedule{0,1} 2022-03-20 13:59:18 +00:00
merry
f0a4bf1f6a IR: Implement SHA256Hash 2022-03-20 13:59:18 +00:00
Wunkolo
5e7d2afe0f IR: Introduce VectorReduceAdd{8,16,32,64} opcode
Adds all elements of vector and puts the result into the lowest element.
Accelerates the `addv` instruction into a vectorized implementation
rather than a serial one.
2021-09-27 19:54:11 +01:00
Wunkolo
1e94acff66 ir: Add VectorBroadcastElement{Lower} IR instruction
The lane-splatting variant of `FMUL` and `FMLA` is very
common in instruction streams when implementing things like
matrix multiplication. When used, they are used very densely.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication

The way this is currently implemented is by grabbing the particular lane
into a general purpose register and then broadcasting it into a simd
register through `VectorGetElement` and `VectorBroadcast`.

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index));
```

What could be done instead is to keep it within
the vector-register and use a permute/shuffle to "splat" the particular
lane across all other lanes, removing the GPR-round-trip.

This is implemented as the new IR instruction `VectorBroadcastElement`:

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index);
```
2021-08-07 23:03:57 +01:00
Wunkolo
5971361160 IR: Add AndNot{32,64} IR instruction
Also includes BMI1-acceleration for x64, when available
2021-07-02 22:27:29 +01:00
Wunkolo
49d00634f9 IR: Add VectorAndNot IR instruction
And(a, Not(b)) is a common enough operation that this can
be fused into a single `AndNot` operation. On x64 this is also
a single `pandn` instruction rather than two.
2021-07-02 22:27:29 +01:00
SachinVin
a626a2ec63 ir_emitter: Remove 32-bit-only SubWithCarry 2021-06-11 17:27:34 +01:00
SachinVin
ccf27f9c8c ir_emitter: Remove 32-bit-only AddWithCarry 2021-06-09 01:54:03 +01:00
MerryMage
828959caed IR: Implement FPVector{To,From}Half32
Implement ASIMD VCVT (half) in terms of this instruction.
Correct handling of ASIMDStandardValue.
2021-06-05 03:39:48 +01:00
MerryMage
17ae7f9ce1 IR: Implement IR instruction CallHostFunction 2021-05-23 15:44:57 +01:00
MerryMage
53493b2024 Add .clang-format file
Using clang-format version 12.0.0
2021-05-22 15:07:02 +01:00
Merry
714216fd0e Consolidate all source files into src/ directory 2021-05-19 17:41:59 +01:00