Wunkolo 1e94acff66 ir: Add VectorBroadcastElement{Lower} IR instruction
The lane-splatting variant of `FMUL` and `FMLA` is very
common in instruction streams when implementing things like
matrix multiplication. When used, they are used very densely.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication

The way this is currently implemented is by grabbing the particular lane
into a general purpose register and then broadcasting it into a simd
register through `VectorGetElement` and `VectorBroadcast`.

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index));
```

What could be done instead is to keep it within
the vector-register and use a permute/shuffle to "splat" the particular
lane across all other lanes, removing the GPR-round-trip.

This is implemented as the new IR instruction `VectorBroadcastElement`:

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index);
```
2021-08-07 23:03:57 +01:00
..
2021-05-22 15:07:02 +01:00
2021-05-22 15:07:02 +01:00
2021-05-22 15:07:02 +01:00
2021-05-22 15:07:02 +01:00
2021-05-22 15:07:02 +01:00
2021-05-22 15:07:02 +01:00