Description
This is a miscellaneous issue to adding SIMD. I've been doing a lot of work, and what's become clear is that adding SIMD is something that requires a lot of profiling, this may take a while to come to fruition.
Miscelaneous things I've found:
Using explicit assembly (SIMD or not) for anything on a Vec3 or smaller is NOT WORTH IT because of compiler inlining. This includes the cross product, but I haven't checked vector Len yet. If you disable all compiler optimization (-N) it's usually an improvement. I suppose theoretically, if you could convince the compiler to magically inline your SIMD it would work fine, but you can't so...
(Yes, I know SIMD loads 4 values at a time, you can interleave them and use junk slots for Vec3 and Vec2. I figured it was worth experimenting with)
However, the improvements gained on a Vec4 are big enough to be worth it. Combined with pointers it's a massive improvement (10 ns/op to just over 1 ns/op for some simple operations like addition).
Matrices are still a work in progress, but I'm fairly confident I can do some magic with 4x4 matrix inversion and possibly determinants. We'll see if it matters for 3x3. 4x4 Matrix multiplcation can probably be improved simply by adding a SIMD dot product and using the dot on Row/Col instead of writing out the operation like we're doing now.
Activity