add.h |
|
55418 |
and.h |
|
41391 |
cmplt.h |
|
26247 |
cnt.h |
SIMDE_ARM_SVE_CNT_H |
2576 |
dup.h |
|
43805 |
ld1.h |
Note: we don't have vector implementations for most of these because
we can't just load everything and mask out the uninteresting bits;
that might cause a fault, for example if the end of the buffer buts
up against a protected page.
One thing we might be able to do would be to check if the predicate
is all ones and, if so, use an unpredicated load instruction. This
would probably we worthwhile for smaller types, though perhaps not
for larger types since it would mean branching for every load plus
the overhead of checking whether all bits are 1. |
14386 |
ptest.h |
SIMDE_ARM_SVE_PTEST_H |
2399 |
ptrue.h |
SIMDE_ARM_SVE_PTRUE_H |
4711 |
qadd.h |
|
18612 |
reinterpret.h |
|
66380 |
sel.h |
|
28614 |
st1.h |
|
13374 |
sub.h |
|
55418 |
types.h |
TODO: SVE2 is going to be a bit awkward with this setup. We currently
either use SVE vectors or assume that the vector length is known at
compile-time. For CPUs which provide SVE but not SVE2 we're going
to be getting scalable vectors, so we may need to loop through them.
Currently I'm thinking we'll have a separate function for non-SVE
types. We can call that function in a loop from an SVE version,
and we can call it once from a resolver.
Unfortunately this is going to mean a lot of boilerplate for SVE,
which already has several variants of a lot of functions (*_z, *_m,
etc.), plus overloaded functions in C++ and generic selectors in C.
Anyways, all this means that we're going to need to always define
the portable types.
The good news is that at least we don't have to deal with
to/from_private functions; since the no-SVE versions will only be
called with non-SVE params. |
34178 |
whilelt.h |
|
34501 |