-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support partial vector extension instructions #545
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks
Benchmark suite | Current: c3374ea | Previous: 4ef61b8 | Ratio |
---|---|---|---|
Dhrystone |
1256 Average DMIPS over 10 runs |
1284 Average DMIPS over 10 runs |
1.02 |
Coremark |
946.035 Average iterations/sec over 10 runs |
972.508 Average iterations/sec over 10 runs |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
|
||
/* standard uncompressed instruction */ | ||
const uint32_t index = (insn & INSN_6_2) >> 2; | ||
static inline bool op_000000(rv_insn_t *ir, const uint32_t insn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op_000000
looks misleading. Can you improve its naming scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming scheme is based on the function6
field listed in riscv-v-spec/inst-table.adoc. Since each function6
may include OPI
, OPM
, or OPF
functions, often corresponding to unrelated operations. I chose to name them directly based on the function6
for consistency.
This might seem unclear without additional context. To improve clarity, I could add comments explaining the naming convention for each op_function6
. Would this address your concern?
Could we create an CI like using ROSCOF for this? |
I'm not familiar with ROSCOF, but I'll look into it and give it a try. |
Suggest using |
Thank you all for your feedback and suggestions! I will fix the typos, add a newline at the end of files, and remove any unnecessary elements. I also noticed that the current code does not fully meet the contributing guidelines, so I will make sure to address those issues. In addition, I will add more detailed comments in Since some of the code was misplaced from the beginning, and as @eleanorLYJ mentioned, there are non-compliant comments in an early commit, I’m considering I’d appreciate your guidance. Thank you! |
This comment was marked as resolved.
This comment was marked as resolved.
src/rv32_template.c
Outdated
} \ | ||
} | ||
|
||
#define VMV_LOOP(des, op1, op2, op, SHIFT, MASK, i, j, itr, vm) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I ask where this is one used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VMV_LOOP
macro is used in the implementation of vmv_v_i
(at src/rv32_template.c
, line 6366), as the riscv-vector-tests frequently utilize vmv_v_i
to clear bits in vector registers during each test. This serves as a quick implementation for the vmv_v_i
instruction.
Additionally, I noticed that the implementations of vmv_v_*
(representing vmv_v_v
, vmv_v_x
, and vmv_v_i
) can be refactored to reuse existing macros such as VV_LOOP
, VX_LOOP
, VI_LOOP
, and their _LEFT
variants (collectively referred to as V*_LOOP
and V*_LOOP_LEFT
). I will remove the VMV_LOOP
and related _LEFT
macros accordingly. Thank you for pointing this out!
case 5: /* Reserved */ | ||
decode_vxtype(ir, insn); | ||
ir->opcode = rv_insn_vfmin_vf; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code marks case 5 as /* Reserved */
but then proceeds to implement it. Consider either removing the comment or making it a reserved case that returns false
.
Code suggestion
Check the AI-generated fix before applying
case 5: /* Reserved */ | |
decode_vxtype(ir, insn); | |
ir->opcode = rv_insn_vfmin_vf; | |
break; | |
case 5: /* Reserved */ |
Code Review Run #6921bb
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
Code Review Agent Run #0be539Actionable Suggestions - 3
Additional Suggestions - 7
Review Details
|
return op(ir, insn); | ||
static inline bool op_vcfg(rv_insn_t *ir, const uint32_t insn) | ||
{ | ||
switch (insn & 0x80000000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more descriptive constant for the bit mask 0x80000000
. A named constant like VSETVLI_MASK
would improve code readability and maintainability.
Code suggestion
Check the AI-generated fix before applying
switch (insn & 0x80000000) { | |
/* Mask for bit 31 of vector configuration instructions */ | |
#define VSETVLI_MASK 0x80000000 | |
switch (insn & VSETVLI_MASK) { |
Code Review Run #0be539
Is this a valid issue, or was it incorrectly flagged by the Agent?
- it was incorrectly flagged
rv->csr_vtype = 0x80000000; | ||
return true; | ||
} | ||
uint16_t vlmax = (v_lmul < 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some comment for this part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'll add some detail to the vlmax
calculation.
src/rv32_template.c
Outdated
|
||
/* clang-format off */ | ||
|
||
#define OPT(des, op1, op2, op, op_type) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, but on thing I concern is some more complex RVV instructions can't be simply implemented by this macro.
These kinds of instructions include but not limit to vmerge, nclip, etc.
Not sure what is your plan for these instructions.
And maybe rename or add some comment to identify the limitation of OPT()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name OPT
is not precise. It is a matter of sew
variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, but on thing I concern is some more complex RVV instructions can't be simply implemented by this macro.
These kinds of instructions include but not limit to vmerge, nclip, etc.
Not sure what is your plan for these instructions.
And maybe rename or add some comment to identify the limitation of OPT().
You're right. The current implementation doesn't support widening/narrowing vector instructions, as well as vmerge
and vnclip
. The original plan is to introduce variants, such as WV_LOOP
, to handle these cases. However, this PR focuses on the basic implementation of vector instructions, so I'll leave this for future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe rename or add some comment to identify the limitation of
OPT()
.
The name
OPT
is not precise. It is a matter of sew variant.
I'll rename OPT
to VECTOR_DISPATCH
, but let me know if you have a better suggestion.
In general is an amazing PR. You made some incredible contribution |
This comment was marked as resolved.
This comment was marked as resolved.
Add decode stage for RISC-V "V" Vector extension instructions from version 1.0, excluding VXUNARY0, VRFUNARY0, VWFUNARY0, VFUNARY1, vmv<nr>r, and VFUNARY0. This commit focuses on the decode stage to ensure correct instructions parsing before proceeding to the execution stage. Verification is currently done through hand-written code. Modify Makefile to support VLEN configuration, via make ENABLE_EXT_V=1 VLEN=<value>. The default value for VLEN is set to 128. The current implementation only supports VLEN=128. Enabling ENABLE_EXT_V=1 will also enable ENABLE_EXT_F=1, as vector load/ store instruction shares the same opcode with load_fp and store_fp.
Add support for vset{i}vl{i} instructions following the RISC-V vector extension version 1.0. Simplify avlmax calculation by directly computing avlmax = lmul * vlen / sew instead of converting to floating-point as described in the specification.
Implement vle8_v, vle16_v, vle32_v, vse8_v, vse16_v, vse32_v. Using loop unrolling technique to handle a word at a time. The implementation assumes VLEN = 128. There are two types of illegal instructions: 1. When eew is narrower than csr_vl. Set vill in vtype to 1 and other bits to 0, set csr_vl to 0. 2. When LMUL > 1 and trying to access a vector register that is larger than 31. Use assert to handle this case.
To emulate vector registers of length VLEN using an array of uint32_t, we first handle different SEW values (8, 16, 32) using sew_*b_handler. Inside the handler, the V*_LOOP macro expands to process different VL values and operand types, along with its corresponding V*_LOOP_LEFT. The goal is to maximize code reuse by defining individual operations next to their respective vector instructions, which can be easily applied using the OPT() macro. V*_LOOP execution steps: 1. Copy the operand op1 (op2). 2. Align op1 to the right. 3. Perform the specified operation between op1 and op2. 4. Mask the result according to the corresponding SEW. 5. Shift the result left to align with the corresponding position. 6. Accumulate the result. In vector register groups, registers should follow the pattern v2*n, v2*n+1 when lmul = 2, etc. The current implementation allows using any vector registers except those exceeding v31. For vector masking, if the corresponding mask bit is 0, the value of the destination vector register is preserved. The process is as follows: 1. Copy the destination register. 2. Clear the bits corresponding to VL. 3. Store the computed result in ans. 4. Update the destination register with ans. If ir->vm == 0, vector masking is activated.
The current riscv-arch-test does not include a test suite for vector extension. However, we could explore using riscv-ctg to generate a suitable test suite. |
# Vector extension instructions | ||
ENABLE_EXT_V ?= 0 | ||
$(call set-feature, EXT_V) | ||
VLEN ?= 128 # Default VLEN is 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall this moved into the conditional block of ifeq ($(call has, EXT_V), 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I'll move VLEN ?= 128
into the conditional block. Thanks for pointing that out.
} \ | ||
} | ||
|
||
#define VI_LOOP(des, op1, op2, op, SHIFT, MASK, i, j, itr, vm) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MASK
parameter has same name with MASK()
function-like macro. I am concern with the naming scheme. Can you pass MASK_BIT
as the parameter and leverage the MASK()
function-like macro?
P.S. The compiler should smart enough to do constant propagation for the MASK()
function-like macro during compile time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rename MASK
to MASK_BIT
. Thanks for your suggestion and the information!
Can you provide tools to validate RVV compliance? As I know, there are few projects below: |
@jserv I am using RISC-V Vector Tests Generator to validate the implemented rvv instructions.
By using
I ran the tests using |
Add support for the RISC-V "V" Vector Extension. This pull request implements decoding for 585 out of 616 version 1.0 spec vector instructions, with partial interpreter implementation.
The decoding method for vector instructions, including vector configuration and load/store instructions, follows the approach used in
rv32emu
. The newrvv_jumptable
is introduced to handle remaining arithmetic instructions.The interpreter implementation is tested using the riscv-vector-tests repository, with current limitations, as outlined in the repo. Included partial support for vector load/store instructions and single-width arithmetic instructions. The architecture now supports different settings for
sew
,lmul
, and vector masking.Vector instructions passing the tests include:
vle8.v
,vle16.v
,vle32.v
vse8.v
,vse16.v
,vse32.v
vadd.vv
,vadd.vx
,vadd.vi
vsub.vv
,vsub.vx
,vsub.vi
,vand.vv
,vand.vx
,vand.vi
vor.vv
,vor.vx
,vor.vi
vxor.vv
,vxor.vx
,vxor.vi
vsll.vv
,vsll.vx
,vsll.vi
vmul.vv
,vmul.vx
,vmul.vi
Close #504
Summary by Bito
This pull request implements extensive support for the RISC-V Vector Extension, decoding 585 out of 616 vector instructions. It enhances vector load/store operations, arithmetic instructions, and configuration management, while introducing a new jump table for remaining instructions. Many operations are still placeholders, indicating areas for future development.Unit tests added: True
Estimated effort to review (1-5, lower is better): 2