-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Description
I created a test case to reproduce a hang that occurs after a strided vector load (vlse32.v) when no hardware barrier is inserted.
The test case tests a linear algebra kernel (matrix multiplication with transposed B).
The main.c computes the result and compares it against a golden model.
Input data and expected output are provided in data.h and were generated via a Python script.
How to reproduce
-
Extract the attached zip archive into
sw/spatzBenchmarks -
Add the following lines to
sw/spatzBenchmarks/CMakeLists.txt:
add_library(mat-mul-trans-B mat-mul-trans-B/kernel/mat-mul-trans-B.c)
add_spatz_test_threeParam(mat-mul-trans-B mat-mul-trans-B/main.c 64 32 64)- Build and run
Expected Behaviour
The code hangs after executing a strided vector load (vlse32.v).
Inserting a hardware barrier immediately after the load avoids the hang.
Workaround
Uncommenting the hardware barrier after the strided load fixes the issue:
asm volatile ("vlse32.v v16, (%0), %1" :: "r"(col_b1), "r"(stride));
asm volatile ("vlse32.v v24, (%0), %1" :: "r"(col_b1_next), "r"(stride));
snrt_cluster_hw_barrier();Recompiling with the barrier enabled allows the test to run correctly.
# Loading entry point: 80000000
#
##################################### MATRIX_MUL_TRANS_B TEST ####################################
#
# INFO | Running 'matrix_mul_trans_B' test on Spatz Cluster
# INFO | Test SUCCESS
#
##########################################################################################
#
# ** Info: [SUCCESS] Program finished successfully