Previously, we observed that two schedules with the same hint, but with different shared memory scopes—shared.dyn and shared—exhibited different performance. Specifically, shared.dyn consistently underperformed compared to shared. As a result, our design has favored using static shared memory. However, the fix introduced in this commit resolved the issue by eliminating 20% of the redundant sync primitives in shared.dyn. Consequently, their performance should now be comparable.
Given this improvement, I suggest we consider converting the shared memory to shared.dyn to explore more tile candidates. However, it's important to benchmark the results to ensure that this change does not negatively impact performance.