Updates in 2025.3.1
General
Improved the charts in the Compute Workload Analysis section to better distinguish between per_cycle_active and per_cycle_elapsed metrics.
Resolved Issues
Fixed an issue where kernels using the compile-time attribute
__block_size__were launched with incorrect grid dimensions.Fixed an issue with timline y-axis labels being showing unexpected units for small max values.
Fixed a crash when stepping applications in interactive profiling mode.
Fix that roofline charts missed showing achived value data in some cases.
Fixed that duplicated tooltips could be shown for some links in the Memory Chart.
Fixed a potential hang when setting
--pm-sampling-buffer-sizeto very large values.Fixed several rules to not show non-actionable warnings for unsupported, missing metrics when profiling on mobile chips.