[Misc][RFC] Add automated profiling sweep and heatmap visualization tools #17933
+640
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a lightweight and automated profiling suite for vLLM, enabling kernel-level benchmarking across multiple batch sizes and prompt lengths, and visualizing the results as heatmaps.
It includes:
sweep_profiling.py
: Automates profiling runs with varied batch and prompt configurations. In addition to the overall model runner time, we also generate the profiling result for operator breakdown.plot_heatmap_from_traces.py
: Parses JSON trace outputs and generates latency heatmaps.profiling.py
: A self-contained adaptation ofexamples/offline_inference/profiling.py
, modified for sweepability and output compatibility.For more detailed description for this RFC and PR, please see #17823 .
Usage
Example:
python sweep_profiling.py --model "deepseek-ai/DeepSeek-R1-Distill-Llama-8B" --max-tokens 80000 --tensor-parallel-size 2
This will generate multiple trace files like:
To visualize results:
Files Added
Notes
Related Issue
FIX #17823
CC List
@GindaChen @JJMN22