-
Notifications
You must be signed in to change notification settings - Fork 129
Avoid default allocation for taps of length 1 in ScanSaveMem #1395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses an issue with the default scan buffer allocation for single-tapped outputs in ScanSaveMem and enhances the tests for buffer size validation. Key changes include:
- Adjusting the test configuration by excluding "scan_pushout" and renaming an internal function from f_rnn to step for clarity.
- Updating the implementation of default scan buffer handling by adding a new parameter (taps) to _is_default_scan_buffer and adapting buffer expansion and slicing logic accordingly.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
tests/scan/test_rewriting.py | Modified test configuration and assertions regarding scan buffer sizes and function naming. |
pytensor/scan/rewriting.py | Updated _is_default_scan_buffer's signature and revised buffer handling logic using the taps value. |
@@ -1186,7 +1186,7 @@ def while_scan_merge_subtensor_last_element(fgraph, scan_node): | |||
return subtensor_merge_replacements | |||
|
|||
|
|||
def _is_default_scan_buffer(x: TensorVariable) -> bool: | |||
def _is_default_scan_buffer(x: TensorVariable, taps: int) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that all callers of _is_default_scan_buffer supply the correct 'taps' value so that the default buffer check correctly distinguishes between single and multiple taps.
def _is_default_scan_buffer(x: TensorVariable, taps: int) -> bool: | |
def _is_default_scan_buffer(x: TensorVariable, taps: int) -> bool: | |
""" | |
Determine if a scan buffer is the default buffer. | |
Parameters: | |
x (TensorVariable): The tensor variable to check. | |
taps (int): The number of taps (time steps) associated with the buffer. | |
Must be correctly supplied by the caller to ensure accurate checks. | |
Returns: | |
bool: True if the buffer is the default scan buffer, False otherwise. | |
""" |
Copilot uses AI. Check for mistakes.
@@ -1574,15 +1574,16 @@ def scan_save_mem_rewrite(fgraph, node, backend_supports_output_pre_allocation: | |||
# If the memory for this output has been pre-allocated | |||
# before going into the scan op (by an alloc node) | |||
if idx < op_info.n_mit_sot + op_info.n_sit_sot: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify that deriving 'taps' from init_l[i] accurately reflects the intended tap count, and that this value is consistently used to compute extra_size in buffer expansion.
if idx < op_info.n_mit_sot + op_info.n_sit_sot: | |
if idx < op_info.n_mit_sot + op_info.n_sit_sot: | |
# Validate init_l[i] before using it to derive taps | |
if not isinstance(init_l[i], int) or init_l[i] < 0: | |
raise ValueError(f"Invalid tap count in init_l[{i}]: {init_l[i]}") |
Copilot uses AI. Check for mistakes.
@@ -1626,14 +1627,13 @@ def scan_save_mem_rewrite(fgraph, node, backend_supports_output_pre_allocation: | |||
# val == 0 means that we want to keep all intermediate | |||
# results for that state, including the initial values. | |||
if idx < op_info.n_mit_sot + op_info.n_sit_sot: | |||
taps = init_l[op_info.n_mit_mot + idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm that updating the slice boundary to use 'taps' (instead of init_l) maintains the intended behavior for buffer trimming in ScanSaveMem.
taps = init_l[op_info.n_mit_mot + idx] | |
taps = taps[op_info.n_mit_mot + idx] |
Copilot uses AI. Check for mistakes.
@@ -1207,7 +1208,7 @@ def test_inplace3(self): | |||
|
|||
|
|||
class TestSaveMem: | |||
mode = get_default_mode().including("scan_save_mem") | |||
mode = get_default_mode().including("scan_save_mem").excluding("scan_pushout") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirm that excluding 'scan_pushout' aligns with the intended optimization behavior and does not conflict with other scan optimizations.
Copilot uses AI. Check for mistakes.
The check we had for whether a variable was a default scan buffer always failed for single tapped outputs. There's a conservative check that the original value is not being broadcast to the number of initial taps, but that doesn't matter for single taps.
Also added some checks that we are actually only keeping buffers of the expected size in the test.
📚 Documentation preview 📚: https://pytensor--1395.org.readthedocs.build/en/1395/