[bug] use_sliding_window
doesn't work as expected
#38002
Labels
use_sliding_window
doesn't work as expected
#38002
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
description
What is expected:
use_sliding_window
is set as false indeepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
here. We do expect sliding window is disabled. In other words, we should expect the same results even with differentsliding_window
.However, the results are different in the repro script.
Root cause
Attention Mask is changed according to
sliding_window
without respect onuse_sliding_window
.transformers/src/transformers/models/qwen2/modeling_qwen2.py
Lines 708 to 715 in 3c0796a
If we add some printing under this conditional block, we can clearly see attention mask is changed even with
use_sliding_window=false
The text was updated successfully, but these errors were encountered: