PERF: Slowdowns with .isin() on columns typed as np.uint64 #60098
Labels
isin
isin method
Performance
Memory or execution speed performance
Regression
Functionality that used to work in a prior pandas version
Milestone
Uh oh!
There was an error while loading. Please reload this page.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
The last line, with older numpy==1.26.4 (last version <2.0), is even worse: ~200ms.
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.10.12
python-bits : 64
OS : Linux
OS-release : 6.5.0-27-generic
Version : #28~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 15 10:51:06 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 2.1.2
Prior Performance
With pandas 1.4.4 and numpy 1.26.4, all the benchmarks above show 3-5ms (3ms on signedness match, 5ms on signedness mismatch). So despite updating numpy mitigating the worst 200ms regression, this still looks like a 5x performance regression on pandas side since 1.4.4.
I'm guessing the regression could be related to PR #46693 , which happened on the 1.5.0 release.
The text was updated successfully, but these errors were encountered: