-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: implement fast isin() for nullable dtypes #38340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jorisvandenbossche Hi, I'm new here and would like to work on this issue. |
@jorisvandenbossche I found function The whole debug path is: But I cannot find this function in algos.pyx or algos.pxd, could you please give some advices? Thanks very much. |
We can see Line 4633 in 5cafae7
And if we pass import pandas as pd
import numpy as np
from pandas.core import algorithms
arr = np.random.randint(0, 10, 1_000_001)
s1 = pd.Series(arr)
s2 = pd.Series(arr, dtype="Int64")
%timeit algorithms.isin(s1._values, [1, 2, 3, 20])
1.87 ms ± 66.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit algorithms.isin(s2._values, [1, 2, 3, 20])
22.7 ms ± 851 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit algorithms.isin(s2._values._data, [1, 2, 3, 20])
1.86 ms ± 39.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) So the solution could be: from pandas.core.arrays.integer import IntegerArray
class Series:
def isin(self, values) -> "Series":
if isinstance(self._values, IntegerArray):
result = algorithms.isin(self._values._data, values)
else:
result = algorithms.isin(self._values, values)
return self._constructor(result, index=self.index).__finalize__( self, method="isin") Looking forward to your reply. |
Here is the link of PR, please let me know if there is anything need to be modified. Thanks |
Currently, you can get quite a slowdown:
The text was updated successfully, but these errors were encountered: