-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: 'Series.to_numpy(dtype=, na_value=)' behaves differently with 'pd.NA' and 'np.nan' #48951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is it because The exception raises in line 535. Is it a problem with numpy? Line 535 in 87cfe4e
|
Could you post this in #48891? |
Not sure this is the same as #48891, I think this can already be considered a bug The following should work:
I think this is more related to #48864 |
This works if you start with a nullable type:
The issue is if you start with Looking into this |
Oh yes, I was about to open an issue, too, and name it to_numpy(): na_value ignored when converting object-type pandas data floattypeseries = pd.Series( [1,2,None], dtype='Float64')
objecttypeseries = floattypeseries.astype('object')
floattypeseries.to_numpy(dtype=float, na_value=np.nan) # → succeeds, 'array([ 1., 2., nan])'
objecttypeseries.to_numpy(dtype=float, na_value=np.nan) # → fails with 'TypeError: float() argument must be a string or a number, not 'NAType'' Looking for a drop-in replacement as a workaround, my first idea was to use .to_numpy(na_value=np.nan).astype(float) but this fails for integer-type pandas data (not containing any .astype('object').to_numpy(na_value=np.nan).astype(float) |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
It appears that a Series that has a missing value that was created using either
None
ornp.nan
can be replaced by usingSeries.to_numpy(dtype=, na_value=)
, but one created withpd.NA
fails with a raised exception (both arguments must be specified to trigger the behavior).Expected Behavior
It is expected that since all three values (
None
,np.nan
, andpd.NA
) all represent missing values, that all three should behave the same. For the above reproducible example, the print statements should all report[1 2 0 4]
(or[1. 2. 0. 4.]
for the fourth 'float64' case).Installed Versions
The text was updated successfully, but these errors were encountered: