-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Series.argsort() incorrect handling of NaNs #12694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Perhaps what the original implementation was after is
If one does go with this implementation (where the values represent the ordering), perhaps one should rename the method something other than |
@seth-p this is exactly the same as
Not really sure what the issue is about here. If/when we have integer NA then these -1's could be NaN. |
The output of
The output of Put another way, can you explain what the output of The documentation just says "Overrides ndarray.argsort. Argsorts the value, omitting NA/null values, and places the result in the same locations as the non-NA values". My point is that (a) omitting NA/null values the way that it does before calling At the risk of repeating myself a bit, the values produced by
In what way should The presence of I think the only two plausible implementations of
Here |
and I'll ask you back, what is the purpose of doing |
To be honest its almost an internal routine.
|
Of course you don't use the Pandas Seriously:
|
@seth-p 2 you should simply use why go to all this trouble, and if you really really thing you actually need
|
I didn't realize Since But to be clear, the existing To prevent someone else going though the headache I just did in misusing it, I strongly suggest that it (i) simply be deleted; (ii) be redefined to return |
@seth-p I tell you what, why don't you add an example (or 2) to My example is showing idempotency of the tranform (as you showed I'll move this to 0.19.0 (though would take a doc-PR in the meantime). and we can rediscuss then as any changes would not be done until then (as these are all API changes). |
I'll think about what I think should be done for 0.19.0, and perhaps take a stab at it. I can't think of a doc-PR for the existing |
Going through some argsort/sorting issues, and I also stumbled on this. I agree with @seth-p that the current implementation of It is also true that it behaves as documented: "Argsorts the value, omitting NA/null values, and places the result in the same locations as the non-NA values". Small example:
So because the result of I think ideally we should fix this. But, the question is how to do this? |
I too encountered this issue. I wrote code for working with numpy arrays using argsort and then was very surprised to find that it didn't work when I later passed it a Series instead. In particular, I was calling |
@mroeschke df.iloc[df["A"].abs().argsort()] I encountered this issue. Of course nobody uses I noticed that you have a The current implementation is incorrect: Lines 3751 to 3754 in 6169cba
The returned indices of non-nan items are meaningless when it's placed back to the original position. |
It appears that
Series.argsort()
is implemented ass.dropna().argsort().reindex(s.index, fill_value=-1) = np.argsort(s.dropna()).reindex(s.index, fill_value=-1)
.There are two problems with this:
(a) Since the result represents integer indices into the original series
s
, the result should not have the sameindex
ass.index
-- it should either be aSeries
with index[0, 1, ...]
, or more likely simply be a NumPyarray
;(b) The way
NaN
s are effectively removed before callingnp.argsort()
leads to indexes that are no longer appropriate into the originalSeries
, resulting in the nonsensical results shown in[9]
and[10]
below.Note that: "As of NumPy 1.4.0
argsort
works with real/complex arrays containing nan values." So it's not obvious to me that there's much to be gained from mucking around withnp.argsort(s.values)
.The text was updated successfully, but these errors were encountered: