-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Changing way of applying numpy functions on Series #33775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Building documentation returns
which looks like issue #26788 |
Is this going to kill performance for something like the following? s = pd.Series(list(range(100_000))
s.apply(np.sin) |
I think so, but I feel like it's more appropriate use of the |
@@ -3847,9 +3847,6 @@ def f(x): | |||
f = func | |||
|
|||
with np.errstate(all="ignore"): | |||
if isinstance(f, np.ufunc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we want to do this?
is going to kill perf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #33492 describes that. I also added test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said I am not inclided to accept this w/o proof that this does not kill perf. We likely don't have asv's for this (or even using a np.ufunc in apply).
Now if you add those and either perf is not degraded, or you actually catch the TypeError then I would be ok.
@@ -3847,9 +3847,6 @@ def f(x): | |||
f = func | |||
|
|||
with np.errstate(all="ignore"): | |||
if isinstance(f, np.ufunc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said I am not inclided to accept this w/o proof that this does not kill perf. We likely don't have asv's for this (or even using a np.ufunc in apply).
Now if you add those and either perf is not degraded, or you actually catch the TypeError then I would be ok.
I felt like numpy functions could be called differently e.g. np.sin(series). |
If we're going to change this then we'll need to deprecate the current behavior first. I haven't looked closely at the original issue, but at a glance it looks like it's dealing with nested data, which isn't especially well-supported in pandas. It's not clear to me that the gain from changing the implementation here is worth the effect to existing users. |
Maybe better solution would be adding new parameter e.g if parameter force_mapping is set to True, function will be applied on each element separately. |
Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-05-10 21:18:01 UTC |
I don't like just adding the If we want to do this, I think the solution is to actually deprecate the current behavior of ufuncs in |
I agree with @TomAugspurger here, would be ok with showing a |
What do you mean by showing |
we are -1 on adding any options to this function. you cannot simply change the way something like this works w/o showing the perf benefit (or drag in this case). I suspect this is likely going to be bad. So the only way this could possibly go forward is to detect cases where this is likey to be used (e.g. using a ufunc on an object dtype), then dispatch that way. You might need to then show a |
@jreback do you have a preference for The downside is that we can't fix the bug immediately. But IMO not slowing down this code without warning is more important than fixing the bug that (I think) only shows up for nested data. |
I don’t think we should change this at all including deprecation |
closing this as I don't see positions changing here. If you want to raise an issue and get consensus on a change okie dokie. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff