-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: groupby aligns Series when passed as groupands #15244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
in your original function, you are passing |
I can't seem to reproduce it on my machine. However, you are passing a grouper with a different shape. I actually didn't know that pandas is aligning the grouper. You can also pass the string: |
I think @jorisvandenbossche has it right.
If you dont't align then this will be ok I think (IOW, if you pass in the group-and from the same frame). certainly no guarantees :< |
Yeah, I think this is why this didn't show up until now - I didn't know pandas supported unaligned indices passed to groupby. Showed up in a user bug dask/dask#1876. I can fix this in dask by manually aligning beforehand, but the threadsafe issue still stands. Out of curiousity, why does this fail only on the first call? Some index structure being built up and then cached on later calls? |
The passed series ( Line 2580 in ba05744
|
Hmmm, well something is getting modified, as it only fails the first time. |
its shouldn't ever modify the original object, only the groupby object itself has state, but that could be the problem. IOW, this is a cached_property, which I suppose could be interrupted and if the groupby object is shared...... |
This fails in the code above though, where only the original frame is shared (neither the groupby or the filtered frame is shared). But |
This seems to only fail if the index is longer than the grouped frame. Swapping the filter onto the index passes every time (not that this is recommended): def f(x):
return x.groupby(x[x.amount < 20].id).amount.count() |
Uh oh!
There was an error while loading. Please reload this page.
xref #15338
Calling groupby with an unaligned index on the same frame in multiple threads can result in incorrect results. The following example demonstrates:
On my machine, running
python test.py
results in:A few notes:
build_frame
). To me this indicates that the groupby call sets up some state (that's cached) that may not be threadsafe. Also, the second call topool.map
always returns correct results, only the first calls fail.Tested with pandas master, as well as pandas 0.19.0 and up.
The text was updated successfully, but these errors were encountered: