Skip to content

BUG: GH10355 groupby std() doesnt sqrt grouping cols #11507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

henrystokeley
Copy link

New attempt at #10355 Hopefully should address the issues raised in #11300

Previously, grouping columns were square rooted when as_index=False
We now test whether the grouping keys are in the columns, and
if so don't square root those columns.

Note that we squash TypeError which occurs when self.keys is not
Hashable, and so we can't check for existence in columns.

Previously, grouping columns were square rooted when as_index=False
We now test whether the grouping keys are in the columns, and
if so don't square root those columns.

Note that we squash TypeError which occurs when self.keys is not
Hashable and so we can't check for existence in columns.
return np.sqrt(self.var(ddof=ddof))
else:
df = self.var(ddof=ddof)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need all of this logic, just use self._selected_obj

@jreback
Copy link
Contributor

jreback commented Nov 4, 2015

pls add a whatsnew note in 0.17.1 bug-fixes

@henrystokeley
Copy link
Author

@jreback

My if tests logic finds columns used to make the group. This is to ensure their values are not square rooted.
How can I use self._selected_obj to discover which columns are being used in the grouping?

@jreback
Copy link
Contributor

jreback commented Nov 7, 2015

In [27]: df = pandas.DataFrame({
               'a' : [1,1,1,2,2,2,3,3,3],
               'b' : [1,2,3,4,5,6,7,8,9],
})

In [21]: g = df.groupby('a',as_index=False)

In [22]: g._set_selection_from_grouper()

In [23]: g._selected_obj
Out[23]: 
   a  b
0  1  1
1  1  2
2  1  3
3  2  4
4  2  5
5  2  6
6  3  7
7  3  8
8  3  9

In [24]: g = df.groupby('a',as_index=True)

In [25]: g._set_selection_from_grouper()

In [26]: g._selected_obj
Out[26]: 
   b
0  1
1  2
2  3
3  4
4  5
5  6
6  7
7  8
8  9

these _set_selection_from_grouper() functions are called when functions are actually run (e.g. you actually call .std()). So you can then use the ._selected_obj for what the actual data (excluding the groupings is).

@henrystokeley
Copy link
Author

@jreback

Thanks for your quick reply.

As you can see in your output on line [23] _selected_obj doesn't exclude the grouping when as_index=False
It only excludes the grouping when as_index=True, and that isn't the case we're dealing with.

Sorry if I'm missing something.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2015

use this, though maybe something more sophisticated in there. you have to test using levels as well. Everything is there in the grouper objects, you just have to look for it. Don't reinvent the wheel.

In [36]: g.grouper.names 
Out[36]: ['a']

@jreback
Copy link
Contributor

jreback commented Nov 25, 2015

pls rebase / update according to comments

@jreback
Copy link
Contributor

jreback commented Dec 6, 2015

can you rebase / update according to comments

@jreback
Copy link
Contributor

jreback commented Dec 9, 2015

can you update?

@jreback
Copy link
Contributor

jreback commented Dec 16, 2015

can you update

@jreback
Copy link
Contributor

jreback commented Jan 6, 2016

closing. but pls reopen if you'd like to update

@xieyuheng
Copy link

#25315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants