Skip to content

Aggregate with pd.Series.nunique in datetime column has weird result #15112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jesrael opened this issue Jan 11, 2017 · 1 comment
Closed

Aggregate with pd.Series.nunique in datetime column has weird result #15112

jesrael opened this issue Jan 11, 2017 · 1 comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby

Comments

@jesrael
Copy link

jesrael commented Jan 11, 2017

I have DataFrame:

import pandas as pd
import datetime as datetime

a = pd.DataFrame({'id': [1, 2, 3, 2], 
              'my_date': [datetime.datetime(2017, 1, i) for i in range(1, 4)] + [datetime.datetime(2017, 1, 1)],
              'num': [2, 3, 1, 4]
        })

print (a.dtypes)
id                  int64
my_date    datetime64[ns]
num                 int64
dtype: object

I try aggreagate by pd.Series.nunique, but get wrong output for datetimes:

grouped_a = a.groupby('id').agg({'my_date': pd.Series.nunique, 
                                     'num': pd.Series.nunique}).reset_index()
grouped_a.columns = ['id', 'num_unique_num', 'num_unique_my_date']

print (grouped_a)
   id  num_unique_num            num_unique_my_date
0   1               1 1970-01-01 00:00:00.000000001
1   2               2 1970-01-01 00:00:00.000000002
2   3               1 1970-01-01 00:00:00.000000001

My solution is use nunique which works nice.

grouped_a = a.groupby('id').agg({'my_date': 'nunique', 
                                     'num': 'nunique'}).reset_index()
grouped_a.columns = ['id', 'num_unique_num', 'num_unique_my_date']
print (grouped_a)
   id  num_unique_num  num_unique_my_date
0   1               1                   1
1   2               2                   2
2   3               1                   1

But why does not work first solution? Bug?

SO question.

print (pd.show_versions())
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: sk_SK
LOCALE: None.None

pandas: 0.19.2+0.g825876c.dirty
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.1
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: 0.2.1
None
@jesrael jesrael changed the title Aggreagate by pd.Series.nunique in datetime column has weird result Aggregate by pd.Series.nunique in datetime column has weird result Jan 11, 2017
@jesrael jesrael changed the title Aggregate by pd.Series.nunique in datetime column has weird result Aggregate with pd.Series.nunique in datetime column has weird result Jan 11, 2017
@jreback
Copy link
Contributor

jreback commented Jan 11, 2017

duplicate of #14423

the fix actually is pretty simple, pls see the other issue.

@jreback jreback closed this as completed Jan 11, 2017
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby labels Jan 11, 2017
@jreback jreback added this to the No action milestone Jan 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants