Skip to content

PerformanceWarning on _drop_axis() #19799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
celsopneto opened this issue Feb 20, 2018 · 6 comments
Closed

PerformanceWarning on _drop_axis() #19799

celsopneto opened this issue Feb 20, 2018 · 6 comments
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@celsopneto
Copy link

celsopneto commented Feb 20, 2018

Problem description

I have this script, that takes 3 csv files, with them I make two groupbys and two merges, the files are uploaded here, and the code is as follows:

import pandas as pd
import numpy as np

# Only needed if want to see the traceback
import warnings
warnings.filterwarnings('error',category=pd.io.pytables.PerformanceWarning)

df_weather = pd.read_csv(r'weather.csv', sep=';')
df_total = pd.read_csv(r'df_total.csv', sep=';',parse_dates=['DATA_EVENTO'])
df_agenda = pd.read_csv(r'agenda.csv',parse_dates=['Data'])

df_by_date = df_total.groupby('DATA_EVENTO')\
             .agg({'VALOR': [np.size, np.sum]}).reset_index()

event = df_by_date.merge(df_agenda,
                          left_on='DATA_EVENTO',
                          right_on='Data')


group_weather = df_weather.groupby('Data')\
                .agg({'Precipitacao': 'max',
                      'TempMaxima': ['max', 'mean'],
                      'TempMinima': ['min', 'mean'],
                      'Insolacao': ['max', 'mean']}).reset_index()
    

we_and_arts = event.merge(group_weather, left_on='Data',
                           right_on='Data',suffixes=('a', 'w'))

both df's index.nlevels returns 1

event.index.nlevels
Out[16]: 1

group_weather.index.nlevels
Out[17]: 1

When setting filterwarnings to raise PerformanceWarnings (as explained in #3622 ) this is the traceback:

  File "<ipython-input-3-bdf57e64aed5>", line 1, in <module>
    runfile('C:/Users/Daniel/Documents/wctba/yvy_issue.py', wdir='C:/Users/Daniel/Documents/wctba')

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Daniel/Documents/wctba/yvy_issue.py", line 34, in <module>
    right_on='Data',suffixes=('a', 'w'))

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5370, in merge
    copy=copy, indicator=indicator, validate=validate)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 57, in merge
    validate=validate)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 565, in __init__
    self.join_names) = self._get_merge_keys()

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 876, in _get_merge_keys
    self.right = self.right.drop(right_drop, axis=1)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2530, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2562, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py", line 1568, in drop
    stacklevel=3)

PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.

I don't understand why it says it's a multi-index, and I don't know how I could set the level parameter.
If it's normal behavior, why so? And how can I work around that?

Sorry in advance for not following the guidelines on new issues thoroughly.

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: pt_BR
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Can you make a reproduce example? agenda isn't defined.

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Feb 20, 2018
@celsopneto
Copy link
Author

celsopneto commented Feb 21, 2018

I've anonymized the files for sensible information, now I think it's reproducible.
If there is anything more I can do to help just let me know.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 21, 2018 via email

@celsopneto
Copy link
Author

I've tried to write a small sample both manually and reading data from disk (using 1 month of data) but could not reproduce the warning.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Feb 21, 2018 via email

@celsopneto
Copy link
Author

I did poked around as you suggested and it turns out that group_weather had multilevel columns:

group_weather.columns
Out[51]: 
MultiIndex(levels=[['Precipitacao', 'TempMaxima', 'TempMinima', 'Insolacao', 'Temp Comp Media', 'Data'], ['max', 'mean', 'min', '']],
           labels=[[5, 0, 1, 1, 2, 2, 3, 3, 4, 4], [3, 0, 0, 1, 2, 1, 0, 1, 0, 1]])

After "flattening" the df I got rid of the warning.
What is interesting though, is that my manually created example had the same Multindex levels, but it didn't raised the warning.

In the end, I don't think it's a bug, thanks for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

2 participants