PerformanceWarning on _drop_axis() #19799

celsopneto · 2018-02-20T21:09:38Z

Problem description

I have this script, that takes 3 csv files, with them I make two groupbys and two merges, the files are uploaded here, and the code is as follows:

import pandas as pd
import numpy as np

# Only needed if want to see the traceback
import warnings
warnings.filterwarnings('error',category=pd.io.pytables.PerformanceWarning)

df_weather = pd.read_csv(r'weather.csv', sep=';')
df_total = pd.read_csv(r'df_total.csv', sep=';',parse_dates=['DATA_EVENTO'])
df_agenda = pd.read_csv(r'agenda.csv',parse_dates=['Data'])

df_by_date = df_total.groupby('DATA_EVENTO')\
             .agg({'VALOR': [np.size, np.sum]}).reset_index()

event = df_by_date.merge(df_agenda,
                          left_on='DATA_EVENTO',
                          right_on='Data')


group_weather = df_weather.groupby('Data')\
                .agg({'Precipitacao': 'max',
                      'TempMaxima': ['max', 'mean'],
                      'TempMinima': ['min', 'mean'],
                      'Insolacao': ['max', 'mean']}).reset_index()
    

we_and_arts = event.merge(group_weather, left_on='Data',
                           right_on='Data',suffixes=('a', 'w'))

both df's index.nlevels returns 1

event.index.nlevels
Out[16]: 1

group_weather.index.nlevels
Out[17]: 1

When setting filterwarnings to raise PerformanceWarnings (as explained in #3622 ) this is the traceback:

  File "<ipython-input-3-bdf57e64aed5>", line 1, in <module>
    runfile('C:/Users/Daniel/Documents/wctba/yvy_issue.py', wdir='C:/Users/Daniel/Documents/wctba')

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
    execfile(filename, namespace)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/Daniel/Documents/wctba/yvy_issue.py", line 34, in <module>
    right_on='Data',suffixes=('a', 'w'))

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5370, in merge
    copy=copy, indicator=indicator, validate=validate)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 57, in merge
    validate=validate)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 565, in __init__
    self.join_names) = self._get_merge_keys()

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py", line 876, in _get_merge_keys
    self.right = self.right.drop(right_drop, axis=1)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2530, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2562, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)

  File "C:\Users\Daniel\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py", line 1568, in drop
    stacklevel=3)

PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.

I don't understand why it says it's a multi-index, and I don't know how I could set the level parameter.
If it's normal behavior, why so? And how can I work around that?

Sorry in advance for not following the guidelines on new issues thoroughly.

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: pt_BR
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-02-20T22:37:22Z

Can you make a reproduce example? agenda isn't defined.

celsopneto · 2018-02-21T14:48:43Z

I've anonymized the files for sensible information, now I think it's reproducible.
If there is anything more I can do to help just let me know.

TomAugspurger · 2018-02-21T14:54:39Z

Does your example require reading data from disk, or can a small example be constructed manually? If this is a bug, then we need to write a unit test for it, and the unit test should be manually constructed and not read from disk.

…

On Wed, Feb 21, 2018 at 8:48 AM, Celso Pereira Neto < ***@***.***> wrote: dfs.zip <https://github.com/pandas-dev/pandas/files/1744390/dfs.zip> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#19799 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIgNt8wzDjgjSHH0bsfOEuhJvVFuwks5tXCzQgaJpZM4SMq17> .

celsopneto · 2018-02-21T14:58:13Z

I've tried to write a small sample both manually and reading data from disk (using 1 month of data) but could not reproduce the warning.

TomAugspurger · 2018-02-21T15:00:21Z

OK, let us know if you have any luck. Since you're raising on warnings, you should be able to start a debugger after the exception and poke around.

…

On Wed, Feb 21, 2018 at 8:58 AM, Celso Pereira Neto < ***@***.***> wrote: I've tried to write a small sample both manually and reading data from disk (using 1 month of data) but could not reproduce the warning. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#19799 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIg-zmuZZt-i_A8wdU_oJx43pD8mxks5tXC8JgaJpZM4SMq17> .

celsopneto · 2018-02-21T19:58:23Z

I did poked around as you suggested and it turns out that group_weather had multilevel columns:

group_weather.columns
Out[51]: 
MultiIndex(levels=[['Precipitacao', 'TempMaxima', 'TempMinima', 'Insolacao', 'Temp Comp Media', 'Data'], ['max', 'mean', 'min', '']],
           labels=[[5, 0, 1, 1, 2, 2, 3, 3, 4, 4], [3, 0, 0, 1, 2, 1, 0, 1, 0, 1]])

After "flattening" the df I got rid of the warning.
What is interesting though, is that my manually created example had the same Multindex levels, but it didn't raised the warning.

In the end, I don't think it's a bug, thanks for the support.

TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Feb 20, 2018

celsopneto closed this as completed Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PerformanceWarning on _drop_axis() #19799

PerformanceWarning on _drop_axis() #19799

celsopneto commented Feb 20, 2018 •

edited

Loading

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

TomAugspurger commented Feb 20, 2018

Uh oh!

celsopneto commented Feb 21, 2018 •

edited

Loading

Uh oh!

TomAugspurger commented Feb 21, 2018 via email

Uh oh!

celsopneto commented Feb 21, 2018

Uh oh!

TomAugspurger commented Feb 21, 2018 via email

Uh oh!

celsopneto commented Feb 21, 2018

Uh oh!

Uh oh!

PerformanceWarning on _drop_axis() #19799

PerformanceWarning on _drop_axis() #19799

Comments

celsopneto commented Feb 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

TomAugspurger commented Feb 20, 2018

Uh oh!

celsopneto commented Feb 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Feb 21, 2018 via email

Uh oh!

celsopneto commented Feb 21, 2018

Uh oh!

TomAugspurger commented Feb 21, 2018 via email

Uh oh!

celsopneto commented Feb 21, 2018

Uh oh!

celsopneto commented Feb 20, 2018 •

edited

Loading

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

celsopneto commented Feb 21, 2018 •

edited

Loading