Skip to content

Maybe wrong default axis with operators (add, sub, mul, div) between datetime-indexed df and series 1.0.0 #31487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
giuliobeseghi opened this issue Jan 31, 2020 · 3 comments · Fixed by #36797
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@giuliobeseghi
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd

index = pd.date_range(start='2020', periods=5)
df = pd.DataFrame([[1, 2, 3]] * 5, columns=['a', 'b', 'c'], index=index)
series = pd.Series([10, 20, 30, 40, 50], index=index)

print(df + series)
2020-01-01 00:00:00 2020-01 02 00:00:00 2020-01-03 00:00:00 2020-01-04 00:00:00 2020-01-05 00:00:00 a b c
2020-01-01 NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-02 NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-03 NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-04 NaN NaN NaN NaN NaN NaN NaN NaN
2020-01-05 NaN NaN NaN NaN NaN NaN NaN NaN

Problem description

According to the docs (https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#data-alignment-and-arithmetic):

When doing an operation between DataFrame and Series, the default behavior is to align the Series index on the DataFrame columns, thus broadcasting row-wise

In the special case of working with time series data, if the DataFrame index contains dates, the broadcasting will be column-wise

It seems to me that in both cases now the broadcasting is row-wise.

Is this an expected change for pandas 1.0.0 (I hope not - I never saw any FutureWarnings about it)? If so, the docs (and the examples) must be updated.

The same happens for the operators -, /, *, %

Expected Output

Not sure if this is the expected output anymore, but it used to be equivalent to:

df.add(series, axis=0)
a b c
2020-01-01 11 12 13
2020-01-02 21 22 23
2020-01-03 31 32 33
2020-01-04 41 42 43
2020-01-05 51 52 53

Although I can't replicate it, I'm pretty sure this was the behaviour until pandas 0.25.3

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200127
Cython : 0.29.14
pytest : 5.3.4
hypothesis : 4.54.2
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.4
pyxlsb : None
s3fs : 0.4.0
scipy : 1.3.2
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0

@jorisvandenbossche
Copy link
Member

That documentation might not have been up to date for a long time. I went back up to pandas 0.18 with your example above, and it is still giving the same result as we have now on 1.0.

Could you try to find a reproducible example and show the result you get? (if you still have an environment with an older version of pandas, or otherwise you can try to recreate that?)

@jbrockmendel
Copy link
Member

I think I started refactoring the arithmetic code about 2 years ago, and dont remember the described behavior existing at the time.

The described behavior does seem analogous to the slicing special-casing xref #31476.

@giuliobeseghi
Copy link
Author

giuliobeseghi commented Feb 1, 2020

I couldn't find an example, I'll post it if I get an error at some point :( you can close the issue in the meantime if you want. Thanks for letting me know about the documentation (it probably needs updating then?).

By the way, what is the rationale of aligning a series to the columns of a dataframe with arithmetics? Is it to replicate the behavior of numpy arrays?
I guess that aligning index to index would be more intuitive.

@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Feb 25, 2020
jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Oct 1, 2020
@jreback jreback added this to the 1.2 milestone Oct 2, 2020
jreback pushed a commit that referenced this issue Oct 2, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants