-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: concat along axis 1 unnecessarily materializes RangeIndex->Int64Index #47501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
can u show how much this actually matters at scale eg 100 columns 100k rows and if it matters |
This PR fixes the two main issues that crop up in pandas 1.4.3 relative to 1.4.2, both around `pd.concat`: - Columns are now sorted such that integer values come before string values. That is a behavior change that we mimic. - When multiple objects with identical RangeIndexes are concatenated along axis 1 and sorting is requested, pandas now creates an integer index instead of a RangeIndex. This is not what we want since it increases memory pressure, so those tests have been modified to stop checking the index type and a [pandas issue has been raised](pandas-dev/pandas#47501). Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Nghia Truong (https://github.com/ttnghia) URL: #11152
first bad commit: [75a799c] Backport PR #47206 on branch 1.4.x (REGR: concat not sorting columns for mixed column names) (#47251) @phofl it appears that #47508 does not fix #46675, are they not related? |
No I don't think so. Will take a look there too |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
This is as of pandas 1.4.3
Even though both inputs have identical
RangeIndex
inputs, the output index is an Int64Index. This behavior appears to be triggered by thesort=True
parameter, since removing that gives aRangeIndex
.My naive guess is that there is a missing check somewhere that the sort is a no-op on a
RangeIndex
.This issue definitely seems related to #46675, but it is not identical since it has specifically appeared in 1.4.3, whereas that issue was already present in 1.4.2.
Installed Versions
Prior Performance
This is as of pandas 1.4.2
The text was updated successfully, but these errors were encountered: