Skip to content

BUG: columns misaligned in repr when having >10 columns with integer index #8300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Sep 17, 2014 · 12 comments
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string

Comments

@jorisvandenbossche
Copy link
Member

On master, with integer index, with more than 10 columns, the numbers of one char are misaligned:

In [37]: df = pd.DataFrame(np.random.randn(10,20))

In [38]: pd.options.display.max_columns = 4

In [39]: df
Out[39]:
         0         1     ...           18        19
0 -0.112066 -0.315377    ...     1.579197 -0.718384
1 -1.007079  0.234835    ...    -0.485022 -0.317984
2 -0.718872  1.094609    ...    -2.083531  0.559202
3 -0.165058 -0.127335    ...     0.296100  1.071882
4  0.874132  0.880632    ...    -1.648739  0.903484
5 -0.443316  0.673899    ...     0.823925  0.365219
6  0.215746  1.092637    ...     0.645429 -1.164019
7  1.853194 -1.682354    ...     1.071287 -0.366133
8 -0.568050 -1.205619    ...    -1.644478 -2.206907
9  2.136091 -0.253217    ...     1.148933  1.390127

[10 rows x 20 columns]

In [40]: df = pd.DataFrame(np.random.randn(10,10))

In [41]: df
Out[41]:
          0         1    ...            8         9
0  0.337429 -0.627859    ...     0.174635  1.378869
1 -1.679982  1.498135    ...     0.554562  1.804903
2  0.682877 -0.539606    ...    -0.065115  0.871922
3  0.517217 -0.850968    ...     0.998076 -1.141827
4  0.131862  1.702232    ...    -1.048214 -1.424597
5  0.926904 -0.120110    ...     0.852535  1.052194
6 -0.474197 -0.984774    ...     1.206264 -0.710416
7  0.532609 -0.797821    ...    -1.082022 -0.137153
8 -0.075104 -0.524546    ...     0.177842 -0.030325
9  1.568465  1.206357    ...     0.410863  1.575516

[10 rows x 10 columns]
@jorisvandenbossche jorisvandenbossche added Output-Formatting __repr__ of pandas objects, to_string Bug labels Sep 17, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.15.0 milestone Sep 17, 2014
@jreback
Copy link
Contributor

jreback commented Sep 17, 2014

wow you have good eyes! @jorisvandenbossche

@jreback
Copy link
Contributor

jreback commented Sep 17, 2014

hmm, wondering if I caused this here somehow: #8282

@jorisvandenbossche
Copy link
Member Author

More simple example:

In [57]: df = pd.DataFrame([[0,1],[2,3]], columns=[0,10])

In [58]: df
Out[58]:
   0   10
0   0   1
1   2   3

Yes, but if you do it with integer values in the dataframe, then it is more noticeable :-)

@jreback
Copy link
Contributor

jreback commented Sep 19, 2014

cc @bjonen

@bjonen
Copy link
Contributor

bjonen commented Sep 19, 2014

In v0.13.0 (central truncation was introduced in 0.14)

In [2]: pd.DataFrame([[0,1],[2,3]], columns=[1000000,100])
Out[2]:
   1000000  100
0        0        1
1        2        3

[2 rows x 2 columns]

@jreback
Copy link
Contributor

jreback commented Oct 2, 2014

@bjonen can you have a look?

@bjonen
Copy link
Contributor

bjonen commented Oct 3, 2014

The formatting is done with the Index.format which by default left justifies see here

In [22]: pd.DataFrame([[0,1],[2,3]], columns=[0,10]).columns.format()
Out[22]: ['0 ', '10']

Perhaps replacing

fmt_columns = columns.format()

with

from pandas.core.format import format_array
fmt_columns = format_array(columns, None, justify='right')

will work.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2014

maybe for index type of floating/integer they should right justify?

@jreback
Copy link
Contributor

jreback commented Oct 4, 2014

@jorisvandenbossche ?

@jreback
Copy link
Contributor

jreback commented Oct 4, 2014

This is also true in 0.14.1. Let's defer for now.

@jreback jreback modified the milestones: 0.15.1, 0.15.0 Oct 4, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jschendel
Copy link
Member

Similar issue with mixed float/NaN indexes, but spaces get applied on opposite sides:

In [2]: pd.Float64Index([0.0, np.nan, 2.0]).format()
Out[2]: [' 0.0', 'NaN ', ' 2.0']

In [3]: pd.Index([0.0, np.nan, 2.0], dtype=object).format()
Out[3]: [' 0.0', 'NaN ', ' 2.0']

Doesn't happen if a non-float is included:

In [4]: pd.Index([0.0, np.nan, 2], dtype=object).format()
Out[4]: ['0.0', 'NaN', '2']

@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@shteken
Copy link
Contributor

shteken commented Jun 12, 2023

I took a look at this issue and here are my findings:

  1. String formatted columns data frame is printed without any misalignment.
    I find this fact curious and it might lead to a solution.
df = DataFrame([[0,1],[2,3]], columns=["0","10"])
df
   0  10
0  0   1
1  2   3
  1. Even after changing the code for _format_with_header to fmt_columns = format_array(columns, None, justify='right'), I still have some misalignment (3 spaces instead of 2 between the index and first column):
df = pd.DataFrame([[0,1],[2,3]], columns=[0,10])
df
    0  10
0   0   1
1   2   3
  1. I traced the original problem back to its roots and it stems from the justify function. The function is called from _get_strcols_without_index. The printing process appends spaces to the columns' values according to the biggest number.

I am not sure how to solve it since there are so many functions involved that are also related to other process. I hope this findings will help somebody else solve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

9 participants