-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Pandas crashes when asked to display a dataframe containing a column with specific content #49195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting this @perfectly-preserved-pie! To investigate this further it would helpful to narrow down where things are going wrong. First, can you reproduce this in a more minimal example? (without It's possible that the issue is with printing whatever type is held in that column (does just displaying |
it seems to be because of dash object's behavior. The below dash object's ipdb> print(seq)
Div([A(children=Img(id='mls_photo_div', referrerPolicy='noreferrer', src='https://ik.imagekit.io/theoldesthouse/22208359.jpg?tr=h-300%2Cw-400&ik-sdk-version=python-2.2.8', style={'display': 'block', 'width': '100%', 'margin-left': 'auto', 'margin-right': 'auto'}), href='https://www.bhhscalifornia.com/listing-detail/225-w-kelso-street-7-inglewood-ca-90301_5416097', referrerPolicy='noreferrer', target='_blank')])
ipdb> len(seq)
2 I am not sure if there is a better work around. but, it would be better if you could change it to string before printing it would work. In [16]: df.popup_html = df.popup_html.astype('str')
In [17]: df
Out[17]:
mls_number subtype street_number street_name City ... Full Bathrooms Half Bathrooms Three Quarter Bathrooms popup_html date_generated
992 TR22223368 RMRT/D 1735 Redgate CIR Diamond Bar ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... NaN
993 22208909 APT 6373 Yucca ST #18 Los Angeles ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... NaN
994 SB22222749 CONDO/A 7765 W 91st ST #A2126 Playa del Rey ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... NaN
995 SR22220740 RMRT/A 23340 Schoolcraft ST West Hills ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... NaN
996 22208359 CONDO 225 W KELSO ST #7 Inglewood ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... NaN
... ... ... ... ... ... ... ... ... ... ... ...
2265 22204627 SFR 1718 N Occidental BLVD Los Angeles ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... False
2266 WS22210260 SFR/D 125 Robinson ST Los Angeles ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... False
2267 SR22212601 SFR/A 18047 Erwin ST Encino ... 0 0 0 [Div([A(children=Img(id='mls_photo_div', refer... False
2268 22204091 CONDO 4310 CAHUENGA BLVD #303 Toluca Lake ... 0 1 0 [Div([A(children=Img(id='mls_photo_div', refer... False
2269 22203913 SFR 2646 Tilden AVE Los Angeles ... 0 1 0 [Div([A(children=Img(id='mls_photo_div', refer... False
[1278 rows x 35 columns] |
this needs to be fixed in dash I guess. |
That 'popup_html' code is generated from a CSV spreadsheet like this one using this function.. If I go line by line in dataframe.py (just basically importing that spreadsheet and executing code from line 1 to line 507) the error is still the same when I print the df. I can't give an example as I'm about to hit my Google Maps API quota for the month :(
I don't believe so. It's just a list containing other lists and dicts. It's an object dtype.
Yes.
Also yes. I've spot checked single rows and used random .iloc ranges. With all that being said, this might be more of a Dash issue than Pandas like @vamsi-verma-s said. If that's the case I'm happy to close out this issue since I don't think it's something you can fix, and @vamsi-verma-s provided a nice workaround by casting the column as a string. Which totally works for me; I just need to print the df when I'm debugging something. My webapp reads the whole dataframe just fine. |
Hi, sorry for reopening an old issue, but I don't believe this was fixed. I actually have a minimal reproducible example of the same bug that doesn't involve any other library: import pandas as pd
print(pd.__version__)
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df_outer = pd.DataFrame({"a": [{"x": df}]})
print(df_outer) This is the stack trace I'm seeing, using pandas version Stack Trace
My interpretation is that this happens because pandas treats the nested DataFrame as a normal sequence and tries to iterate on it, but for DataFrames EDIT: Seems like I can't reopen the issue, so I'll wait for someone with permissions to respond. Please also let me know if I should open a new issue instead. |
Uh oh!
There was an error while loading. Please reload this page.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I have a column that contains Dash by Plotly HTML code for a Dash Leaflet app. This column's rows each contain a nested list of lists and dictionaries. Here's an example:
When I try to view the dataframe, I get this:
When I drop the column in question, the dataframe is returned successfully:
Expected Behavior
The dataframe should be printed out to the terminal successfully without any errors.
I'm 95% sure it has something to do with the way the content in the 'popup_html' column is structured, but that content is critical to my app. Anyone have ideas on what I can do here?
Installed Versions
commit : 91111fd
python : 3.10.7.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-128-generic
Version : #144-Ubuntu SMP Tue Sep 20 11:00:04 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.5.1
numpy : 1.23.2
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 65.3.0
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.4.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : 2022.8.2
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.8.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None
The text was updated successfully, but these errors were encountered: