Skip to content

Left join without passing right_on raises TypeError instead of intended ValueError #26824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ajpen opened this issue Jun 13, 2019 · 9 comments
Closed
Labels
Error Reporting Incorrect or improved errors from pandas

Comments

@ajpen
Copy link

ajpen commented Jun 13, 2019

Code

import pandas as pd

df1 = pd.DataFrame([{'boro': 'B',
  'from_st': 'EAST FORDHAM ROAD',
  'main_st': 'DECATUR AVENUE',
  'order_no': 'P-00097673',
  'sos': 'W',
  'to_st': 'EAST  193 STREET'}])

df2 = pd.DataFrame([{'SRP_Seq': 3,
  'SR_Arrow': 'W  ',
  'SR_Distx': 138,
  'SR_Mutcd_Code': 'PS-101BA  ',
  'Sign_description': 'NO PARKING (SANITATION BROOM SYMBOL) WEDNESDAY 8AM-9AM --> (SUPERSEDES R7-88-9RA)',
  'Unnamed: 7': '',
  'Unnamed: 8': '',
  'boro': 'B',
  'order_no': 'P-01287481'}])

pd.merge(df2, df1, how='left', left_on='A')

Problem description

Unexpected exception is raised when doing a left join without passing arguments for right_on.

Expected Output

ValueError: len(right_on) must equal len(left_on)

Actual Output

TypeError: object of type 'NoneType' has no len()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.26-03278-g71dc68f9c9d0
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
@WillAyd
Copy link
Member

WillAyd commented Jun 13, 2019

This is by design. Not entirely sure what you want to have happen (your example doesn't have an "A" label) but you might just be looking for how='left' and on='A'

Worth checking out the user guide for merging, joining and concatenation to get a feel for this:

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#

@ajpen
Copy link
Author

ajpen commented Jun 13, 2019

@WillAyd Sorry, I should have added the stack trace. I'm not raising the issue on usage.

In the stacktrace I noticed the following:

-> 1057         if len(self.right_on) != len(self.left_on):
   1058             raise ValueError("len(right_on) must equal len(left_on)")
   1059 

TypeError: object of type 'NoneType' has no len()

You can see that the intention was to raise the ValueError, but TypeError was understandably raised instead. I imagine parameter sanitation for right_on was missed.

@WillAyd
Copy link
Member

WillAyd commented Jun 13, 2019

Oh ok I see your point. That probably would make more sense. If you’d like to try a PR would certainly welcome that!

@WillAyd WillAyd reopened this Jun 13, 2019
@harshit-py
Copy link

Can I take this one only if @ajpen is not taking this up? I'm completely new to submitting to OSS and would appreciate some help. Here's what I think should help, a couple checks inside the checks for right_on and left_on

elif self.left_on is not None:
    n = len(self.left_on)
    if self.right_index:
        if len(self.left_on) != self.right.index.nlevels:
            raise ValueError('len(left_on) must equal the number '
                             'of levels in the index of "right"')
        self.right_on = [None] * n
    if not self.right_on:
        raise ValueError('both left_on and right_on should '
                         'be passed')
elif self.right_on is not None:
    n = len(self.right_on)
    if self.left_index:
        if len(self.right_on) != self.left.index.nlevels:
            raise ValueError('len(right_on) must equal the number '
                             'of levels in the index of "left"')
        self.left_on = [None] * n
    if not self.left_on:
        raise ValueError('both left_on and right_on should '
                         'be passed')
if len(self.right_on) != len(self.left_on):
    raise ValueError("len(right_on) must equal len(left_on)")

@ajpen
Copy link
Author

ajpen commented Jun 13, 2019

@harshit-py Be my guest :)

@simonjayhawkins
Copy link
Member

You can see that the intention was to raise the ValueError, but TypeError was understandably raised instead. I imagine parameter sanitation for right_on was missed.

if this is raising an exception as expected and only the message is unclear or the exception type is inappropriate, then handling the exception with try/except and re-raising may be more appropriate than additional parameter validation.

@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas and removed Usage Question labels Jun 14, 2019
@harshit-py
Copy link

@simonjayhawkins thanks for the comment, do you mean something like this?

        try:
            if len(self.right_on) != len(self.left_on):
                raise ValueError("len(right_on) must equal len(left_on)")
        except TypeError:
            raise ValueError('both left_on and right_on should '
                             'be passed')

Also, how to make sense of the Codecov report in the last unsuccessful PR? Pardon me for asking these basic questions, this would be my first submission.

@jreback jreback added this to the 0.25.0 milestone Jun 15, 2019
@simonjayhawkins
Copy link
Member

do you mean something like this?

@harshit-py my initial concern was adding more validation to _validate_specification after seeing the comment

    def _validate_specification(self):
        # Hm, any way to make this logic less complicated??

the try/except may not be appropriate in this case after all.

@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jun 28, 2019
@mroeschke
Copy link
Member

Some of the validation must have changed in the meantime since with OP's example I am getting

In [2]: import pandas as pd
   ...:
   ...: df1 = pd.DataFrame([{'boro': 'B',
   ...:   'from_st': 'EAST FORDHAM ROAD',
   ...:   'main_st': 'DECATUR AVENUE',
   ...:   'order_no': 'P-00097673',
   ...:   'sos': 'W',
   ...:   'to_st': 'EAST  193 STREET'}])
   ...:
   ...: df2 = pd.DataFrame([{'SRP_Seq': 3,
   ...:   'SR_Arrow': 'W  ',
   ...:   'SR_Distx': 138,
   ...:   'SR_Mutcd_Code': 'PS-101BA  ',
   ...:   'Sign_description': 'NO PARKING (SANITATION BROOM SYMBOL) WEDNESDAY 8AM-9AM --> (SUPERSEDES R7-88-9RA)',
   ...:   'Unnamed: 7': '',
   ...:   'Unnamed: 8': '',
   ...:   'boro': 'B',
   ...:   'order_no': 'P-01287481'}])
   ...:
   ...: pd.merge(df2, df1, how='left', left_on='A')

MergeError: Must pass "right_on" OR "right_index".

Which is a fairly clear error message. If anyone can reproduce the original error message with a new example we can reopen and consider how to provide a clearer error message. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants