Can't handle duplicate column names in sql read #3487

darindillon · 2013-04-29T18:43:51Z

Possibly related to #3468

Using pandas 0.10.1.
It's legal for sql (at least Sql Server) to have queries that return two columns with the same name. And it's legal for pandas to have dataframes that have two columns with the same name. But pandas gives a very misleading error message when you try it with read_frame:

Query that has 2 columns with the same name:

sql = "select 1 as whatever, 2 as name, 3 as name where 1=0"
pos = pandas.io.sql.read_frame(sql, dbconn)

That gives very misleading error message

jreback · 2013-05-02T19:15:40Z

@tavistmorph we just pushed updates for dup_columns. Can you try this wil latest master (I am not sure this is the issue, but let's see)

ghost · 2013-05-10T11:55:42Z

Can't repro with 0.10.1 or 0.11.0 or master

In [1]: import sqlite3
   ...: 
   ...: import pandas.io.sql as psql
   ...: con = sqlite3.connect("a2.db") 
   ...: c = con.cursor()
   ...: try:
   ...:     c.execute('''CREATE TABLE tab (a,b,c);''')
   ...: except Exception as e:
   ...:     pass
   ...: try:
   ...:     c.execute("INSERT INTO tab VALUES (1,2,3);")
   ...:     c.execute("INSERT INTO tab VALUES (4,5,6);")
   ...:     c.commit()
   ...: except Exception as e:
   ...:     pass
   ...: #    print e
   ...: sql = "select a,b as a from tab where a=1"
   ...: #c.execute(sql).fetchall()
   ...: pandas.io.sql.read_frame(sql, con)
Out[1]: 
   a  a
0  1  2

In [3]: print pd.__version__
0.10.1

darindillon · 2013-05-10T13:19:22Z

Your code above works fine for me too. But I was running a different case than you. Try it this way and it gives an error for me on 0.11. (BTW -- not sure if it matters, but my database is sql server)

import pandas
import pyodbc
import pandas.io.sql
conn = pyodbc.connect("your dsn here")
sql = "select 1 as whatever, 2 as name, 3 as name where 1 = 0"
pos = pandas.io.sql.read_frame(sql, conn)

Result:
Traceback (most recent call last):
File "D:\workspaces\PythonOneOffs\client.py", line 7, in
pos = pandas.io.sql.read_frame(sql, conn)
File "C:\Python27\lib\site-packages\pandas\io\sql.py", line 162, in read_frame
coerce_float=coerce_float)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1115, in from_records
columns)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 5338, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 1821, in create_block_manager_from_arrays
mgr = BlockManager(blocks, axes)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 896, in init
self._verify_integrity()
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 990, in _verify_integrity
raise AssertionError('Number of manager items must equal union of '
AssertionError: Number of manager items must equal union of block items

ghost · 2013-05-10T13:47:17Z

I can reproduce that error, but only when the select query returns 0 rows.

import sqlite3

import pandas.io.sql as psql
con = sqlite3.connect("a9.db") 
c = con.cursor()
try:
    c.execute('''CREATE TABLE tab (a,b,c)''')
except:
    pass
try:
    c.execute("INSERT INTO tab VALUES (1,2,3)")
    c.execute("INSERT INTO tab VALUES (4,5,6)")
    c.commit()
except:
    pass

sql = 'select a as a2  , b , c as b from tab where a="meow"'
pandas.io.sql.read_frame(sql, con)

/home/user1/src/pandas/pandas/core/internals.py in _verify_integrity(self)
   1126 
   1127         if len(self.items) != tot_items:
-> 1128             raise AssertionError('Number of manager items must equal union of '
   1129                                  'block items')
   1130 

AssertionError: Number of manager items must equal union of block items

ghost · 2013-05-10T13:56:13Z

Works with pyodbc as well, unless the select query returns no rows.

In [45]: import sqlite3
    ...: import pyodbc
    ...: 
    ...: import pandas.io.sql as psql
    ...: con = sqlite3.connect("a.db") 
    ...: c = con.cursor()
    ...: try:
    ...:     c.execute('''CREATE TABLE tab (a,b,c)''')
    ...: except:
    ...:     pass
    ...: try:
    ...:     c.execute("INSERT INTO tab VALUES (1,2,3)")
    ...:     c.execute("INSERT INTO tab VALUES (4,5,6)")
    ...:     c.commit()
    ...: except:
    ...:     pass
    ...: con.close()
    ...: 
    ...: con = pyodbc.connect('DRIVER={SQLite3};DATABASE=a.db;')
    ...: sql = 'select a as a2  , b , c as b from tab'
    ...: pandas.io.sql.read_frame(sql, con)
Out[45]: 
   a2  b  b
0   1  2  3
1   4  5  6

jreback · 2013-05-10T14:11:29Z

@y-p what is actually passed in the 0-case (to arrays_to_mgr)?

ghost · 2013-05-10T14:29:40Z

Interesting.
This works:

DataFrame.from_records([], columns=['a','b','c'])

Empty DataFrame
Columns: [a, b, c]
Index: []

But this fails with the obscure message:

DataFrame.from_records([], columns=['a','b','b'])

AssertionError: Number of manager items must equal union of block items

The main ctor doesn't let you construct an empty df with non-zero dims at all:

DataFrame([], columns=['a','b','c'])

ValueError: Shape of passed values is (0, 0), indices imply (3, 0)

but allows it if there are no dims (I remember opening that issue, actually):

DataFrame([])

Empty DataFrame
Columns: []
Index: []

In any case, the original issue probably interpreted the behaviour on an empty queryset
with a bug in the duplicate column handling.

ghost · 2013-05-10T14:33:03Z

I think we can close this in deference to #3562.

ghost · 2013-05-10T16:38:13Z

ok, #3564 closed #3562 and this as well.

ghost mentioned this issue May 10, 2013

df.from_records should be consistent in handling empty data arg #3562

Closed

ghost closed this as completed May 10, 2013

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Can't handle duplicate column names in sql read #3487

Can't handle duplicate column names in sql read #3487

darindillon commented Apr 29, 2013

jreback commented May 2, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

darindillon commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

Uh oh!

Can't handle duplicate column names in sql read #3487

Can't handle duplicate column names in sql read #3487

Comments

darindillon commented Apr 29, 2013

Query that has 2 columns with the same name:

That gives very misleading error message

jreback commented May 2, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

darindillon commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

jreback commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!

ghost commented May 10, 2013

Uh oh!