Skip to content

read_sql should accept a sql_params parameter #10899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
s-celles opened this issue Aug 24, 2015 · 13 comments
Closed

read_sql should accept a sql_params parameter #10899

s-celles opened this issue Aug 24, 2015 · 13 comments
Assignees
Labels
Docs good first issue IO SQL to_sql, read_sql, read_sql_query

Comments

@s-celles
Copy link
Contributor

Hello,

I wonder if current use of read_sql couldn't lead to SQL injection.

I read in https://docs.python.org/2/library/sqlite3.html

# Never do this -- insecure!
symbol = 'RHAT'
c.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)

# Do this instead
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
print c.fetchone()

# Larger example that inserts many records at a time
purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
            ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
            ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
           ]
c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)

Most of people will use

"SELECT * FROM stocks WHERE symbol = '%s'" % symbol

(or .format(...))

with read_sql

if symbol is an unsafe input it could lead some problems

http://xkcd.com/327/

Is it safe to do it here ?

Kind regards

@jorisvandenbossche
Copy link
Member

read_sql has a params keyword: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html#pandas.read_sql

Is it that what you are looking for?

@s-celles
Copy link
Contributor Author

That's probably the parameter to use

but I think we should warm more in doc

import pandas as pd
from sqlalchemy import create_engine
db_uri = 'mysql+mysqlconnector://root:root@localhost:3306/meteo'
engine = create_engine(db_uri)
#query = "SELECT * FROM data where Ta>35" # no problem
# you need to create a 'test_table' table which will be dropped
temp = "35; DROP TABLE test_table;" 
query = "SELECT * FROM data where Ta>%s" % temp
pd.read_sql(query, engine)

raises

InterfaceError: (mysql.connector.errors.InterfaceError) Use multi=True when executing multiple statements [SQL: 'SELECT * FROM data where Ta>35; DROP TABLE test_table;']

but test_table is dropped

pd.read_sql(query, engine)

again raises

ProgrammingError: (mysql.connector.errors.ProgrammingError) 1051 (42S02): Unknown table 'meteo.test'

@s-celles
Copy link
Contributor Author

I think doc should (at least) provide a SQL query with a SELECT example and parameters.

Maybe after http://pandas-docs.github.io/pandas-docs-travis/io.html#querying

Of course, you can specify a more “complex” query.

In [425]: pd.read_sql_query("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)
Out[425]: 
   id Col_1  Col_2
0  42     Y  -12.5

and before chunks

I wonder if there isn't a way to disable multiple statements.

@s-celles
Copy link
Contributor Author

s-celles commented Sep 2, 2015

I try this:

import sqlalchemy
import pandas as pd
import numpy as np
db_uri = 'sqlite:///test.db'
engine = sqlalchemy.create_engine(db_uri)
df = pd.DataFrame(np.random.randn(4,3), columns=['a','b','c'])
df.to_sql("df", engine)
df.to_sql("test_table", engine)
#pd.read_sql("SELECT * from df where a>0;", engine)
pd.read_sql("SELECT * from df where a>0; DROP TABLE test_table", engine)

it raises

Warning: You can only execute one statement at a time.

and test_table is fortunately not dropped.

So I think the problem is on (mysqlconnector) driver side.
http://bugs.mysql.com/bug.php?id=78308

@jorisvandenbossche
Copy link
Member

@scls19fr Improvement to the docs are certainly welcome!

The warning you see above is actually a warning (feature) from sqlite3 itself (the have executescript to execute multiple statements).

It is always possible to misuse read_sql, just as you can misuse a plain conn.execute. This is a general issue with sql querying, so I don't think pandas should directly do anything about that. But of course, warning for that in the docs is easy to do!

@s-celles
Copy link
Contributor Author

s-celles commented Sep 3, 2015

In fact the problem is on driver side because Pandas seems not to allow by default several statements.
mysql-connector-python seems to execute several statements even if multi is not set to True

I think that doc should be improve with the use of sqlalchemy.text and bind variables

see #10846

@s-celles
Copy link
Contributor Author

This PR #10983 shows that
it is possible to do

name_text = sqlalchemy.text('select * from iris where name=:name')
iris_df = sql.read_sql(name_text, self.conn, params={'name': 'Iris-versicolor'})

maybe doc http://pandas-docs.github.io/pandas-docs-travis/io.html#id4 should be improved accordingly

After

"
Of course, you can specify a more “complex” query.

In [437]: pd.read_sql_query("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)
Out[437]: 
   id Col_1  Col_2
0  42     Y  -12.5

"

maybe a query with parameters should be shown

from sqlalchemy import text
my_id = 42
query = text("SELECT id, Col_1, Col_2 FROM data WHERE id = :my_id;")
pd.read_sql_query(query, engine, params={'my_id': my_id})

@s-celles
Copy link
Contributor Author

But what is odd, is that I can't do

my_id = 42
table = 'data'
query = text("SELECT id, Col_1, Col_2 FROM :table WHERE id = :my_id;")
pd.read_sql_query(query, engine, params={'my_id': my_id, 'table': table})

table name can't be a parameter. Why ?

Maybe @stephenpascoe can help

@stephenpascoe
Copy link

AFAIK this would depend on the SQL backend. SQLAlchemy passes the unsubstituted SQL expression and the parameter dictionary to the underlying DB API to interpret. It would be reasonable for a DB API not to support substitution of the table parameter as it could be an SQL injection vulnerability. Also it wouldn't necessarily be supported by the DB's stored procedure system.

Personally, I've never tried this but a quick test with the raw sqlite3 db api shows a "?" in the table position gives a syntax error.

@jorisvandenbossche
Copy link
Member

Parameter substitution is not possible for the table name AFAIK.

The thing is, in sql there is often a difference between string quoting, and variable quoting (see eg https://sqlite.org/lang_keywords.html the difference in quoting between string and identifier). So you are filling in a string, which is for sql something else as a variable name (in this case a table name).

@scls19fr if you want to add that example to the docs, always welcome!

@erichxchen
Copy link
Contributor

Hello, I find this issue is quite interesting. I've created a PR to add some examples (including the bad examples and good examples) to the docs. Comments are appreciated.

@erichxchen
Copy link
Contributor

take

@tdy
Copy link
Contributor

tdy commented Jun 26, 2024

Is this issue fully resolved by #56546? Or are there any remaining tasks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs good first issue IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

7 participants