Why Is Escaping Data Considered So Magical? [Python]

Prev: GDAL-1.7.1 : vcvarsall.bat missing
Next: improving python performance by extension module (64bit)

From: Cameron Simpson on 25 Jun 2010 01:54

On 25Jun2010 15:38, Lawrence D'Oliveiro <ldo(a)geek-central.gen.new_zealand> wrote:
| In message <2010062422432660794-angrybaldguy(a)gmailcom>, Owen Jacobson wrote:
| > Why would I write this when SQLAlchemy, even without using its ORM
| > features, can do it for me?
|
| SQLAlchemy doesn't seem very flexible. Looking at the code examples
| <http://www.sqlalchemy.org/docs/examples.html>, they're very procedural:
| build object, then do a string of separate method calls to add data to it. I
| prefer the functional approach, as in my table-update example.

He said "without using its ORM". I do what you suggest (make SQL
statements at need) using SQLalchemy all the time. It is simple and easy
and _robust_ against odd data. The number of times I've had to
fix/remove insert-values-into-SQL-text code ...
--
Cameron Simpson <cs(a)zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Plague, Famine, Pestilence, and C++ stalk the land. We're doomed! Doomed!
- Simon E Spero

From: Cameron Simpson on 25 Jun 2010 02:11

On 25Jun2010 15:54, I wrote:
| The number of times I've had to
| fix/remove insert-values-into-SQL-text code ...

My point here is that with insert-escaped-values-into-sql-text,
you only need to forget to do it once (or do it wrong).
By using a parameterised form like that required by SQLalchemy
the library does it and never forgets.

I would also point out that if you use a library to _construct_ the SQL
statements themselves eg via SQLA's .select() methods etc then you will never
introduce a syntax error into the SQL either. I expect I could construct SQL
syntax errors that cause havoc when inserted with correctly escaped parameter
values if I tried, probably using quotes in the SQL typo part.

Cheers,
--
Cameron Simpson <cs(a)zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

George, discussing a patent and prior art:
"Look, this publication has a date, the patent has a priority date,
can't you just compare them?"
Paul Sutcliffe:
"Not unless you're a lawyer."

From: Nobody on 25 Jun 2010 02:47

On Fri, 25 Jun 2010 12:25:56 +1200, Lawrence D'Oliveiro wrote:

> Just been reading this article
> ...
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.
>
> Fair enough. But it's disconcerting to see some of the advice being
> offered in the reader comments, like “force everyone to use stored
> procedures”, or “force everyone to use prepared/parametrized
> statements”, “never construct ad-hoc SQL queries” and the like.
>
> I construct ad-hoc queries all the time. It really isn't that hard to
> do safely.

Wrong.

Even if you get the quoting absolutely correct (which is a very big "if"),
you have to remember to perform it every time, without exception. And you
need to perform it exactly once. As the program gets more complex,
ensuring that it's done in the correct place, and only there, gets harder.

More generally, as a program gets more complex, "this will work so long as
we do X every time without fail" approaches "this won't work".

> All you have to do is read the documentation—for example,
> <http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html>—and then
> write a routine that takes arbitrary data and turns it into a valid
> string literal, like this
> <http://www.codecodex.com/wiki/Useful_MySQL_Routines#Quoting>.

That's okay. Provided the documentation is accurate. And provided that you
update the escaping algorithm whenever the SQL dialect gets extended, or
you switch to a different back-end, or modify the program. IOW, it's not
even remotely okay.

"Unparsing" data so that you get the correct answer out of a subsequent
parsing step is objectively and obviously the wrong approach. The
correct approach is to skip both the unparsing and parsing steps
entirely.

Formal grammars are a useful way to represent graph-like data structures
in a human-readable and human-editable form. But for creation,
modification and use by a computer, it is invariably preferable to operate
upon the graph directly. Textual formats inherit all of the "issues" which
apply to the underlying data structure, then add a few of their own for
good measure.

> I've done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash.

And, of course, you're convinced that you got it right every time. That
attitude alone should set alarm bells ringing for anyone who's worked in
this industry for more than five minutes.

From: Paul Rubin on 25 Jun 2010 03:09

Nobody <nobody(a)nowhere.com> writes:
> More generally, as a program gets more complex, "this will work so long as
> we do X every time without fail" approaches "this won't work".

QOTW

From: Jorgen Grahn on 25 Jun 2010 08:15

On Fri, 2010-06-25, Lawrence D'Oliveiro wrote:
> Just been reading this article
> <http://www.theregister.co.uk/2010/06/23/xxs_sql_injection_attacks_testing_remedy/>
> which says that a lot of security holes are arising these days because
> everybody is concentrating on unit testing of their own particular
> components, with less attention being devoted to overall integration
> testing.

I don't do SQL and I don't even understand the terminology properly
.... but the discussion around it bothers me.

Do those people really do this?
- accept untrusted user data
- try to sanitize the data (escaping certain characters etc)
- turn this data into executable code (SQL)
- executing it

Like the example in the article

SELECT * FROM hotels WHERE city = '<untrusted>';

If so, its isomorphic with doing os.popen('zcat -f %s' % untrusted)
in Python (at least on Unix, where 'zcat ...' is executed as a shell
script).

I thought it was well-known that the solution is *not* to try to
sanitize the input -- it's to switch to an interface which doesn't
involve generating an intermediate executable. In the Python example,
that would be something like os.popen2(['zcat', '-f', '--', untrusted]).

Am I missing something? If not, I can go back to sleep -- and keep
avoiding SQL and web programming like the plague until that community
has entered the 21st century.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12
Prev: GDAL-1.7.1 : vcvarsall.bat missing
Next: improving python performance by extension module (64bit)