From: Roy Smith on 1 Jul 2010 08:11 Stephen Hansen <me+list/python(a)ixokai.io> wrote: > The quote does not deny the power of regular expressions; it challenges > widely held assumption and belief that comes from *somewhere* that they > are the best way to approach any problem that is text related. Well, that assumption comes from historical unix usage where traditional tools like awk, sed, ed, and grep, made heavy use of regex, and therefore people learned to become proficient at them and use them all the time. Somewhat later, the next generation of tools such as vi and perl continued that tradition. Given the tools that were available at the time, regex was indeed likely to be the best tool available for most text-related problems. Keep in mind that in the early days, people were working on hard-copy terminals [[http://en.wikipedia.org/wiki/ASR-33]] so economy of expression was a significant selling point for regexes. Not trying to further this somewhat silly debate, just adding a bit of historical viewpoint to answer the implicit question you ask as to where the assumption came from.
From: Stephen Hansen on 1 Jul 2010 10:18 On 7/1/10 3:03 AM, Jean-Michel Pichavant wrote: > Re is part of the python standard library, for some purpose I guess. No, *really*? So all those people who have been advocating its useless and shouldn't be are already too late? Damn. Well, there goes *that* whole crusade we were all out on. Since we can't destroy re, maybe we can go club baby seals. -- ... Stephen Hansen ... Also: Ixokai ... Mail: me+list/python (AT) ixokai (DOT) io ... Blog: http://meh.ixokai.io/
From: Stephen Hansen on 1 Jul 2010 10:27 On 7/1/10 5:11 AM, Roy Smith wrote: > Stephen Hansen<me+list/python(a)ixokai.io> wrote: > >> The quote does not deny the power of regular expressions; it challenges >> widely held assumption and belief that comes from *somewhere* that they >> are the best way to approach any problem that is text related. > > Well, that assumption comes from historical unix usage where traditional > tools like awk, sed, ed, and grep, made heavy use of regex, and > therefore people learned to become proficient at them and use them all > the time. Oh, I'm fully aware of the history of re's -- but its not those old hats and even their students and the unix geeks I'm talking about. It's the newbies and people wandering into the language with absolutely no idea about the history of unix, shell scripting and such, who so often arrive with the idea firmly planted in their head, that I wonder at. Sure, there's going to be a certain amount of cross-polination from unix-geeks to students-of-students-of-students-of unix geeks to spread the idea, but it seems more pervasive for that. I just picture a re-vangelist camping out in high schools and colleges selling the party line or something :) -- ... Stephen Hansen ... Also: Ixokai ... Mail: me+list/python (AT) ixokai (DOT) io ... Blog: http://meh.ixokai.io/ P.S. And no, unix geeks is not a pejorative term.
From: Lawrence D'Oliveiro on 3 Jul 2010 22:33 In message <pan.2010.06.29.09.35.18.594000(a)nowhere.com>, Nobody wrote: > On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote: > >>> Seriously, almost every other kind of library uses a binary API. What >>> makes databases so special that they need a string-command based API? >> >> HTML is also effectively a string-based API. > > HTML is a data format. The sane way to construct or manipulate HTML is via > the DOM, not string operations. What is this “DOM” of which you speak? I looked here <http://docs.python.org/library/>, but can find nothing that sounds like that, that is relevant to HTML. >> And what about regular expressions? > > What about them? As the saying goes: > > Some people, when confronted with a problem, think > "I know, I'll use regular expressions." > Now they have two problems. > > They have some uses, e.g. defining tokens[1]. Using them to match more > complex constructs is error-prone ... What if they're NOT more complex, but they can simply contain user-entered data? >> And all the functionality available through the subprocess >> module and its predecessors? > > The main reason why everyone recommends subprocess over its predecessors > is that it allows you to bypass the shell, which is one of the most > common sources of the type of error being discussed in this thread. How would you deal with this, then: I wrote a script called ExtractMac, to convert various old Macintosh-format documents accumulated over the years (stored in AppleDouble form by uploading to a Netatalk server) to more cross-platform formats. This has a table of conversion commands to use. For example, the entries for PICT and TEXT Macintosh file types look like this: "PICT" : { "type" : "image", "ext" : ".png", "act" : "convert %(src)s %(dst)s", }, "TEXT" : { "type" : "text", "ext" : ".txt", "act" : "LineEndings unix <%(src)s >%(dst)s", }, The conversion code that uses this table looks like Cmd = \ ( Act.get("act", "cp -p %(src)s %(dst)s") % { "src" : ShellEscape(Src), "dst" : ShellEscape(DstFileName), } ) sys.stderr.write("Doing: %s\n" % Cmd) Status = os.system(Cmd) How much simpler would your alternative be? I don't think it would be simpler at all.
From: Rami Chowdhury on 3 Jul 2010 22:43
On Saturday 03 July 2010 19:33:44 Lawrence D'Oliveiro wrote: > In message <pan.2010.06.29.09.35.18.594000(a)nowhere.com>, Nobody wrote: > > On Tue, 29 Jun 2010 12:30:36 +1200, Lawrence D'Oliveiro wrote: > >>> Seriously, almost every other kind of library uses a binary API. What > >>> makes databases so special that they need a string-command based API? > >> > >> HTML is also effectively a string-based API. > > > > HTML is a data format. The sane way to construct or manipulate HTML is > > via the DOM, not string operations. > > What is this âDOMâ of which you speak? I looked here > <http://docs.python.org/library/>, but can find nothing that sounds like > that, that is relevant to HTML. > The Document Object Model - I don't think the standard library has an HTML DOM module but there's certainly one for XML (and XHTML): http://docs.python.org/library/xml.dom.html ---- Rami Chowdhury "Any sufficiently advanced incompetence is indistinguishable from malice." -- Grey's Law +1-408-597-7068 / +44-7875-841-046 / +88-01819-245544 |