Prev: interaction of mode 'r+', file.write(), and file.tell(): a bug or undefined behavior?
Next: Why am I getting this Error message
From: Steven D'Aprano on 29 Jan 2010 05:04 On Fri, 29 Jan 2010 11:23:54 +0200, Johann Spies wrote: > On Thu, Jan 28, 2010 at 07:07:04AM -0800, evilweasel wrote: >> Hi folks, >> >> I am a newbie to python, and I would be grateful if someone could point >> out the mistake in my program. Basically, I have a huge text file >> similar to the format below: >> >> AAAAAGACTCGAGTGCGCGGA 0 >> AAAAAGATAAGCTAATTAAGCTACTGG 0 >> AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 >> AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 >> AAAAAGGTCGCCTGACGGCTGC 0 > > I know this is a python list but if you really want to get the job done > quickly this is one method without writing python code: > > $ cat /tmp/y > AAAAAGACTCGAGTGCGCGGA 0 > AAAAAGATAAGCTAATTAAGCTACTGG 0 > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 > AAAAAGGTCGCCTGACGGCTGC 0 > $ grep -v 0 /tmp/y > tmp/z > $ cat /tmp/z > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 That will do the wrong thing for lines like: AAAAAGATAAGCTAATTAAGCTACTGGGTT 10 -- Steven
From: Johann Spies on 29 Jan 2010 08:21 On Fri, Jan 29, 2010 at 10:04:33AM +0000, Steven D'Aprano wrote: > > I know this is a python list but if you really want to get the job done > > quickly this is one method without writing python code: > > > > $ cat /tmp/y > > AAAAAGACTCGAGTGCGCGGA 0 > > AAAAAGATAAGCTAATTAAGCTACTGG 0 > > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 > > AAAAAGGTCGCCTGACGGCTGC 0 > > $ grep -v 0 /tmp/y > tmp/z > > $ cat /tmp/z > > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 > > That will do the wrong thing for lines like: > > AAAAAGATAAGCTAATTAAGCTACTGGGTT 10 In that case change the grep to ' 0$' then only the lines with a singel digit '0' at the end of the line will be excluded. One can do the same using regulare expressions in Python and it will probably a lot slower on large files. Regards Johann -- Johann Spies Telefoon: 021-808 4599 Informasietegnologie, Universiteit van Stellenbosch "My son, if sinners entice thee, consent thou not." Proverbs 1:10
From: D'Arcy J.M. Cain on 29 Jan 2010 09:32 On Fri, 29 Jan 2010 11:23:54 +0200 Johann Spies <jspies(a)sun.ac.za> wrote: > I know this is a python list but if you really want to get the job > done quickly this is one method without writing python code: > [...] > $ grep -v 0 /tmp/y > tmp/z There's plenty of ways to do it without writing Python. C, C++, Perl, Forth, Awk, BASIC, Intercal, etc. So what? Besides, your solution doesn't work. You want "grep -vw 0 /tmp/y > tmp/z" and even then it doesn't meet the requirements. It extracts the lines the OP wants but doesn't reformat them. It also assumes a Unix system or at least something with grep installed so it isn't portable. If you want to see how the same task can be done in many different languages see http://www.roesler-ac.de/wolfram/hello.htm. -- D'Arcy J.M. Cain <darcy(a)druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
From: nn on 29 Jan 2010 09:41 Johann Spies wrote: > On Thu, Jan 28, 2010 at 07:07:04AM -0800, evilweasel wrote: > > Hi folks, > > > > I am a newbie to python, and I would be grateful if someone could > > point out the mistake in my program. Basically, I have a huge text > > file similar to the format below: > > > > AAAAAGACTCGAGTGCGCGGA 0 > > AAAAAGATAAGCTAATTAAGCTACTGG 0 > > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 > > AAAAAGGTCGCCTGACGGCTGC 0 > > I know this is a python list but if you really want to get the job > done quickly this is one method without writing python code: > > $ cat /tmp/y > AAAAAGACTCGAGTGCGCGGA 0 > AAAAAGATAAGCTAATTAAGCTACTGG 0 > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 > AAAAAGGTCGCCTGACGGCTGC 0 > $ grep -v 0 /tmp/y > tmp/z > $ cat /tmp/z > AAAAAGATAAGCTAATTAAGCTACTGGGTT 1 > AAAAAGGGGGCTCACAGGGGAGGGGTAT 1 > > Regards > Johann > -- > Johann Spies Telefoon: 021-808 4599 > Informasietegnologie, Universiteit van Stellenbosch > > "My son, if sinners entice thee, consent thou not." > Proverbs 1:10 I would rather use awk for this: awk 'NF==2 && $2!~/^0$/ {printf("seq%s\n%s\n",NR,$1)}' dnain.dat but I think that is getting a bit off topic...
From: Aahz on 2 Feb 2010 18:07
In article <mailman.1551.1264701475.28905.python-list(a)python.org>, D'Arcy J.M. Cain <darcy(a)druid.net> wrote: > >If you have a problem and you think that regular expressions are the >solution then now you have two problems. Regex is really overkill for >the OP's problem and it certainly doesn't improve readability. If you're going to use a quote, it works better if you use the exact quote and attribute it: 'Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.' --Jamie Zawinski -- Aahz (aahz(a)pythoncraft.com) <*> http://www.pythoncraft.com/ import antigravity |