Prev: [ANN] pyxser-1.4.6r --- Python Object to XML serializer/deserializer
Next: regular expressions and the LOCALE flag
From: Lee Sander on 3 Aug 2010 13:34 Hi, Suppose I have a string such as this 'aabccccccefggggghiiijkr' I would like to print out all the positions that are flanked by a run of symbols. So for example, I would like to the output for the above input as follows: 2 b 1 aa 2 b -1 cccccc 10 e -1 cccccc 11 f 1 ggggg 17 h 1 iii 17 h -1 ggggg where the first column is the position of interest, the next column is the entry at that position, 1 if the following column refers to a runs that come after and -1 if the runs come before I can do this easily for forward (shown below) but not clear how to do this backwards. I would really appreciate it if someone can help with this problem. I feel like a regex solution would be possible but I am not too good with regex. The code for forward is as follows: def homopolymericSites(Seq): Seq=Seq.upper() i=0 len_seq=len(Seq)-1# hack to prevent boundary condition while i < len_seq: bi=Seq[i] k=1 # go to the start of a homopolymer while 1: if i+k >= len_seq: break # no more sequence left if bi==Seq[i+k]: k+=1 else: break if k>1: # homopolymer length i=i+k id_of_chr_which_proceeds_homopolymer=Seq[i] # note not i+1 pos_of_chr_which_proceeds_homopolymer=i+1 # +1 to convert it to 1- index notation id_of_homopolymer=Seq[i-1] length_of_homopolymer=k print "%s\t%s/%s\t%s" %(pos_of_chr_which_proceeds_homopolymer, id_of_chr_which_proceeds_homopolymer, id_of_homopolymer, length_of_homopolymer) else: i+=1
From: Peter Otten on 3 Aug 2010 14:31
Lee Sander wrote: > Hi, > Suppose I have a string such as this > 'aabccccccefggggghiiijkr' > > I would like to print out all the positions that are flanked by a run > of symbols. > So for example, I would like to the output for the above input as > follows: > > 2 b 1 aa > 2 b -1 cccccc > 10 e -1 cccccc > 11 f 1 ggggg > 17 h 1 iii > 17 h -1 ggggg > > where the first column is the position of interest, the next column is > the entry at that position, > 1 if the following column refers to a runs that come after and -1 if > the runs come before Trying to follow your spec I came up with from itertools import groupby from collections import namedtuple Item = namedtuple("Item", "pos key size") def compact(seq): pos = 0 for key, group in groupby(seq): size = len(list(group)) yield Item(pos, key, size) pos += size def window(items): items = iter(items) prev = None cur = next(items) for nxt in items: yield prev, cur, nxt prev = cur cur = nxt yield prev, cur, None items = compact("aabccccccefggggghiiijkr") for prev, cur, nxt in window(items): if cur.size == 1: if prev is not None: if prev.size > 1: print cur.pos, cur.key, -1, prev.key*prev.size if nxt is not None: if nxt.size > 1: print cur.pos, cur.key, 1, nxt.key*nxt.size However, this gives a slightly differenct output: $ python homopolymers.py 2 b -1 aa 2 b 1 cccccc 9 e -1 cccccc 10 f 1 ggggg 16 h -1 ggggg 16 h 1 iii 20 j -1 iii Peter |