From: Ashish Uthama on 2 Apr 2010 14:50 On Fri, 02 Apr 2010 13:10:31 -0300, ambrosia nightwish <mess_imen(a)yahoo.fr> wrote: > if i have the sequence: AACCGTTAACGT, and i want to find the frequencies > of all the dinucleotides (word of 2 letters for example), at the > position 1 we read AA so the frequency of appearence is freq=1/12 , at > the second position w read AC and freq=1/12, at the eighth position AA > appeared fo the second time so freq=2/12 ,the calcul is stopped at the > position N-len+1 (N:length of the sequence, len: length of word) s='AACCGTTAACGT'; wLen=2; %associative array, hash, lookup table ...(please see help) countMap = containers.Map(); for indx=1: length(s)-wLen+1 curWord = s(indx:indx+wLen-1); if(isKey(countMap,curWord)) %we have seen this, increment count countMap(curWord)=countMap(curWord)+1; else countMap(curWord)=1; end end words = countMap.keys; frequency = countMap.values; %Convert to an array frequency = [frequency{:}]; prob = frequency./sum(frequency)
From: ambrosia nightwish on 3 Apr 2010 17:36 THe problem still exists:The first solution shows the number of the counted words and gives a final result what I want to do is to find the number of appearance of words in every step i walk (increment by 1and word reading by wl), Let us take the same example s='AACCGTTAACGT' for the words: AAC: n=1 ACC : n=1 CCG: n=1 CGT: n=1 TTA: n=1 TAA: n=1 AAC: n=2 ACG: n=1 CGT: n=2 AS for the second solution, the containers.Map function dont exist in the matlab version that i have.
From: Bruno Luong on 4 Apr 2010 06:33 Something like this? s = 'AACCGTTAACGT'; k = 3; d = double(s); A = hankel(d(1:end-k+1),d(end-k+1:end)); [u i j] = unique(A,'rows'); b = zeros(length(i),1); c = zeros(size(j)); for n=1:length(j) jn = j(n); b(jn) = b(jn)+1; c(n) = b(jn); end S = char(A) c % Bruno
From: ambrosia nightwish on 4 Apr 2010 07:11 That's working Bruno, thank you all
From: Bruno Luong on 4 Apr 2010 09:38 % Here is an vectorized code (not necessary meant faster) % http://www.mathworks.com/matlabcentral/fileexchange/24255 s = 'AACCGTTAACGT'; k = 3; d = double(s); A = hankel(d(1:end-k+1),d(end-k+1:end)); [u i j] = unique(A,'rows'); [js is]=sort(j); clear c c(is) = cell2mat(SplitVec(js,[],@(x) (1:length(x))')) % SplitVec on FEX % Bruno
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: efficient storage, subsets of a set Next: Finding First 50 max. values in an array |