From: us on 2 Apr 2010 11:54 "ambrosia nightwish" <mess_imen(a)yahoo.fr> wrote in message <hp53f7$97n$1(a)fred.mathworks.com>... > no not yet ....and the example(?)... us
From: ambrosia nightwish on 2 Apr 2010 12:00 in the FCGR toolbox of Jesús Mena-Chalco we have an example
From: ambrosia nightwish on 2 Apr 2010 12:10 if i have the sequence: AACCGTTAACGT, and i want to find the frequencies of all the dinucleotides (word of 2 letters for example), at the position 1 we read AA so the frequency of appearence is freq=1/12 , at the second position w read AC and freq=1/12, at the eighth position AA appeared fo the second time so freq=2/12 ,the calcul is stopped at the position N-len+1 (N:length of the sequence, len: length of word)
From: Roger Stafford on 2 Apr 2010 14:06 "ambrosia nightwish" <mess_imen(a)yahoo.fr> wrote in message <hp54tn$2ii$1(a)fred.mathworks.com>... > if i have the sequence: AACCGTTAACGT, and i want to find the frequencies of all the dinucleotides (word of 2 letters for example), at the position 1 we read AA so the frequency of appearence is freq=1/12 , at the second position w read AC and freq=1/12, at the eighth position AA appeared fo the second time so freq=2/12 ,the calcul is stopped at the position N-len+1 (N:length of the sequence, len: length of word) -------------- Here's an outline of how you might go about it. Let v be a vector of length N with the nucleotide sequence - I am assuming they are represented by four numbers in this discussion - and k be the desired word length. 1) Create a N-k+1 by k matrix, M, containing the successive length-k words. You can use the 'hankel' function for this purpose. 2) Apply [B,m,n] = unique(M,'rows') to M. B will be a table of all the words appearing in the sequence in sorted order. 3) Apply 'histc' to the vector n to obtain the counts of B words in the sequence. 4) From the counts you can obtain the frequencies. Can you take it from there? Roger Stafford
From: us on 2 Apr 2010 14:25 "ambrosia nightwish" <mess_imen(a)yahoo.fr> wrote in message <hp54tn$2ii$1(a)fred.mathworks.com>... > if i have the sequence: AACCGTTAACGT, and i want to find the frequencies of all the dinucleotides (word of 2 letters for example), at the position 1 we read AA so the frequency of appearence is freq=1/12 , at the second position w read AC and freq=1/12, at the eighth position AA appeared fo the second time so freq=2/12 ,the calcul is stopped at the position N-len+1 (N:length of the sequence, len: length of word) one of the many solutions % the data s='AACCGTTAACGT'; wl=2; % the engine rpat=sprintf('\\S{%d,%d}',wl,wl); t=cell(wl,1); for i=1:wl t{i,1}=regexp(s(i:end),rpat,'match').'; end t=cat(1,t{:}); [tu,ix,ix]=unique(t); n=histc(ix,1:max(ix)); r=[tu,num2cell(n)]; % the result disp(s); disp(r); %{ % S = AACCGTTAACGT % R = 'AA' [2] 'AC' [2] 'CC' [1] 'CG' [2] 'GT' [2] 'TA' [1] 'TT' [1] %} us
First
|
Prev
|
Next
|
Last
Pages: 1 2 3 4 5 Prev: efficient storage, subsets of a set Next: Finding First 50 max. values in an array |