From: Ashish Uthama on
On Fri, 02 Apr 2010 13:10:31 -0300, ambrosia nightwish
<mess_imen(a)yahoo.fr> wrote:

> if i have the sequence: AACCGTTAACGT, and i want to find the frequencies
> of all the dinucleotides (word of 2 letters for example), at the
> position 1 we read AA so the frequency of appearence is freq=1/12 , at
> the second position w read AC and freq=1/12, at the eighth position AA
> appeared fo the second time so freq=2/12 ,the calcul is stopped at the
> position N-len+1 (N:length of the sequence, len: length of word)


s='AACCGTTAACGT';

wLen=2;

%associative array, hash, lookup table ...(please see help)
countMap = containers.Map();

for indx=1: length(s)-wLen+1

curWord = s(indx:indx+wLen-1);

if(isKey(countMap,curWord))
%we have seen this, increment count
countMap(curWord)=countMap(curWord)+1;
else
countMap(curWord)=1;
end

end

words = countMap.keys;


frequency = countMap.values;
%Convert to an array
frequency = [frequency{:}];

prob = frequency./sum(frequency)

From: ambrosia nightwish on
THe problem still exists:The first solution shows the number of the counted words and gives a final result what I want to do is to find the number of appearance of words in every step i walk (increment by 1and word reading by wl), Let us take the same example s='AACCGTTAACGT'
for the words:
AAC: n=1
ACC : n=1
CCG: n=1
CGT: n=1
TTA: n=1
TAA: n=1
AAC: n=2
ACG: n=1
CGT: n=2
AS for the second solution, the containers.Map function dont exist in the matlab version that i have.
From: Bruno Luong on
Something like this?

s = 'AACCGTTAACGT';
k = 3;

d = double(s);
A = hankel(d(1:end-k+1),d(end-k+1:end));
[u i j] = unique(A,'rows');
b = zeros(length(i),1);
c = zeros(size(j));
for n=1:length(j)
jn = j(n);
b(jn) = b(jn)+1;
c(n) = b(jn);
end

S = char(A)
c

% Bruno
From: ambrosia nightwish on
That's working Bruno, thank you all
From: Bruno Luong on
% Here is an vectorized code (not necessary meant faster)
% http://www.mathworks.com/matlabcentral/fileexchange/24255

s = 'AACCGTTAACGT';
k = 3;

d = double(s);
A = hankel(d(1:end-k+1),d(end-k+1:end));
[u i j] = unique(A,'rows');
[js is]=sort(j);
clear c
c(is) = cell2mat(SplitVec(js,[],@(x) (1:length(x))')) % SplitVec on FEX

% Bruno