From: Misha Koshelev on
Dear Sirs or Madams:

I am trying to find and/or implement the Quality Threshold clustering algorithm in MATLAB:
http://en.wikipedia.org/wiki/Cluster_analysis#QT_clustering_algorithm

Actually, any clustering algorithm that results in the same clustering given initial inputs is good for me.

Something easy & fast would be best.

Any tips much appreciated.

Thank you!

Sincerely yours
Misha Koshelev
From: misha680 on
Here is what I have found so far:
R has a version
http://rss.acs.unt.edu/Rdoc/library/flexclust/html/qtclust.html

Original paper:
http://genome.cshlp.org/content/9/11/1106.full
with this step-by-step diagram
http://genome.cshlp.org/content/9/11/1106/F5.large.jpg

Perl implementation:
http://stackoverflow.com/questions/1973109/how-can-i-implement-qt-quality-threshold-clustering-in-perl

At the moment I am a little confused on how to compute a cluster
diameter...?

Thank you
Sincerely yours
Misha Koshelev

On Jan 19, 5:23 pm, "Misha Koshelev" <mk144...(a)bcm.edu> wrote:
> Dear Sirs or Madams:
>
> I am trying to find and/or implement the Quality Threshold clustering algorithm in MATLAB:http://en.wikipedia.org/wiki/Cluster_analysis#QT_clustering_algorithm
>
> Actually, any clustering algorithm that results in the same clustering given initial inputs is good for me.
>
> Something easy & fast would be best.
>
> Any tips much appreciated.
>
> Thank you!
>
> Sincerely yours
> Misha Koshelev

From: Misha Koshelev on
misha680 <misha680(a)gmail.com> wrote in message <6b2d2873-b332-4160-87a3-af2bc7cbd471(a)34g2000yqp.googlegroups.com>...
> Here is what I have found so far:
> R has a version
> http://rss.acs.unt.edu/Rdoc/library/flexclust/html/qtclust.html
>
> Original paper:
> http://genome.cshlp.org/content/9/11/1106.full
> with this step-by-step diagram
> http://genome.cshlp.org/content/9/11/1106/F5.large.jpg
>
> Perl implementation:
> http://stackoverflow.com/questions/1973109/how-can-i-implement-qt-quality-threshold-clustering-in-perl
>
> At the moment I am a little confused on how to compute a cluster
> diameter...?
>
> Thank you
> Sincerely yours
> Misha Koshelev
>
> On Jan 19, 5:23 pm, "Misha Koshelev" <mk144...(a)bcm.edu> wrote:
> > Dear Sirs or Madams:
> >
> > I am trying to find and/or implement the Quality Threshold clustering algorithm in MATLAB:http://en.wikipedia.org/wiki/Cluster_analysis#QT_clustering_algorithm
> >
> > Actually, any clustering algorithm that results in the same clustering given initial inputs is good for me.
> >
> > Something easy & fast would be best.
> >
> > Any tips much appreciated.
> >
> > Thank you!
> >
> > Sincerely yours
> > Misha Koshelev

Ok here's my implementation using Euclidean distance:
function idx=qtclusteuclid(G,d,D)
% QT clustering algorithm as described in:
%
% Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression
% data: Identification and analysis of coexpressed genes. Genome Research
% 9, 1106&#8211;1115.
%
% http://genome.cshlp.org/content/9/11/1106.full
% http://genome.cshlp.org/content/9/11/1106/F5.large.jpg
%
% if two sets A{i} have same cardinality, we pick one with smallest diameter
% our distance measure is Euclidean distance
%
% input:
% G-nxp data to cluster
% d-diameter threshold
% D-Euclidean distance for all 0<i<j<=n
%
% output:
% idx-nx1 vector containing cluster indices
%
% Misha Koshelev
% January 20th, 2009
% Montague Laboratory

n=size(G,1);
if n<=1
idx=ones(n,1);
return;
end
if nargin<3
D=Inf*ones(n,n);
for i=1:n
D(i,i)=0;
for j=i+1:n
D(i,j)=sqrt(sum((G(i,:)-G(j,:)).^2));D(j,i)=D(i,j);
end
end
end

C=[];Ccard=0;Cdiam=0;
for i=1:n
flag=true;
A=[i];Acard=1;Adiam=0;
while flag&&length(A)<n
pts=1:n;pts(A)=[];
jdiam=zeros(length(pts),1);
for pidx=1:length(pts)
% We only need to compute maximal distance from new point j to all
% existing points in cluster
jdiam(pidx)=max(D(pts(pidx),A));
end
[minjdiam,pidx]=min(jdiam);j=pts(pidx);
if sum(jdiam==minjdiam)>1
dbstack;keyboard;
end

if max(Adiam,minjdiam)>d
flag=false;
else
A=[A,j];
Acard=Acard+1;
Adiam=max(Adiam,minjdiam);
end
end

A=sort(A);
if Acard>Ccard
C=A;
Ccard=Acard;
Cdiam=Adiam;
end
end

idx=ones(n,1);
GmC=1:n;GmC(C)=[];
idx(GmC)=qtclusteuclid(G(GmC,:),d,D(GmC,GmC))+1;

function d=diam(G,clust)
% http://thesaurus.maths.org/mmkb/entry.html?action=entryByConcept&id=3279
% The largest distance between any two points in a set is called the set's diameter.
%
% input:
% G-nxp data for our cluster
% clust-vector of row indices into G
%

dsqr=0;
for k=1:length(clust)
for k2=k:length(clust)
dsqr=max(dsqr,sum((G(clust(k),:)-G(clust(k2),:)).^2));
end
end
d=sqrt(dsqr);

From: Misha Koshelev on
Oops should be this sorry :(

function idx=qtclusteuclid(G,d,D)
% QT clustering algorithm as described in:
%
% Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression
% data: Identification and analysis of coexpressed genes. Genome Research
% 9, 1106&#8211;1115.
%
% http://genome.cshlp.org/content/9/11/1106.full
% http://genome.cshlp.org/content/9/11/1106/F5.large.jpg
%
% if two sets A{i} have same cardinality, we pick first one
% our distance measure is Euclidean distance
%
% input:
% G-nxp data to cluster
% d-diameter threshold
% D-Euclidean distance for all 0<i<j<=n
%
% output:
% idx-nx1 vector containing cluster indices
%
% Misha Koshelev
% January 20th, 2009
% Montague Laboratory

n=size(G,1);
if n<=1
idx=ones(n,1);
return;
end
if nargin<3
D=Inf*ones(n,n);
for i=1:n
D(i,i)=0;
for j=i+1:n
D(i,j)=sqrt(sum((G(i,:)-G(j,:)).^2));D(j,i)=D(i,j);
end
end
end

C=[];Ccard=0;Cdiam=0;
for i=1:n
flag=true;
A=[i];Acard=1;Adiam=0;
while flag&&length(A)<n
pts=1:n;pts(A)=[];
jdiam=zeros(length(pts),1);
for pidx=1:length(pts)
% We only need to compute maximal distance from new point j to all
% existing points in cluster
jdiam(pidx)=max(D(pts(pidx),A));
end
[minjdiam,pidx]=min(jdiam);j=pts(pidx);
if sum(jdiam==minjdiam)>1
dbstack;keyboard;
end

if max(Adiam,minjdiam)>d
flag=false;
else
A=[A,j];
Acard=Acard+1;
Adiam=max(Adiam,minjdiam);
end
end

A=sort(A);
if Acard>Ccard
C=A;
Ccard=Acard;
Cdiam=Adiam;
end
end

idx=ones(n,1);
GmC=1:n;GmC(C)=[];
idx(GmC)=qtclusteuclid(G(GmC,:),d,D(GmC,GmC))+1;

function d=diam(G,clust)
% http://thesaurus.maths.org/mmkb/entry.html?action=entryByConcept&id=3279
% The largest distance between any two points in a set is called the set's diameter.
%
% input:
% G-nxp data for our cluster
% clust-vector of row indices into G
%

dsqr=0;
for k=1:length(clust)
for k2=k:length(clust)
dsqr=max(dsqr,sum((G(clust(k),:)-G(clust(k2),:)).^2));
end
end
d=sqrt(dsqr);

From: Misha Koshelev on
Actually here is three different versions:
http://people.hnl.bcm.edu/misha/tmp/qtclust.zip

I posted to File Exchange. Will take above link down at some point.

QT clustering algorithm as described in:

Heyer, L. J., Kruglyak, S., Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106&#8211;1115.

http://genome.cshlp.org/content/9/11/1106.full
http://genome.cshlp.org/content/9/11/1106/F5.large.jpg

qtclusteuclid.m - use Euclidean distance as measure
qtclustjacknife.m - use Jackknife Correlation (see article above)
qtclustmod.m - use Euclidean distance, modified to always return 2 clusters.

Misha