From: Soumya on
Hi all..

Can anybody tell me how to generate banana shaped dataset used for clustering purposes..I'm working on combination of clustering algorithms.. if possible please provide the source code or the mathematical function by which I can generate this dataset... thanks..
From: Soumya on
%GENDATB Generation of banana shaped classes
%
% A = GENDATB(N,S)
%
% INPUT
% N number of generated samples of vector with
% number of samples per class
% S variance of the normal distribution (opt, def: s=1)
%
% OUTPUT
% A generated dataset
%
% DESCRIPTION
% Generation of a 2-dimensional 2-class dataset A of N objects with a
% banana shaped distribution. The data is uniformly distributed along the
% bananas and is superimposed with a normal distribution with standard
% deviation S in all directions. Class priors are P(1) = P(2) = 0.5.
% Defaults: N = [50,50], S = 1.
%
% SEE ALSO
% DATASETS, PRDATASETS

% Copyright: A. Hoekstra, R.P.W. Duin, duin(a)ph.tn.tudelft.nl
% Faculty of Applied Sciences, Delft University of Technology
% P.O. Box 5046, 2600 GA Delft, The Netherlands

% $Id: gendatb.m,v 1.3 2003/07/21 09:27:21 davidt Exp $

function a = gendatb(N,s)

prtrace(mfilename);

if nargin < 1, N = [50,50]; end
if nargin < 2, s = 1; end

% Default size of the banana:
r = 5;
% Default class prior probabilities:
p = [0.5 0.5];
N = genclass(N,p);

domaina = 0.125*pi + rand(1,N(1))*1.25*pi;
a = [r*sin(domaina') r*cos(domaina')] + randn(N(1),2)*s;

domainb = 0.375*pi - rand(1,N(2))*1.25*pi;
a = [a; [r*sin(domainb') r*cos(domainb')] + randn(N(2),2)*s + ...
ones(N(2),1)*[-0.75*r -0.75*r]];
lab = genlab(N);

a = dataset(a,lab,'name','Banana Set','lablist',genlab([1;1]),'prior',p);

return


I have got this code..I'm new to matlab.. can anyone tell me how to run this in matlab..
From: David Young on
Can you say a little more exactly what you mean by a banana-shaped dataset? Lots of functions could be thought of as producing something roughly banana-shaped, but if this is a standard term in clustering studies maybe you could post a link to a page that describes the idea in a little more detail, or perhaps you could just give some more information yourself. As it is, a short enough section of *any* non-linear polynomial could be regarded as "banana shaped" - so you could just use y = x*x for example.

Taking your question over-literally, I can't resist pointing out that the shape of bananas is a hot potato in the UK, making front-page news in newspapers that like to attack European legislation such as this:

http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31994R2257:EN:HTML

Contrary to what the newspapers seemed to think, the EU legislation did not define "banana shaped", so it does not provide a solution to our problem.
From: Oleg Komarov on
"Soumya " <soumya.nsec(a)gmail.com> wrote in message <husnvd$nih$1(a)fred.mathworks.com>...
> %GENDATB Generation of banana shaped classes
> %
> % A = GENDATB(N,S)
> %
> % INPUT
> % N number of generated samples of vector with
> % number of samples per class
> % S variance of the normal distribution (opt, def: s=1)
> %
> % OUTPUT
> % A generated dataset
> %
> % DESCRIPTION
> % Generation of a 2-dimensional 2-class dataset A of N objects with a
> % banana shaped distribution. The data is uniformly distributed along the
> % bananas and is superimposed with a normal distribution with standard
> % deviation S in all directions. Class priors are P(1) = P(2) = 0.5.
> % Defaults: N = [50,50], S = 1.
> %
> % SEE ALSO
> % DATASETS, PRDATASETS
>
> % Copyright: A. Hoekstra, R.P.W. Duin, duin(a)ph.tn.tudelft.nl
> % Faculty of Applied Sciences, Delft University of Technology
> % P.O. Box 5046, 2600 GA Delft, The Netherlands
>
> % $Id: gendatb.m,v 1.3 2003/07/21 09:27:21 davidt Exp $
>
> function a = gendatb(N,s)
>
> prtrace(mfilename);
>
> if nargin < 1, N = [50,50]; end
> if nargin < 2, s = 1; end
>
> % Default size of the banana:
> r = 5;
> % Default class prior probabilities:
> p = [0.5 0.5];
> N = genclass(N,p);
>
> domaina = 0.125*pi + rand(1,N(1))*1.25*pi;
> a = [r*sin(domaina') r*cos(domaina')] + randn(N(1),2)*s;
>
> domainb = 0.375*pi - rand(1,N(2))*1.25*pi;
> a = [a; [r*sin(domainb') r*cos(domainb')] + randn(N(2),2)*s + ...
> ones(N(2),1)*[-0.75*r -0.75*r]];
> lab = genlab(N);
>
> a = dataset(a,lab,'name','Banana Set','lablist',genlab([1;1]),'prior',p);
>
> return
>
>
> I have got this code..I'm new to matlab.. can anyone tell me how to run this in matlab..

Type in the command prompt of in a new m-file:
A = GENDATB();
or
A = GENDATB(10,7.8);

where you can choose N and S according to your needs.
The help is pretyclear though.

Oleg
From: Soumya on
"David Young" <d.s.young.notthisbit(a)sussex.ac.uk> wrote in message <huso1c$rdb$1(a)fred.mathworks.com>...
> Can you say a little more exactly what you mean by a banana-shaped dataset? Lots of functions could be thought of as producing something roughly banana-shaped, but if this is a standard term in clustering studies maybe you could post a link to a page that describes the idea in a little more detail, or perhaps you could just give some more information yourself. As it is, a short enough section of *any* non-linear polynomial could be regarded as "banana shaped" - so you could just use y = x*x for example.
>
> Taking your question over-literally, I can't resist pointing out that the shape of bananas is a hot potato in the UK, making front-page news in newspapers that like to attack European legislation such as this:
>
> http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31994R2257:EN:HTML
>
> Contrary to what the newspapers seemed to think, the EU legislation did not define "banana shaped", so it does not provide a solution to our problem.


http://nichol.as/papers/labreport3.pdf

Thanks for helping out.. see the pdf..in the 3rd page there is a picture of banana distribution.. I think this will clarify the issue. I need a dataset of those patterns in the distribution..