From: Mohammad A. Mezher on
Hi,

I am not sure about the function perfcurve in matlab.. the function takes number of input parameters:
[X,Y] = perfcurve(labels,scores,posclass);

what is the labels?
what is the scores?

I read all its documentation but can't find the data that i have to use in labels and in scores??


appreciate any clarification..

Thanks
From: Sadik on
Hi Mohammad,

The following example from the documentation is very illustrative. I am going to explain it a bit for you to better understand it:

1. load fisheriris
%matlab's own dataset. Basically, there are three types of fish: setosa, versicolor and virginica [these names are in the variable species] and 50 samples per type. The first fifty is setosa, second fifty is versicolor and the third is virginica.
2. x = meas(51:end,1:2);
% If you load the data, you will see that meas is a 150x4 matrix. There are 150 samples with 4 features per sample. x = meas(51:end,1:2) chooses the data pertaining to versicolor and virginica, and it is getting only 2 of the 4 features.
3. y = (1:100)'>50;
% versicolor=0, virginica=1
% 50 zeros and 50 ones. This means, versicolor will be represented by zeroes and virginica by ones in the glm.
4. b = glmfit(x,y,'binomial');
% Obtain the model parameters.
5. p = glmval(b,x,'logit');
% Using these parameters, compute the output of the classifier. This is what goes into "scores" in perfcurve.
6. [X,Y,T,AUC] = perfcurve(species(51:end,:),p,'virginica');
% You can very easily see now. "labels" is nothing but a list of true labels you had in your data set. Since, after reduction, the dataset had 50 versicolors and 50 virginicas, "labels" is now a cell array where the first 50 elements are equal to the string 'versicolor' and the last 50 is equal to 'virginica'.
% The last input to perfcurve "posclass" is the label of the positive class. If you look at line 3. above, we are assigning 1 to the second fifty, which is virginica. Therefore, the label of the positive class is 'virginica'.
plot(X,Y)
xlabel('False positive rate'); ylabel('True positive rate')
title('ROC for classification by logistic regression')

Best.
From: Mohammad A. Mezher on
"Sadik " <sadik.hava(a)gmail.com> wrote in message <hqvsqd$i5d$1(a)fred.mathworks.com>...
> Hi Mohammad,
>
> The following example from the documentation is very illustrative. I am going to explain it a bit for you to better understand it:
>
> 1. load fisheriris
> %matlab's own dataset. Basically, there are three types of fish: setosa, versicolor and virginica [these names are in the variable species] and 50 samples per type. The first fifty is setosa, second fifty is versicolor and the third is virginica.
> 2. x = meas(51:end,1:2);
> % If you load the data, you will see that meas is a 150x4 matrix. There are 150 samples with 4 features per sample. x = meas(51:end,1:2) chooses the data pertaining to versicolor and virginica, and it is getting only 2 of the 4 features.
> 3. y = (1:100)'>50;
> % versicolor=0, virginica=1
> % 50 zeros and 50 ones. This means, versicolor will be represented by zeroes and virginica by ones in the glm.
> 4. b = glmfit(x,y,'binomial');
> % Obtain the model parameters.
> 5. p = glmval(b,x,'logit');
> % Using these parameters, compute the output of the classifier. This is what goes into "scores" in perfcurve.
> 6. [X,Y,T,AUC] = perfcurve(species(51:end,:),p,'virginica');
> % You can very easily see now. "labels" is nothing but a list of true labels you had in your data set. Since, after reduction, the dataset had 50 versicolors and 50 virginicas, "labels" is now a cell array where the first 50 elements are equal to the string 'versicolor' and the last 50 is equal to 'virginica'.
> % The last input to perfcurve "posclass" is the label of the positive class. If you look at line 3. above, we are assigning 1 to the second fifty, which is virginica. Therefore, the label of the positive class is 'virginica'.
> plot(X,Y)
> xlabel('False positive rate'); ylabel('True positive rate')
> title('ROC for classification by logistic regression')
>
> Best.

Thank you Sadik for your reply,

i think i am missing something here you did not use any type of training and testing data for example:

load ionodata % ionosphere dataset has A for data and groups for their labels
indices = crossvalind('Kfold',groups,3);
test = (indices == i); train = ~test;
svmStruct = svmtrain(A(train),groups(train));
classes = svmclassify(svmStruct,A(test));

I am stuck here what kind of data i have to use in the perfcurve function
pos =0; % for positive labels
[X,Y,T,AUC] = perfcurve(labels,scores,pos);

are the labels and scores here for labels and the confidence of training data or for the test data or for all dataset??

Thank you in advance..
From: Sadik on
You should be using the labels of the testing data and the scores obtained by the testing data.

Best.
From: Mohammad A. Mezher on
"Sadik " <sadik.hava(a)gmail.com> wrote in message <hr17mt$ad4$1(a)fred.mathworks.com>...
> You should be using the labels of the testing data and the scores obtained by the testing data.
>
> Best.


Thank you for your reply. Do you thinking of any function to compute the score value (the confidence) of testing data?? i have the following equation
d(x) = sum K(xi,x)*(alphai.*yi) - 0.5*sum K(xi,sv)*(alphai.*yi)

xi training data
yi labels of training data
x testing data
K() kernel function
sv support vector
 | 
Pages: 1
Prev: Adding rotated images
Next: RADIUS