roc curve and cross-validation?? [Matlab]

Prev: handling two figures in GUI
Next: WBC Cell Segmentation

From: Mohammad on 31 Mar 2010 03:41

hi,

please look at the AUROC following function. The idea here is to use the following function with a three crossvalidation procedure.

The AUROC is:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function auc=AUROC
for i = 1:3 % 3 cross validation
test = (indices == i); train = ~test;
svmStruct = svmtrain(A(train,: ),groups(train),'Kernel_Function',
Kernel_FunctionValue,'Method', MethodValue);
ytest=d(test);
[classes,ypred] = svmclassify(svmStruct,A(test,: ));
classperf(cp,classes,test);
[AUC(:,i),tpr(:,i),fpr(:,i)]= rocplot(ypred,ytest);
end

%%%% To Compute the Area Under Roc Curve
[auc,indx]=max(AUC);
TP(:,1)=tpr(:,indx);
FP(:,1)=fpr(:,indx);

end % end AUROC
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Is the AUROC 3 croos-validation function here right or wrong??? i am returning the maximum auc value.

thank you in advance...

From: Ilya Narsky on 1 Apr 2010 10:12

"Mohammad" <mohabedalgani(a)yahoo.com> wrote in message
news:73565061.477272.1270035717113.JavaMail.root(a)gallium.mathforum.org...
> hi,
>
> please look at the AUROC following function. The idea here is to use the
> following function with a three crossvalidation procedure.
>
> The AUROC is:
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> function auc=AUROC
> for i = 1:3 % 3 cross validation
> test = (indices == i); train = ~test;
> svmStruct = svmtrain(A(train,: ),groups(train),'Kernel_Function',
> Kernel_FunctionValue,'Method', MethodValue);
> ytest=d(test);
> [classes,ypred] = svmclassify(svmStruct,A(test,: ));
> classperf(cp,classes,test);
> [AUC(:,i),tpr(:,i),fpr(:,i)]= rocplot(ypred,ytest);
> end
>
> %%%% To Compute the Area Under Roc Curve
> [auc,indx]=max(AUC);
> TP(:,1)=tpr(:,indx);
> FP(:,1)=fpr(:,indx);
>
>
> end % end AUROC
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> Is the AUROC 3 croos-validation function here right or wrong??? i am
> returning the maximum auc value.
>
> thank you in advance...
>

Normally, the goal of cross-validation analysis is to obtain an estimate
averaged over cross-validation folds and optionally statistical error for
this estimate. I don't know what you are trying to accomplish by taking the
maximal value across the folds instead of computing the average. I don't
know how you obtained "indices". I don't know what is "rocplot". I don't
know what svmclassify you are using since svmclassify in Matlab does not
have a 2nd output.

If you have Statistics Toolbox, perhaps this is what you want (substitute
whatever you like for classregtree training and prediction):

load ionosphere;
N = size(X,1);
cv = cvpartition(Y,'kfold',3);
score = zeros(N,1);

% Loop over folds
for k=1:3
% Get training and test indices
istrain = training(cv,k);
istest = test(cv,k);

% Train
t = classregtree(X(istrain,:),Y(istrain));

% Get predicted class probability
[~,node] = eval(t,X(istest,:));
probs = classprob(t,node);

% Extract probability for the right class
[~,pos] = ismember('g',t.classname);
score(istest) = probs(:,pos);
end

[fpr,tpr,~,auc] = perfcurve(Y,score,'g');
plot(fpr,tpr);

If you have Matlab 10a, you can estimate confidence bounds for the ROC curve
using perfcurve:

load ionosphere;
cv = cvpartition(Y,'kfold',3);
YPerFold = cell(3,1);
scorePerFold = cell(3,1);
for k=1:3
istrain = training(cv,k);
istest = test(cv,k);
t = classregtree(X(istrain,:),Y(istrain));
[~,node] = eval(t,X(istest,:));
probs = classprob(t,node);
[~,pos] = ismember('g',t.classname);
YPerFold{k} = Y(istest);
scorePerFold{k} = probs(:,pos);
end
[fpr,tpr,~,auc] = perfcurve(YPerFold,scorePerFold,'g','xvals','all');
errorbar(fpr,tpr(:,1),tpr(:,2)-tpr(:,1),tpr(:,3)-tpr(:,1)); % plot with
errors

I would also recommend using more than 3 folds. Usually people choose 5 or
10.

-Ilya

From: Mohammad A. Mezher on 3 Apr 2010 16:27

"Ilya Narsky" <inarsky(a)mathworks.com> wrote in message <hp29kh$1nf$1(a)fred.mathworks.com>...
>
> "Mohammad" <mohabedalgani(a)yahoo.com> wrote in message
> news:73565061.477272.1270035717113.JavaMail.root(a)gallium.mathforum.org...
> > hi,
> >
> > please look at the AUROC following function. The idea here is to use the
> > following function with a three crossvalidation procedure.
> >
> > The AUROC is:
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> > function auc=AUROC
> > for i = 1:3 % 3 cross validation
> > test = (indices == i); train = ~test;
> > svmStruct = svmtrain(A(train,: ),groups(train),'Kernel_Function',
> > Kernel_FunctionValue,'Method', MethodValue);
> > ytest=d(test);
> > [classes,ypred] = svmclassify(svmStruct,A(test,: ));
> > classperf(cp,classes,test);
> > [AUC(:,i),tpr(:,i),fpr(:,i)]= rocplot(ypred,ytest);
> > end
> >
> > %%%% To Compute the Area Under Roc Curve
> > [auc,indx]=max(AUC);
> > TP(:,1)=tpr(:,indx);
> > FP(:,1)=fpr(:,indx);
> >
> >
> > end % end AUROC
> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >
> > Is the AUROC 3 croos-validation function here right or wrong??? i am
> > returning the maximum auc value.
> >
> > thank you in advance...
> >
>
> Normally, the goal of cross-validation analysis is to obtain an estimate
> averaged over cross-validation folds and optionally statistical error for
> this estimate. I don't know what you are trying to accomplish by taking the
> maximal value across the folds instead of computing the average. I don't
> know how you obtained "indices". I don't know what is "rocplot". I don't
> know what svmclassify you are using since svmclassify in Matlab does not
> have a 2nd output.
>
> If you have Statistics Toolbox, perhaps this is what you want (substitute
> whatever you like for classregtree training and prediction):
>
> load ionosphere;
> N = size(X,1);
> cv = cvpartition(Y,'kfold',3);
> score = zeros(N,1);
>
> % Loop over folds
> for k=1:3
> % Get training and test indices
> istrain = training(cv,k);
> istest = test(cv,k);
>
> % Train
> t = classregtree(X(istrain,:),Y(istrain));
>
> % Get predicted class probability
> [~,node] = eval(t,X(istest,:));
> probs = classprob(t,node);
>
> % Extract probability for the right class
> [~,pos] = ismember('g',t.classname);
> score(istest) = probs(:,pos);
> end
>
> [fpr,tpr,~,auc] = perfcurve(Y,score,'g');
> plot(fpr,tpr);
>
>
> If you have Matlab 10a, you can estimate confidence bounds for the ROC curve
> using perfcurve:
>
> load ionosphere;
> cv = cvpartition(Y,'kfold',3);
> YPerFold = cell(3,1);
> scorePerFold = cell(3,1);
> for k=1:3
> istrain = training(cv,k);
> istest = test(cv,k);
> t = classregtree(X(istrain,:),Y(istrain));
> [~,node] = eval(t,X(istest,:));
> probs = classprob(t,node);
> [~,pos] = ismember('g',t.classname);
> YPerFold{k} = Y(istest);
> scorePerFold{k} = probs(:,pos);
> end
> [fpr,tpr,~,auc] = perfcurve(YPerFold,scorePerFold,'g','xvals','all');
> errorbar(fpr,tpr(:,1),tpr(:,2)-tpr(:,1),tpr(:,3)-tpr(:,1)); % plot with
> errors
>
> I would also recommend using more than 3 folds. Usually people choose 5 or
> 10.
>
> -Ilya
>

Thank you so much for your kindly help...

but i think you are using newer version than i do.

i am using R2007b what version do you use?? so i can test your function and try your idea then.

thank you again for your appreciated help..

From: Ilya Narsky on 4 Apr 2010 21:26

Mohammad A. Mezher wrote:
> "Ilya Narsky" <inarsky(a)mathworks.com> wrote in message
> <hp29kh$1nf$1(a)fred.mathworks.com>...
>>
>> "Mohammad" <mohabedalgani(a)yahoo.com> wrote in message
>> news:73565061.477272.1270035717113.JavaMail.root(a)gallium.mathforum.org...
>> > hi,
>> >
>> > please look at the AUROC following function. The idea here is to use
>> the > following function with a three crossvalidation procedure.
>> >
>> > The AUROC is:
>> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> > function auc=AUROC
>> > for i = 1:3 % 3 cross validation
>> > test = (indices == i); train = ~test;
>> > svmStruct = svmtrain(A(train,:
>> ),groups(train),'Kernel_Function',
>> > Kernel_FunctionValue,'Method', MethodValue);
>> > ytest=d(test);
>> > [classes,ypred] = svmclassify(svmStruct,A(test,: ));
>> > classperf(cp,classes,test);
>> > [AUC(:,i),tpr(:,i),fpr(:,i)]= rocplot(ypred,ytest);
>> > end
>> >
>> > %%%% To Compute the Area Under Roc Curve
>> > [auc,indx]=max(AUC);
>> > TP(:,1)=tpr(:,indx);
>> > FP(:,1)=fpr(:,indx);
>> >
>> >
>> > end % end AUROC
>> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> >
>> > Is the AUROC 3 croos-validation function here right or wrong??? i am
>> > returning the maximum auc value.
>> >
>> > thank you in advance...
>> >
>>
>> Normally, the goal of cross-validation analysis is to obtain an
>> estimate averaged over cross-validation folds and optionally
>> statistical error for this estimate. I don't know what you are trying
>> to accomplish by taking the maximal value across the folds instead of
>> computing the average. I don't know how you obtained "indices". I
>> don't know what is "rocplot". I don't know what svmclassify you are
>> using since svmclassify in Matlab does not have a 2nd output.
>>
>> If you have Statistics Toolbox, perhaps this is what you want
>> (substitute whatever you like for classregtree training and prediction):
>>
>> load ionosphere;
>> N = size(X,1);
>> cv = cvpartition(Y,'kfold',3);
>> score = zeros(N,1);
>>
>> % Loop over folds
>> for k=1:3
>> % Get training and test indices
>> istrain = training(cv,k);
>> istest = test(cv,k);
>>
>> % Train
>> t = classregtree(X(istrain,:),Y(istrain));
>>
>> % Get predicted class probability
>> [~,node] = eval(t,X(istest,:));
>> probs = classprob(t,node);
>>
>> % Extract probability for the right class
>> [~,pos] = ismember('g',t.classname);
>> score(istest) = probs(:,pos);
>> end
>>
>> [fpr,tpr,~,auc] = perfcurve(Y,score,'g');
>> plot(fpr,tpr);
>>
>>
>> If you have Matlab 10a, you can estimate confidence bounds for the ROC
>> curve using perfcurve:
>>
>> load ionosphere;
>> cv = cvpartition(Y,'kfold',3);
>> YPerFold = cell(3,1);
>> scorePerFold = cell(3,1);
>> for k=1:3
>> istrain = training(cv,k);
>> istest = test(cv,k);
>> t = classregtree(X(istrain,:),Y(istrain));
>> [~,node] = eval(t,X(istest,:));
>> probs = classprob(t,node);
>> [~,pos] = ismember('g',t.classname);
>> YPerFold{k} = Y(istest);
>> scorePerFold{k} = probs(:,pos);
>> end
>> [fpr,tpr,~,auc] = perfcurve(YPerFold,scorePerFold,'g','xvals','all');
>> errorbar(fpr,tpr(:,1),tpr(:,2)-tpr(:,1),tpr(:,3)-tpr(:,1)); % plot
>> with errors
>>
>> I would also recommend using more than 3 folds. Usually people choose
>> 5 or 10.
>>
>> -Ilya
>
>
>
> Thank you so much for your kindly help...
>
> but i think you are using newer version than i do.
>
> i am using R2007b what version do you use?? so i can test your function
> and try your idea then.
>
> thank you again for your appreciated help..

perfcurve was introduced in 9a.

From: Mohammad A. Mezher on 24 Apr 2010 01:34

Ilya Narsky <inarsky(a)mathworks.com> wrote in message <4BB93C39.8040200(a)mathworks.com>...
> Mohammad A. Mezher wrote:
> > "Ilya Narsky" <inarsky(a)mathworks.com> wrote in message
> > <hp29kh$1nf$1(a)fred.mathworks.com>...
> >>
> >> "Mohammad" <mohabedalgani(a)yahoo.com> wrote in message
> >> news:73565061.477272.1270035717113.JavaMail.root(a)gallium.mathforum.org...
> >> > hi,
> >> >
> >> > please look at the AUROC following function. The idea here is to use
> >> the > following function with a three crossvalidation procedure.
> >> >
> >> > The AUROC is:
> >> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >> > function auc=AUROC
> >> > for i = 1:3 % 3 cross validation
> >> > test = (indices == i); train = ~test;
> >> > svmStruct = svmtrain(A(train,:
> >> ),groups(train),'Kernel_Function',
> >> > Kernel_FunctionValue,'Method', MethodValue);
> >> > ytest=d(test);
> >> > [classes,ypred] = svmclassify(svmStruct,A(test,: ));
> >> > classperf(cp,classes,test);
> >> > [AUC(:,i),tpr(:,i),fpr(:,i)]= rocplot(ypred,ytest);
> >> > end
> >> >
> >> > %%%% To Compute the Area Under Roc Curve
> >> > [auc,indx]=max(AUC);
> >> > TP(:,1)=tpr(:,indx);
> >> > FP(:,1)=fpr(:,indx);
> >> >
> >> >
> >> > end % end AUROC
> >> > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> >> >
> >> > Is the AUROC 3 croos-validation function here right or wrong??? i am
> >> > returning the maximum auc value.
> >> >
> >> > thank you in advance...
> >> >
> >>
> >> Normally, the goal of cross-validation analysis is to obtain an
> >> estimate averaged over cross-validation folds and optionally
> >> statistical error for this estimate. I don't know what you are trying
> >> to accomplish by taking the maximal value across the folds instead of
> >> computing the average. I don't know how you obtained "indices". I
> >> don't know what is "rocplot". I don't know what svmclassify you are
> >> using since svmclassify in Matlab does not have a 2nd output.
> >>
> >> If you have Statistics Toolbox, perhaps this is what you want
> >> (substitute whatever you like for classregtree training and prediction):
> >>
> >> load ionosphere;
> >> N = size(X,1);
> >> cv = cvpartition(Y,'kfold',3);
> >> score = zeros(N,1);
> >>
> >> % Loop over folds
> >> for k=1:3
> >> % Get training and test indices
> >> istrain = training(cv,k);
> >> istest = test(cv,k);
> >>
> >> % Train
> >> t = classregtree(X(istrain,:),Y(istrain));
> >>
> >> % Get predicted class probability
> >> [~,node] = eval(t,X(istest,:));
> >> probs = classprob(t,node);
> >>
> >> % Extract probability for the right class
> >> [~,pos] = ismember('g',t.classname);
> >> score(istest) = probs(:,pos);
> >> end
> >>
> >> [fpr,tpr,~,auc] = perfcurve(Y,score,'g');
> >> plot(fpr,tpr);
> >>
> >>
> >> If you have Matlab 10a, you can estimate confidence bounds for the ROC
> >> curve using perfcurve:
> >>
> >> load ionosphere;
> >> cv = cvpartition(Y,'kfold',3);
> >> YPerFold = cell(3,1);
> >> scorePerFold = cell(3,1);
> >> for k=1:3
> >> istrain = training(cv,k);
> >> istest = test(cv,k);
> >> t = classregtree(X(istrain,:),Y(istrain));
> >> [~,node] = eval(t,X(istest,:));
> >> probs = classprob(t,node);
> >> [~,pos] = ismember('g',t.classname);
> >> YPerFold{k} = Y(istest);
> >> scorePerFold{k} = probs(:,pos);
> >> end
> >> [fpr,tpr,~,auc] = perfcurve(YPerFold,scorePerFold,'g','xvals','all');
> >> errorbar(fpr,tpr(:,1),tpr(:,2)-tpr(:,1),tpr(:,3)-tpr(:,1)); % plot
> >> with errors
> >>
> >> I would also recommend using more than 3 folds. Usually people choose
> >> 5 or 10.
> >>
> >> -Ilya
> >
> >
> >
> > Thank you so much for your kindly help...
> >
> > but i think you are using newer version than i do.
> >
> > i am using R2007b what version do you use?? so i can test your function
> > and try your idea then.
> >
> > thank you again for your appreciated help..
>
> perfcurve was introduced in 9a.

The code has an error:
??? Error using ==> classregtree.classprob at 18
The CLASSPROB method is not available for regression trees.

since the function
t = classregtree(X(istrain,:),Y(istrain));

return a regression tree value

kindly, is there any other way to compute the confidence value (the score) for the perfcurve function???

|
Pages: 1
Prev: handling two figures in GUI
Next: WBC Cell Segmentation