Prev: need to solve double integration
Next: Matlab/Carsim solver DLL not found, invalid port dimensions
From: Samoline1 Linke on 26 Oct 2009 05:06 "Wayne King" <wmkingty(a)gmail.com> wrote in message <hbpr3j$p0r$1(a)fred.mathworks.com>... > "Samoline1 Linke" <fixed-term.jehandad.khan(a)de.bosch.com> wrote in message <hbpo3u$4qb$1(a)fred.mathworks.com>... > > "Wayne King" <wmkingty(a)gmail.com> wrote in message <hbpk0q$9pj$1(a)fred.mathworks.com>... > > > "Samoline1 Linke" <fixed-term.jehandad.khan(a)de.bosch.com> wrote in message <hbovqb$5s6$1(a)fred.mathworks.com>... > > > > "Wayne King" <wmkingty(a)gmail.com> wrote in message <hbnbmv$gft$1(a)fred.mathworks.com>... > > > > > "Samoline1 Linke" <fixed-term.jehandad.khan(a)de.bosch.com> wrote in message <hbn8e0$31a$1(a)fred.mathworks.com>... > > > > > > hi, > > > > > > > > > > > > as mentioned in example (load count.dat) of topic 'Removing Outliers' > > > > > > > > > > > > The script written for the example count.dat shows that the values more than mean+-3 std of the any vector will be removed. > > > > > > > > > > > > but after removing the outliers if you do boxplot(count) you can still see the outliers in the second box. > > > > > > > > > > > > Why is that so? > > > > > Hi, outliers on a boxplot are defined to be greater than 1.5*IQR above the upper quartile, or less than 1.5*IQR below the lower quartile. > > > > > > > > > > load count.dat > > > > > % just focusing on outliers above the mean > > > > > UpperLimit = mean(count)+3*std(count); > > > > > % returns 108.1109 170.7588 269.6676 > > > > > % but > > > > > quantile(count,.75)+1.5*iqr(count) % requires stats toolbox > > > > > % returns 87.5000 134.2500 205.7500 > > > > > > > > > > so it's possible to have "outliers" on a boxplot that are not more than 3 standard deviations above (or below) the mean. > > > > > > > > > > Hope that helps, > > > > > wayne > > > > > > > > ------------------------- > > > > > > > > What you are saying is correct but still I am asking something else. I am saying > > > > > > > > mean (count) > > > > > > > > 32.0000 46.5417 65.5833 > > > > > > > > std (count) > > > > > > > > 25.3703 41.4057 68.0281 > > > > > > > > so technically speaking any point for first column should be removed which is above or below (32 + 25.3703 (3) , 32 - 25.3703 (3) ) or (108.1, -12.1) . > > > > > > > > But you see after removing outlier still you can see in the boxplot that for the first column there exists an outlier at Row 20 and its value is 114. > > > > > > > > My question is, why this point was not removed as an outlier because it is >108.1. > > > > > > Hi, I'm not sure how you "removed" the outlier, but I don't see that it shows up as an outlier in the boxplot. To take your example, > > > > > > load count.dat > > > countCol1 =count(:,1); % just get 1st column to follow your example > > > boxplot(countCol1) % you see the "outlier" > > > indices = find(countCol1>108.1); %108.1 mean plus 3 std > > > countCol1(indices) = []; > > > boxplot(countCol1) % "outlier" is gone > > > > > > Perhaps you didn't actually remove it? > > > > > > Hope that helps, > > > wayne > > > > ----------------------------------------------- > > > > Thanks for explaining in such a detail but lemme tell you that I followed the same method as mentioned by Matlab help. > > > > may be i type it here... > > > > mu = mean (count) > > > > sigma = std (count) > > > > [n,p] = size(count) > > > > Meanmat = repmat (mu , n , 1) > > > > Sigmamat = repmat (sigma, n, 1) > > > > outliers = abs (count - Meanmat) > 3* Sigmamat > > > > nout = sum (outliers) % shows how many outliers each column has > > > > count( any( outliers, 2), :) = [] % removes the entire row which is good > > > > > > This you can find by typing 'Removing Outliers' in help section. > > > > Thanks for taking interest. I hope we find the root cause > > Hi, the answer lies in my first response, if you execute the code: > > mu = mean (count); > sigma = std (count); > [n,p] = size(count); > Meanmat = repmat (mu , n , 1); > Sigmamat = repmat (sigma, n, 1); > outliers = abs (count - Meanmat) > 3* Sigmamat ; > count( any( outliers, 2), :) = []; > boxplot(count) > > The only "outliers" that appear in the boxplot are in the 2nd column of count, BUT these are NOT outliers if the definition of outlier is greater than 3 standard deviations above the mean. They ARE outliers if your definition of outlier is greater than upperquartile+1.5*IQR. Remember for the 2nd column of data, the mean plus 3 standard deviations is 170.7588. > > If you look at the 2nd column of count, there are no values greater than 170.7588, so if you try to remove data values that exceed 3 standard deviations, you remove nothing. However, the upper quartile + 1.5*IQR for column two is 134.2500, so two values exceed that. The boxplot shows these as outliers. > > wayne# ------------ Thanx Wayne...It seems you are correct. I dont know why I was considering it as greater than mean + 3*std.. But according to your definition, it seems to be true. Another question, which def. for outliers is correct? in matlab they normally remove mean+- 3*std but you say it considers the interquartile range for the outliers. I think its safer to consider the first one as outliers because otherwise you might remove many points as outliers which actually are not... right??
|
Pages: 1 Prev: need to solve double integration Next: Matlab/Carsim solver DLL not found, invalid port dimensions |