Prev: Multiple file generation
Next: Finding Permutations
From: Moussa Barhoum on 27 Jun 2010 17:09 Hey all, i have been searching for a solution to fit my data to an F-distribution. I would love to do the same i could do using the Distribution Fitting Tool display my data as a PDF and then fit it to an F-distribution (obtaining to degrees of freedom). To make things more complicated, i would love to force the fit to keep the 2 degree of freedom parameters the same. I tried MLE fitting, but it would give me this error message: ??? Error using ==> mlecustom>llf_pdfcdf at 431 The PDF function returned NaN or infinite values. Error in ==> fminsearch at 326 x(:) = xe; fxe = funfcn(x,varargin{:}); Error in ==> mlecustom at 176 [phat,nll,err,output] = ... Error in ==> mle at 235 phat = mlecustom(data,varargin{:}); my data: b = 3.0063 5.3973 1.0552 1.6775 3.5941 3.6745 1.5974 5.4428 1.5753 1.0681 2.9515 3.3556 3.9873 1.9885 4.9261 2.5978 1.3383 1.4387 1.7563 3.9649 10.0926 8.5962 2.8598 1.1136 3.3386 9.0662 2.3405 1.1929 1.2016 10.2392 4.5267 3.4411 1.0820 2.3101 1.1701 1.5227 2.2722 1.5916 2.3553 1.0906 1.7549 5.5435 1.4495 3.6016 2.2507 1.8196 1.0098 4.2446 1.3948 5.8286 2.9181 2.0230 1.4193 2.0842 4.3030 15.1182 4.1982 2.5606 1.5692 1.5524 2.0650 4.8211 1.0712 2.0283 3.5567 1.8303 1.1555 10.2633 1.6645 5.0301 1.6294 1.3119 1.1638 1.1842 1.6109 7.2934 1.1922 1.5066 12.7246 1.5799 10.9940 2.2119 2.4862 11.4594 3.6319 8.3397 1.9224 1.0496 1.2648 1.1199 4.3906 13.3621 1.0618 1.0076 1.3702 2.8932 1.4346 1.0468 3.3334 1.3164 1.1577 1.2256 7.6636 1.4186 1.2067 6.6799 1.1489 3.6273 1.0514 5.1709 1.0619 3.3850 1.6954 3.8940 2.2907 1.5253 1.0088 4.7113 2.0798 1.0519 1.1758 2.9757 11.0018 1.3998 2.4047 9.5845 1.0758 1.4117 1.3969 1.5155 1.9979 4.0829 2.3710 3.4140 1.0172 1.9700 4.1774 1.2387 1.6544 3.4779 3.1943 4.9337 1.5218 1.0444 1.9940 5.1685 1.4844 4.4642 1.0124 3.7596 1.8809 1.2140 1.5717 2.4207 1.9130 3.1338 1.5042 1.6834 1.8892 1.5878 2.6965 9.9786 1.8745 3.2760 1.0092 1.9928 9.7287 1.1087 1.2216 8.2545 3.8794 5.3150 1.4164 2.3002 10.1046 1.4769 2.2194 8.7748 6.4692 1.5215 3.2907 2.7151 5.5717 1.7217 1.3388 1.9332 1.3439 3.3561 2.1242 2.0960 1.7833 1.5619 1.0352 1.9530 1.0113 1.0267 3.9088 2.8136 1.7519 1.5455 3.7940 1.5761 2.7004 2.6236 1.1345 1.3922 1.1705 2.8878 1.3313 3.0600 1.7059 1.6310 5.2760 1.6421 4.9022 1.4867 2.2828 2.3537 1.4587 6.5448 5.8294 3.6644 5.3389 2.1525 4.5845 2.0876 3.0509 1.0992 3.4672 2.6940 1.0265 1.2197 1.4134 1.6266 1.9842 8.2104 3.8623 3.7944 4.2568 1.7242 1.5089 2.0093 2.5877 2.1008 4.8051 5.1881 1.2154 2.6426 1.7280 1.7053 1.1350 2.1602 1.7455 4.2522 1.7289 2.2418 1.6276 1.5583 2.9968 1.1985 1.5961 1.5977 1.1536 1.5424 1.0916 1.3234 1.5873 2.6404 1.0956 2.5411 3.6787 1.7564 4.5035 3.3190 2.8934 2.4553 2.8414 4.6595 1.4793 2.0721 1.7101 1.6586 2.5097 1.4140 2.6310 1.4766 14.2639 1.5617 1.9093 2.6169 1.8041 1.8104 1.0404 1.1985 1.4212 20.4357 3.8799 1.3687 1.6414 1.2741 1.1354 1.2808 2.1849 2.1852 1.1872 2.2219 3.0806 2.0246 4.7895 2.7151 1.2931 4.9612 2.3288 1.9256 2.7507 1.3335 14.0433 1.0122 13.2992 2.2247 5.7857 2.3574 1.0075 6.1835 4.8773 5.8183 2.1737 1.2555 1.0813 4.1982 1.0162 3.0334 2.8303 1.1711 2.2886 5.7479 1.0376 1.4643 1.0089 6.4582 2.6487 3.5618 1.9456 1.1254 3.7649 1.4431 1.6899 2.3837 1.9032 1.1898 1.2777 2.1223 1.1621 1.7742 2.2746 1.3284 1.0682 2.6872 1.4305 4.0843 1.4669 3.0386 2.0488 2.5051 1.3231 3.4060 1.1859 2.7741 1.2419 1.2040 6.2655 1.2749 3.4853 3.5561 1.5741 1.6568 4.4462 3.4552 8.9330 2.0193 5.9975 4.0043 4.5683 1.6891 1.4732 2.8826 1.2949 4.5704 4.2024 2.3082 2.4348 1.3761 1.2833 2.1940 4.2712 7.5892 2.0461 2.4730 1.4636 1.2354 5.3211 1.0773 1.3996 4.2355 1.8159 1.0875 1.0056 2.5858 1.7105 2.1468 2.8348 1.2811 3.1811 1.1485 1.0836 2.5605 1.3760 2.9112 3.3243 9.3282 3.5485 1.5231 1.0794 1.2554 1.0971 3.1015 2.1030 1.3034 12.1383 1.4679 2.9059 4.7639 11.1725 2.8186 3.3342 2.6096 5.2603 1.0498 1.2840 1.1448 7.1427 1.6490 2.4519 1.4078 1.7434 1.6411 10.3904 3.1037 1.5482 6.5284 5.6473 1.3858 16.2491 1.8034 1.2660 1.1393 1.1687 1.0976 1.0667 1.8605 6.8957 1.0341 3.0723 1.2599 5.0623 1.5956 1.4736 4.2538 2.1615 1.4789 4.2866 2.5333 3.5132 1.6001 1.0609 1.3795 1.5086 1.1066 1.1065 1.0153 2.5859 7.2004 8.7445 2.0378 6.1006 14.5152 1.1064 1.5117 2.3023 3.2963 7.6223 1.2412 1.0378 6.6257 4.4081 3.9592 3.1699 4.5355 2.6683 2.1485 2.7626 1.9989 2.3426 8.8858 1.3414 1.7459 1.0035 2.3986 7.8997 2.2138 1.1478 2.1922 2.7406 1.6945 1.1209 2.5263 1.2362 2.1322 5.2344 2.3331 18.9530 5.7575 4.8780 1.3266 1.5635 1.2728 1.1054 5.9531 1.3115 2.8523 1.2610 1.1859 2.3006 1.2582 1.1734 1.4598 2.0073 2.1957 2.8566 1.3706 2.6807 3.5602 4.0656 2.3522 2.2266 5.3302 3.8552 1.0296 2.2315 2.2658 1.3523 1.9031 1.0939 1.1283 3.9879 3.0232 2.4028 1.1135 1.7232 3.7223 5.9015 1.9480 1.1424 1.3817 1.9370 1.1703 1.4074 2.5975 10.1198 1.0325 7.9041 1.3804 1.1038 2.9940 8.1914 1.3489 3.4340 3.4361 3.9514 2.3127 1.0409 1.1079 1.4042 3.4578 7.2064 3.1389 2.3010 1.9244 1.5123 2.7814 1.0780 3.4317 2.4400 1.5116 1.1092 1.0664 4.1464 1.2272 5.6880 1.2880 1.0882 2.1342 2.0373 3.6857 1.5847 1.7116 2.5311 3.3255 1.1743 4.2912 2.8836 1.1946 2.5169 2.5507 1.6544 1.4124 2.2226 2.7670 1.0934 2.7485 1.6115 2.4143 1.1746 6.6386 9.6568 5.1013 2.3712 1.8713 1.1373 1.9534 3.3031 1.2405 1.4846 1.7456 1.6261 4.7740 1.8277 2.0302 2.4660 3.2929 1.6315 3.5554 1.1019 2.6635 2.2756 1.1551 1.2871 1.1161 3.7482 1.3100 1.1183 4.0004 1.6425 5.6705 4.7382 1.2645 4.9773 8.0379 4.0052 1.7477 2.4140 1.7698 2.9134 5.5229 2.4251 1.6696 2.3997 3.9916 1.8297 1.0820 2.5358 6.7377 12.6485 1.6655 1.5915 2.3683 3.2911 2.6525 3.4353 4.1513 1.1120 15.1050 1.4915 2.2912 5.0731 1.7680 4.4162 3.2061 2.3737 4.6353 1.8969 2.7488 15.1133 3.9849 6.1440 1.8334 1.0125 5.2969 8.0987 3.5448 10.1917 1.2723 5.6972 4.4218 3.0078 1.7220 8.7575 1.0101 10.3482 4.7143 2.4380 3.0903 6.7566 2.1465 1.3778 6.7021 6.8426 1.1269 2.3488 2.3145 1.6658 1.3058 1.6265 3.8186 2.0958 1.0115 5.6438 1.9468 1.2762 2.7924 1.3359 1.5006 5.2435 2.8491 1.0588 2.4957 1.1120 1.0281 5.1564 1.2091 5.1210 11.2011 3.3104 1.2493 1.1203 5.5755 5.2399 1.1991 1.0558 3.4265 3.8597 3.0213 3.4419 5.7921 7.6505 1.5331 1.8074 2.0403 1.1053 1.4399 4.6942 1.4618 4.5182 1.6347 8.0611 2.0467 1.5694 1.4403 1.7907 1.4265 1.0726 8.8064 3.9824 1.1409 1.0033 6.1409 1.9513 1.6137 2.3469 6.1116 1.4222 1.3983 2.9303 1.4578 4.5749 1.1750 4.4529 1.7172 10.3955 2.7795 2.0754 3.1227 1.8519 1.7010 5.0306 1.8115 1.3788 1.0552 1.1605 1.4873 1.0999 1.6806 2.6331 7.8586 1.2049 2.2204 2.3788 2.2085 1.0957 1.4800 1.0095 1.2017 2.0780 11.1652 1.9712 1.5707 1.5820 1.4035 1.2840 1.3367 3.9186 2.7160 3.3808 1.2095 3.3440 1.2694 1.7087 1.7701 1.6164 7.6547 14.4568 2.3038 2.8932 1.8561 1.1182 3.3905 1.3650 1.5586 2.1001 7.1835 2.1817 2.1016 1.3963 1.2017 21.4800 4.2406 2.3908 2.5159 1.2335 7.3399 1.5557 1.5868 1.3030 2.2733 3.3151 1.8101 1.6787 2.3988 1.7583 1.0905 2.8649 1.7582 1.1262 7.0251 1.4533 7.8599 3.4036 2.4349 5.9407 1.3959 2.4596 1.7366 1.3921 1.0978 1.0791 2.5105 2.6294 1.3208 1.0661 1.3313 6.2383 1.7393 1.3919 3.7865 2.4333 1.0756 1.5104 3.0947 8.4641 2.7220 1.4601 2.1853 3.8645 2.3319 4.3185 2.7337 1.7242 17.5223 2.3959 3.5398 2.2890 2.2793 5.9532 4.7399 3.2730 1.5550 3.9095 3.8731 5.5985 1.0608 1.1588 1.5390 4.5483 1.6925 3.0990 8.6467 5.0669 2.4063 4.2160 5.9967 1.6885 7.5544 2.0140 6.0339 3.3936 1.0568 2.6664 2.0301 2.3880 1.2746 3.5016 1.4748 4.4574 1.2662 3.2034 2.2129 2.0785 1.6571 2.6722 4.1859 1.2460 1.6050 1.5703 2.4942 2.0201 4.4738 1.5804 1.7623 2.3612 1.3554 1.2003 1.6590 2.5830 4.4552 2.4651 1.6763 1.6722 2.7190 1.7907 1.5158 2.4801 2.7694 2.7829 1.1356 2.4733 2.3095 4.2510 1.4044 1.1556 1.2448 3.7516 4.3626 1.8070 8.2951 1.7903 3.0618 3.2557 1.4558 1.2598 2.6786 1.4386 1.7213 5.5031 2.1036 6.1684 2.8314 2.1675 3.5016 1.3793 4.1379 1.3393 1.6275 2.2235 1.3345 1.8641 2.8598 1.6595 3.0378 12.1554 mean(b)=3.0154 var(b)=6.8362 Now, if i divide my data randomly by let say 2.76 and then use mle according to mle(b/2.76,'pdf',@fpdf,'start',1:2) it would give me ans = 10.1411 9.9884 which is good. Except that i don't know how to fit my data directly or what the meaning of this factor 2.76 is. Any help or hint is appreciated. Thank you very much in advance! Moussa
From: Tom Lane on 28 Jun 2010 15:44 > i have been searching for a solution to fit my data to an F-distribution. > I would love to do the same i could do using the Distribution Fitting Tool > display my data as a PDF and then fit it to an F-distribution (obtaining > to degrees of freedom). To make things more complicated, i would love to > force the fit to keep the 2 degree of freedom parameters the same. > I tried MLE fitting, but it would give me this error message: > ??? Error using ==> mlecustom>llf_pdfcdf at 431 > The PDF function returned NaN or infinite values. .... > mean(b)=3.0154 > var(b)=6.8362 > > Now, if i divide my data randomly by let say 2.76 and then use mle > according to mle(b/2.76,'pdf',@fpdf,'start',1:2) > it would give me > ans = > > 10.1411 9.9884 > > which is good. Except that i don't know how to fit my data directly or > what the meaning of this factor 2.76 is. Moussa, the F distribution is not very flexible. There's no reason to expect it will fit any set of data. Is there a reason to expect it would fit yours? The scaled F distribution is somewhat more flexible. You've found that if you scale the data by 2.76, you can get a fit, though you report only that the fitting process completes, not that it produces something that is a good match to the data. I would judge it not to be a good fit, based on this: p = mle(b/2.76, 'pdf', @fpdf, 'start',[1 10]) ecdf(b/2.76) xx = linspace(0,10); line(xx,fcdf(xx,p(1),p(2)),'color','r') Your question about how to have the two d.f. parameters be the same is easy: p = mle(b/2.76, 'pdf', @(x,p) fpdf(x,p,p), 'start', 1) But the other question is hard. For some distribution shapes that do not look like any F distributions, the optimizer is going to encounter trouble as it tries to match the data by moving the parameters to the edge of their valid region or beyond. -- Tom
From: Moussa Barhoum on 30 Jun 2010 20:55 "Tom Lane" <tlaneATmathworksDOTcom(a)nospam.com> wrote in message <i0au2r$p7q$1(a)fred.mathworks.com>... > > i have been searching for a solution to fit my data to an F-distribution. > > I would love to do the same i could do using the Distribution Fitting Tool > > display my data as a PDF and then fit it to an F-distribution (obtaining > > to degrees of freedom). To make things more complicated, i would love to > > force the fit to keep the 2 degree of freedom parameters the same. > > I tried MLE fitting, but it would give me this error message: > > ??? Error using ==> mlecustom>llf_pdfcdf at 431 > > The PDF function returned NaN or infinite values. > ... > > mean(b)=3.0154 > > var(b)=6.8362 > > > > Now, if i divide my data randomly by let say 2.76 and then use mle > > according to mle(b/2.76,'pdf',@fpdf,'start',1:2) > > it would give me > > ans = > > > > 10.1411 9.9884 > > > > which is good. Except that i don't know how to fit my data directly or > > what the meaning of this factor 2.76 is. > > Moussa, the F distribution is not very flexible. There's no reason to expect > it will fit any set of data. Is there a reason to expect it would fit yours? > > The scaled F distribution is somewhat more flexible. You've found that if > you scale the data by 2.76, you can get a fit, though you report only that > the fitting process completes, not that it produces something that is a good > match to the data. I would judge it not to be a good fit, based on this: > > p = mle(b/2.76, 'pdf', @fpdf, 'start',[1 10]) > ecdf(b/2.76) > xx = linspace(0,10); line(xx,fcdf(xx,p(1),p(2)),'color','r') > > Your question about how to have the two d.f. parameters be the same is easy: > > p = mle(b/2.76, 'pdf', @(x,p) fpdf(x,p,p), 'start', 1) > > But the other question is hard. For some distribution shapes that do not > look like any F distributions, the optimizer is going to encounter trouble > as it tries to match the data by moving the parameters to the edge of their > valid region or beyond. > > -- Tom > Tom, thank you very much for the helpful tips. After contemplating about your answers and trying some things out i realized that there is a fundamental problem between fitting my data to an F-distribution and the F-distribution itself! The F distribution itself starts at 0 at least at the positive x-Axes! However, the F-distribution we usually use is defined by convention as the quotient: LARGERVariance/SMALLERVariance! Which is by definition always greater or equal to 1(for randomly generated numbers this means that the area under the curve from x=0 to 1 is always close to 0.5!!) Hence, my problem with fitting my data to an F-distribution without knowing this fact. The right way to do this is to fit the F distribution to my TRUNCATED data. So now the question still raises of how to fit an F-distribution in which one only considerss the "1 till infinity" part of the F distribution. Very helpful is this link: http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/customdist1demo.html So what i did with the same data above for b: defined a PF for my model according to: pf_f = @(x,V1,V2) fpdf(x,V1,V2) ./ (1-fcdf(1,V1,V2)); then fitted the data to p1, p2 with p1=p2 according to Tom's formula above p = mle(b, 'pdf', @(x,p) pf_f(x,p,p), 'start', 1) and out pops 4.2866 However, i don't know how good of a fit it is since a comparison of PDFs (or CDFs) is not very meaningful if i try to compare my data (which starts at1) with my fit (that still starts at 0). xx = linspace(0,10); line(xx,pf_f(xx,p(1),p(1)),'color','r') Note that the Amplitude goes up by factor of 2 since we renormalized our PDF and since .5 is below or equal to 1. Question 1: How can i tell how good of a fit it is using this approach? Question 2 would be: Is there a way to fit my data to an F-distribution, in which i can specify that i would love to fit the F-distribution function starting from x=1. In other words, is there another way to fit my data , knowing the restrictions i have regarding the F-Values(>=1) without renormalizing my truncated data but by modifying the fit routine itself? Any hints or answers are greatly appreciated! Thanks, Moussa
From: Tom Lane on 1 Jul 2010 15:32 > The F distribution itself starts at 0 at least at the positive x-Axes! > However, the F-distribution we usually use is defined by convention as the > quotient: LARGERVariance/SMALLERVariance! Which is by definition always > greater or equal to 1(for randomly generated numbers this means that the > area under the curve from x=0 to 1 is always close to 0.5!!) Hence, my > problem with fitting my data to an F-distribution without knowing this > fact. The right way to do this is to fit the F distribution to my > TRUNCATED data. Moussa, it is possible that I don't understand, but I believe you don't want the truncated distribution. Suppose you have A as one variance and B as the other. The truncated distribution is what you'd want if you took A/B when A>B, but nothing when A<B. If I understand things, you instead have FF below, where: F = A/B FF = max(A,B) / min(A,B) = max(F, 1/F) Then FF>=1 always, and for f>1, FF<f if F<f and F>1/f You can work this out directly. What follows is a script that attempts to illustrate the issues. -- Tom %% Simulate a regular F value, fit it v1 = 5; v2 = 8; c1 = chi2rnd(v1,1000,1)/v1; c2 = chi2rnd(v2,1000,1)/v2; f = c1./c2; p = mle(f,'pdf',@fpdf,'start',[2 2]) subplot(2,2,1) ecdf(f) x = linspace(0,25); line(x,fcdf(x,p(1),p(2)),'color','r') title(sprintf('Regular with F(%g,%g)',p)) %% Compute the max-over-min version cmax = max(c1,c2); cmin = min(c1,c2); ff = max(c1,c2)./min(c1,c2); %% Compare this version to its theoretical cdf x = linspace(1.0001,25,1000); Fx = @(x,v1,v2) fcdf(x,v1,v2) - (1-fcdf(x,v2,v1)); subplot(2,2,2); ecdf(ff) line(x,Fx(x,v1,v2),'color','r') title(sprintf('Modified with F(%g,%g)',v1,v2)) %% Find the pdf of this version fx = @(x,v1,v2) fpdf(x,v1,v2) + fpdf(x,v2,v1); fks = ksdensity(ff,x,'support',[1 1000]); subplot(2,2,3); plot(x,fks) line(x,fx(x,v1,v2),'color','r') title(sprintf('Modified with F(%g,%g)',v1,v2)) xlabel('x'); ylabel('f(x)') %% Try fitting that - symmetric in v1 and v2 so they may be swapped p = mle(ff,'pdf',fx,'start',[2 2]) subplot(2,2,4); ecdf(ff) line(x,Fx(x,p(1),p(2)),'color','r') title(sprintf('Modified with F(%g,%g)',p)) set(findobj(gcf,'type','axes'),'xlim',[0 10])
|
Pages: 1 Prev: Multiple file generation Next: Finding Permutations |