From: Moussa Barhoum on
Hey all,

i have been searching for a solution to fit my data to an F-distribution.
I would love to do the same i could do using the Distribution Fitting Tool display my data as a PDF and then fit it to an F-distribution (obtaining to degrees of freedom). To make things more complicated, i would love to force the fit to keep the 2 degree of freedom parameters the same.

I tried MLE fitting, but it would give me this error message:

??? Error using ==> mlecustom>llf_pdfcdf at 431
The PDF function returned NaN or infinite values.

Error in ==> fminsearch at 326
x(:) = xe; fxe = funfcn(x,varargin{:});

Error in ==> mlecustom at 176
[phat,nll,err,output] = ...

Error in ==> mle at 235
phat = mlecustom(data,varargin{:});

my data:

b =

3.0063
5.3973
1.0552
1.6775
3.5941
3.6745
1.5974
5.4428
1.5753
1.0681
2.9515
3.3556
3.9873
1.9885
4.9261
2.5978
1.3383
1.4387
1.7563
3.9649
10.0926
8.5962
2.8598
1.1136
3.3386
9.0662
2.3405
1.1929
1.2016
10.2392
4.5267
3.4411
1.0820
2.3101
1.1701
1.5227
2.2722
1.5916
2.3553
1.0906
1.7549
5.5435
1.4495
3.6016
2.2507
1.8196
1.0098
4.2446
1.3948
5.8286
2.9181
2.0230
1.4193
2.0842
4.3030
15.1182
4.1982
2.5606
1.5692
1.5524
2.0650
4.8211
1.0712
2.0283
3.5567
1.8303
1.1555
10.2633
1.6645
5.0301
1.6294
1.3119
1.1638
1.1842
1.6109
7.2934
1.1922
1.5066
12.7246
1.5799
10.9940
2.2119
2.4862
11.4594
3.6319
8.3397
1.9224
1.0496
1.2648
1.1199
4.3906
13.3621
1.0618
1.0076
1.3702
2.8932
1.4346
1.0468
3.3334
1.3164
1.1577
1.2256
7.6636
1.4186
1.2067
6.6799
1.1489
3.6273
1.0514
5.1709
1.0619
3.3850
1.6954
3.8940
2.2907
1.5253
1.0088
4.7113
2.0798
1.0519
1.1758
2.9757
11.0018
1.3998
2.4047
9.5845
1.0758
1.4117
1.3969
1.5155
1.9979
4.0829
2.3710
3.4140
1.0172
1.9700
4.1774
1.2387
1.6544
3.4779
3.1943
4.9337
1.5218
1.0444
1.9940
5.1685
1.4844
4.4642
1.0124
3.7596
1.8809
1.2140
1.5717
2.4207
1.9130
3.1338
1.5042
1.6834
1.8892
1.5878
2.6965
9.9786
1.8745
3.2760
1.0092
1.9928
9.7287
1.1087
1.2216
8.2545
3.8794
5.3150
1.4164
2.3002
10.1046
1.4769
2.2194
8.7748
6.4692
1.5215
3.2907
2.7151
5.5717
1.7217
1.3388
1.9332
1.3439
3.3561
2.1242
2.0960
1.7833
1.5619
1.0352
1.9530
1.0113
1.0267
3.9088
2.8136
1.7519
1.5455
3.7940
1.5761
2.7004
2.6236
1.1345
1.3922
1.1705
2.8878
1.3313
3.0600
1.7059
1.6310
5.2760
1.6421
4.9022
1.4867
2.2828
2.3537
1.4587
6.5448
5.8294
3.6644
5.3389
2.1525
4.5845
2.0876
3.0509
1.0992
3.4672
2.6940
1.0265
1.2197
1.4134
1.6266
1.9842
8.2104
3.8623
3.7944
4.2568
1.7242
1.5089
2.0093
2.5877
2.1008
4.8051
5.1881
1.2154
2.6426
1.7280
1.7053
1.1350
2.1602
1.7455
4.2522
1.7289
2.2418
1.6276
1.5583
2.9968
1.1985
1.5961
1.5977
1.1536
1.5424
1.0916
1.3234
1.5873
2.6404
1.0956
2.5411
3.6787
1.7564
4.5035
3.3190
2.8934
2.4553
2.8414
4.6595
1.4793
2.0721
1.7101
1.6586
2.5097
1.4140
2.6310
1.4766
14.2639
1.5617
1.9093
2.6169
1.8041
1.8104
1.0404
1.1985
1.4212
20.4357
3.8799
1.3687
1.6414
1.2741
1.1354
1.2808
2.1849
2.1852
1.1872
2.2219
3.0806
2.0246
4.7895
2.7151
1.2931
4.9612
2.3288
1.9256
2.7507
1.3335
14.0433
1.0122
13.2992
2.2247
5.7857
2.3574
1.0075
6.1835
4.8773
5.8183
2.1737
1.2555
1.0813
4.1982
1.0162
3.0334
2.8303
1.1711
2.2886
5.7479
1.0376
1.4643
1.0089
6.4582
2.6487
3.5618
1.9456
1.1254
3.7649
1.4431
1.6899
2.3837
1.9032
1.1898
1.2777
2.1223
1.1621
1.7742
2.2746
1.3284
1.0682
2.6872
1.4305
4.0843
1.4669
3.0386
2.0488
2.5051
1.3231
3.4060
1.1859
2.7741
1.2419
1.2040
6.2655
1.2749
3.4853
3.5561
1.5741
1.6568
4.4462
3.4552
8.9330
2.0193
5.9975
4.0043
4.5683
1.6891
1.4732
2.8826
1.2949
4.5704
4.2024
2.3082
2.4348
1.3761
1.2833
2.1940
4.2712
7.5892
2.0461
2.4730
1.4636
1.2354
5.3211
1.0773
1.3996
4.2355
1.8159
1.0875
1.0056
2.5858
1.7105
2.1468
2.8348
1.2811
3.1811
1.1485
1.0836
2.5605
1.3760
2.9112
3.3243
9.3282
3.5485
1.5231
1.0794
1.2554
1.0971
3.1015
2.1030
1.3034
12.1383
1.4679
2.9059
4.7639
11.1725
2.8186
3.3342
2.6096
5.2603
1.0498
1.2840
1.1448
7.1427
1.6490
2.4519
1.4078
1.7434
1.6411
10.3904
3.1037
1.5482
6.5284
5.6473
1.3858
16.2491
1.8034
1.2660
1.1393
1.1687
1.0976
1.0667
1.8605
6.8957
1.0341
3.0723
1.2599
5.0623
1.5956
1.4736
4.2538
2.1615
1.4789
4.2866
2.5333
3.5132
1.6001
1.0609
1.3795
1.5086
1.1066
1.1065
1.0153
2.5859
7.2004
8.7445
2.0378
6.1006
14.5152
1.1064
1.5117
2.3023
3.2963
7.6223
1.2412
1.0378
6.6257
4.4081
3.9592
3.1699
4.5355
2.6683
2.1485
2.7626
1.9989
2.3426
8.8858
1.3414
1.7459
1.0035
2.3986
7.8997
2.2138
1.1478
2.1922
2.7406
1.6945
1.1209
2.5263
1.2362
2.1322
5.2344
2.3331
18.9530
5.7575
4.8780
1.3266
1.5635
1.2728
1.1054
5.9531
1.3115
2.8523
1.2610
1.1859
2.3006
1.2582
1.1734
1.4598
2.0073
2.1957
2.8566
1.3706
2.6807
3.5602
4.0656
2.3522
2.2266
5.3302
3.8552
1.0296
2.2315
2.2658
1.3523
1.9031
1.0939
1.1283
3.9879
3.0232
2.4028
1.1135
1.7232
3.7223
5.9015
1.9480
1.1424
1.3817
1.9370
1.1703
1.4074
2.5975
10.1198
1.0325
7.9041
1.3804
1.1038
2.9940
8.1914
1.3489
3.4340
3.4361
3.9514
2.3127
1.0409
1.1079
1.4042
3.4578
7.2064
3.1389
2.3010
1.9244
1.5123
2.7814
1.0780
3.4317
2.4400
1.5116
1.1092
1.0664
4.1464
1.2272
5.6880
1.2880
1.0882
2.1342
2.0373
3.6857
1.5847
1.7116
2.5311
3.3255
1.1743
4.2912
2.8836
1.1946
2.5169
2.5507
1.6544
1.4124
2.2226
2.7670
1.0934
2.7485
1.6115
2.4143
1.1746
6.6386
9.6568
5.1013
2.3712
1.8713
1.1373
1.9534
3.3031
1.2405
1.4846
1.7456
1.6261
4.7740
1.8277
2.0302
2.4660
3.2929
1.6315
3.5554
1.1019
2.6635
2.2756
1.1551
1.2871
1.1161
3.7482
1.3100
1.1183
4.0004
1.6425
5.6705
4.7382
1.2645
4.9773
8.0379
4.0052
1.7477
2.4140
1.7698
2.9134
5.5229
2.4251
1.6696
2.3997
3.9916
1.8297
1.0820
2.5358
6.7377
12.6485
1.6655
1.5915
2.3683
3.2911
2.6525
3.4353
4.1513
1.1120
15.1050
1.4915
2.2912
5.0731
1.7680
4.4162
3.2061
2.3737
4.6353
1.8969
2.7488
15.1133
3.9849
6.1440
1.8334
1.0125
5.2969
8.0987
3.5448
10.1917
1.2723
5.6972
4.4218
3.0078
1.7220
8.7575
1.0101
10.3482
4.7143
2.4380
3.0903
6.7566
2.1465
1.3778
6.7021
6.8426
1.1269
2.3488
2.3145
1.6658
1.3058
1.6265
3.8186
2.0958
1.0115
5.6438
1.9468
1.2762
2.7924
1.3359
1.5006
5.2435
2.8491
1.0588
2.4957
1.1120
1.0281
5.1564
1.2091
5.1210
11.2011
3.3104
1.2493
1.1203
5.5755
5.2399
1.1991
1.0558
3.4265
3.8597
3.0213
3.4419
5.7921
7.6505
1.5331
1.8074
2.0403
1.1053
1.4399
4.6942
1.4618
4.5182
1.6347
8.0611
2.0467
1.5694
1.4403
1.7907
1.4265
1.0726
8.8064
3.9824
1.1409
1.0033
6.1409
1.9513
1.6137
2.3469
6.1116
1.4222
1.3983
2.9303
1.4578
4.5749
1.1750
4.4529
1.7172
10.3955
2.7795
2.0754
3.1227
1.8519
1.7010
5.0306
1.8115
1.3788
1.0552
1.1605
1.4873
1.0999
1.6806
2.6331
7.8586
1.2049
2.2204
2.3788
2.2085
1.0957
1.4800
1.0095
1.2017
2.0780
11.1652
1.9712
1.5707
1.5820
1.4035
1.2840
1.3367
3.9186
2.7160
3.3808
1.2095
3.3440
1.2694
1.7087
1.7701
1.6164
7.6547
14.4568
2.3038
2.8932
1.8561
1.1182
3.3905
1.3650
1.5586
2.1001
7.1835
2.1817
2.1016
1.3963
1.2017
21.4800
4.2406
2.3908
2.5159
1.2335
7.3399
1.5557
1.5868
1.3030
2.2733
3.3151
1.8101
1.6787
2.3988
1.7583
1.0905
2.8649
1.7582
1.1262
7.0251
1.4533
7.8599
3.4036
2.4349
5.9407
1.3959
2.4596
1.7366
1.3921
1.0978
1.0791
2.5105
2.6294
1.3208
1.0661
1.3313
6.2383
1.7393
1.3919
3.7865
2.4333
1.0756
1.5104
3.0947
8.4641
2.7220
1.4601
2.1853
3.8645
2.3319
4.3185
2.7337
1.7242
17.5223
2.3959
3.5398
2.2890
2.2793
5.9532
4.7399
3.2730
1.5550
3.9095
3.8731
5.5985
1.0608
1.1588
1.5390
4.5483
1.6925
3.0990
8.6467
5.0669
2.4063
4.2160
5.9967
1.6885
7.5544
2.0140
6.0339
3.3936
1.0568
2.6664
2.0301
2.3880
1.2746
3.5016
1.4748
4.4574
1.2662
3.2034
2.2129
2.0785
1.6571
2.6722
4.1859
1.2460
1.6050
1.5703
2.4942
2.0201
4.4738
1.5804
1.7623
2.3612
1.3554
1.2003
1.6590
2.5830
4.4552
2.4651
1.6763
1.6722
2.7190
1.7907
1.5158
2.4801
2.7694
2.7829
1.1356
2.4733
2.3095
4.2510
1.4044
1.1556
1.2448
3.7516
4.3626
1.8070
8.2951
1.7903
3.0618
3.2557
1.4558
1.2598
2.6786
1.4386
1.7213
5.5031
2.1036
6.1684
2.8314
2.1675
3.5016
1.3793
4.1379
1.3393
1.6275
2.2235
1.3345
1.8641
2.8598
1.6595
3.0378
12.1554

mean(b)=3.0154
var(b)=6.8362

Now, if i divide my data randomly by let say 2.76 and then use mle according to
mle(b/2.76,'pdf',@fpdf,'start',1:2)
it would give me

ans =

10.1411 9.9884

which is good. Except that i don't know how to fit my data directly or what the meaning of this factor 2.76 is.
Any help or hint is appreciated. Thank you very much in advance!

Moussa
From: Tom Lane on
> i have been searching for a solution to fit my data to an F-distribution.
> I would love to do the same i could do using the Distribution Fitting Tool
> display my data as a PDF and then fit it to an F-distribution (obtaining
> to degrees of freedom). To make things more complicated, i would love to
> force the fit to keep the 2 degree of freedom parameters the same.
> I tried MLE fitting, but it would give me this error message:
> ??? Error using ==> mlecustom>llf_pdfcdf at 431
> The PDF function returned NaN or infinite values.
....
> mean(b)=3.0154
> var(b)=6.8362
>
> Now, if i divide my data randomly by let say 2.76 and then use mle
> according to mle(b/2.76,'pdf',@fpdf,'start',1:2)
> it would give me
> ans =
>
> 10.1411 9.9884
>
> which is good. Except that i don't know how to fit my data directly or
> what the meaning of this factor 2.76 is.

Moussa, the F distribution is not very flexible. There's no reason to expect
it will fit any set of data. Is there a reason to expect it would fit yours?

The scaled F distribution is somewhat more flexible. You've found that if
you scale the data by 2.76, you can get a fit, though you report only that
the fitting process completes, not that it produces something that is a good
match to the data. I would judge it not to be a good fit, based on this:

p = mle(b/2.76, 'pdf', @fpdf, 'start',[1 10])
ecdf(b/2.76)
xx = linspace(0,10); line(xx,fcdf(xx,p(1),p(2)),'color','r')

Your question about how to have the two d.f. parameters be the same is easy:

p = mle(b/2.76, 'pdf', @(x,p) fpdf(x,p,p), 'start', 1)

But the other question is hard. For some distribution shapes that do not
look like any F distributions, the optimizer is going to encounter trouble
as it tries to match the data by moving the parameters to the edge of their
valid region or beyond.

-- Tom


From: Moussa Barhoum on
"Tom Lane" <tlaneATmathworksDOTcom(a)nospam.com> wrote in message <i0au2r$p7q$1(a)fred.mathworks.com>...
> > i have been searching for a solution to fit my data to an F-distribution.
> > I would love to do the same i could do using the Distribution Fitting Tool
> > display my data as a PDF and then fit it to an F-distribution (obtaining
> > to degrees of freedom). To make things more complicated, i would love to
> > force the fit to keep the 2 degree of freedom parameters the same.
> > I tried MLE fitting, but it would give me this error message:
> > ??? Error using ==> mlecustom>llf_pdfcdf at 431
> > The PDF function returned NaN or infinite values.
> ...
> > mean(b)=3.0154
> > var(b)=6.8362
> >
> > Now, if i divide my data randomly by let say 2.76 and then use mle
> > according to mle(b/2.76,'pdf',@fpdf,'start',1:2)
> > it would give me
> > ans =
> >
> > 10.1411 9.9884
> >
> > which is good. Except that i don't know how to fit my data directly or
> > what the meaning of this factor 2.76 is.
>
> Moussa, the F distribution is not very flexible. There's no reason to expect
> it will fit any set of data. Is there a reason to expect it would fit yours?
>
> The scaled F distribution is somewhat more flexible. You've found that if
> you scale the data by 2.76, you can get a fit, though you report only that
> the fitting process completes, not that it produces something that is a good
> match to the data. I would judge it not to be a good fit, based on this:
>
> p = mle(b/2.76, 'pdf', @fpdf, 'start',[1 10])
> ecdf(b/2.76)
> xx = linspace(0,10); line(xx,fcdf(xx,p(1),p(2)),'color','r')
>
> Your question about how to have the two d.f. parameters be the same is easy:
>
> p = mle(b/2.76, 'pdf', @(x,p) fpdf(x,p,p), 'start', 1)
>
> But the other question is hard. For some distribution shapes that do not
> look like any F distributions, the optimizer is going to encounter trouble
> as it tries to match the data by moving the parameters to the edge of their
> valid region or beyond.
>
> -- Tom
>
Tom,

thank you very much for the helpful tips. After contemplating about your answers and trying some things out i realized that there is a fundamental problem between fitting my data to an F-distribution and the F-distribution itself!
The F distribution itself starts at 0 at least at the positive x-Axes! However, the F-distribution we usually use is defined by convention as the quotient: LARGERVariance/SMALLERVariance! Which is by definition always greater or equal to 1(for randomly generated numbers this means that the area under the curve from x=0 to 1 is always close to 0.5!!)
Hence, my problem with fitting my data to an F-distribution without knowing this fact.
The right way to do this is to fit the F distribution to my TRUNCATED data.
So now the question still raises of how to fit an F-distribution in which one only considerss the "1 till infinity" part of the F distribution. Very helpful is this link:

http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/customdist1demo.html

So what i did with the same data above for b:

defined a PF for my model according to:
pf_f = @(x,V1,V2) fpdf(x,V1,V2) ./ (1-fcdf(1,V1,V2));
then fitted the data to p1, p2 with p1=p2 according to Tom's formula above

p = mle(b, 'pdf', @(x,p) pf_f(x,p,p), 'start', 1)

and out pops

4.2866

However, i don't know how good of a fit it is since a comparison of PDFs (or CDFs) is not very meaningful if i try to compare my data (which starts at1) with my fit (that still starts at 0).

xx = linspace(0,10); line(xx,pf_f(xx,p(1),p(1)),'color','r')

Note that the Amplitude goes up by factor of 2 since we renormalized our PDF and since .5 is below or equal to 1.

Question 1: How can i tell how good of a fit it is using this approach?

Question 2 would be:

Is there a way to fit my data to an F-distribution, in which i can specify that i would love to fit the F-distribution function starting from x=1. In other words, is there another way to fit my data , knowing the restrictions i have regarding the F-Values(>=1) without renormalizing my truncated data but by modifying the fit routine itself?
Any hints or answers are greatly appreciated!

Thanks,

Moussa
From: Tom Lane on
> The F distribution itself starts at 0 at least at the positive x-Axes!
> However, the F-distribution we usually use is defined by convention as the
> quotient: LARGERVariance/SMALLERVariance! Which is by definition always
> greater or equal to 1(for randomly generated numbers this means that the
> area under the curve from x=0 to 1 is always close to 0.5!!) Hence, my
> problem with fitting my data to an F-distribution without knowing this
> fact. The right way to do this is to fit the F distribution to my
> TRUNCATED data.

Moussa, it is possible that I don't understand, but I believe you don't want
the truncated distribution. Suppose you have A as one variance and B as the
other. The truncated distribution is what you'd want if you took A/B when
A>B, but nothing when A<B. If I understand things, you instead have FF
below, where:

F = A/B
FF = max(A,B) / min(A,B) = max(F, 1/F)

Then FF>=1 always, and for f>1,

FF<f if F<f and F>1/f

You can work this out directly. What follows is a script that attempts to
illustrate the issues.

-- Tom

%% Simulate a regular F value, fit it
v1 = 5;
v2 = 8;
c1 = chi2rnd(v1,1000,1)/v1;
c2 = chi2rnd(v2,1000,1)/v2;
f = c1./c2;
p = mle(f,'pdf',@fpdf,'start',[2 2])

subplot(2,2,1)
ecdf(f)
x = linspace(0,25);
line(x,fcdf(x,p(1),p(2)),'color','r')
title(sprintf('Regular with F(%g,%g)',p))

%% Compute the max-over-min version
cmax = max(c1,c2);
cmin = min(c1,c2);
ff = max(c1,c2)./min(c1,c2);

%% Compare this version to its theoretical cdf
x = linspace(1.0001,25,1000);
Fx = @(x,v1,v2) fcdf(x,v1,v2) - (1-fcdf(x,v2,v1));

subplot(2,2,2);
ecdf(ff)
line(x,Fx(x,v1,v2),'color','r')
title(sprintf('Modified with F(%g,%g)',v1,v2))

%% Find the pdf of this version
fx = @(x,v1,v2) fpdf(x,v1,v2) + fpdf(x,v2,v1);
fks = ksdensity(ff,x,'support',[1 1000]);

subplot(2,2,3);
plot(x,fks)
line(x,fx(x,v1,v2),'color','r')
title(sprintf('Modified with F(%g,%g)',v1,v2))
xlabel('x'); ylabel('f(x)')

%% Try fitting that - symmetric in v1 and v2 so they may be swapped
p = mle(ff,'pdf',fx,'start',[2 2])

subplot(2,2,4);
ecdf(ff)
line(x,Fx(x,p(1),p(2)),'color','r')
title(sprintf('Modified with F(%g,%g)',p))

set(findobj(gcf,'type','axes'),'xlim',[0 10])