Normality test [Mathematica]

Prev: Can Mathematica solve this differential equation ?
Next: Diagonalizing large matrices

From: michael partensky on 13 Feb 2010 05:25

Hi. Ron.
I have applied both the original and the modified functions (see below) to
the data set
dt = {1.2, 1.4, 1.9, 3.1, 3.3, 3.6, 3.8, 4.2, 4.4, 6.1};

The plots are somewhat different. Could you please comment on these
differences.
Especially, why changing the AspectRatio is important?

Which *quantitative *measure of the normality do you prefer?

Thanks.
Michael.

On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman(a)sfu.ca> wrote:

> Here, prompted by off-line conversations, is an improved version of
> qqnorm:
>
> qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a},
> n = Length(a)data; y = Sort(a)data; {y1,y2,y3} = Quartiles@y;
> x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.];
> {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1;
> ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None,
> AspectRatio->((Last(a)y-First@y)/(y3-y1))/((Last(a)x-First@x)/(x3-x1)),
> Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}],
> FrameLabel->{"Standard Normal","Observed Data"}]]
>
> The most notable changes are that the reference line is now drawn so
> that it passes through the joint first and third quartile points, and
> the aspect ratio now varies so that the visual slope of the reference
> line is always approximately 1. Also, the normal scores are now a
> better approximation of the expected order statstics.
>
> On Feb 2, 3:48 am, Ray Koopman <koop...(a)sfu.ca> wrote:
> > On Feb 2, 12:28 am, michael partensky <parten...(a)gmail.com> wrote:
> >> Hi.
> >> I wonder if anybody knows a function similar to qqnorm(data) from
> >> *R*, producing a normal scores plot, or some related tools in M.
> >> for testing normality of data?
> >>
> >> Thanks
> >> Michael Partenskii
> >
> > qqnorm[y_] := Block[
> > {n = Length@y, m = Mean@y, s = StandardDeviation@y, x},
> > x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.];
> > ListPlot[Transpose@{x,Sort@y},
> > PlotRange->All, Frame->True, Axes->None, AspectRatio->1,
> > Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}],
> > FrameLabel->{"Theoretical Standard Normal Quantiles",
> > "Observed Quantiles"}]]
>
>

From: Ray Koopman on 15 Feb 2010 05:46

----- michael partensky <partensky(a)gmail.com> wrote:
> Hi. Ron.
> I have applied both the original and the modified functions (see
> below) to the data set
> dt = {1.2, 1.4, 1.9, 3.1, 3.3, 3.6, 3.8, 4.2, 4.4, 6.1};
>
> The plots are somewhat different. Could you please comment on these
> differences. Especially, why changing the AspectRatio is important?

I think what you're noticing is mostly the effects of scaling and the
way the line is drawn. (The changes to the normal scores are small
and were a self-indulgence: I wanted to see if I could find a better
approximation of the expected order statistics without complicating
things unduly.)

The old version made the plot square, and drew the line through the
joint mean with slope (numeric, not visual) equal to the standard
deviation of the observed data, thus making it approximately equal to
the ratio of the two standard deviations. The line is close to the
best-fit line that would be produced by orthogonal regression, where
the errors are measured perpendicular to the line; and this will be
apparent in the plot to the extent that the ratio of the ranges of
the two variables is close to the ratio of their standard deviations.

The new version takes the same basic approach but applies it to the
middle half of each data set, using the first and third quartiles to
both draw the line and equate the visual plot units. Then the aspect
ratio of the whole plot becomes a function of the data and conveys
information about the lengths of the distributions' tails, even if
the line is not drawn.

Drawing the line through the first and third quartiles seems to be
the de facto standard, but it will not always be the best choice.
In your case it uses points 3 and 8, when it is clear from the plot
that points 4...9 collectively would be more appropriate. In this
case the line misleads the eye.

Here's another approach that you might find interesting: plot the
observed data against multiple samples from a standard normal
distribution.

Show[Graphics(a)Table[Line(a)Transpose@{Sort(a)RandomReal[
NormalDistribution[0,1], Length(a)data], data}, {50}],
PlotRange->All, Frame->True]

Then compare that plot to what you get when you replace the data by
a single fixed sample of the same size from a normal distribution.

>
> Which *quantitative* measure of the normality do you prefer?

I've never had need of such a measure, so I haven't thought about it.
My top-of-the-head response is that there probably is no measure that
will be best for all purposes, that it will depend on the particular
aspect of non-normality that is most important in the situation at
hand.

>
> Thanks.
> Michael.
>
> On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman(a)sfu.ca> wrote:
>
>> Here, prompted by off-line conversations, is an improved version of
>> qqnorm:
>>
>> qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a},
>> n = Length(a)data; y = Sort(a)data; {y1,y2,y3} = Quartiles@y;
>> x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.];
>> {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1;
>> ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None,
>> AspectRatio->((Last(a)y-First@y)/(y3-y1))/((Last(a)x-First@x)/(x3-x1)),
>> Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}],
>> FrameLabel->{"Standard Normal","Observed Data"}]]
>>
>> The most notable changes are that the reference line is now drawn so
>> that it passes through the joint first and third quartile points, and
>> the aspect ratio now varies so that the visual slope of the reference
>> line is always approximately 1. Also, the normal scores are now a
>> better approximation of the expected order statstics.
>>
>> On Feb 2, 3:48 am, Ray Koopman <koop...(a)sfu.ca> wrote:
>>> On Feb 2, 12:28 am, michael partensky <parten...(a)gmail.com> wrote:
>>>> Hi.
>>>> I wonder if anybody knows a function similar to qqnorm(data) from
>>>> *R*, producing a normal scores plot, or some related tools in M.
>>>> for testing normality of data?
>>>>
>>>> Thanks
>>>> Michael Partenskii
>>>
>>> qqnorm[y_] := Block[
>>> {n = Length@y, m = Mean@y, s = StandardDeviation@y, x},
>>> x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.];
>>> ListPlot[Transpose@{x,Sort@y},
>>> PlotRange->All, Frame->True, Axes->None, AspectRatio->1,
>>> Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}],
>>> FrameLabel->{"Theoretical Standard Normal Quantiles",
>>> "Observed Quantiles"}]]

From: michael partensky on 15 Feb 2010 05:46

*Thanks, Ray.
*

> Here's another approach that you might find interesting: plot the
> observed data against multiple samples from a standard normal
> distribution.
>
> Show[Graphics(a)Table[Line(a)Transpose@{Sort(a)RandomReal[
> NormalDistribution[0,1], Length(a)data], data}, {50}],
> PlotRange->All, Frame->True]
>
> *I was thinking of something like this, but could not implement it so
nicely. **Comparing directly the data sets with the averaged same- size
(ordered) samples from the normal distribution **can be also very helpful
for teaching. After such a discussion, a quartiles-based approach can be
introduced more naturally. Does it make sense?*

> Then compare that plot to what you get when you replace the data by
> a single fixed sample of the same size from a normal distribution.
>
> >
> > Which *quantitative* measure of the normality do you prefer?
>
> I've never had need of such a measure, so I haven't thought about it.
> My top-of-the-head response is that there probably is no measure that
> will be best for all purposes, that it will depend on the particular
> aspect of non-normality that is most important in the situation at
> hand.
>
> *I see. I just wanted to grasp where, say, in hypothesis-testing we can
use some quantitative measures of non-normality derived from the
aforementioned tests.*

>
> > Thanks.
> > Michael.
> >
> > On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman(a)sfu.ca> wrote:
> >
> >> Here, prompted by off-line conversations, is an improved version of
> >> qqnorm:
> >>
> >> qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a},
> >> n = Length(a)data; y = Sort(a)data; {y1,y2,y3} = Quartiles@y;
> >> x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.];
> >> {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1;
> >> ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None,
> >> AspectRatio->((Last(a)y-First@y)/(y3-y1))/((Last(a)x-First@x)/(x3-x1)),
> >> Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}],
> >> FrameLabel->{"Standard Normal","Observed Data"}]]
> >>
> >> The most notable changes are that the reference line is now drawn so
> >> that it passes through the joint first and third quartile points, and
> >> the aspect ratio now varies so that the visual slope of the reference
> >> line is always approximately 1. Also, the normal scores are now a
> >> better approximation of the expected order statstics.
> >>
> >> On Feb 2, 3:48 am, Ray Koopman <koop...(a)sfu.ca> wrote:
> >>> On Feb 2, 12:28 am, michael partensky <parten...(a)gmail.com> wrote:
> >>>> Hi.
> >>>> I wonder if anybody knows a function similar to qqnorm(data) from
> >>>> *R*, producing a normal scores plot, or some related tools in M.
> >>>> for testing normality of data?
> >>>>
> >>>> Thanks
> >>>> Michael Partenskii
> >>>
> >>> qqnorm[y_] := Block[
> >>> {n = Length@y, m = Mean@y, s = StandardDeviation@y, x},
> >>> x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.];
> >>> ListPlot[Transpose@{x,Sort@y},
> >>> PlotRange->All, Frame->True, Axes->None, AspectRatio->1,
> >>> Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}],
> >>> FrameLabel->{"Theoretical Standard Normal Quantiles",
> >>> "Observed Quantiles"}]]
>

First | Prev |
Pages: 1 2
Prev: Can Mathematica solve this differential equation ?
Next: Diagonalizing large matrices