From: michael partensky on 13 Feb 2010 05:25 Hi. Ron. I have applied both the original and the modified functions (see below) to the data set dt = {1.2, 1.4, 1.9, 3.1, 3.3, 3.6, 3.8, 4.2, 4.4, 6.1}; The plots are somewhat different. Could you please comment on these differences. Especially, why changing the AspectRatio is important? Which *quantitative *measure of the normality do you prefer? Thanks. Michael. On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman(a)sfu.ca> wrote: > Here, prompted by off-line conversations, is an improved version of > qqnorm: > > qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a}, > n = Length(a)data; y = Sort(a)data; {y1,y2,y3} = Quartiles@y; > x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.]; > {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1; > ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None, > AspectRatio->((Last(a)y-First@y)/(y3-y1))/((Last(a)x-First@x)/(x3-x1)), > Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}], > FrameLabel->{"Standard Normal","Observed Data"}]] > > The most notable changes are that the reference line is now drawn so > that it passes through the joint first and third quartile points, and > the aspect ratio now varies so that the visual slope of the reference > line is always approximately 1. Also, the normal scores are now a > better approximation of the expected order statstics. > > On Feb 2, 3:48 am, Ray Koopman <koop...(a)sfu.ca> wrote: > > On Feb 2, 12:28 am, michael partensky <parten...(a)gmail.com> wrote: > >> Hi. > >> I wonder if anybody knows a function similar to qqnorm(data) from > >> *R*, producing a normal scores plot, or some related tools in M. > >> for testing normality of data? > >> > >> Thanks > >> Michael Partenskii > > > > qqnorm[y_] := Block[ > > {n = Length@y, m = Mean@y, s = StandardDeviation@y, x}, > > x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.]; > > ListPlot[Transpose@{x,Sort@y}, > > PlotRange->All, Frame->True, Axes->None, AspectRatio->1, > > Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}], > > FrameLabel->{"Theoretical Standard Normal Quantiles", > > "Observed Quantiles"}]] > >
From: Ray Koopman on 15 Feb 2010 05:46 ----- michael partensky <partensky(a)gmail.com> wrote: > Hi. Ron. > I have applied both the original and the modified functions (see > below) to the data set > dt = {1.2, 1.4, 1.9, 3.1, 3.3, 3.6, 3.8, 4.2, 4.4, 6.1}; > > The plots are somewhat different. Could you please comment on these > differences. Especially, why changing the AspectRatio is important? I think what you're noticing is mostly the effects of scaling and the way the line is drawn. (The changes to the normal scores are small and were a self-indulgence: I wanted to see if I could find a better approximation of the expected order statistics without complicating things unduly.) The old version made the plot square, and drew the line through the joint mean with slope (numeric, not visual) equal to the standard deviation of the observed data, thus making it approximately equal to the ratio of the two standard deviations. The line is close to the best-fit line that would be produced by orthogonal regression, where the errors are measured perpendicular to the line; and this will be apparent in the plot to the extent that the ratio of the ranges of the two variables is close to the ratio of their standard deviations. The new version takes the same basic approach but applies it to the middle half of each data set, using the first and third quartiles to both draw the line and equate the visual plot units. Then the aspect ratio of the whole plot becomes a function of the data and conveys information about the lengths of the distributions' tails, even if the line is not drawn. Drawing the line through the first and third quartiles seems to be the de facto standard, but it will not always be the best choice. In your case it uses points 3 and 8, when it is clear from the plot that points 4...9 collectively would be more appropriate. In this case the line misleads the eye. Here's another approach that you might find interesting: plot the observed data against multiple samples from a standard normal distribution. Show[Graphics(a)Table[Line(a)Transpose@{Sort(a)RandomReal[ NormalDistribution[0,1], Length(a)data], data}, {50}], PlotRange->All, Frame->True] Then compare that plot to what you get when you replace the data by a single fixed sample of the same size from a normal distribution. > > Which *quantitative* measure of the normality do you prefer? I've never had need of such a measure, so I haven't thought about it. My top-of-the-head response is that there probably is no measure that will be best for all purposes, that it will depend on the particular aspect of non-normality that is most important in the situation at hand. > > Thanks. > Michael. > > On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman(a)sfu.ca> wrote: > >> Here, prompted by off-line conversations, is an improved version of >> qqnorm: >> >> qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a}, >> n = Length(a)data; y = Sort(a)data; {y1,y2,y3} = Quartiles@y; >> x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.]; >> {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1; >> ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None, >> AspectRatio->((Last(a)y-First@y)/(y3-y1))/((Last(a)x-First@x)/(x3-x1)), >> Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}], >> FrameLabel->{"Standard Normal","Observed Data"}]] >> >> The most notable changes are that the reference line is now drawn so >> that it passes through the joint first and third quartile points, and >> the aspect ratio now varies so that the visual slope of the reference >> line is always approximately 1. Also, the normal scores are now a >> better approximation of the expected order statstics. >> >> On Feb 2, 3:48 am, Ray Koopman <koop...(a)sfu.ca> wrote: >>> On Feb 2, 12:28 am, michael partensky <parten...(a)gmail.com> wrote: >>>> Hi. >>>> I wonder if anybody knows a function similar to qqnorm(data) from >>>> *R*, producing a normal scores plot, or some related tools in M. >>>> for testing normality of data? >>>> >>>> Thanks >>>> Michael Partenskii >>> >>> qqnorm[y_] := Block[ >>> {n = Length@y, m = Mean@y, s = StandardDeviation@y, x}, >>> x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.]; >>> ListPlot[Transpose@{x,Sort@y}, >>> PlotRange->All, Frame->True, Axes->None, AspectRatio->1, >>> Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}], >>> FrameLabel->{"Theoretical Standard Normal Quantiles", >>> "Observed Quantiles"}]]
From: michael partensky on 15 Feb 2010 05:46 *Thanks, Ray. * > Here's another approach that you might find interesting: plot the > observed data against multiple samples from a standard normal > distribution. > > Show[Graphics(a)Table[Line(a)Transpose@{Sort(a)RandomReal[ > NormalDistribution[0,1], Length(a)data], data}, {50}], > PlotRange->All, Frame->True] > > *I was thinking of something like this, but could not implement it so nicely. **Comparing directly the data sets with the averaged same- size (ordered) samples from the normal distribution **can be also very helpful for teaching. After such a discussion, a quartiles-based approach can be introduced more naturally. Does it make sense?* > Then compare that plot to what you get when you replace the data by > a single fixed sample of the same size from a normal distribution. > > > > > Which *quantitative* measure of the normality do you prefer? > > I've never had need of such a measure, so I haven't thought about it. > My top-of-the-head response is that there probably is no measure that > will be best for all purposes, that it will depend on the particular > aspect of non-normality that is most important in the situation at > hand. > > *I see. I just wanted to grasp where, say, in hypothesis-testing we can use some quantitative measures of non-normality derived from the aforementioned tests.* > > > Thanks. > > Michael. > > > > On Fri, Feb 5, 2010 at 3:24 AM, Ray Koopman <koopman(a)sfu.ca> wrote: > > > >> Here, prompted by off-line conversations, is an improved version of > >> qqnorm: > >> > >> qqnorm2[data_] := Block[{n, y,y1,y2,y3, x,x1,x2,x3, b,a}, > >> n = Length(a)data; y = Sort(a)data; {y1,y2,y3} = Quartiles@y; > >> x = InverseErf[Range[1-n,n-1,2]/(n+.33(n-1.25)^-.1)]*Sqrt[2.]; > >> {x1,x2,x3} = Quartiles@x; b = (y3-y1)/(x3-x1); a = y1 - b*x1; > >> ListPlot[Transpose@{x,y}, PlotRange->All, Frame->True, Axes->None, > >> AspectRatio->((Last(a)y-First@y)/(y3-y1))/((Last(a)x-First@x)/(x3-x1)), > >> Prolog->Line[{#,#*b+a}&/@{First@x,Last@x}], > >> FrameLabel->{"Standard Normal","Observed Data"}]] > >> > >> The most notable changes are that the reference line is now drawn so > >> that it passes through the joint first and third quartile points, and > >> the aspect ratio now varies so that the visual slope of the reference > >> line is always approximately 1. Also, the normal scores are now a > >> better approximation of the expected order statstics. > >> > >> On Feb 2, 3:48 am, Ray Koopman <koop...(a)sfu.ca> wrote: > >>> On Feb 2, 12:28 am, michael partensky <parten...(a)gmail.com> wrote: > >>>> Hi. > >>>> I wonder if anybody knows a function similar to qqnorm(data) from > >>>> *R*, producing a normal scores plot, or some related tools in M. > >>>> for testing normality of data? > >>>> > >>>> Thanks > >>>> Michael Partenskii > >>> > >>> qqnorm[y_] := Block[ > >>> {n = Length@y, m = Mean@y, s = StandardDeviation@y, x}, > >>> x = InverseErf[Range[1-n,n-1,2]/n]*Sqrt[2.]; > >>> ListPlot[Transpose@{x,Sort@y}, > >>> PlotRange->All, Frame->True, Axes->None, AspectRatio->1, > >>> Prolog->Line[{{x[[1]],x[[1]]*s+m},{x[[-1]],x[[-1]]*s+m}}], > >>> FrameLabel->{"Theoretical Standard Normal Quantiles", > >>> "Observed Quantiles"}]] >
First
|
Prev
|
Pages: 1 2 Prev: Can Mathematica solve this differential equation ? Next: Diagonalizing large matrices |