graph: fill area between series lines [SAS]

Prev: put the space in data step
Next: Interesting post about SAS and the Pharmaceutical Industry on

From: Mike Zdeb on 19 Feb 2010 15:02

hi ... here's another idea ... it's similar to Ya Huang's idea

so, first, if the the lines do not cross ... it's sort of easy if you understand
how GPLOT uses the AREAS option ...

here's an example from the on-line help (with a few observations eliminated)
where the two lines do not cross ...

****************************************;
data stocks;
input year high low @@;
datalines;
1960 685.47 568.05 1961 734.91 610.25 1962 726.01 535.76 1963 767.21 646.79
1964 891.71 768.08 1965 969.26 840.59 1966 995.15 744.32 1967 943.08 786.41
1968 985.21 825.13 1969 968.85 769.93 1970 842.00 631.16 1971 950.82 797.97
1972 1036.27 889.15 1973 1051.70 788.31 1974 891.66 577.60 1975 881.81 632.04
1976 1014.79 858.71 1977 999.75 800.85 1978 907.74 742.12 1979 897.61 796.67
1980 1000.17 759.13 1981 1024.05 824.01 1982 1070.55 776.92 1983 1287.20 1027.04
1984 1286.64 1086.57 1985 1553.10 1184.96 1986 1955.57 1502.29 1987 2722.42 1738.74
1988 2183.50 1879.14 1989 2791.41 2144.64 1990 2999.75 2365.10
;
run;

goptions reset=all;

axis1 order=(1960 to 1990 by 5) offset=(2,2)
label=none
major=(height=2)
minor=(height=1);

axis2 order=(0 to 4000 by 1000) offset=(0,0)
label=none
major=(height=2)
minor=(height=1);

pattern1 v=s c=white;
pattern2 v=s c=graydd;

symbol1 i=join c=black;
symbol2 i=join c=black;
symbol3 i=join c=red w=2;
symbol4 i=join c=red w=2;

proc gplot data=stocks;
plot (low high low high) * year / overlay haxis=axis1 hminor=4 vaxis=axis2 vminor=1 caxis=black areas=2;
run;
quit;
****************************************;

the PLOT statement uses both low ang high twice plus and AREAS=2 option
GPLOT uses the 2nd PATTERN 1st since it draws the GRAY area (pattern color GRAYDD) for the y-variable HIGH first
then it uses the 1st PATTERN which is WHITE for the y-variable LOW ... that white covers the gray
area up to the level of the y-variable LOW

as Ya pointed out, you can get the lines added by using the variables a second time (SYMBOLS 3 and 4 are used)

for your data, the lines cross and you want the colors to change

so, you can INVENT a new variable that will cover over shaded areas with WHITE (as done above), but its value
has to be that of the lower value at each age value ...

****************************************;
data foo;
* use $char to preserve the leading space for " < 20" ... so it shows up on the LEFT;
input age $char5. y1 y2;
* new variable ... set y3 to the lower of the two values;
y3 = y1*(y1 le y2) + y2*(y2 lt y1);
;
datalines;
<20 .00 .00
20-24 .01 .00
25-29 .03 .01
30-34 .04 .01
35-39 .05 .02
40-44 .06 .04
45-49 .08 .06
50-54 .08 .09
55-59 .09 .16
60-64 .15 .29
65-69 .10 .15
70-74 .08 .08
75-79 .09 .06
80-84 .07 .03
85+ .06 .02
;
run;

goptions reset=all gunit=pct h=2;

symbol1 i=j c=black;
symbol2 i=j c=black;
symbol3 i=j c=black;
symbol4 i=j f=marker v='V' c=black;
symbol5 i=j f=marker v='W' c=black;

pattern1 v=s c=white;
pattern2 v=s c=grayee;
pattern3 v=s c=gray88;

axis1 label=(a=90 'Y') order=0 to 0.3 by 0.05 minor=(n=4);
axis2 label=('AGE GROUP') offset=(0,0)pct;

* add some border white space;
title1 ls=1;
title2 a=90 ls=1;
title3 a=-90 ls=1;
footnote1 ls=1;

proc gplot data=foo;
plot (y3 y2 y1 y2 y1) * age /overlay areas=3 vaxis=axis1 haxis=axis2;
run;
quit;
****************************************;

this time there are FIVE variables plotted ... there are 3 areas and remember that

the 3rd variable will be shaded first (that's Y1 and the area up to that line is dark gray ... gray88)
the 2nd variable is shaded next (that's Y2 and the area up to that line is light gray ... grayee,
and it covers part of the area shaded with dark gray)
the 1st variable is shaded last (that's Y#, the lowest value at each age level and the area is WHITE,
so the white covers all the are up to that lower line)

then Y2 and Y1 use the 4th and 5th symbol and add the actual lines for Y2 and Y1 (if you don't
need the lines then just plot the first three variables in the list

that all works, but SORT OF since the point at which the lines cross on the left side of the plot
occurs BETWEEN age groups 45-49 and 50-54 ... when they cross on the right side, it's at a single
point, at age group 70-74

if you look at the results you'll see a small area of light gray near the left crossing point that
should not be there ... best you can do with this method

or, you can 'adjust' your data and make Y1 = Y2 at either 45-49 or 50-54 and it'll look 'perfect'

--
Mike Zdeb
U(a)Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475

From: Bill McKirgan on 19 Feb 2010 16:45

On Feb 19, 2:02 pm, ms...(a)albany.edu (Mike Zdeb) wrote:
> hi ... here's another idea ... it's similar to Ya Huang's idea
>
> so, first, if the the lines do not cross ... it's sort of easy if you understand
> how GPLOT uses the AREAS option ...
>
> here's an example from the on-line help (with a few observations eliminated)
> where the two lines do not cross ...
>
> ****************************************;
> data stocks;
> input year high low @@;
> datalines;
> 1960 685.47 568.05 1961 734.91 610.25 1962 726.01 535.76 1963 767.21 646.79
> 1964 891.71 768.08 1965 969.26 840.59 1966 995.15 744.32 1967 943.08 786.41
> 1968 985.21 825.13 1969 968.85 769.93 1970 842.00 631.16 1971 950.82 797.97
> 1972 1036.27 889.15 1973 1051.70 788.31 1974 891.66 577.60 1975 881.81 632.04
> 1976 1014.79 858.71 1977 999.75 800.85 1978 907.74 742.12 1979 897.61 796.67
> 1980 1000.17 759.13 1981 1024.05 824.01 1982 1070.55 776.92 1983 1287.20 1027.04
> 1984 1286.64 1086.57 1985 1553.10 1184.96 1986 1955.57 1502.29 1987 2722.42 1738.74
> 1988 2183.50 1879.14 1989 2791.41 2144.64 1990 2999.75 2365.10
> ;
> run;
>
> goptions reset=all;
>
> axis1 order=(1960 to 1990 by 5) offset=(2,2)
> label=none
> major=(height=2)
> minor=(height=1);
>
> axis2 order=(0 to 4000 by 1000) offset=(0,0)
> label=none
> major=(height=2)
> minor=(height=1);
>
> pattern1 v=s c=white;
> pattern2 v=s c=graydd;
>
> symbol1 i=join c=black;
> symbol2 i=join c=black;
> symbol3 i=join c=red w=2;
> symbol4 i=join c=red w=2;
>
> proc gplot data=stocks;
> plot (low high low high) * year / overlay haxis=axis1 hminor=4 vaxis=axis2 vminor=1 caxis=black areas=2;
> run;
> quit;
> ****************************************;
>
> the PLOT statement uses both low ang high twice plus and AREAS=2 option
> GPLOT uses the 2nd PATTERN 1st since it draws the GRAY area (pattern color GRAYDD) for the y-variable HIGH first
> then it uses the 1st PATTERN which is WHITE for the y-variable LOW ... that white covers the gray
> area up to the level of the y-variable LOW
>
> as Ya pointed out, you can get the lines added by using the variables a second time (SYMBOLS 3 and 4 are used)
>
> for your data, the lines cross and you want the colors to change
>
> so, you can INVENT a new variable that will cover over shaded areas with WHITE (as done above), but its value
> has to be that of the lower value at each age value ...
>
> ****************************************;
> data foo;
> * use $char to preserve the leading space for " < 20" ... so it shows up on the LEFT;
> input age $char5. y1 y2;
> * new variable ... set y3 to the lower of the two values;
> y3 = y1*(y1 le y2) + y2*(y2 lt y1);
> ;
> datalines;
> <20 .00 .00
> 20-24 .01 .00
> 25-29 .03 .01
> 30-34 .04 .01
> 35-39 .05 .02
> 40-44 .06 .04
> 45-49 .08 .06
> 50-54 .08 .09
> 55-59 .09 .16
> 60-64 .15 .29
> 65-69 .10 .15
> 70-74 .08 .08
> 75-79 .09 .06
> 80-84 .07 .03
> 85+ .06 .02
> ;
> run;
>
> goptions reset=all gunit=pct h=2;
>
> symbol1 i=j c=black;
> symbol2 i=j c=black;
> symbol3 i=j c=black;
> symbol4 i=j f=marker v='V' c=black;
> symbol5 i=j f=marker v='W' c=black;
>
> pattern1 v=s c=white;
> pattern2 v=s c=grayee;
> pattern3 v=s c=gray88;
>
> axis1 label=(a=90 'Y') order=0 to 0.3 by 0.05 minor=(n=4);
> axis2 label=('AGE GROUP') offset=(0,0)pct;
>
> * add some border white space;
> title1 ls=1;
> title2 a=90 ls=1;
> title3 a=-90 ls=1;
> footnote1 ls=1;
>
> proc gplot data=foo;
> plot (y3 y2 y1 y2 y1) * age /overlay areas=3 vaxis=axis1 haxis=axis2;
> run;
> quit;
> ****************************************;
>
> this time there are FIVE variables plotted ... there are 3 areas and remember that
>
> the 3rd variable will be shaded first (that's Y1 and the area up to that line is dark gray ... gray88)
> the 2nd variable is shaded next (that's Y2 and the area up to that line is light gray ... grayee,
> and it covers part of the area shaded with dark gray)
> the 1st variable is shaded last (that's Y#, the lowest value at each age level and the area is WHITE,
> so the white covers all the are up to that lower line)
>
> then Y2 and Y1 use the 4th and 5th symbol and add the actual lines for Y2 and Y1 (if you don't
> need the lines then just plot the first three variables in the list
>
> that all works, but SORT OF since the point at which the lines cross on the left side of the plot
> occurs BETWEEN age groups 45-49 and 50-54 ... when they cross on the right side, it's at a single
> point, at age group 70-74
>
> if you look at the results you'll see a small area of light gray near the left crossing point that
> should not be there ... best you can do with this method
>
> or, you can 'adjust' your data and make Y1 = Y2 at either 45-49 or 50-54 and it'll look 'perfect'
>
> --
> Mike Zdeb
> U(a)Albany School of Public Health
> One University Place
> Rensselaer, New York 12144-3456
> P/518-402-6479 F/630-604-1475

Mike,

Thank you for another fine idea that builds on Ya Huang's suggestion.
The information about where to find this in the documentation is just
as helpful to me as the solution you crafted for this particular graph
problem.

Thanks again to all of you for your help, and please forgive my many
spelling errors above.

Bill McKirgan

From: Mike Zdeb on 19 Feb 2010 16:21

hi ... OK, on further thought (and more coffee) ... in the previous post I said ...

"if you look at the results you'll see a small area of light gray near the left crossing point that
should not be there ... best you can do with this method or, you can 'adjust' your data and make Y1 = Y2
at either 45-49 or 50-54 and it'll look 'perfect'"

here's another idea ... not a general solution, but with these data you can figure out where the lines
cross between age groups by creating a numeric x-variable and then use a format to show the age groups
on the plot ... this will look "perfect" without changing the original data

proc format;
value age
1 = '<20' 2 = '20-24' 3 = '25-29' 4 = '30-34'
5 = '35-39' 6 = '40-44' 7 = '45-49' 8 = '50-54'
9 = '55-59' 10= '60-64' 11= '65-69' 12= '70-74'
13= '75-79' 14= '80-84' 15= '85+'
;
run;

data foo;
input y1 y2 @@;
* set y3 to the lower of the two values;
y3 = y1*(y1 le y2) + y2*(y2 lt y1);
x + 1;
;
datalines;
.00 .00 .01 .00 .03 .01 .04 .01
.05 .02 .06 .04 .08 .06 .08 .09
.09 .16 .15 .29 .10 .15 .08 .08
.09 .06 .07 .03 .06 .02
;
run;

* where the lines cross between X=7 and X=8;
data foo;
if last then do;
x = 7.655; y3 = 0.08; call missing (y1,y2); output;
end;
set foo end=last;
output;
run;

proc sort data=foo;
by x;
run;

goptions reset=all gunit=pct htext=2;

symbol1 i=j c=black;
symbol2 i=j c=black;
symbol3 i=j c=black;
symbol4 i=j f=marker v='V' c=black;
symbol5 i=j f=marker v='W' c=black;

pattern1 v=s c=white;
pattern2 v=s c=grayee;
pattern3 v=s c=gray88;

axis1 label=(a=90 'Y') order=0 to 0.3 by 0.05 minor=(n=4);
axis2 label=('AGE GROUP') offset=(0,0)pct minor=none;

title1 ls=1;
title2 a=90 ls=1;
title3 a=-90 ls=1;
footnote1 ls=1;

proc gplot data=foo;
plot (y3 y2 y1 y2 y1) * x /overlay areas=3 vaxis=axis1 haxis=axis2;
format x age.;
run;
quit;

--
Mike Zdeb
U(a)Albany School of Public Health
One University Place
Rensselaer, New York 12144-3456
P/518-402-6479 F/630-604-1475

> hi ... here's another idea ... it's similar to Ya Huang's idea
>
> for your data, the lines cross and you want the colors to change
>
> so, you can INVENT a new variable that will cover over shaded areas with WHITE (as done above), but its value
> has to be that of the lower value at each age value ...
>
> ****************************************;
> data foo;
> * use $char to preserve the leading space for " < 20" ... so it shows up on the LEFT;
> input age $char5. y1 y2;
> * new variable ... set y3 to the lower of the two values;
> y3 = y1*(y1 le y2) + y2*(y2 lt y1);
> ;
> datalines;
> <20 .00 .00
> 20-24 .01 .00
> 25-29 .03 .01
> 30-34 .04 .01
> 35-39 .05 .02
> 40-44 .06 .04
> 45-49 .08 .06
> 50-54 .08 .09
> 55-59 .09 .16
> 60-64 .15 .29
> 65-69 .10 .15
> 70-74 .08 .08
> 75-79 .09 .06
> 80-84 .07 .03
> 85+ .06 .02
> ;
> run;
>
> goptions reset=all gunit=pct h=2;
>
> symbol1 i=j c=black;
> symbol2 i=j c=black;
> symbol3 i=j c=black;
> symbol4 i=j f=marker v='V' c=black;
> symbol5 i=j f=marker v='W' c=black;
>
> pattern1 v=s c=white;
> pattern2 v=s c=grayee;
> pattern3 v=s c=gray88;
>
> axis1 label=(a=90 'Y') order=0 to 0.3 by 0.05 minor=(n=4);
> axis2 label=('AGE GROUP') offset=(0,0)pct;
>
> * add some border white space;
> title1 ls=1;
> title2 a=90 ls=1;
> title3 a=-90 ls=1;
> footnote1 ls=1;
>
> proc gplot data=foo;
> plot (y3 y2 y1 y2 y1) * age /overlay areas=3 vaxis=axis1 haxis=axis2;
> run;
> quit;
> ****************************************;
>
> this time there are FIVE variables plotted ... there are 3 areas and remember that
>
> the 3rd variable will be shaded first (that's Y1 and the area up to that line is dark gray ... gray88)
> the 2nd variable is shaded next (that's Y2 and the area up to that line is light gray ... grayee,
> and it covers part of the area shaded with dark gray)
> the 1st variable is shaded last (that's Y#, the lowest value at each age level and the area is WHITE,
> so the white covers all the are up to that lower line)
>
> then Y2 and Y1 use the 4th and 5th symbol and add the actual lines for Y2 and Y1 (if you don't
> need the lines then just plot the first three variables in the list
>
> that all works, but SORT OF since the point at which the lines cross on the left side of the plot
> occurs BETWEEN age groups 45-49 and 50-54 ... when they cross on the right side, it's at a single
> point, at age group 70-74
>
> if you look at the results you'll see a small area of light gray near the left crossing point that
> should not be there ... best you can do with this method
>
> or, you can 'adjust' your data and make Y1 = Y2 at either 45-49 or 50-54 and it'll look 'perfect'
>
> --
> Mike Zdeb
> U(a)Albany School of Public Health
> One University Place
> Rensselaer, New York 12144-3456
> P/518-402-6479 F/630-604-1475
>

From: Bill McKirgan on 25 Feb 2010 11:43

I just wanted to give you all an update on my graph problem and
solution. Below, is the code I used to make the kind of graph my
colleague 'TK' was asking for last week. He was using excel, and was
asked to fill in the area between two lines and found that while this
sounded easy-enough to do, it was in fact impossible with Excel 2007.
The closest we came was by overlaying area plots and adding borders to
the areas and making them transparent. I took a stab at the problem
with SGplot and with my limited SAS graph knowlege I ran into the same
problem (ie, there's no simple solution), and that's when I posted-up.

Thanks to all of you you expressed interest and offered ideas,
examples and spot-on solutions both on and off list. I am in your
debt and will look through these ideas later on as a way to learn more
about things like annotation datasets, and hash objects for making
more data-driven custom solutions.

In the end I used a solution provided by Mike Zdeb that is similar to
what Ya Huang posted. Another excellent candidate solution was Data
_NULL_'s which used hash objects to automate the part where fill areas
between the intersecting lines are trimmed. _NULL_, that stuff was
way over my head, but I will keep studying it because I want to learn
more techniques for making dynamic programs. I use list processing
and macro solutions in my regular data management work, but do not yet
understand the hash, and appreciate your example as it will probably
be what helps light the hash-object bulb in my head (so to speak).

I'm pretty sure my colleague will be very happy with the result, and
may be astounded to learn that it was accomplished by the generous
support of the SAS-L community.

Thanks!!

Bill McKirgan

Here's that code...

/* _graph_help__mike_zdeb_.sas

Use this to make a graph for TK.
This began as an excel file/graph with a request
to fill the areas between the two plotted lines.

Not easy/possible to do in excel Bill McKirgan
tried to do in SAS sgplot and then asked for help
on the SAS Listerv:

http://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/c08750954d3fea12?hl=en&pli=1

Many good ideas were suggested, and Mike Zdeb's solution
was the best/closest to meeting the goal TK described.

Mike's code was downloaded from SAS-L and into this
program which has some minor modifications to improve
colors, add titles and graph legend information.

Bill McKirgan
2/25/2010
*/
/* point to directory to save output */
%let whatpath=mypath;

/* Mike uses formatted variable instead of character data */
proc format;
value age
1 = '<20'
2 = '20-24'
3 = '25-29'
4 = '30-34'
5 = '35-39'
6 = '40-44'
7 = '45-49'
8 = '50-54'
9 = '55-59'
10= '60-64'
11= '65-69'
12= '70-74'
13= '75-79'
14= '80-84'
15= '85+'
;
run;

/* Admin Data from TK's chart (deidentified prior to posting to SAS-
L)*/
data foo;
input y1 y2 @@;
* set y3 to the lower of the two values;
y3 = y1*(y1 le y2) + y2*(y2 lt y1);
x + 1;
datalines;
..00 .00
..01 .00
..03 .01
..04 .01
..05 .02
..06 .04
..08 .06
..08 .09
..09 .16
..15 .29
..10 .15
..08 .08
..09 .06
..07 .03
..06 .02
;
run;

* Mike's HARDCODE where the lines cross between X=7 and X=8;
/* This was brilliant, and another similar post from Data _Null_
involved
a dynamic solution to filling the areas that involved
many lines of code and the use of HASH objects
That too was very close to the desired final appearance;
however, the hashes were way beyond my comprehension
but I will study the example because I like
data-driven solutions
*/
data foo;
if last then do;
x = 7.655;
y3 = 0.08;
call missing (y1,y2);
output;
end;
set foo end=last;
output;
run;
proc sort data=foo;
by x;
run;

goptions reset=all gunit=pct htext=2;
/* McKirgan edits to fine-tune graph output (suppress lines for clean
legend)
Add color to lines and fill patterns */
symbol1 i=j c=white; /* keep lines white to suppress in legend */
symbol2 i=j c=white; /* keep lines white to suppress in legend */
symbol3 i=j c=white; /* keep lines white to suppress in legend */
symbol4 i=j f=marker v='V' c=red; /* line color */
symbol5 i=j f=marker v='W' c=blue;

pattern1 v=s c=white;
pattern2 v=s c=lightblue ;
pattern3 v=s c=lightgreen ;

axis1 label=(a=90 'PERCENTAGE IN COHORT')
order=0 to 0.3 by 0.05
minor=(n=4)
;
axis2 label=('AGE GROUP')
offset=(0,0)pct
minor=none
;
legend1 label=none
value=('' '' '' 'MyHVs' 'Total Vs')
down=5
mode=protect
position=inside
;

title1 ls=1 'MHV vs Total V Age Distribution' ;
*title2 a=90 ls=1 'title 2' ;
*title3 a=-90 ls=1 'title 3' ;
*footnote1 ls=1 'footnote 1' ;

/* pdf is okay, but simply right clicking graph in SAS
and exporting as JPG is faster and has a better
result (an image that can be copy/pasted into
an MS-Office Word/Powerpoint file */

/* suppressing output to pdf after initial test */
*ods pdf file="&WHATPATH.\MHv_graph.pdf";

/* JUST EXPORT THE GRAPH FROM WITHIN SAS SESSION */
proc gplot data=foo;
plot (y3 y2 y1 y2 y1) * x
/overlay
areas=3
vaxis=axis1
haxis=axis2
legend = legend1 ;
format x age.;
run;
quit;

*ods pdf close;
run;
quit;

First | Prev |
Pages: 1 2 3
Prev: put the space in data step
Next: Interesting post about SAS and the Pharmaceutical Industry on