Prev: Do Until Loop (total sas beginner question)
Next: Simple Hack To Get $2000 To Your PayPal Account
From: Steve_Bates on 12 Jul 2010 19:24 Hi All SASers, I have a problem creating relationship periods for couples on our database that I simply cannot solve. Due to the vagaries of our social welfare database we have real problems ascertaining accurate relationship periods between two clients. The front end software only records a partner if the partner is present when a benefit period is canceled. There are also other issues that come into play but these dont affect the problem I'm dealing with. Bottom line is I need to get both clients and partners relationship history in order to work out the correct relationship period(s) What I want to be able to do is extract accurate relationship periods using SAS code. I've prepared some test code to illustrate below: data relationships ; length swn pnswn client_start client_end 8 client partner $12 ; infile datalines delimiter=','; input swn client $ pnswn partner $ client_start client_end ; informat client_start client_end date9. ; format client_start client_end date9. ; datalines; 111,John Doe,222,Jane Citizen,20APR2004,11NOV2005 111,John Doe,222,Jane Citizen,24AUG2006,14JUL2008 222,Jane Citizen,111,John Doe,19JUL2004,11NOV2005 222,Jane Citizen,111,John Doe,24AUG2006,14JUL2008 222,Jane Citizen,111,John Doe,01JAN2010,20FEB2010 222,Jane Citizen,111,John Doe,05MAR2010,20MAR2010 ; SWN is the social welfare number of the client and PNSWN is the social welfare number of the partner at cancellation. From the example above I would want to extract the following relationship periods: 19JUL2004 - 11NOV2005 24AUG2006 - 14JUL2008 01JAN2010 - 20FEB2010 05MAR2010 - 20MAR2010 In other words if there is overlap between the two periods I want to extract the overlap period only but if there is no overlap then the whole period needs to be extracted (as seen in the last two obs). The situation in the last two obs occurs when one client has a partner included and excluded within one benefit type ie is on single rate Sickness Benefit, has partner included and then excluded while still on Sickness Benefit. When the benefit cancels there is no partner ergo the pnswn is not recorded. The above data is extracted from benefit data. Clients can (and do) have multiple periods in and out of relationships and these differ greatly so any solution needs to account for this. I've tried proc transposes, arrays you name it but have been unable to crack it. Any help much appreciated as its driving me crazy! warm regards Steve
From: Patrick on 13 Jul 2010 06:47 Hi Steve I believe the following code does what you're looking for (but some more test cases wouldn't hurt...). data have ; length id swn pnswn client_start client_end 8 client partner $12 ; infile datalines delimiter=','; input swn client $ pnswn partner $ client_start client_end ; informat client_start client_end date9. ; format client_start client_end date9. ; id=_n_; datalines; 111,John Doe,222,Jane Citizen,20APR2004,11NOV2005 111,John Doe,222,Jane Citizen,24AUG2006,14JUL2008 222,Jane Citizen,111,John Doe,19JUL2004,11NOV2005 222,Jane Citizen,111,John Doe,24AUG2006,14JUL2008 222,Jane Citizen,111,John Doe,01JAN2010,20FEB2010 222,Jane Citizen,111,John Doe,05MAR2010,20MAR2010 ; run; proc sql; create view V_OverlappingObs as select m.id, max(p.client_start,m.client_start) as Period_Start format=date9., min(p.client_end,m.client_end) as Period_End format=date9. from have as M,have as P where m.swn=p.pnswn and (P.client_start <= M.client_end and P.client_end >= M.client_start) ; create table want as select l.*, coalesce(r.Period_Start,l.client_start) as Period_Start format=date9., coalesce(r.Period_End,l.client_end) as Period_End format=date9. from have as l left join V_OverlappingObs as r on l.id=r.id ; quit; proc print data=want ; run; HTH Patrick
From: Steve_Bates on 13 Jul 2010 15:58 Hi Patrick, Thank you very much for this, really appreciated. I'll throw a heap of real clients at it my end this morning and let you know the results. Initial test looks great. I thought the answer lay in SQL as it's used for a lot of similar problems but I dont have the expertise to get my particular problem solved. Will post later today, thanks again. cheers Steve
From: Steve_Bates on 14 Jul 2010 00:50 Hi, Tested the code and it worked perfect thanks although I got the logoc slightly wrong. Since it picks up the partner at cancellation the end date actually had to be the latest (and not strictly the overlap period). I modified your code and it seems to work. I'm out of the office till Monday now but I will post the code then. I'm pretty sure you could do it more elegantly but all the tests I threw at it appeared to work. Thanks again for your help in this, I had never heard of the coalesce function in SQL so will do some research into this. cheers Steve
From: Steve_Bates on 22 Jul 2010 19:07 Hi Patrick, Sorry for the late response but I have been on sick leave this week. I modified your code as follows to pick up the latest date, it seems to work with the tests I've thrown against it so far and I've had other analysts throw examples at it as well - do you agree? I've used an example which gives the correct dates with my adaptation. Again this was due to me getting the logic incorrect initially! data have ; length id swn pnswn client_start client_end 8 client partner $12 ; infile datalines delimiter=','; input swn client $ pnswn partner $ client_start client_end ; informat client_start client_end date9. ; format client_start client_end date9. ; id=_n_; datalines; 111,John Doe,222,Jane Citizen,11Feb1991,12Dec1998 111,John Doe,222,Jane Citizen,14Dec1998,24Sep2001 111,John Doe,222,Jane Citizen,05Nov2001,07Jul2002 111,John Doe,222,Jane Citizen,06Dec2002,02Jul2003 222,Jane Citizen,111,John Doe,11Feb1991,12Dec1998 222,Jane Citizen,111,John Doe,14Dec1998,24Sep2001 222,Jane Citizen,111,John Doe,05Nov2001,21Aug2004 ; run ; proc sql; create table V_OverlappingObs as select m.id, max(p.client_start,m.client_start) as Period_Start format=ddmmyy10., max(p.client_end,m.client_end) as Period_End format=ddmmyy10. from have as M,have as P where m.swn=p.pnswn and (P.client_start <= M.client_end and P.client_end >= M.client_start) order by period_end, period_start ; quit ; data V_OverlappingObs ; set V_OverlappingObs ; by period_end period_start ; if first.period_end ; run ; proc sql ; create table my_version as select l.*, coalesce(r.Period_Start,l.client_start) as Period_Start format=ddmmyy10., coalesce(r.Period_End,l.client_end) as Period_End format=ddmmyy10. from have as l inner join V_OverlappingObs as r on l.id=r.id order by period_end, period_start ; quit; cheers Steve
|
Pages: 1 Prev: Do Until Loop (total sas beginner question) Next: Simple Hack To Get $2000 To Your PayPal Account |