From: Claudia Champagne on 22 Feb 2010 23:31 Hi everyone, I'm trying to use a subquery to count past observations common to variables. However, I must be doing something wrong because I run out of disk space even though I have over 60 Go available! I have 3 variables in table1 (and > 1 million lines): DEAL_ID, DATE and CIE_ID. Since the same deal can involve more than 1 cie, the same DEAL_ID can be repeated on more than 1 line with a different CIE_ID: Deal_ID Date CIE_ID 1 01/01/1995 77 1 01/01/1995 22 1 01/01/1995 33 1 01/01/1995 44 1 01/01/1995 55 2 01/01/1998 77 2 01/01/1998 22 2 01/01/1998 17 2 01/01/1998 3 3 01/01/1999 77 3 01/01/1999 22 3 01/01/1999 33 3 01/01/1999 4 For a deal on a given date, I want to count the number of past deals between the cies involved in the deal. I want past deals in the previous 5-year period. For example, for DEAL_ID = 3, I need to obtain the following: Deal_ID CIE_ID CIE_ID2 Count 3 77 22 2 3 77 33 1 3 77 4 0 3 22 33 1 3 22 4 0 .... At the time of DEAL_ID = 3, Cies 77 & 22 were in 2 previous deals together over the past 5 years, #77 & 33 were in 1 previous deal together and 77 & 4 were in 0 past deal. I tried using the following sql command but it uses too much disk space! Proc sql; Create table count_past as select distinct a.PackageID, a.CIE_ID, b.CIE_ID as CIE_ID2, (select (count(distinct b.PackageID)) from table1 b where a.CIE_ID <> CIE_ID and b.PackageID in (select b.PackageID from table1 b where b.CIE_ID = a.CIE_ID) and 0 < intck('DAY', b.Date, a.Date)<= 1825 ) from table1 a, table1 b where a.CIE_ID <> b.CIE_ID group by a.PackageID, a.CIE_ID, b.CIE_ID; quit; Any ideas on how to make it more efficient? Thanks!
|
Pages: 1 Prev: STATEMENT AND SYNTAX for robust standard errors in Zero Next: Multicollinearity problem |