From: Claudia Champagne on
Hi everyone,

I'm trying to use a subquery to count past observations common to variables.
However, I must be doing something wrong because I run out of disk space
even though I have over 60 Go available!

I have 3 variables in table1 (and > 1 million lines): DEAL_ID, DATE and
CIE_ID. Since the same deal can involve more than 1 cie, the same DEAL_ID
can be repeated on more than 1 line with a different CIE_ID:

Deal_ID Date CIE_ID
1 01/01/1995 77
1 01/01/1995 22
1 01/01/1995 33
1 01/01/1995 44
1 01/01/1995 55
2 01/01/1998 77
2 01/01/1998 22
2 01/01/1998 17
2 01/01/1998 3
3 01/01/1999 77
3 01/01/1999 22
3 01/01/1999 33
3 01/01/1999 4


For a deal on a given date, I want to count the number of past deals between
the cies involved in the deal. I want past deals in the previous 5-year
period. For example, for DEAL_ID = 3, I need to obtain the following:

Deal_ID CIE_ID CIE_ID2 Count
3 77 22 2
3 77 33 1
3 77 4 0
3 22 33 1
3 22 4 0
....

At the time of DEAL_ID = 3, Cies 77 & 22 were in 2 previous deals together
over the past 5 years, #77 & 33 were in 1 previous deal together and 77 & 4
were in 0 past deal.

I tried using the following sql command but it uses too much disk space!

Proc sql;
Create table count_past as
select distinct a.PackageID, a.CIE_ID, b.CIE_ID as CIE_ID2,
(select (count(distinct b.PackageID)) from table1 b where
a.CIE_ID <> CIE_ID and b.PackageID in (select b.PackageID from
table1 b where b.CIE_ID = a.CIE_ID) and
0 < intck('DAY', b.Date, a.Date)<= 1825 )
from table1 a, table1 b
where a.CIE_ID <> b.CIE_ID
group by a.PackageID, a.CIE_ID, b.CIE_ID;
quit;


Any ideas on how to make it more efficient?

Thanks!