Opposite to INNER JOIN [Programming SQL]

Prev: CTE with multiple anchors
Next: Q;re Descriptive stats

From: Just D. on 5 Mar 2010 00:47

There is a simple task - to return the records from table1 where the key
table1.key1 doesn't exist in the table2. The first idea was to write it as:

SELECT * FROM table1 WHERE NOT table1.key1 IN (SELECT table2.key2 FROM
table2)

Simple, but it's way too inefficient for sure. Is there some trick that I'm
missing, something like an opposite to INNER JOIN? I doubt it exists, but
who knows...

There were two other ideas - to use a status column and change the status
when the record was sent to the table2, using this column in the query, but
for some reason my boss doesn'tlike this idea, explaining that we'll need to
restructure the whole table, change the related logic, table is huge, any
database structure corrections are painful, since the database is around 300
GBytes now and keeps growing, etc. I tried to explain that scannign the same
table again and again is not the best way, but... Adding another table where
we could track this status was also aworking idea, but it was put on hold
for a while.

Any other ideas?

Just D.

From: Erland Sommarskog on 5 Mar 2010 03:15

Just D. (no(a)spam.please) writes:
> There is a simple task - to return the records from table1 where the key
> table1.key1 doesn't exist in the table2. The first idea was to write it
> as:
>
> SELECT * FROM table1 WHERE NOT table1.key1 IN (SELECT table2.key2 FROM
> table2)
>
> Simple, but it's way too inefficient for sure.

It may be, but it could also be very efficient, all depending on keys,
indexes and cardinalities.

Some people like to use a left join:

SELECT ...
FROM table1
LEFT JOIN table2 ON table1.key1 = table2.key2
WHERE table.key2 IS NULL

Sometimes this may give a better plan. It may also give exactly the same
plan.

Personally I prefer:

SELECFT * FROM table1
WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.key = table2.key)

This usually gives the same plan as NOT IN, but there is a trap with NULL
values if you use NOT IN, that you cannot walk into with NOT EXISTS.

> There were two other ideas - to use a status column and change the
> status when the record was sent to the table2, using this column in the
> query, but for some reason my boss doesn'tlike this idea, explaining
> that we'll need to restructure the whole table, change the related
> logic, table is huge, any database structure corrections are painful,
> since the database is around 300 GBytes now and keeps growing, etc. I
> tried to explain that scannign the same table again and again is not the
> best way, but... Adding another table where we could track this status
> was also aworking idea, but it was put on hold for a while.

Now you are venting thoughts about the underlying business problem, and
maybe it offers some better ideas, but you would need to give more
information than just glimpses, if you want some input from this angle.

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

From: Tom on 5 Mar 2010 08:37

select *
from table1
left join table2 on table1.key on table2.key
where table2.key is null

Just D. wrote:
> There is a simple task - to return the records from table1 where the key
> table1.key1 doesn't exist in the table2. The first idea was to write it as:
>
> SELECT * FROM table1 WHERE NOT table1.key1 IN (SELECT table2.key2 FROM
> table2)
>
> Simple, but it's way too inefficient for sure. Is there some trick that I'm
> missing, something like an opposite to INNER JOIN? I doubt it exists, but
> who knows...
>
> There were two other ideas - to use a status column and change the status
> when the record was sent to the table2, using this column in the query, but
> for some reason my boss doesn'tlike this idea, explaining that we'll need to
> restructure the whole table, change the related logic, table is huge, any
> database structure corrections are painful, since the database is around 300
> GBytes now and keeps growing, etc. I tried to explain that scannign the same
> table again and again is not the best way, but... Adding another table where
> we could track this status was also aworking idea, but it was put on hold
> for a while.
>
> Any other ideas?
>
> Just D.

From: Tom on 5 Mar 2010 08:46

Should be

select *
from table1
left join table2 on table1.key = table2.key
where table2.key is null

Tom wrote:
> select *
> from table1
> left join table2 on table1.key on table2.key
> where table2.key is null
>
> Just D. wrote:
> > There is a simple task - to return the records from table1 where the key
> > table1.key1 doesn't exist in the table2. The first idea was to write it as:
> >
> > SELECT * FROM table1 WHERE NOT table1.key1 IN (SELECT table2.key2 FROM
> > table2)
> >
> > Simple, but it's way too inefficient for sure. Is there some trick that I'm
> > missing, something like an opposite to INNER JOIN? I doubt it exists, but
> > who knows...
> >
> > There were two other ideas - to use a status column and change the status
> > when the record was sent to the table2, using this column in the query, but
> > for some reason my boss doesn'tlike this idea, explaining that we'll need to
> > restructure the whole table, change the related logic, table is huge, any
> > database structure corrections are painful, since the database is around 300
> > GBytes now and keeps growing, etc. I tried to explain that scannign the same
> > table again and again is not the best way, but... Adding another table where
> > we could track this status was also aworking idea, but it was put on hold
> > for a while.
> >
> > Any other ideas?
> >
> > Just D.

From: Just D. on 5 Mar 2010 16:04

Erland,

Thanks so much! That was very helpful!

Just D.

"Erland Sommarskog"
>> There is a simple task - to return the records from table1 where the key
>> table1.key1 doesn't exist in the table2. The first idea was to write it
>> as:
>>
>> SELECT * FROM table1 WHERE NOT table1.key1 IN (SELECT table2.key2 FROM
>> table2)
>>
>> Simple, but it's way too inefficient for sure.
>
> It may be, but it could also be very efficient, all depending on keys,
> indexes and cardinalities.
>
> Some people like to use a left join:
>
> SELECT ...
> FROM table1
> LEFT JOIN table2 ON table1.key1 = table2.key2
> WHERE table.key2 IS NULL
>
> Sometimes this may give a better plan. It may also give exactly the same
> plan.
>
> Personally I prefer:
>
> SELECFT * FROM table1
> WHERE NOT EXISTS (SELECT * FROM table2 WHERE table1.key = table2.key)
>
> This usually gives the same plan as NOT IN, but there is a trap with NULL
> values if you use NOT IN, that you cannot walk into with NOT EXISTS.
>
>
>> There were two other ideas - to use a status column and change the
>> status when the record was sent to the table2, using this column in the
>> query, but for some reason my boss doesn'tlike this idea, explaining
>> that we'll need to restructure the whole table, change the related
>> logic, table is huge, any database structure corrections are painful,
>> since the database is around 300 GBytes now and keeps growing, etc. I
>> tried to explain that scannign the same table again and again is not the
>> best way, but... Adding another table where we could track this status
>> was also aworking idea, but it was put on hold for a while.
>
> Now you are venting thoughts about the underlying business problem, and
> maybe it offers some better ideas, but you would need to give more
> information than just glimpses, if you want some input from this angle.
>
> --
> Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

| Next | Last
Pages: 1 2
Prev: CTE with multiple anchors
Next: Q;re Descriptive stats