TOP clause [Programming SQL]

Prev: How to Convert Char to Datetime
Next: inner join on itself

From: tshad on 7 Apr 2010 18:50

"Gert-Jan Strik" <sorrytoomuchspamalready(a)xs4all.nl> wrote in message
news:4BBCCF6A.2C108395(a)xs4all.nl...
> tshad wrote:
>>
>> We have some Select statements that have TOP 10000 in them and they used
>> to
>> have ORDER BY clauses in them but we took them out.
>>
>> We are also taking out the TOP 10000 out as well.
>>
>> I assume this doesn't really affect performance (with or without the
>> clause)? It will never hit 10000.
>>
>> Thanks,
>>
>> Tom
>
> Tom,
>
> Removing the ORDER BY can improve performance. Obviously, the meaning
> changes dramatically if there are more rows than the TOP clause
> specifies.
>
> Removing the TOP 10000 clause does not affect performance if there are
> fewer than 10000 qualifying rows and the optimizer successfully
> estimates (based on the table statistics) that there are fewer than
> 10000 rows.
>
> When I look at your post called "inner join on itself", then I am not
> entirely sure that the optimizer will be able to estimate the number of
> resulting rows acurately. You would have to check the row estimates in
> the query plan to verify this.
>

That may not be a problem as the TOP is going to be taken out and the old
code with the extra inner join is going to be changed to one of the
following:

UPDATE TableA
SET TableA.Description = PagedResults.Description
FROM (
SELECT B.Description,
ROW_NUMBER() OVER (ORDER BY B.ID) AS
ResultSetRowNumber
FROM ##TableGT B
INNER JOIN vwCustomer vw
ON vwAView.ExternalID = B.ID AND vw.CatID = B.ID
WHERE B.session = @session AND B.batchID = @batchTimeStamp
) as PagedResults
WHERE ResultSetRowNumber BETWEEN @StartRow AND @EndRow

OR

UPDATE TableA
SET TableA.Description = B.Description
FROM ##TableGT B
INNER JOIN vwCustomer vw
ON vwAView.ExternalID = B.ID AND vw.CatID = B.ID
WHERE B.session = @session AND B.batchID = @batchTimeStamp AND B.ID
BETWEEN @StartRow AND @EndRow

The Global table has an Identity field (B.ID). But not set as a primary
key.

The program runs this in batches of 30. This was being done apparently
because the program would time out sometimes before finishing.

I thought the second one might be faster since in both cases I am getting
the same data and I assume setting up the ROW_NUMBER() OVER() would have
some overhead.

I also realized that the ID could start at 1000 if there was another session
using the table, so in this case the query would be executed about 33 times
with no results before it found any rows. Each time it executes it would do
all the joins before it got to the BETWEEN test. So even though it found
nothing 33 times, it would still do all the processing and be very slow.

Thanks,

Tom

> --
> Gert-Jan

First | Prev |
Pages: 1 2
Prev: How to Convert Char to Datetime
Next: inner join on itself