Manipulation of date ranges [Programming SQL]

Prev: Parameter Sniffing
Next: Best Practice Error Checking

From: Tom Cooper on 27 Apr 2010 21:20

For very large amounts of data, I thinkk the cursor solution is likely to be
the fastest in this case. For smaller amounts on data, there is a set based
solution that is as fast or faster and is simpler and easier to understand
(at least to me) than the cursor. The set based solution I would use is:

Select Case When a.period_start < b.period_start Then b.period_start Else
a.period_start End As period_start,
Case When a.period_end < b.period_end Then a.period_end Else b.period_end
End As period_end
From @A a
Cross Join @B b
Where (a.period_start <= b.period_start And a.period_end >= b.period_start)
Or (b.period_start <= a.period_start And b.period_end >= a.period_start);

My testing indicated the following (note that the performance tests were
very unsophisticated and not carefully designed)

# rows in @A # rows in @B milliseconds for cursor milliseconds for set
based
3 2 4
< 1
30 20 6
< 1
300 200 20 20
3,000 2,000 280 1,697
30,000 20,000 2,083 170,877

So the increase in the time the cursor takes is close to linear as the
number of rows increases, but the set based solution increases close to the
square of the number of rows.

However, I like the simplicity (and therefore maintainability) of the set
based solution and would use it if I expected the number of rows to be very
small and/or this process is done rarely and the number of rows was small
enough to make the response time acceptable - for example if it runs once a
week in the middle of the night, I may not care that it takes 3 minutes but
might well care if it took 6 hours. But if I expected a large number of
rows and needed the performance, then I would use the cursor.

Tom

"Fred" <foleide(a)free.fr.invalid> wrote in message
news:%231KNZZg5KHA.5952(a)TK2MSFTNGP04.phx.gbl...
>
>
> "Mark" <markc600(a)hotmail.com> a �crit dans le message de groupe de
> discussion :
> 19b5cc0a-309c-4004-92f2-4636338eb961(a)q15g2000yqj.googlegroups.com...
>> Try this
>>
>> WITH Starts AS (
>> SELECT a.period_start
>> FROM @A a
>> WHERE EXISTS (SELECT * FROM @B b WHERE a.period_start BETWEEN
>> b.period_start AND b.period_end)
>>
>> UNION ALL
>>
>> SELECT b.period_start
>> FROM @B b
>> WHERE EXISTS (SELECT * FROM @A a WHERE b.period_start BETWEEN
>> a.period_start AND a.period_end)
>> ),
>> Ends AS (
>> SELECT a.period_end
>> FROM @A a
>> WHERE EXISTS (SELECT * FROM @B b WHERE a.period_end BETWEEN
>> b.period_start AND b.period_end)
>>
>> UNION ALL
>>
>> SELECT b.period_end
>> FROM @B b
>> WHERE EXISTS (SELECT * FROM @A a WHERE b.period_end BETWEEN
>> a.period_start AND a.period_end)
>> )
>> SELECT s.period_start,
>> MIN(e.period_end) AS period_end
>> FROM Starts s
>> INNER JOIN Ends e ON e.period_end>=s.period_start
>> GROUP BY s.period_start;
>
> Thanks Mark,
>
> I didn't think to use CTE for this.
> It seems I can't use it inside an inline table function (?)
> So I used a multi-statement function as before.
> I tried your code on production data but it is much slower than the
> version with a cursor.
> (3600 ms vs 150 ms for more than 2000 periods in each set).
> Do you think I can reduce the time with appropriate declaration of the
> table type ?
> Right now, I have just declared the two columns as primary key so I guess
> the joins involving the second columns are not so easy to optimize for the
> query analyzer ?
>
> --
> Fred
> foleide(a)free.fr

From: Fred on 28 Apr 2010 01:34

"Tom Cooper" <tomcooper(a)comcast.net> a �crit dans le message de groupe
de discussion : OcH4GEn5KHA.620(a)TK2MSFTNGP02.phx.gbl...
> For very large amounts of data, I thinkk the cursor solution is likely
> to be the fastest in this case. For smaller amounts on data, there is
> a set based solution that is as fast or faster and is simpler and
> easier to understand (at least to me) than the cursor.

Yes, your set based solution is easy to understand (as was also Mark's
one).
I agree that the cursor based solution is not so easy to understand (and
didn't translate the comments).
An advantage of the cursor based solution is that it is very easy to
transform to consider more than two periods set (no one thought to ask
me this for the moment !).
And also the function for union of periods set is nearly the same.

> However, I like the simplicity (and therefore maintainability) of the
> set based solution and would use it if I expected the number of rows
> to be very small and/or this process is done rarely and the number of
> rows was small enough to make the response time acceptable - for
> example if it runs once a week in the middle of the night, I may not
> care that it takes 3 minutes but might well care if it took 6 hours.
> But if I expected a large number of rows and needed the performance,
> then I would use the cursor.

Well, the numbers of rows will be always at least about a thousand.
And when the function is called, it is to prepare data that are sent
immediatly to the user. So I can't, in this case, schedule some night
jobs to perform the calculation.
I always try to find a good set based solution, but I am afraid I will
have to keep my ugly �developer� solution !

Thanks for the time you spent doing some tests Tom.

--
Fred
foleide(a)free.fr

First | Prev |
Pages: 1 2 3
Prev: Parameter Sniffing
Next: Best Practice Error Checking