Prev: Parameter Sniffing
Next: Best Practice Error Checking
From: Tom Cooper on 27 Apr 2010 21:20 For very large amounts of data, I thinkk the cursor solution is likely to be the fastest in this case. For smaller amounts on data, there is a set based solution that is as fast or faster and is simpler and easier to understand (at least to me) than the cursor. The set based solution I would use is: Select Case When a.period_start < b.period_start Then b.period_start Else a.period_start End As period_start, Case When a.period_end < b.period_end Then a.period_end Else b.period_end End As period_end From @A a Cross Join @B b Where (a.period_start <= b.period_start And a.period_end >= b.period_start) Or (b.period_start <= a.period_start And b.period_end >= a.period_start); My testing indicated the following (note that the performance tests were very unsophisticated and not carefully designed) # rows in @A # rows in @B milliseconds for cursor milliseconds for set based 3 2 4 < 1 30 20 6 < 1 300 200 20 20 3,000 2,000 280 1,697 30,000 20,000 2,083 170,877 So the increase in the time the cursor takes is close to linear as the number of rows increases, but the set based solution increases close to the square of the number of rows. However, I like the simplicity (and therefore maintainability) of the set based solution and would use it if I expected the number of rows to be very small and/or this process is done rarely and the number of rows was small enough to make the response time acceptable - for example if it runs once a week in the middle of the night, I may not care that it takes 3 minutes but might well care if it took 6 hours. But if I expected a large number of rows and needed the performance, then I would use the cursor. Tom "Fred" <foleide(a)free.fr.invalid> wrote in message news:%231KNZZg5KHA.5952(a)TK2MSFTNGP04.phx.gbl... > > > "Mark" <markc600(a)hotmail.com> a �crit dans le message de groupe de > discussion : > 19b5cc0a-309c-4004-92f2-4636338eb961(a)q15g2000yqj.googlegroups.com... >> Try this >> >> WITH Starts AS ( >> SELECT a.period_start >> FROM @A a >> WHERE EXISTS (SELECT * FROM @B b WHERE a.period_start BETWEEN >> b.period_start AND b.period_end) >> >> UNION ALL >> >> SELECT b.period_start >> FROM @B b >> WHERE EXISTS (SELECT * FROM @A a WHERE b.period_start BETWEEN >> a.period_start AND a.period_end) >> ), >> Ends AS ( >> SELECT a.period_end >> FROM @A a >> WHERE EXISTS (SELECT * FROM @B b WHERE a.period_end BETWEEN >> b.period_start AND b.period_end) >> >> UNION ALL >> >> SELECT b.period_end >> FROM @B b >> WHERE EXISTS (SELECT * FROM @A a WHERE b.period_end BETWEEN >> a.period_start AND a.period_end) >> ) >> SELECT s.period_start, >> MIN(e.period_end) AS period_end >> FROM Starts s >> INNER JOIN Ends e ON e.period_end>=s.period_start >> GROUP BY s.period_start; > > Thanks Mark, > > I didn't think to use CTE for this. > It seems I can't use it inside an inline table function (?) > So I used a multi-statement function as before. > I tried your code on production data but it is much slower than the > version with a cursor. > (3600 ms vs 150 ms for more than 2000 periods in each set). > Do you think I can reduce the time with appropriate declaration of the > table type ? > Right now, I have just declared the two columns as primary key so I guess > the joins involving the second columns are not so easy to optimize for the > query analyzer ? > > -- > Fred > foleide(a)free.fr
From: Fred on 28 Apr 2010 01:34
"Tom Cooper" <tomcooper(a)comcast.net> a �crit dans le message de groupe de discussion : OcH4GEn5KHA.620(a)TK2MSFTNGP02.phx.gbl... > For very large amounts of data, I thinkk the cursor solution is likely > to be the fastest in this case. For smaller amounts on data, there is > a set based solution that is as fast or faster and is simpler and > easier to understand (at least to me) than the cursor. Yes, your set based solution is easy to understand (as was also Mark's one). I agree that the cursor based solution is not so easy to understand (and didn't translate the comments). An advantage of the cursor based solution is that it is very easy to transform to consider more than two periods set (no one thought to ask me this for the moment !). And also the function for union of periods set is nearly the same. > However, I like the simplicity (and therefore maintainability) of the > set based solution and would use it if I expected the number of rows > to be very small and/or this process is done rarely and the number of > rows was small enough to make the response time acceptable - for > example if it runs once a week in the middle of the night, I may not > care that it takes 3 minutes but might well care if it took 6 hours. > But if I expected a large number of rows and needed the performance, > then I would use the cursor. Well, the numbers of rows will be always at least about a thousand. And when the function is called, it is to prepare data that are sent immediatly to the user. So I can't, in this case, schedule some night jobs to perform the calculation. I always try to find a good set based solution, but I am afraid I will have to keep my ugly �developer� solution ! Thanks for the time you spent doing some tests Tom. -- Fred foleide(a)free.fr |