Temp tables vs Permanent table with deletes [Programming SQL]

Prev: better to filter on join or where clause
Next: Cann't find ' sys.dm_sql_referenced_entities '

From: John Bell on 6 Apr 2010 02:22

On Mon, 5 Apr 2010 13:41:25 -0700, "tshad" <tfs(a)dslextreme.com> wrote:

>
>"John Bell" <jbellnewsposts(a)hotmail.com> wrote in message
>news:umdkr5dpvf50ft73sq09frfodtgldv7b78(a)4ax.com...
>> On Mon, 5 Apr 2010 11:32:29 -0700, "tshad" <tfs(a)dslextreme.com> wrote:
>>
>>>I am looking at changing about 10 procedures that use a Permanent table
>>>for
>>>nothing more than temporary storage to load and store various records from
>>>a
>>>select statement and run a Join on this table to update other records.
>>>
>>>It then deletes all the records it put into the table. The table is used
>>>by
>>>multiple procedures at the same time keyed by a Guid it always gets at the
>>>start of the routines.
>>>
>>>There are about 4 indexes on this table.
>>>
>>>There can be any from 1 to 1000 records involved in this.
>>>
>>>I was thinking about dropping the table and just using a temp table. When
>>>the table is created I would then have to create 3 indexes - would drop
>>>one
>>>index (the Guid).
>>>
>>>Then the table would go away when I exit the procedure. I don't think
>>>there
>>>is any necessity to drop the table (and indexes) as it would go away
>>>anyway.
>>>
>>>But now I don't have to delete the records at the end or create the Guid
>>>at
>>>the beginning.
>>>
>>>Anything I am missing here?
>>>
>>>Thanks,
>>>
>>>Tom
>>>
>> Hi Tom
>>
>> Creating many temp tables can create a bottleneck on tempdb See
>> http://support.microsoft.com/kb/328551
>>
>> Have you looked at whether you need a table to hold this?
>>
>
>As opposed to what? A derived table as Celko suggests?
>
>At the moment, it is a permanent table that 10 or 12 functions use for about
>20 seconds then it deletes all the records it created.
>
>Not sure if I need a table or not. Looking at the options.
>
>One problem we had is that in some cases the rows didn't get deleted as they
>should have so the deletes take longer than needed.
>
>I could just leave the table as is, delete all the current rows and fix the
>procedures that are not deleting the records (which I would do anyway) and
>leave the code as is.
>
>I was looking at a temporary table because there are 4 indexes on the table,
>but that may not be necessary if the currect records are deleted. Not sure
>on that because many people could access the table at the same time and put
>a large amount of rows in the table at once in which case it may be helpful.
>But then again, the overhead of the inserts and deletes because of the
>indexes may outweigh the indexes.
>
>The reason I was even looking at a temp table was that it would alleviate
>some of these issues.
>
>Thanks,
>
>Tom
>> John
>

Any actual solution would be a guess as we haven't seen the code, but
it sounds to me like this is probably one of those common cases where
the solution is not set based.

If you are looking at 1000 rows then adding an index/statistics on a
table may be more expensive than not having the index. Because the
permanent table is volatile statistics could be stale.

But without ciode that is all speculation.

John

From: tshad on 6 Apr 2010 14:43

"John Bell" <jbellnewsposts(a)hotmail.com> wrote in message
news:4iklr59p3nshs8tn2ure0dble3o1kl3kep(a)4ax.com...
> On Mon, 5 Apr 2010 13:41:25 -0700, "tshad" <tfs(a)dslextreme.com> wrote:
>
>>
>>"John Bell" <jbellnewsposts(a)hotmail.com> wrote in message
>>news:umdkr5dpvf50ft73sq09frfodtgldv7b78(a)4ax.com...
>>> On Mon, 5 Apr 2010 11:32:29 -0700, "tshad" <tfs(a)dslextreme.com> wrote:
>>>
>>>>I am looking at changing about 10 procedures that use a Permanent table
>>>>for
>>>>nothing more than temporary storage to load and store various records
>>>>from
>>>>a
>>>>select statement and run a Join on this table to update other records.
>>>>
>>>>It then deletes all the records it put into the table. The table is
>>>>used
>>>>by
>>>>multiple procedures at the same time keyed by a Guid it always gets at
>>>>the
>>>>start of the routines.
>>>>
>>>>There are about 4 indexes on this table.
>>>>
>>>>There can be any from 1 to 1000 records involved in this.
>>>>
>>>>I was thinking about dropping the table and just using a temp table.
>>>>When
>>>>the table is created I would then have to create 3 indexes - would drop
>>>>one
>>>>index (the Guid).
>>>>
>>>>Then the table would go away when I exit the procedure. I don't think
>>>>there
>>>>is any necessity to drop the table (and indexes) as it would go away
>>>>anyway.
>>>>
>>>>But now I don't have to delete the records at the end or create the Guid
>>>>at
>>>>the beginning.
>>>>
>>>>Anything I am missing here?
>>>>
>>>>Thanks,
>>>>
>>>>Tom
>>>>
>>> Hi Tom
>>>
>>> Creating many temp tables can create a bottleneck on tempdb See
>>> http://support.microsoft.com/kb/328551
>>>
>>> Have you looked at whether you need a table to hold this?
>>>
>>
>>As opposed to what? A derived table as Celko suggests?
>>
>>At the moment, it is a permanent table that 10 or 12 functions use for
>>about
>>20 seconds then it deletes all the records it created.
>>
>>Not sure if I need a table or not. Looking at the options.
>>
>>One problem we had is that in some cases the rows didn't get deleted as
>>they
>>should have so the deletes take longer than needed.
>>
>>I could just leave the table as is, delete all the current rows and fix
>>the
>>procedures that are not deleting the records (which I would do anyway) and
>>leave the code as is.
>>
>>I was looking at a temporary table because there are 4 indexes on the
>>table,
>>but that may not be necessary if the currect records are deleted. Not
>>sure
>>on that because many people could access the table at the same time and
>>put
>>a large amount of rows in the table at once in which case it may be
>>helpful.
>>But then again, the overhead of the inserts and deletes because of the
>>indexes may outweigh the indexes.
>>
>>The reason I was even looking at a temp table was that it would alleviate
>>some of these issues.
>>
>>Thanks,
>>
>>Tom
>>> John
>>
>
> Any actual solution would be a guess as we haven't seen the code, but
> it sounds to me like this is probably one of those common cases where
> the solution is not set based.
>
It is all set based here.

1) insert set of rows that is a subset of another table (or two) into the
table
2) use the table, usually to update another table or insert records into it
using this table in a join.
3) delete the records that were just added into the table (if you were the
only one using the table at the moment - it would then be empty).

> If you are looking at 1000 rows then adding an index/statistics on a
> table may be more expensive than not having the index. Because the
> permanent table is volatile statistics could be stale.
>
So then the Temp table would be better - based on what Tony said that
statistics are created on the fly as rows are added.

> But without ciode that is all speculation.
>
True. But the code is different in each case.

The table has 3 columns (GUID, ParentID, ChildID). One Guid per session.
If I were to use the table and add 100 rows, I would get a new GUID and all
the 100 rows would use that Guid.

When done I do a delete using that GUID.

DELETE table WHERE GUID = @GUID

Tom
> John

From: tshad on 6 Apr 2010 14:45

"Tony Rogerson" <tonyrogerson(a)torver.net> wrote in message
news:5E21021A-E28E-480F-87D1-6EC16FF7FFA3(a)microsoft.com...
>> In this case Tony, I think that this is an unnecessarily harsh response:
>
> Given --celko--'s attitude over the past decade on this forum any harsh
> response is reasonable - in fact, justified.
>
>> and records, his advise to the OP is sound: for a tiny table (up to 1000
>> rows), a temp table or a table variable will not only perform as well or
>> better, but will also be much easier to maintain then a 'queue' table.
>> If it is possible, it should perform even better if it is possible to use
>> find these rows from a CTE rather then realizing them at all, but the OP
>> hasn't given enough information to decide if that is possible.
>>
>
> That's not in my experience; for the table expression - derived table or
> CTE the optimiser hasn't a real idea in most cases (given real world SQL)
> how many rows will be returned on the intermediary steps, so you get a
> general plan.
>
> Table expressions are literally inline macros, the SQL is expanded into
> the main query - there is no encapsulation benefits in using table
> expressions because of this expansion.
>
> # table have statistics - it is my experience - and I've a hell of a lot
> of it in different industries, situations, scale and schema scenarios that
> a # table is 80% of the time the right choice because of those statistics.
> CPU is cheap so a plan compile costs milliseconds - probably not even
> measurable, however getting the plan wrong can cost significant amounts of
> CPU and disk IO because of the additional amounts of data having to be
> processed.
>
>> In my experience, the best performance improvements come from changing
>> applications not to _need_ databases to do things that they aren't good
>> at; and the second best from fixing schema. The worst 'improvements'
>> come from doing things like avoiding the table creation overhead by using
>> a persistent table for my temp storage - something surprising to many
>> programmers who are used to CPU bound tasks that _always_ run faster if I
>> do less work here ;)
>
> Using a permanet table to act as a temporary table just causes blocking,
> fragmentation and additional logging because people usually locate it in
> an application database rather than tempdb.
>
In another response, John Bell mentioned that the problem with temp tables,
which is what I was thinking of doing, was that it would cause a bottleneck
in TempDB.

If that were the case, why use it?

In my case, not everyone would be using the table at once, typically. But
there could be multiple users hitting the table at once.

But you would have the same issue with the static table, where everyone hits
the table at once.

Tom

> In my experience the best performance improvements come from teaching
> people to think in Sets; often people think the database isn't the right
> tool for the job because they don't understand Relational Theory so just
> poo poo the relational database.
>
> Tony.
>
>
> "m" <m(a)b.c> wrote in message news:uTLCuyR1KHA.224(a)TK2MSFTNGP06.phx.gbl...
>> In this case Tony, I think that this is an unnecessarily harsh response:
>> While Celko is reminding us all _again_ about the difference between rows
>> and records, his advise to the OP is sound: for a tiny table (up to 1000
>> rows), a temp table or a table variable will not only perform as well or
>> better, but will also be much easier to maintain then a 'queue' table.
>> If it is possible, it should perform even better if it is possible to use
>> find these rows from a CTE rather then realizing them at all, but the OP
>> hasn't given enough information to decide if that is possible.
>>
>> In my experience, the best performance improvements come from changing
>> applications not to _need_ databases to do things that they aren't good
>> at; and the second best from fixing schema. The worst 'improvements'
>> come from doing things like avoiding the table creation overhead by using
>> a persistent table for my temp storage - something surprising to many
>> programmers who are used to CPU bound tasks that _always_ run faster if I
>> do less work here ;)
>>
>>
>> "Tony Rogerson" <tonyrogerson(a)torver.net> wrote in message
>> news:#Jsd7rP1KHA.264(a)TK2MSFTNGP05.phx.gbl...
>>>> For no more than 1000 rows (not records), why not re-write the code to
>>>> use a derived table in each procedure? The procedures would become
>>>> independent, loosely coupled modules. You will probably have to do
>>>> some more work than just that, but it is a start.
>>>
>>> 1000 rows? From which planet did you pluck that figure from?
>>>
>>> No statistics are held on table expressions like a derived table,
>>> performance can be horrendous.
>>>
>>> Temporary tables (akin to a relvar) are widely used and rightly too in
>>> SQL Server, statistics are held on a temporary table so you get the best
>>> plan available for what you are doing.
>>>
>>> Temporary tables are in fact real tables and conform to Codd's rules.
>>>
>>> What's your problem other than the syntax we use in SQL Server doesn't
>>> explicitly follow ISO SQL?
>>>
>>> --ROGGIE--
>>>
>>>
>>> "--CELKO--" <jcelko212(a)earthlink.net> wrote in message
>>> news:506ec56a-e829-447c-9896-ee1613b35a6e(a)g30g2000yqc.googlegroups.com...
>>>> In the 1960's we would write records (not rows) to a scratch tape and
>>>> share that tape among several procedures. It saved us the trouble of
>>>> dismounting the tape. One of the fundamental principles of software
>>>> engineering is that loosely coupled modules are better than what you
>>>> have. Look up "data coupling', "functional coupling" and "procedural
>>>> coupling"in any book on software engineering. YOu seem to have one or
>>>> more of these design problems.
>>>>
>>>> For no more than 1000 rows (not records), why not re-write the code to
>>>> use a derived table in each procedure? The procedures would become
>>>> independent, loosely coupled modules. You will probably have to do
>>>> some more work than just that, but it is a start.
>>>

From: tshad on 6 Apr 2010 14:48

"Tibor Karaszi" <tibor_please.no.email_karaszi(a)hotmail.nomail.com> wrote in
message news:OVZvliU1KHA.4204(a)TK2MSFTNGP04.phx.gbl...
> You can know for sure by executing the proc while catching recompile
> events in Profiler. then you will know if *your* proc causes re-compiles.
> Note that as of 2005, there are both recompile events at SQL: level and
> RPC: level.
>
It's just confusing when in one case it says it will recompile and in
another it says (or seems to say) that it won't.

Tony says it will recompile the statement each time it sees the #Temp

Thanks,

Tom
> --
> Tibor Karaszi, SQL Server MVP
> http://www.karaszi.com/sqlserver/default.asp
> http://sqlblog.com/blogs/tibor_karaszi
>
>
>
> "tshad" <tfs(a)dslextreme.com> wrote in message
> news:#1w32SQ1KHA.348(a)TK2MSFTNGP02.phx.gbl...
>>
>> "tshad" <tfs(a)dslextreme.com> wrote in message
>> news:OiSCsHQ1KHA.224(a)TK2MSFTNGP06.phx.gbl...
>>>
>>> "Tony Rogerson" <tonyrogerson(a)torver.net> wrote in message
>>> news:%231qj$tP1KHA.5828(a)TK2MSFTNGP02.phx.gbl...
>>>> Hi Tom,
>>>>
>>>> You'll no doubt be getting really bad blocking from the deletes; you'll
>>>> be
>>>> getting poor performance because of all the logging that needs to
>>>> happen
>>>> because of the way you are doing.
>>>>
>>> Good point. Hadn't thought about the logging issue.
>>>
>>>> The problem with # tables is that they cause statement recompiles, but
>>>> that isn't necessarily a bad thing - how frequently are the stored
>>>> procedures called but more importantly are they called by many
>>>> concurrent
>>>> users (at the exact same time), if so then using # tables will likely
>>>> cause RECOMPILE locks.
>>>
>>> Not sure about that as there are so many procedures that use this table.
>>>>
>>
>> Also, I didn't think Temp Tables caused recompiles.
>>
>> I was looking at this article and it seems to say yes and no.
>>
>> It seems to say if your creates are at the top of the stored procedure,
>> the temp tables will be
>>
>> ****************************************************************************
>> Stored procedures will recompile if the developer has place interleaving
>> Data Definition Language operations with Data Manipulation Language
>> operations. This is usually caused when temporary objects are created and
>> referenced throughout the code. The reason this happens is that the
>> temporary objects due not exist when the initial compilation of the code
>> takes place, so SQL Server has to recompile the stored procedure during
>> execution. This recompilation takes place after the temporary object is
>> referenced for the first time.
>> *****************************************************************************
>>
>> By referenced, does he mean at creation time (I assume this is the case).
>>
>> ****************************************************************************************
>> By placing all of your temporary table creation statements together, SQL
>> Server can create plans for those temporary tables when one of them is
>> referenced for the first time. This recompile will still take place
>> during the execution of the stored procedure, but you have cut down the
>> recompiles from n to two (one for the stored procedure and one when the
>> first reference to a temporary table is made).
>> ****************************************************************************************
>>
>> Here it seems to imply that the recompiles will happen each time the
>> Stored procedure is run - when the table is referenced.
>>
>> *************************************************************************************************
>> SQL Server will also be able to reuse the execution plan for the stored
>> procedure the next time the procedure is called and your recompiles will
>> go to zero. Remember, like permanent objects, if you change the schema of
>> a temporary table, the change will cause the stored procedure to
>> recompile as well. Make all schema changes (such as index creation) right
>> after your create table statements and before you reference any of the
>> temporary tables. If you take the stored procedure created during the
>> section on using Profiler and modify it as written below, you will stop
>> the unnecessary recompiles.
>> ************************************************************************************************
>>
>> Here it says it won't recompile. But I assume that is as long as the
>> CREATE happens before DML (SELECT,INSERT etc) statement. What about a
>> DECLARE statement? I assume it wouldn't since the DECLARE is not a DML
>> statement.
>>
>> Thanks,
>>
>> tom
>>
>>
>>
>>>> You might want to consider using table variables but only if you have
>>>> multiple concurrent connections executing the same stored proecdures.
>>>>
>>>
>>> Maybe, but since there can be around 1000 records in this table, I would
>>> be
>>> nervous about using a table variable for this.
>>>
>>> I just refactored a procedure that had a table variable that was taking
>>> about 50 seconds to process when I was adding 2400 rows to it.
>>>
>>> Just by change it to a temporary table, the time went down to 2 seconds.
>>>
>>> Thanks,
>>>
>>> Tom
>>>
>>>> Hope that helps.
>>>>
>>>> Tony.
>>>>
>>>> "tshad" <tfs(a)dslextreme.com> wrote in message
>>>> news:eNdIY5O1KHA.6104(a)TK2MSFTNGP06.phx.gbl...
>>>>> I am looking at changing about 10 procedures that use a Permanent
>>>>> table
>>>>> for nothing more than temporary storage to load and store various
>>>>> records
>>>>> from a select statement and run a Join on this table to update other
>>>>> records.
>>>>>
>>>>> It then deletes all the records it put into the table. The table is
>>>>> used
>>>>> by multiple procedures at the same time keyed by a Guid it always gets
>>>>> at
>>>>> the start of the routines.
>>>>>
>>>>> There are about 4 indexes on this table.
>>>>>
>>>>> There can be any from 1 to 1000 records involved in this.
>>>>>
>>>>> I was thinking about dropping the table and just using a temp table.
>>>>> When the table is created I would then have to create 3 indexes -
>>>>> would
>>>>> drop one index (the Guid).
>>>>>
>>>>> Then the table would go away when I exit the procedure. I don't think
>>>>> there is any necessity to drop the table (and indexes) as it would go
>>>>> away anyway.
>>>>>
>>>>> But now I don't have to delete the records at the end or create the
>>>>> Guid
>>>>> at the beginning.
>>>>>
>>>>> Anything I am missing here?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Tom
>>>>>
>>>>>
>>>
>>>

From: tshad on 6 Apr 2010 14:50

"John Bell" <jbellnewsposts(a)hotmail.com> wrote in message
news:umdkr5dpvf50ft73sq09frfodtgldv7b78(a)4ax.com...
> On Mon, 5 Apr 2010 11:32:29 -0700, "tshad" <tfs(a)dslextreme.com> wrote:
>
>>I am looking at changing about 10 procedures that use a Permanent table
>>for
>>nothing more than temporary storage to load and store various records from
>>a
>>select statement and run a Join on this table to update other records.
>>
>>It then deletes all the records it put into the table. The table is used
>>by
>>multiple procedures at the same time keyed by a Guid it always gets at the
>>start of the routines.
>>
>>There are about 4 indexes on this table.
>>
>>There can be any from 1 to 1000 records involved in this.
>>
>>I was thinking about dropping the table and just using a temp table. When
>>the table is created I would then have to create 3 indexes - would drop
>>one
>>index (the Guid).
>>
>>Then the table would go away when I exit the procedure. I don't think
>>there
>>is any necessity to drop the table (and indexes) as it would go away
>>anyway.
>>
>>But now I don't have to delete the records at the end or create the Guid
>>at
>>the beginning.
>>
>>Anything I am missing here?
>>
>>Thanks,
>>
>>Tom
>>
> Hi Tom
>
> Creating many temp tables can create a bottleneck on tempdb See
> http://support.microsoft.com/kb/328551
>
Also, I noticed that this mainly seems to be talking about SQL 7 and a
HotFix to that database.

Is that still the case for 2000, 2005 and 2008?

Thanks,

Tom

> Have you looked at whether you need a table to hold this?
>
> John

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9
Prev: better to filter on join or where clause
Next: Cann't find ' sys.dm_sql_referenced_entities '