SQL Server 2000: Tempdb growth is causing application failure. [Server Configuration]

Prev: Transaction Promotion/Hang?
Next: 100% cpu usage by sql ghost cleanup process

From: MrPCMan on 31 Mar 2010 11:49

Jay,

Finally, I reviewed (again) the latest errorlog from the production server
and I do see something that I discounted previously but maybe you can make
better sense of yet.

Since 12/21/09 when the database instance was restarted, we have had approx.
50 times of the following message:
"SQL Server has encountered 3728 occurrence(s) of IO requests taking longer
than 15 seconds to complete on file [f:\MSSQL\MSSQL\data\tempdb.mdf] in
database [tempdb] (2). The OS file handle is 0x000004DC. The offset of the
latest long IO is: 0x0000000170c000"

I discounted this because there are no consistent corelations between the
application sp failure and this error by date and time. The encountered
occurance(s) varies between 1 and 4412.

I hope this helps and again thanks much for your help!

MrPCMan

"Jay Konigsberg" wrote:

> Ask you manager to verify the recovery model of the tempdb to make sure it's
> "Simple" and not "Full", or "Bulk Logged".
>
> --
> Jay Konigsberg
> SQL Server DBA in Sacramento, CA
> http://www.linkedin.com/in/jaykonigsberg
>
> Live in Sacramento, CA?
> Join the Sacramento SQL Server User Group on LinkedIn
> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg
>
>
>
> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message
> news:094FAFB0-3F7A-4221-8B41-D3564DCFD1EC(a)microsoft.com...
> > Jay, Thanks so much for help!
> >
> > Well, my manager doesn't believe the tempdb database is the cause! So, I
> > have to find other root causes of the failure. I am only afriad that this
> > is
> > the root cause but I just can't prove it. Oh well!
> >
> > It is unfortunate also I don't have another technet telephone support
> > ticket, so I am out!
> >
> > Oh, you had ask about the server logs and event log; no problems with
> > database logs or server event logs for the days the sp failed. There were
> > some I/O errors previously but not during the execution of the failed
> > stored
> > procedure.
> >
> > Thanks Jay!
> > MrPCMan
> >
> >
> > "Jay Konigsberg" wrote:
> >
> >> Again I ask, what is in the SQL Server and/or Windows error logs.
> >>
> >> As to tempdb and its log space, tempdb defaults to Simple Recovery, which
> >> means that log space is released as soon as the transaction completes.
> >>
> >> I just had a horrible thought: did someone change your tempdb Recovery
> >> Model
> >> to full, not setup a backup plan and somehow turn off the autogrow
> >> (usually
> >> 10%)?
> >>
> >> It's been too long since I worked on 2000, so I can't tell you how to
> >> check.
> >>
> >>
> >> --
> >> Jay Konigsberg
> >> SQL Server DBA in Sacramento, CA
> >> http://www.linkedin.com/in/jaykonigsberg
> >>
> >> Live in Sacramento, CA?
> >> Join the Sacramento SQL Server User Group on LinkedIn
> >> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg
> >>
> >>
> >>
> >> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message
> >> news:9AF1EB41-CFFF-45BB-A8AB-EE7C0F1AA6FE(a)microsoft.com...
> >> > 1st thank you for replying!
> >> >
> >> > Ok, let say you're right a simple aggregriate summation shouldn't cause
> >> > this
> >> > type of failure. I doubt that. Here is more information on why I
> >> > believe
> >> > something is going on while this temporary table is being created
> >> > and/or
> >> > data
> >> > is being inserted - I issued an sqlperf(logspace) command everyday for
> >> > last 9
> >> > days to view the status of the tempdb log health. The "Log Space Used"
> >> > started at 20% on the first day increased about 5% per day until the
> >> > day
> >> > we
> >> > had failure on day #8 where it was 62% then it dropped back to 4%. Why
> >> > is
> >> > this?
> >> >
> >> > Could you please tell how SQL does shrinking of the "Log Space Used"?
> >> > In
> >> > SQL
> >> > 2K I did not know that log files could auto-shrink or autogrow? if not,
> >> > could
> >> > you please tell me why this number would be shrinked.
> >> >
> >> > This is the only correlation I have! I'm sorry to sound dumb however
> >> > this
> >> > does not make sense to me!
> >> >
> >> > Thanks,
> >> > MrPCMan
> >> >
> >> > "Erland Sommarskog" wrote:
> >> >
> >> >> MrPCMan (MrPCMan(a)discussions.microsoft.com) writes:
> >> >> > I cannot prove this however everything that I have look at provides
> >> >> > me
> >> >> > cause to look at the tempdb as being the problem. Here is the
> >> >> > scenerio;
> >> >> > my sp produces a very large bcp output to text file with a header
> >> >> > record
> >> >> > having the total number of records inside the file. That number is
> >> >> > generated from a import of over 1m records into a temporary table,
> >> >> > then
> >> >> > several aggregiate functions occur on that same table to provide
> >> >> > this
> >> >> > header number then the file is bcp out to disk. Now the header
> >> >> > record
> >> >> > number values are sometimes zero and when that happen the large file
> >> >> > is
> >> >> > NOT produced. There are over 1m records there (always) to insert
> >> >> > into
> >> >> > the temporary table however after the import of records into the
> >> >> > temporary table and aggregiate functions occurs the value of the
> >> >> > header
> >> >> > record variable is zero. This failure is not everyday, just every so
> >> >> > often - sorry there is no pattern.
> >> >>
> >> >> So the theory is that something happens to the tempdb logs that causes
> >> >> your procedure to fail. Sorry, I think it is the other way round. Your
> >> >> procedures probably takes some toll on tempdb. The days when your job
> >> >> fails, the toll on tempdb is lower precisely because the job fails.
> >> >>
> >> >> It is very unlikely that your problem has anything to do with some
> >> >> issues in tempdb. Most likely there is a problem in your code, and
> >> >> you should review it, not the least how you handle errors.
> >> >>
> >> >> --
> >> >> Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se
> >> >>
> >> >> Links for SQL Server Books Online:
> >> >> SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
> >> >> SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
> >> >> SQL 2000:
> >> >> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
> >> >>
> >> >> .
> >> >>
> >>
> >>
> >> .
> >>
>
>
> .
>

From: Jay Konigsberg on 31 Mar 2010 22:18

Is the [f:\MSSQL\MSSQL\data\tempdb.mdf] file fragmented in the filesystem?
Still, I would not think this would do anything but slow things down. I
would also check for hardware errors on the physical drive.

What is the recovery model of the tempdb?

--
Jay Konigsberg
SQL Server DBA in Sacramento, CA
http://www.linkedin.com/in/jaykonigsberg

Live in Sacramento, CA?
Join the Sacramento SQL Server User Group on LinkedIn
http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg

"MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message
news:0D8FC652-FAFF-4D8A-AF5F-EB58B4CE221E(a)microsoft.com...
> Jay,
>
> Finally, I reviewed (again) the latest errorlog from the production server
> and I do see something that I discounted previously but maybe you can make
> better sense of yet.
>
> Since 12/21/09 when the database instance was restarted, we have had
> approx.
> 50 times of the following message:
> "SQL Server has encountered 3728 occurrence(s) of IO requests taking
> longer
> than 15 seconds to complete on file [f:\MSSQL\MSSQL\data\tempdb.mdf] in
> database [tempdb] (2). The OS file handle is 0x000004DC. The offset of
> the
> latest long IO is: 0x0000000170c000"
>
> I discounted this because there are no consistent corelations between the
> application sp failure and this error by date and time. The encountered
> occurance(s) varies between 1 and 4412.
>
> I hope this helps and again thanks much for your help!
>
> MrPCMan
>
> "Jay Konigsberg" wrote:
>
>> Ask you manager to verify the recovery model of the tempdb to make sure
>> it's
>> "Simple" and not "Full", or "Bulk Logged".
>>
>> --
>> Jay Konigsberg
>> SQL Server DBA in Sacramento, CA
>> http://www.linkedin.com/in/jaykonigsberg
>>
>> Live in Sacramento, CA?
>> Join the Sacramento SQL Server User Group on LinkedIn
>> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg
>>
>>
>>
>> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message
>> news:094FAFB0-3F7A-4221-8B41-D3564DCFD1EC(a)microsoft.com...
>> > Jay, Thanks so much for help!
>> >
>> > Well, my manager doesn't believe the tempdb database is the cause! So,
>> > I
>> > have to find other root causes of the failure. I am only afriad that
>> > this
>> > is
>> > the root cause but I just can't prove it. Oh well!
>> >
>> > It is unfortunate also I don't have another technet telephone support
>> > ticket, so I am out!
>> >
>> > Oh, you had ask about the server logs and event log; no problems with
>> > database logs or server event logs for the days the sp failed. There
>> > were
>> > some I/O errors previously but not during the execution of the failed
>> > stored
>> > procedure.
>> >
>> > Thanks Jay!
>> > MrPCMan
>> >
>> >
>> > "Jay Konigsberg" wrote:
>> >
>> >> Again I ask, what is in the SQL Server and/or Windows error logs.
>> >>
>> >> As to tempdb and its log space, tempdb defaults to Simple Recovery,
>> >> which
>> >> means that log space is released as soon as the transaction completes.
>> >>
>> >> I just had a horrible thought: did someone change your tempdb Recovery
>> >> Model
>> >> to full, not setup a backup plan and somehow turn off the autogrow
>> >> (usually
>> >> 10%)?
>> >>
>> >> It's been too long since I worked on 2000, so I can't tell you how to
>> >> check.
>> >>
>> >>
>> >> --
>> >> Jay Konigsberg
>> >> SQL Server DBA in Sacramento, CA
>> >> http://www.linkedin.com/in/jaykonigsberg
>> >>
>> >> Live in Sacramento, CA?
>> >> Join the Sacramento SQL Server User Group on LinkedIn
>> >> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg
>> >>
>> >>
>> >>
>> >> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message
>> >> news:9AF1EB41-CFFF-45BB-A8AB-EE7C0F1AA6FE(a)microsoft.com...
>> >> > 1st thank you for replying!
>> >> >
>> >> > Ok, let say you're right a simple aggregriate summation shouldn't
>> >> > cause
>> >> > this
>> >> > type of failure. I doubt that. Here is more information on why I
>> >> > believe
>> >> > something is going on while this temporary table is being created
>> >> > and/or
>> >> > data
>> >> > is being inserted - I issued an sqlperf(logspace) command everyday
>> >> > for
>> >> > last 9
>> >> > days to view the status of the tempdb log health. The "Log Space
>> >> > Used"
>> >> > started at 20% on the first day increased about 5% per day until the
>> >> > day
>> >> > we
>> >> > had failure on day #8 where it was 62% then it dropped back to 4%.
>> >> > Why
>> >> > is
>> >> > this?
>> >> >
>> >> > Could you please tell how SQL does shrinking of the "Log Space
>> >> > Used"?
>> >> > In
>> >> > SQL
>> >> > 2K I did not know that log files could auto-shrink or autogrow? if
>> >> > not,
>> >> > could
>> >> > you please tell me why this number would be shrinked.
>> >> >
>> >> > This is the only correlation I have! I'm sorry to sound dumb however
>> >> > this
>> >> > does not make sense to me!
>> >> >
>> >> > Thanks,
>> >> > MrPCMan
>> >> >
>> >> > "Erland Sommarskog" wrote:
>> >> >
>> >> >> MrPCMan (MrPCMan(a)discussions.microsoft.com) writes:
>> >> >> > I cannot prove this however everything that I have look at
>> >> >> > provides
>> >> >> > me
>> >> >> > cause to look at the tempdb as being the problem. Here is the
>> >> >> > scenerio;
>> >> >> > my sp produces a very large bcp output to text file with a header
>> >> >> > record
>> >> >> > having the total number of records inside the file. That number
>> >> >> > is
>> >> >> > generated from a import of over 1m records into a temporary
>> >> >> > table,
>> >> >> > then
>> >> >> > several aggregiate functions occur on that same table to provide
>> >> >> > this
>> >> >> > header number then the file is bcp out to disk. Now the header
>> >> >> > record
>> >> >> > number values are sometimes zero and when that happen the large
>> >> >> > file
>> >> >> > is
>> >> >> > NOT produced. There are over 1m records there (always) to insert
>> >> >> > into
>> >> >> > the temporary table however after the import of records into the
>> >> >> > temporary table and aggregiate functions occurs the value of the
>> >> >> > header
>> >> >> > record variable is zero. This failure is not everyday, just every
>> >> >> > so
>> >> >> > often - sorry there is no pattern.
>> >> >>
>> >> >> So the theory is that something happens to the tempdb logs that
>> >> >> causes
>> >> >> your procedure to fail. Sorry, I think it is the other way round.
>> >> >> Your
>> >> >> procedures probably takes some toll on tempdb. The days when your
>> >> >> job
>> >> >> fails, the toll on tempdb is lower precisely because the job fails.
>> >> >>
>> >> >> It is very unlikely that your problem has anything to do with some
>> >> >> issues in tempdb. Most likely there is a problem in your code, and
>> >> >> you should review it, not the least how you handle errors.
>> >> >>
>> >> >> --
>> >> >> Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se
>> >> >>
>> >> >> Links for SQL Server Books Online:
>> >> >> SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
>> >> >> SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
>> >> >> SQL 2000:
>> >> >> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>> >> >>
>> >> >> .
>> >> >>
>> >>
>> >>
>> >> .
>> >>
>>
>>
>> .
>>

From: Erland Sommarskog on 18 Apr 2010 18:01

MrPCMan (MrPCMan(a)discussions.microsoft.com) writes:
> Finally, I reviewed (again) the latest errorlog from the production server
> and I do see something that I discounted previously but maybe you can make
> better sense of yet.
>
> Since 12/21/09 when the database instance was restarted, we have had
> approx. 50 times of the following message:
> "SQL Server has encountered 3728 occurrence(s) of IO requests taking
> longer than 15 seconds to complete on file
> [f:\MSSQL\MSSQL\data\tempdb.mdf] in database [tempdb] (2). The OS file
> handle is 0x000004DC. The offset of the latest long IO is:
> 0x0000000170c000"
>
> I discounted this because there are no consistent corelations between the
> application sp failure and this error by date and time. The encountered
> occurance(s) varies between 1 and 4412.

That message indicates that your IO subsystem is not up to speed. IO
requests should preferrably complete subsecond - even 100 ms is a long
time, and 15 seconds is an eternity. It could be an explanation for
the problems you are seeing, if these I/O stalls causes some application
to time out.

(This reply is coming late; I've been on vacation.)

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

First | Prev |
Pages: 1 2
Prev: Transaction Promotion/Hang?
Next: 100% cpu usage by sql ghost cleanup process