From: MrPCMan on 31 Mar 2010 11:49 Jay, Finally, I reviewed (again) the latest errorlog from the production server and I do see something that I discounted previously but maybe you can make better sense of yet. Since 12/21/09 when the database instance was restarted, we have had approx. 50 times of the following message: "SQL Server has encountered 3728 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [f:\MSSQL\MSSQL\data\tempdb.mdf] in database [tempdb] (2). The OS file handle is 0x000004DC. The offset of the latest long IO is: 0x0000000170c000" I discounted this because there are no consistent corelations between the application sp failure and this error by date and time. The encountered occurance(s) varies between 1 and 4412. I hope this helps and again thanks much for your help! MrPCMan "Jay Konigsberg" wrote: > Ask you manager to verify the recovery model of the tempdb to make sure it's > "Simple" and not "Full", or "Bulk Logged". > > -- > Jay Konigsberg > SQL Server DBA in Sacramento, CA > http://www.linkedin.com/in/jaykonigsberg > > Live in Sacramento, CA? > Join the Sacramento SQL Server User Group on LinkedIn > http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg > > > > "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message > news:094FAFB0-3F7A-4221-8B41-D3564DCFD1EC(a)microsoft.com... > > Jay, Thanks so much for help! > > > > Well, my manager doesn't believe the tempdb database is the cause! So, I > > have to find other root causes of the failure. I am only afriad that this > > is > > the root cause but I just can't prove it. Oh well! > > > > It is unfortunate also I don't have another technet telephone support > > ticket, so I am out! > > > > Oh, you had ask about the server logs and event log; no problems with > > database logs or server event logs for the days the sp failed. There were > > some I/O errors previously but not during the execution of the failed > > stored > > procedure. > > > > Thanks Jay! > > MrPCMan > > > > > > "Jay Konigsberg" wrote: > > > >> Again I ask, what is in the SQL Server and/or Windows error logs. > >> > >> As to tempdb and its log space, tempdb defaults to Simple Recovery, which > >> means that log space is released as soon as the transaction completes. > >> > >> I just had a horrible thought: did someone change your tempdb Recovery > >> Model > >> to full, not setup a backup plan and somehow turn off the autogrow > >> (usually > >> 10%)? > >> > >> It's been too long since I worked on 2000, so I can't tell you how to > >> check. > >> > >> > >> -- > >> Jay Konigsberg > >> SQL Server DBA in Sacramento, CA > >> http://www.linkedin.com/in/jaykonigsberg > >> > >> Live in Sacramento, CA? > >> Join the Sacramento SQL Server User Group on LinkedIn > >> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg > >> > >> > >> > >> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message > >> news:9AF1EB41-CFFF-45BB-A8AB-EE7C0F1AA6FE(a)microsoft.com... > >> > 1st thank you for replying! > >> > > >> > Ok, let say you're right a simple aggregriate summation shouldn't cause > >> > this > >> > type of failure. I doubt that. Here is more information on why I > >> > believe > >> > something is going on while this temporary table is being created > >> > and/or > >> > data > >> > is being inserted - I issued an sqlperf(logspace) command everyday for > >> > last 9 > >> > days to view the status of the tempdb log health. The "Log Space Used" > >> > started at 20% on the first day increased about 5% per day until the > >> > day > >> > we > >> > had failure on day #8 where it was 62% then it dropped back to 4%. Why > >> > is > >> > this? > >> > > >> > Could you please tell how SQL does shrinking of the "Log Space Used"? > >> > In > >> > SQL > >> > 2K I did not know that log files could auto-shrink or autogrow? if not, > >> > could > >> > you please tell me why this number would be shrinked. > >> > > >> > This is the only correlation I have! I'm sorry to sound dumb however > >> > this > >> > does not make sense to me! > >> > > >> > Thanks, > >> > MrPCMan > >> > > >> > "Erland Sommarskog" wrote: > >> > > >> >> MrPCMan (MrPCMan(a)discussions.microsoft.com) writes: > >> >> > I cannot prove this however everything that I have look at provides > >> >> > me > >> >> > cause to look at the tempdb as being the problem. Here is the > >> >> > scenerio; > >> >> > my sp produces a very large bcp output to text file with a header > >> >> > record > >> >> > having the total number of records inside the file. That number is > >> >> > generated from a import of over 1m records into a temporary table, > >> >> > then > >> >> > several aggregiate functions occur on that same table to provide > >> >> > this > >> >> > header number then the file is bcp out to disk. Now the header > >> >> > record > >> >> > number values are sometimes zero and when that happen the large file > >> >> > is > >> >> > NOT produced. There are over 1m records there (always) to insert > >> >> > into > >> >> > the temporary table however after the import of records into the > >> >> > temporary table and aggregiate functions occurs the value of the > >> >> > header > >> >> > record variable is zero. This failure is not everyday, just every so > >> >> > often - sorry there is no pattern. > >> >> > >> >> So the theory is that something happens to the tempdb logs that causes > >> >> your procedure to fail. Sorry, I think it is the other way round. Your > >> >> procedures probably takes some toll on tempdb. The days when your job > >> >> fails, the toll on tempdb is lower precisely because the job fails. > >> >> > >> >> It is very unlikely that your problem has anything to do with some > >> >> issues in tempdb. Most likely there is a problem in your code, and > >> >> you should review it, not the least how you handle errors. > >> >> > >> >> -- > >> >> Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se > >> >> > >> >> Links for SQL Server Books Online: > >> >> SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx > >> >> SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx > >> >> SQL 2000: > >> >> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > >> >> > >> >> . > >> >> > >> > >> > >> . > >> > > > . >
From: Jay Konigsberg on 31 Mar 2010 22:18 Is the [f:\MSSQL\MSSQL\data\tempdb.mdf] file fragmented in the filesystem? Still, I would not think this would do anything but slow things down. I would also check for hardware errors on the physical drive. What is the recovery model of the tempdb? -- Jay Konigsberg SQL Server DBA in Sacramento, CA http://www.linkedin.com/in/jaykonigsberg Live in Sacramento, CA? Join the Sacramento SQL Server User Group on LinkedIn http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message news:0D8FC652-FAFF-4D8A-AF5F-EB58B4CE221E(a)microsoft.com... > Jay, > > Finally, I reviewed (again) the latest errorlog from the production server > and I do see something that I discounted previously but maybe you can make > better sense of yet. > > Since 12/21/09 when the database instance was restarted, we have had > approx. > 50 times of the following message: > "SQL Server has encountered 3728 occurrence(s) of IO requests taking > longer > than 15 seconds to complete on file [f:\MSSQL\MSSQL\data\tempdb.mdf] in > database [tempdb] (2). The OS file handle is 0x000004DC. The offset of > the > latest long IO is: 0x0000000170c000" > > I discounted this because there are no consistent corelations between the > application sp failure and this error by date and time. The encountered > occurance(s) varies between 1 and 4412. > > I hope this helps and again thanks much for your help! > > MrPCMan > > "Jay Konigsberg" wrote: > >> Ask you manager to verify the recovery model of the tempdb to make sure >> it's >> "Simple" and not "Full", or "Bulk Logged". >> >> -- >> Jay Konigsberg >> SQL Server DBA in Sacramento, CA >> http://www.linkedin.com/in/jaykonigsberg >> >> Live in Sacramento, CA? >> Join the Sacramento SQL Server User Group on LinkedIn >> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg >> >> >> >> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message >> news:094FAFB0-3F7A-4221-8B41-D3564DCFD1EC(a)microsoft.com... >> > Jay, Thanks so much for help! >> > >> > Well, my manager doesn't believe the tempdb database is the cause! So, >> > I >> > have to find other root causes of the failure. I am only afriad that >> > this >> > is >> > the root cause but I just can't prove it. Oh well! >> > >> > It is unfortunate also I don't have another technet telephone support >> > ticket, so I am out! >> > >> > Oh, you had ask about the server logs and event log; no problems with >> > database logs or server event logs for the days the sp failed. There >> > were >> > some I/O errors previously but not during the execution of the failed >> > stored >> > procedure. >> > >> > Thanks Jay! >> > MrPCMan >> > >> > >> > "Jay Konigsberg" wrote: >> > >> >> Again I ask, what is in the SQL Server and/or Windows error logs. >> >> >> >> As to tempdb and its log space, tempdb defaults to Simple Recovery, >> >> which >> >> means that log space is released as soon as the transaction completes. >> >> >> >> I just had a horrible thought: did someone change your tempdb Recovery >> >> Model >> >> to full, not setup a backup plan and somehow turn off the autogrow >> >> (usually >> >> 10%)? >> >> >> >> It's been too long since I worked on 2000, so I can't tell you how to >> >> check. >> >> >> >> >> >> -- >> >> Jay Konigsberg >> >> SQL Server DBA in Sacramento, CA >> >> http://www.linkedin.com/in/jaykonigsberg >> >> >> >> Live in Sacramento, CA? >> >> Join the Sacramento SQL Server User Group on LinkedIn >> >> http://www.linkedin.com/groups?home=&gid=2825448&trk=anet_ug_hm&goback=%2Emyg >> >> >> >> >> >> >> >> "MrPCMan" <MrPCMan(a)discussions.microsoft.com> wrote in message >> >> news:9AF1EB41-CFFF-45BB-A8AB-EE7C0F1AA6FE(a)microsoft.com... >> >> > 1st thank you for replying! >> >> > >> >> > Ok, let say you're right a simple aggregriate summation shouldn't >> >> > cause >> >> > this >> >> > type of failure. I doubt that. Here is more information on why I >> >> > believe >> >> > something is going on while this temporary table is being created >> >> > and/or >> >> > data >> >> > is being inserted - I issued an sqlperf(logspace) command everyday >> >> > for >> >> > last 9 >> >> > days to view the status of the tempdb log health. The "Log Space >> >> > Used" >> >> > started at 20% on the first day increased about 5% per day until the >> >> > day >> >> > we >> >> > had failure on day #8 where it was 62% then it dropped back to 4%. >> >> > Why >> >> > is >> >> > this? >> >> > >> >> > Could you please tell how SQL does shrinking of the "Log Space >> >> > Used"? >> >> > In >> >> > SQL >> >> > 2K I did not know that log files could auto-shrink or autogrow? if >> >> > not, >> >> > could >> >> > you please tell me why this number would be shrinked. >> >> > >> >> > This is the only correlation I have! I'm sorry to sound dumb however >> >> > this >> >> > does not make sense to me! >> >> > >> >> > Thanks, >> >> > MrPCMan >> >> > >> >> > "Erland Sommarskog" wrote: >> >> > >> >> >> MrPCMan (MrPCMan(a)discussions.microsoft.com) writes: >> >> >> > I cannot prove this however everything that I have look at >> >> >> > provides >> >> >> > me >> >> >> > cause to look at the tempdb as being the problem. Here is the >> >> >> > scenerio; >> >> >> > my sp produces a very large bcp output to text file with a header >> >> >> > record >> >> >> > having the total number of records inside the file. That number >> >> >> > is >> >> >> > generated from a import of over 1m records into a temporary >> >> >> > table, >> >> >> > then >> >> >> > several aggregiate functions occur on that same table to provide >> >> >> > this >> >> >> > header number then the file is bcp out to disk. Now the header >> >> >> > record >> >> >> > number values are sometimes zero and when that happen the large >> >> >> > file >> >> >> > is >> >> >> > NOT produced. There are over 1m records there (always) to insert >> >> >> > into >> >> >> > the temporary table however after the import of records into the >> >> >> > temporary table and aggregiate functions occurs the value of the >> >> >> > header >> >> >> > record variable is zero. This failure is not everyday, just every >> >> >> > so >> >> >> > often - sorry there is no pattern. >> >> >> >> >> >> So the theory is that something happens to the tempdb logs that >> >> >> causes >> >> >> your procedure to fail. Sorry, I think it is the other way round. >> >> >> Your >> >> >> procedures probably takes some toll on tempdb. The days when your >> >> >> job >> >> >> fails, the toll on tempdb is lower precisely because the job fails. >> >> >> >> >> >> It is very unlikely that your problem has anything to do with some >> >> >> issues in tempdb. Most likely there is a problem in your code, and >> >> >> you should review it, not the least how you handle errors. >> >> >> >> >> >> -- >> >> >> Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se >> >> >> >> >> >> Links for SQL Server Books Online: >> >> >> SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx >> >> >> SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx >> >> >> SQL 2000: >> >> >> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx >> >> >> >> >> >> . >> >> >> >> >> >> >> >> >> . >> >> >> >> >> . >>
From: Erland Sommarskog on 18 Apr 2010 18:01 MrPCMan (MrPCMan(a)discussions.microsoft.com) writes: > Finally, I reviewed (again) the latest errorlog from the production server > and I do see something that I discounted previously but maybe you can make > better sense of yet. > > Since 12/21/09 when the database instance was restarted, we have had > approx. 50 times of the following message: > "SQL Server has encountered 3728 occurrence(s) of IO requests taking > longer than 15 seconds to complete on file > [f:\MSSQL\MSSQL\data\tempdb.mdf] in database [tempdb] (2). The OS file > handle is 0x000004DC. The offset of the latest long IO is: > 0x0000000170c000" > > I discounted this because there are no consistent corelations between the > application sp failure and this error by date and time. The encountered > occurance(s) varies between 1 and 4412. That message indicates that your IO subsystem is not up to speed. IO requests should preferrably complete subsecond - even 100 ms is a long time, and 15 seconds is an eternity. It could be an explanation for the problems you are seeing, if these I/O stalls causes some application to time out. (This reply is coming late; I've been on vacation.) -- Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se Links for SQL Server Books Online: SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
First
|
Prev
|
Pages: 1 2 Prev: Transaction Promotion/Hang? Next: 100% cpu usage by sql ghost cleanup process |