Prev: mysterious discrepancy in the reported free space on two identicalusb drives
Next: mysterious discrepancy in the reported free space on two identicalusb drives
From: Robert Nichols on 20 Feb 2010 01:19 In article <7u83bnF6pnU2(a)mid.individual.net>, Rod Speed <rod.speed.aaa(a)gmail.com> wrote: : :In fact the Everest SMART report shows that it actually got to 87C and that is utterly obscene. I don't get that from the report. Attribute Description Threshold Value Worst Data --------------------- --------- ----- ----- ------------- C2 Temperature 0 89 87 63 That "87" is the _normalized_ parameter -- a value that drops with increasing temperature and indicates "fail" status when it falls to the threshold value of 0. Note that the current value is "89" while the value in the "Worst" column is "87". That makes no sense if those values are actual temperatures. No, it appears that the drive was never much hotter than at the time that measurement was taken. It would be really interesting to see what those numbers are when the drive is first switched on after an extended power off cooldown, when the drive is still near the ambient temperature. -- Bob Nichols AT comcast.net I am "RNichols42"
From: sobriquet on 20 Feb 2010 02:38 On 20 feb, 04:00, "Rod Speed" <rod.speed....(a)gmail.com> wrote: > sobriquet wrote: > > On 19 feb, 19:10, "Rod Speed" <rod.speed....(a)gmail.com> wrote: > >> Rod Speed wrote > > >>> sobriquet wrote > >>>> sobriquet <dohduh...(a)yahoo.com> wrote > >>>>> Rod Speed <rod.speed....(a)gmail.com> wrote > >>>>>> sobriquet wrote > >>>>>>> Rod Speed <rod.speed....(a)gmail.com> wrote > >>>>>>>> sobriquet wrote > >>>>>>>>> I've lost some data on a 2 tb WD mybook usb drive. When > >>>>>>>>> I did a full scan, it found something like 3 mb in bad > >>>>>>>>> sectors. > >>>>>>>>> However, when I reformatted the drive, somehow all bad sectors > >>>>>>>>> were recovered. Apparently, there is some redundancy in > >>>>>>>>> diskspace, so it can allocate some of that extra space to > >>>>>>>>> substitute for the bad sectors on disk when it's just a small > >>>>>>>>> section of bad sectors. > >>>>>>>> Yes, all modern hard drives have spare sectors > >>>>>>>> that can be used as substitutes for bad sectors. > >>>>>>>>> The disk is also able to pass the short drive test (in winDLG > >>>>>>>>> under xp), that it used to fail, before I reformatted the > >>>>>>>>> drive. > >>>>>>>>> Now I wonder if the fact that previously bad sectors have > >>>>>>>>> occurred and I've lost data, is that increasing the > >>>>>>>>> likelyhood that this > >>>>>>>>> might happen again? > >>>>>>>> Yes, that many bad sectors > >>> It isnt in fact all that many now that we can see the SMART data. > >>>>>>>> does indicate a problem with the drive or > >>>>>>>> that the drive is running much too hot etc. > >>>>>>>>> Is the drive less reliable in any way once a small > >>>>>>>>> number of bad sectors have been identified > >>>>>>>> Yes, and 3MB is not a small number of bad sectors. > >>> Turns out to only be 3 bad sectors. > > >> And 3 more pending. > > >>>>>>>>> (even though the bad sectors are no longer visible after the > >>>>>>>>> drive has been formatted again and other drivespace is > >>>>>>>>> substituted for the bad sectors)? > >>>>>>>> Yes, it either indicates that the drive is dying, or that its > >>>>>>>> running stinking hot etc. > >>>>>>>>> Below is the original log from chdsk when the bad sectors > >>>>>>>>> were found: > >>>>>>>> chkdsk isnt a very useful indication of the health of the > >>>>>>>> drive. > >>>>>>>> You really need a proper SMART report on the drive. > >>>>>>>> That isnt necessarily that easy to get for free with an > >>>>>>>> external drive. > >>>>>>> Well, with winDLG, it does say the SMART status is OK for the > >>>>>>> device, > >>>>>> That never means much, its the detailled values that matter. > >>>>>>> and I can get more detailed SMART info. > >>>>>>> Here is a screenshot of the SMART info: > >>>>>>>http://img11.imageshack.us/img11/74/wdmybook.jpg > >>>>>> It isnt at all clear what that actually means, particularly what > >>>>>> the warranty field means. And the reallocated sector entry and > >>>>>> the temperature entry make no sense either. > >>>>>> The Everest SMART report is much more readable, > >>>>>> but doesnt work with external drives in the free version. > >>>>>> smartclt from a linux bootable cd might, and HDSentinal might, > >>>>>> but it isnt free. > >>>>> The version I've tried from HDSentinel wasn't up to date, but > >>>>> perhaps the version (5.30) of Everest on demonoid will provide > >>>>> more detailed > >>>>> SMART info on the drive. I'm busy with the drive now, but I'll > >>>>> soon > >>>>> follow up on this with a screenshot of the Everest SMART info of > >>>>> the drive. > >>>> Screenshot of Everest SMART info of the same drive: > >>>>http://img713.imageshack.us/img713/5343/everestje.jpg > >>> Thats much better. That shows 3 reallocated sectors which > >>> isnt too bad given the utterly obscene temperature of 63C. > > >> And its actually been to 87, thats completely and utterly obscene. > > >>> The temperature is certainly the problem and the > >>> drive will be fine if you can stop it getting that hot. > >>> Not easy to stop it getting that hot tho, particularly in the > >>> summer without air conditioning etc with those external drives. > > >> I'd be returning it if it was mine, but that wouldnt be a warranty > >> claim and how > >> easy it would be to do that depends on your country and its consumer > >> laws. > > >> The technical term is unfit for purpose in countrys with a legal > >> system derived from the british system. > > >> I cant remember the detail with Dutch law. > > >http://img62.imageshack.us/img62/5758/everest1q.jpg > > > So that means one of my internal hitachi drives reached a temperature of 150C?! > > Nope, 40C But that 40 number for the hitachi drive is in the same column as the 63 for the WD drive.. and I don't understand the relationship between the raw values and the value/worst numbers, or does that differ between various brands/models of HDs?
From: sobriquet on 20 Feb 2010 02:46 On 20 feb, 07:19, Robert Nichols <SEE_SIGNAT...(a)localhost.localdomain.invalid> wrote: > In article <7u83bnF6p...(a)mid.individual.net>,Rod Speed <rod.speed....(a)gmail.com> wrote: > > : > :In fact the Everest SMART report shows that it actually got to 87C and that is utterly obscene. > > I don't get that from the report. > > Attribute Description Threshold Value Worst Data > --------------------- --------- ----- ----- ------------- > C2 Temperature 0 89 87 63 > > That "87" is the _normalized_ parameter -- a value that drops with > increasing temperature and indicates "fail" status when it falls to the > threshold value of 0. Note that the current value is "89" while the > value in the "Worst" column is "87". That makes no sense if those > values are actual temperatures. No, it appears that the drive was > never much hotter than at the time that measurement was taken. > > It would be really interesting to see what those numbers are when the > drive is first switched on after an extended power off cooldown, when > the drive is still near the ambient temperature. > > -- > Bob Nichols AT comcast.net I am "RNichols42" Here is a screenshot of the SMART info from Everest when the drive has just been turned on after the power has been off for a while. http://img27.imageshack.us/img27/3280/everest2t.jpg
From: Arno on 20 Feb 2010 10:04 Franc Zabkar <fzabkar(a)iinternode.on.net> wrote: > On 19 Feb 2010 14:37:00 GMT, Arno <me(a)privacy.net> put finger to > keyboard and composed: >>Franc Zabkar <fzabkar(a)iinternode.on.net> wrote: >>> On Thu, 18 Feb 2010 12:43:25 -0800 (PST), sobriquet >>> <dohduhdah(a)yahoo.com> put finger to keyboard and composed: >> >>>>Here is a screenshot of the SMART info: >>>>http://img11.imageshack.us/img11/74/wdmybook.jpg >> >>> IIUC, WD's temperature attribute assigns a normalised value of 100 to >>> a temperature of 50C. A value of 89 would then suggest that the >>> temperature is 61C. >> >>> I could be wrong, though ... >> >>With the WDs I have the raw attribute seems to be C directly. >> >>Arno > I was referring to the normalised attribute value. > Wikipedia is unclear, but it does mention something along those lines > for attribute BE: > http://en.wikipedia.org/wiki/S.M.A.R.T. Well, some more datapoints from 4 of my WD disks: Raw C 106 41 115 32 110 40 128 22 The linear regression tool at http://www.xuru.org/rt/LR.asp gives me y = -0.907188353 x + 137.8498635 and an error of up to 1.95C. That would give 47.1C for 100. Seems there is a rather large rounding error or the like in here. Anyways, with OPs 89 would be 57C from my datapoints, 59C if I add 100/50C. Interesstingly, if I use only the lowest datapoint (128/22C) and the theoretical 100/50C, I get y = -1 x + 150 I suspect there is some dampening or averaging or the like going on with the cooked value and I also suspect temp = -1 x cooked + 150 [C] is what we want. With that the OPs 89 and worst 87 become 61C and 63C which is definitely far too high for comfort. The 63C is high enough that it could have caused enough (temprary) degradation for the 6000 reallocated sectors. In fact, borderline failure due to significant overheating seems to be the most likely cause to me now. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans
From: Arno on 20 Feb 2010 10:11
sobriquet <dohduhdah(a)yahoo.com> wrote: > On 20 feb, 07:19, Robert Nichols > <SEE_SIGNAT...(a)localhost.localdomain.invalid> wrote: >> In article <7u83bnF6p...(a)mid.individual.net>,Rod Speed <rod.speed....(a)gmail.com> wrote: >> >> : >> :In fact the Everest SMART report shows that it actually got to 87C and that is utterly obscene. >> >> I don't get that from the report. >> >> ? ? Attribute Description ?Threshold ?Value ?Worst ? ? ? Data >> ? ? --------------------- ?--------- ?----- ?----- ? ------------- >> ? C2 ? Temperature ? ? ? ? ? ? 0 ? ? ? ?89 ? ? 87 ? ? ? ? 63 >> >> That "87" is the _normalized_ parameter -- a value that drops with >> increasing temperature and indicates "fail" status when it falls to the >> threshold value of 0. ?Note that the current value is "89" while the >> value in the "Worst" column is "87". ?That makes no sense if those >> values are actual temperatures. ?No, it appears that the drive was >> never much hotter than at the time that measurement was taken. >> >> It would be really interesting to see what those numbers are when the >> drive is first switched on after an extended power off cooldown, when >> the drive is still near the ambient temperature. >> >> -- >> Bob Nichols ? ? ? ? AT comcast.net I am "RNichols42" > Here is a screenshot of the SMART info from Everest when the drive has > just been > turned on after the power has been off for a while. > http://img27.imageshack.us/img27/3280/everest2t.jpg Fits. With the linear regression from my other posting, it looks like your disk went up to something like 63C, and that could be enough to degrade its mechanics and electronics enough to have caused a large number of errors. To sum up: It looks like you nearly cooked your disk to death and the 6000 reallocated sectors happened when it was close to to failing completely. Note that there are 3 stages to heat death (with my personal estimation when they happen, depends also on the drive): 1. Starts to produce errors [60-70C]: you were there 2. Fails, but works again after cooldown [65-75C] 3. Fails permanently or suffers permanent damage [?] In all stages the disk ages very rapidly and may fail soon. I would also not really trust a disk anymore that has reached stage 2. Arno -- Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F ---- Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans |