smartctl when disks are in standby (Linux) [Storage]

Prev: making a removable SSD drive nonremovable
Next: Bad Clusters vs. Bad Sectors

From: David Brown on 25 Sep 2009 03:27

I've been playing around a little with standby for disks on a Linux
system - mostly it works well, with disks waking up automatically as needed.

If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
drive, however, it fails - it will wake up the drive, but smartctl gives
up waiting and returns an error message before the drive is ready. One
the drive is up to speed, the test runs fine.

Has anyone come across this before? Is it just that my disks (Samsung
1TB drives) are particularly slow to start up? Or is there a way to
make smartctl wait a little longer before giving up?

mvh.,

David

From: Arno on 25 Sep 2009 13:22

David Brown <david(a)westcontrol.removethisbit.com> wrote:
> I've been playing around a little with standby for disks on a Linux
> system - mostly it works well, with disks waking up automatically as needed.

> If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
> drive, however, it fails - it will wake up the drive, but smartctl gives
> up waiting and returns an error message before the drive is ready. One
> the drive is up to speed, the test runs fine.

> Has anyone come across this before? Is it just that my disks (Samsung
> 1TB drives) are particularly slow to start up? Or is there a way to
> make smartctl wait a little longer before giving up?

Can you post the error message from the syslog? It is possible
that not smartctl but the kernel gives up.

In any case, a simple solution could be to wrap smartctl into
something like

head -c 512 device > /dev/null; smartctl command device

If the wakeup by reading the first 512 bytes fails, you
could also add a "sleep 60" or the like.

For something more sophistocated, you can check
the power mode with "hdparm -C <device>".

Arno

From: David Brown on 26 Sep 2009 08:44

Arno wrote:
> David Brown <david(a)westcontrol.removethisbit.com> wrote:
>> I've been playing around a little with standby for disks on a Linux
>> system - mostly it works well, with disks waking up automatically as needed.
>
>> If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
>> drive, however, it fails - it will wake up the drive, but smartctl gives
>> up waiting and returns an error message before the drive is ready. One
>> the drive is up to speed, the test runs fine.
>
>> Has anyone come across this before? Is it just that my disks (Samsung
>> 1TB drives) are particularly slow to start up? Or is there a way to
>> make smartctl wait a little longer before giving up?
>
> Can you post the error message from the syslog? It is possible
> that not smartctl but the kernel gives up.
>

I'll have a look when I next have the machine switched on - I didn't
think about checking the syslog. Anything else that I've tried that
needs the disk (such as "ls" if the relevant data is not in the cache)
simply blocks until the disk is up to speed, so I've assumed the issue
is specific to smartctl.

> In any case, a simple solution could be to wrap smartctl into
> something like
>
> head -c 512 device > /dev/null; smartctl command device
>

Yes, that's my thought. It might involve slightly more work if I make
use of smartd, such as replacing the original smartctl binary with a
script doing something like your suggestion.

> If the wakeup by reading the first 512 bytes fails, you
> could also add a "sleep 60" or the like.
>
> For something more sophistocated, you can check
> the power mode with "hdparm -C <device>".
>

I know about "hdpram -C" - but in this case, I don't want to check if
the disk is awake, I want to awaken it!

> Arno
>

From: Arno on 26 Sep 2009 11:58

David Brown <david(a)westcontrol.removethisbit.com> wrote:
> Arno wrote:
>> David Brown <david(a)westcontrol.removethisbit.com> wrote:
>>> I've been playing around a little with standby for disks on a Linux
>>> system - mostly it works well, with disks waking up automatically as needed.
>>
>>> If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
>>> drive, however, it fails - it will wake up the drive, but smartctl gives
>>> up waiting and returns an error message before the drive is ready. One
>>> the drive is up to speed, the test runs fine.
>>
>>> Has anyone come across this before? Is it just that my disks (Samsung
>>> 1TB drives) are particularly slow to start up? Or is there a way to
>>> make smartctl wait a little longer before giving up?
>>
>> Can you post the error message from the syslog? It is possible
>> that not smartctl but the kernel gives up.
>>

> I'll have a look when I next have the machine switched on - I didn't
> think about checking the syslog. Anything else that I've tried that
> needs the disk (such as "ls" if the relevant data is not in the cache)
> simply blocks until the disk is up to speed, so I've assumed the issue
> is specific to smartctl.

It may be specific to smartctl or to sending disk commands.
If there is nothing in the syslog, then it is smartctl, otherwise
not necessarily. Because I did not find a commandline setting
for a timeout in smartctl, I think it may well be the kernel.
If it is indeed a timeout in smartctl, that parameter would be
something to propose to the smartctl maintainer.

>> In any case, a simple solution could be to wrap smartctl into
>> something like
>>
>> head -c 512 device > /dev/null; smartctl command device
>>

> Yes, that's my thought. It might involve slightly more work if I make
> use of smartd, such as replacing the original smartctl binary with a
> script doing something like your suggestion.

>> If the wakeup by reading the first 512 bytes fails, you
>> could also add a "sleep 60" or the like.
>>
>> For something more sophistocated, you can check
>> the power mode with "hdparm -C <device>".
>>

> I know about "hdpram -C" - but in this case, I don't want to check if
> the disk is awake, I want to awaken it!

First know - then act! ;-)

This would allow you to skip a wakeup-step and waiting if the HDD
is already up.

Arno

From: David Brown on 26 Sep 2009 15:04

Arno wrote:
> David Brown <david(a)westcontrol.removethisbit.com> wrote:
>> Arno wrote:
>>> David Brown <david(a)westcontrol.removethisbit.com> wrote:
>>>> I've been playing around a little with standby for disks on a Linux
>>>> system - mostly it works well, with disks waking up automatically as needed.
>>>> If I try a smartctl test ("smartctl -t short /dev/sda") on a standby
>>>> drive, however, it fails - it will wake up the drive, but smartctl gives
>>>> up waiting and returns an error message before the drive is ready. One
>>>> the drive is up to speed, the test runs fine.
>>>> Has anyone come across this before? Is it just that my disks (Samsung
>>>> 1TB drives) are particularly slow to start up? Or is there a way to
>>>> make smartctl wait a little longer before giving up?
>>> Can you post the error message from the syslog? It is possible
>>> that not smartctl but the kernel gives up.
>>>
>
>> I'll have a look when I next have the machine switched on - I didn't
>> think about checking the syslog. Anything else that I've tried that
>> needs the disk (such as "ls" if the relevant data is not in the cache)
>> simply blocks until the disk is up to speed, so I've assumed the issue
>> is specific to smartctl.
>
> It may be specific to smartctl or to sending disk commands.
> If there is nothing in the syslog, then it is smartctl, otherwise
> not necessarily. Because I did not find a commandline setting
> for a timeout in smartctl, I think it may well be the kernel.
> If it is indeed a timeout in smartctl, that parameter would be
> something to propose to the smartctl maintainer.
>

Here's a transcript:

host:~# hdparm -y /dev/sda

/dev/sda:
issuing standby command
host:~# hdparm -C /dev/sda

/dev/sda:
drive state is: standby

host:~# smartctl -t short /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Command "Execute SMART Short self-test routine immediately in off-line
mode" failed

host:~# smartctl -c /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection:
Disabled.
Self-test execution status: ( 41) The self-test routine was
interrupted
by the host with a hard or soft
reset.

syslog gives the following:

Sep 26 20:50:23 host kernel: [ 9865.466082] ata1.00: exception Emask 0x0
SAct 0x0 SErr 0x0 action 0x6
frozen
Sep 26 20:50:23 host kernel: [ 9865.466110] ata1.00: cmd
b0/d4:00:01:4f:c2/00:00:00:00:00/00 tag 0
Sep 26 20:50:23 host kernel: [ 9865.466111] res
40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 26 20:50:23 host kernel: [ 9865.466154] ata1.00: status: { DRDY }
Sep 26 20:50:25 host kernel: [ 9868.758069] ata1: soft resetting link
Sep 26 20:50:26 host kernel: [ 9869.402326] ata1.00: configured for UDMA/133
Sep 26 20:50:26 host kernel: [ 9869.410326] ata1.01: configured for UDMA/133
Sep 26 20:50:26 host kernel: [ 9869.410349] ata1: EH complete
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] 1953525168
512-byte hardware sectors (1000205 MB)
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] Write
Protect is off
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] Mode
Sense: 00 3a 00 00
Sep 26 20:50:26 host kernel: [ 9869.410531] sd 0:0:0:0: [sda] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 26 20:50:26 host kernel: [ 9869.418556] sd 0:0:1:0: [sdb] 1953525168
512-byte hardware sectors (10Sep 26 20:50:26 offlinebackup kernel: [
9869.418692] sd 0:0:0:0: [sda] 1953525168 512-byte hardware sectors
(1000205 MB)
Sep 26 20:50:26 host kernel: [ 9869.418726] sd 0:0:0:0: [sda] Write
Protect is off
Sep 26 20:50:26 host kernel: [ 9869.418744] sd 0:0:0:0: [sda] Mode
Sense: 00 3a 00 00
Sep 26 20:50:26 host kernel: [ 9869.418759] sd 0:0:0:0: [sda] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 26 20:50:26 host kernel: [ 9869.418803] sd 0:0:1:0: [sdb] 1953525168
512-byte hardware sectors (1000205 MB)
Sep 26 20:50:26 host kernel: [ 9869.418837] sd 0:0:1:0: [sdb] Write
Protect is off
Sep 26 20:50:26 host kernel: [ 9869.418854] sd 0:0:1:0: [sdb] Mode
Sense: 00 3a 00 00
Sep 26 20:50:26 host kernel: [ 9869.418869] sd 0:0:1:0: [sdb] Write
cache: enabled, read cache: enabled, doesn't support DPO or FUA

It seems that both disks (sda and sdb) have been woken up and reset.
Note that I don't get any syslog messages for a normal wakeup.

Excerpt from "smartctl -a /dev/sda" after this failed test:

Self-test execution status: ( 41) The self-test routine was
interrupted
by the host with a hard or soft
reset.

>>> In any case, a simple solution could be to wrap smartctl into
>>> something like
>>>
>>> head -c 512 device > /dev/null; smartctl command device
>>>
>

Of course, the "head" command will only work if the first block is not
already in the cache - otherwise it will not wake up the disk.

I haven't yet found any equivalent to "hdparm -y" to force a wakeup -
that would be useful.

>> Yes, that's my thought. It might involve slightly more work if I make
>> use of smartd, such as replacing the original smartctl binary with a
>> script doing something like your suggestion.
>
>>> If the wakeup by reading the first 512 bytes fails, you
>>> could also add a "sleep 60" or the like.
>>>
>>> For something more sophistocated, you can check
>>> the power mode with "hdparm -C <device>".
>>>
>
>> I know about "hdpram -C" - but in this case, I don't want to check if
>> the disk is awake, I want to awaken it!
>
> First know - then act! ;-)
>
> This would allow you to skip a wakeup-step and waiting if the HDD
> is already up.
>

A command that forces a wakeup will not do any harm, or take much time,
if the disk is already awake, so there is no harm there.

> Arno
>

| Next | Last
Pages: 1 2
Prev: making a removable SSD drive nonremovable
Next: Bad Clusters vs. Bad Sectors