smartmontoolsを設定

 まずは、smartmontoolsのインストール。

# apt-get update
# apt-get install smartmontools
 

 インストール後に自動で起動しようとするが、初期設定で起動しない状態になっているので今は起動しない。
 /etc/default/smartmontoolsの以下の部分をコメントアウトすることで起動可能となる。

start_smartd=yes
smartd_opts="--interval=1800"
 

 /etc/smartd.confも編集して異常時にメールを送るようにする。全ドライブで同じ設定なので、DEVICESCANを使用する。定期的に自己チェックも行う。

DEVICESCAN -s (S/../.././03|L/../../4/04) -a -m root -d sat
 

 設定がすんだので、smartmontoolsを起動する。

# /etc/init.d/smartmontools start

故障状態

故障がひどくてMDから切り離されているHDD

# smartctl -a /dev/sdb
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST3500320NS
Serial Number:    9QMCGMCY
Firmware Version: SN06
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Nov  8 09:18:00 2010 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 634) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 113) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103d) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   042   042   044    Pre-fail  Always   FAILING_NOW 137492794
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       6
  5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2046
  7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       45791826
  9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6570
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       6
184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       752
188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   074   071   045    Old_age   Always       -       26 (Lifetime Min/Max 26/28)
194 Temperature_Celsius     0x0022   026   040   000    Old_age   Always       -       26 (0 20 0 0)
195 Hardware_ECC_Recovered  0x001a   050   050   000    Old_age   Always       -       137492794
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 547 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 547 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 00 00 00 00 a0 02  23d+00:58:38.089  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02  23d+00:58:38.081  IDENTIFY DEVICE
  ff 00 00 00 00 00 00 02  23d+00:58:37.925  [VENDOR SPECIFIC]
  00 00 00 00 00 00 00 06  23d+00:58:37.925  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 02  23d+00:58:32.897  IDENTIFY PACKET DEVICE

Error 546 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 e0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 02  23d+00:58:38.081  IDENTIFY DEVICE
  ff 00 00 00 00 00 00 02  23d+00:58:37.925  [VENDOR SPECIFIC]
  00 00 00 00 00 00 00 06  23d+00:58:37.925  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 02  23d+00:58:32.897  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02  23d+00:58:32.889  IDENTIFY DEVICE

Error 545 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 00 00 00 00 a0 02  23d+00:58:32.897  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02  23d+00:58:32.889  IDENTIFY DEVICE
  ff 00 00 00 00 00 00 02  23d+00:58:32.733  [VENDOR SPECIFIC]
  00 00 00 00 00 00 00 06  23d+00:58:32.733  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 02  23d+00:58:27.705  IDENTIFY PACKET DEVICE

Error 544 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 e0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ec 00 00 00 00 00 a0 02  23d+00:58:32.889  IDENTIFY DEVICE
  ff 00 00 00 00 00 00 02  23d+00:58:32.733  [VENDOR SPECIFIC]
  00 00 00 00 00 00 00 06  23d+00:58:32.733  NOP [Abort queued commands]
  a1 00 00 00 00 00 a0 02  23d+00:58:27.705  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02  23d+00:58:27.681  IDENTIFY DEVICE

Error 543 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 71 04 9d 00 32 e0  Device Fault; Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  a1 00 00 00 00 00 a0 02  23d+00:58:27.705  IDENTIFY PACKET DEVICE
  ec 00 00 00 00 00 a0 02  23d+00:58:27.681  IDENTIFY DEVICE
  25 00 00 ff ff ff ef 00  23d+00:58:24.309  READ DMA EXT
  25 00 00 ff ff ff ef 00  23d+00:58:23.426  READ DMA EXT
  25 00 80 ff ff ff ef 00  23d+00:58:23.371  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: unknown failure    90%      6492         -
# 2  Extended offline    Completed: unknown failure    90%      6324         -
# 3  Extended offline    Completed: unknown failure    90%      6156         -
# 4  Extended offline    Completed: unknown failure    90%      5988         -
# 5  Extended offline    Completed: unknown failure    90%      5820         -
# 6  Extended offline    Completed without error       00%      5654         -
# 7  Extended offline    Completed without error       00%      5486         -
# 8  Extended offline    Completed without error       00%      5318         -
# 9  Extended offline    Completed without error       00%      5150         -
#10  Extended offline    Completed without error       00%      4982         -
#11  Extended offline    Completed without error       00%      4814         -
#12  Extended offline    Completed without error       00%      4646         -
#13  Extended offline    Completed without error       00%      4478         -
#14  Extended offline    Completed without error       00%      4310         -
#15  Extended offline    Completed without error       00%      4142         -
#16  Extended offline    Completed without error       00%      3974         -
#17  Extended offline    Completed without error       00%      3806         -
#18  Extended offline    Completed without error       00%      3638         -
#19  Extended offline    Completed without error       00%      3470         -
#20  Extended offline    Completed without error       00%      3302         -
#21  Extended offline    Completed without error       00%      3134         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2010-11-08 (月) 09:22:25 (3269d)