**Debian(sarge)でsmartmontoolsを設定 [#eea98060]
#contents

**smartmontoolsを設定 [#eea98060]

 まずは、smartmontoolsのインストール。
 # apt-get update
 # apt-get install smartmontools
#br
 インストール後に自動で起動しようとするが、初期設定で起動しない状態になっているので今は起動しない。~
 /etc/default/smartmontoolsの以下の部分をコメントアウトすることで起動可能となる。
 start_smartd=yes
 smartd_opts="--interval=1800"
#br
 /etc/smartd.confも編集して異常時にメールを送るようにする。複数ドライブを対象にする場合は、いちいち全部記述する必要があるようだ。SATAの場合は、"-d ata"の追加でチェック可能となった。
 /dev/hda -H -m root
 /dev/sda -d ata -H -m root
 /etc/smartd.confも編集して異常時にメールを送るようにする。全ドライブで同じ設定なので、DEVICESCANを使用する。定期的に自己チェックも行う。
 DEVICESCAN -s (S/../.././03|L/../../4/04) -a -m root -d sat
#br
 設定がすんだので、smartmontoolsを起動する。
 /etc/init.d/smartmontools start
 # /etc/init.d/smartmontools start

**故障状態 [#e4ae9982]
***故障がひどくてMDから切り離されているHDD [#y188d88c]
 # smartctl -a /dev/sdb
 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
 Home page is http://smartmontools.sourceforge.net/
 
 === START OF INFORMATION SECTION ===
 Device Model:     ST3500320NS
 Serial Number:    9QMCGMCY
 Firmware Version: SN06
 User Capacity:    500,107,862,016 bytes
 Device is:        Not in smartctl database [for details use: -P showall]
 ATA Version is:   8
 ATA Standard is:  ATA-8-ACS revision 4
 Local Time is:    Mon Nov  8 09:18:00 2010 JST
 SMART support is: Available - device has SMART capability.
 SMART support is: Enabled
 
 === START OF READ SMART DATA SECTION ===
 SMART overall-health self-assessment test result: FAILED!
 Drive failure expected in less than 24 hours. SAVE ALL DATA.
 See vendor-specific Attribute list for failed Attributes.
 
 General SMART Values:
 Offline data collection status:  (0x82) Offline data collection activity
                                         was completed without error.
                                         Auto Offline Data Collection: Enabled.
 Self-test execution status:      (   0) The previous self-test routine completed
                                         without error or no self-test has ever
                                         been run.
 Total time to complete Offline
 data collection:                 ( 634) seconds.
 Offline data collection
 capabilities:                    (0x7b) SMART execute Offline immediate.
                                         Auto Offline data collection on/off support.
                                         Suspend Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
 SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
 Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
 Short self-test routine
 recommended polling time:        (   1) minutes.
 Extended self-test routine
 recommended polling time:        ( 113) minutes.
 Conveyance self-test routine
 recommended polling time:        (   2) minutes.
 SCT capabilities:              (0x103d) SCT Status supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.
 
 SMART Attributes Data Structure revision number: 10
 Vendor Specific SMART Attributes with Thresholds:
 ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   042   042   044    Pre-fail  Always   FAILING_NOW 137492794
   3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
   4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       6
   5 Reallocated_Sector_Ct   0x0033   001   001   036    Pre-fail  Always   FAILING_NOW 2046
   7 Seek_Error_Rate         0x000f   076   060   030    Pre-fail  Always       -       45791826
   9 Power_On_Hours          0x0032   093   093   000    Old_age   Always       -       6570
  10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
  12 Power_Cycle_Count       0x0032   100   037   020    Old_age   Always       -       6
 184 Unknown_Attribute       0x0032   100   100   099    Old_age   Always       -       0
 187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       752
 188 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0
 189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
 190 Airflow_Temperature_Cel 0x0022   074   071   045    Old_age   Always       -       26 (Lifetime Min/Max 26/28)
 194 Temperature_Celsius     0x0022   026   040   000    Old_age   Always       -       26 (0 20 0 0)
 195 Hardware_ECC_Recovered  0x001a   050   050   000    Old_age   Always       -       137492794
 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
 199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
 
 SMART Error Log Version: 1
 ATA Error Count: 547 (device log contains only the most recent five errors)
         CR = Command Register [HEX]
         FR = Features Register [HEX]
         SC = Sector Count Register [HEX]
         SN = Sector Number Register [HEX]
         CL = Cylinder Low Register [HEX]
         CH = Cylinder High Register [HEX]
         DH = Device/Head Register [HEX]
         DC = Device Command Register [HEX]
         ER = Error register [HEX]
         ST = Status register [HEX]
 Powered_Up_Time is measured from power on, and printed as
 DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
 SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
 Error 547 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
   When the command that caused the error occurred, the device was active or idle.
 
   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   04 71 04 9d 00 32 e0  Device Fault; Error: ABRT
 
   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   a1 00 00 00 00 00 a0 02  23d+00:58:38.089  IDENTIFY PACKET DEVICE
   ec 00 00 00 00 00 a0 02  23d+00:58:38.081  IDENTIFY DEVICE
   ff 00 00 00 00 00 00 02  23d+00:58:37.925  [VENDOR SPECIFIC]
   00 00 00 00 00 00 00 06  23d+00:58:37.925  NOP [Abort queued commands]
   a1 00 00 00 00 00 a0 02  23d+00:58:32.897  IDENTIFY PACKET DEVICE
 
 Error 546 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
   When the command that caused the error occurred, the device was active or idle.
 
   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   04 71 04 9d 00 32 e0
 
   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   ec 00 00 00 00 00 a0 02  23d+00:58:38.081  IDENTIFY DEVICE
   ff 00 00 00 00 00 00 02  23d+00:58:37.925  [VENDOR SPECIFIC]
   00 00 00 00 00 00 00 06  23d+00:58:37.925  NOP [Abort queued commands]
   a1 00 00 00 00 00 a0 02  23d+00:58:32.897  IDENTIFY PACKET DEVICE
   ec 00 00 00 00 00 a0 02  23d+00:58:32.889  IDENTIFY DEVICE
 
 Error 545 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
   When the command that caused the error occurred, the device was active or idle.
 
   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   04 71 04 9d 00 32 e0  Device Fault; Error: ABRT
 
   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   a1 00 00 00 00 00 a0 02  23d+00:58:32.897  IDENTIFY PACKET DEVICE
   ec 00 00 00 00 00 a0 02  23d+00:58:32.889  IDENTIFY DEVICE
   ff 00 00 00 00 00 00 02  23d+00:58:32.733  [VENDOR SPECIFIC]
   00 00 00 00 00 00 00 06  23d+00:58:32.733  NOP [Abort queued commands]
   a1 00 00 00 00 00 a0 02  23d+00:58:27.705  IDENTIFY PACKET DEVICE
 
 Error 544 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
   When the command that caused the error occurred, the device was active or idle.
 
   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   04 71 04 9d 00 32 e0
 
   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   ec 00 00 00 00 00 a0 02  23d+00:58:32.889  IDENTIFY DEVICE
   ff 00 00 00 00 00 00 02  23d+00:58:32.733  [VENDOR SPECIFIC]
   00 00 00 00 00 00 00 06  23d+00:58:32.733  NOP [Abort queued commands]
   a1 00 00 00 00 00 a0 02  23d+00:58:27.705  IDENTIFY PACKET DEVICE
   ec 00 00 00 00 00 a0 02  23d+00:58:27.681  IDENTIFY DEVICE
 
 Error 543 occurred at disk power-on lifetime: 6561 hours (273 days + 9 hours)
   When the command that caused the error occurred, the device was active or idle.
 
   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   04 71 04 9d 00 32 e0  Device Fault; Error: ABRT
 
   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   a1 00 00 00 00 00 a0 02  23d+00:58:27.705  IDENTIFY PACKET DEVICE
   ec 00 00 00 00 00 a0 02  23d+00:58:27.681  IDENTIFY DEVICE
   25 00 00 ff ff ff ef 00  23d+00:58:24.309  READ DMA EXT
   25 00 00 ff ff ff ef 00  23d+00:58:23.426  READ DMA EXT
   25 00 80 ff ff ff ef 00  23d+00:58:23.371  READ DMA EXT
 
 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
 # 1  Extended offline    Completed: unknown failure    90%      6492         -
 # 2  Extended offline    Completed: unknown failure    90%      6324         -
 # 3  Extended offline    Completed: unknown failure    90%      6156         -
 # 4  Extended offline    Completed: unknown failure    90%      5988         -
 # 5  Extended offline    Completed: unknown failure    90%      5820         -
 # 6  Extended offline    Completed without error       00%      5654         -
 # 7  Extended offline    Completed without error       00%      5486         -
 # 8  Extended offline    Completed without error       00%      5318         -
 # 9  Extended offline    Completed without error       00%      5150         -
 #10  Extended offline    Completed without error       00%      4982         -
 #11  Extended offline    Completed without error       00%      4814         -
 #12  Extended offline    Completed without error       00%      4646         -
 #13  Extended offline    Completed without error       00%      4478         -
 #14  Extended offline    Completed without error       00%      4310         -
 #15  Extended offline    Completed without error       00%      4142         -
 #16  Extended offline    Completed without error       00%      3974         -
 #17  Extended offline    Completed without error       00%      3806         -
 #18  Extended offline    Completed without error       00%      3638         -
 #19  Extended offline    Completed without error       00%      3470         -
 #20  Extended offline    Completed without error       00%      3302         -
 #21  Extended offline    Completed without error       00%      3134         -
 
 SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
 Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
 If Selective self-test is pending on power-up, resume after 0 minute delay.


トップ   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS