How do I monitor my Hard Disks for errors?

From CLUG Wiki

Jump to: navigation, search

newpage icon WARNING: This is a new page.

This is a new page, and might contain technically incorrect information. Please use at your own risk. If you are able to correct any errors or expand this document, please do so.


Debian Logo Warning: This is a Debian-centric page

This page is written by a Debian user with Debian in mind. Thus, he liberally uses apt-get, and assumes that everything will work exactly the same for you.

If you don't run Debian (or something based on it like Ubuntu), it won't, so please find the differences and add them to this page.


Contents

SMART

SMART (Self-Monitoring, Analysis and Reporting Technology System) is built into all modern Hard Drives (IDE, SCSI, and SATA).It helps you to tell when a drive is going to fail by reporting the number of relocated sectors, error rates, Temperature, Age, and reports of errors.

smartmontools

smartmontools (the successor to smartsuite) is the package you have to install on Linux to get this functionality.

apt-get install smartmontools

You will have to edit /etc/default/smartmontools to enable it to monitor your disks, and you may want to edit /etc/smartd.conf to customize the monitoring.

By default smartd will output warnings to the system logs, so if you don't logcheck your system, you might want to change this to email you. Also, you probably don't want to know when the temperature changes 1 degree, so you must tell it to ignore Attribute 194.

I often use this line in /etc/smartd.conf

DEVICESCAN -H -l error -l selftest -t -I 1 -I 194 -I 195 -s (S/../.././02|L/../../6/03)

Monitor:

  • -H Health
  • -l error error log
  • -l selftest self test log
  • -t do self tests
  • -I Ignore:
    • 1 Raw Read Error Rate
    • 194 Temperature
    • 195 Hardware ECC Erros Recovered
  • (S/../.././02) short self-test every morning at 2am
  • (L/../../6/03) long self-test every Saturday morning at 3am.

SMART and SATA

If you are using the libata driver (i.e. you see SCSI disks), you need to patch your libata to allow it to handle SMART commands. Find the patch for your kernel at the site listed below.

Then whenever you use smartctl, append -d ata to it's command line. smartd.conf will also require updating.

Signs of impending doom

Well, unless you get smartd to email you, you will need to run logcheck (How do I keep an eye on my computer's logs without reading them?) if you want to find out when anything bad happens...

Dying but still mostly OK

Jun  1 22:12:52 imago smartd[8521]: Device: /dev/hdc, 5 Currently unreadable (pending) sectors

Pending sectors are sectors that have gone bad, but as no one has tried to write over them, they havn't been relocated yet. When you start getting bad sector, it is a sign that the drive is on its way out, but it's still useable.

If it is still under waranty, you might be able to get an exchange. Run your vendor's (probably DOS-based) tool on it and see what it says.

Links

smartmontools Home page

libata kernel patches

SATA on Linux