How do I monitor my Hard Disks for errors?
From CLUG Wiki
This is a new page, and might contain technically incorrect information. Please use at your own risk. If you are able to correct any errors or expand this document, please do so.
Warning: This is a Debian-centric page
This page is written by a Debian user with Debian in mind. Thus, he liberally uses apt-get, and assumes that
everything will work exactly the same for you.
If you don't run Debian (or something based on it like Ubuntu), it won't, so please find the differences and add them to this page.
Contents |
SMART
SMART (Self-Monitoring, Analysis and Reporting Technology System) is built into all modern Hard Drives (IDE, SCSI, and SATA).It helps you to tell when a drive is going to fail by reporting the number of relocated sectors, error rates, Temperature, Age, and reports of errors.
smartmontools
smartmontools (the successor to smartsuite) is the package you have to install on Linux to get this functionality.
apt-get install smartmontools
You will have to edit /etc/default/smartmontools to enable it to monitor your disks, and you may want to
edit /etc/smartd.conf to customize the monitoring.
By default smartd will output warnings to the system logs, so if you don't logcheck your system,
you might want to change this to email you. Also, you probably don't want to know when the temperature changes 1 degree,
so you must tell it to ignore Attribute 194.
I often use this line in /etc/smartd.conf
DEVICESCAN -H -l error -l selftest -t -I 1 -I 194 -I 195 -s (S/../.././02|L/../../6/03)
Monitor:
-
-HHealth -
-l errorerror log -
-l selftestself test log -
-tdo self tests -
-IIgnore:-
1Raw Read Error Rate -
194Temperature -
195Hardware ECC Erros Recovered
-
-
(S/../.././02)short self-test every morning at 2am -
(L/../../6/03)long self-test every Saturday morning at 3am.
SMART and SATA
If you are using the libata driver (i.e. you see SCSI disks), you need to patch your libata to allow
it to handle SMART commands. Find the patch for your kernel at the site listed below.
Then whenever you use smartctl, append -d ata to it's command line. smartd.conf will also
require updating.
Signs of impending doom
Well, unless you get smartd to email you, you will need to run logcheck
(How do I keep an eye on my computer's logs without reading them?) if you want to find out when anything bad happens...
Dying but still mostly OK
Jun 1 22:12:52 imago smartd[8521]: Device: /dev/hdc, 5 Currently unreadable (pending) sectors
Pending sectors are sectors that have gone bad, but as no one has tried to write over them, they havn't been relocated yet. When you start getting bad sector, it is a sign that the drive is on its way out, but it's still useable.
If it is still under waranty, you might be able to get an exchange. Run your vendor's (probably DOS-based) tool on it and see what it says.
