Michael Myers wrote:
> http://groups.google.co.uk/group/com...owse_frm/threa
> d/5800e131547381a/eef20cf9806fcb25?rnum=21&hl=en&q=ultra+scsi+temper ature+di
> sk+solaris+work&_done=%2Fgroup%2Fcomp.sys.sun.hard ware%2Fbrowse_frm%2Fthread
> %2F5800e131547381a%2Fcda131318d7ff786%3Flnk%3Dst%2 6q%3Dultra+scsi+temperatur
> e+disk+solaris+work%26rnum%3D1%26hl%3Den%26#doc_70 3f8f97188185f9
>
>
> wow Dave, it's very very interesting !
> Thanks ;-)
> Mike
>
>
If you find
http://groups.google.co.uk/group/com...3f8f97188185f9
interesting, you might like to take a look at this, which is a shell
script I wrote to check disk temperature. I call it from cron every 15
minutes. If within 4 degrees C of the maximum temperature it issues a
warning in syslog. If within 2 degrees C is shuts the machine down. It
finds all the disks, as given in /dev/rdsk. You can exclude any, such as
a CD-ROM drive.
It calls a binary executable, the C source of which I have put in the
shell script as a comment. Save the C to a file hdtemp.c (remove the #
as the comment), then
gcc hdtemp.c -o hdtemp
There is a line in the script which says
"THERE IS NO NEED TO CHANGE ANYTHING BELOW THIS LINE"
but I just realised the location of the binary is hardcoded as
/usr/local/bin, so that statement is not necessary true.
You might also want to change the fact it logs to syslog with local3.
Choose whatever you want, assuming it still conforms to syslog protocol
(RFC 3164).
As I said earlier, there are more complex utilities which are supposed
to predict when hard drives will fail, but from what I have read, these
don't actually work too well in practice, so I settled on temperature only.
On an old SS20, which tends to run very warm, it is not possible to use
the ultrascsi interface. So I used a brute-force approach, with a
bi-metallic strip.
http://www.g8wrb.org/useful-stuff/Su...20/index.shtml
Good luck,
dave k
#!/bin/sh
# This shell script calls a *C* program 'hdtemp'
# which outputs temperature given the raw disk
# Here is a typical output from 'hdtemp'
# /usr/local/bin/hdtemp /dev/rdsk/c1t1d0s2
# current temp = 38 C, trip temp = 65 C
#
PATH=/usr/bin:/usr/sbin
# This script will log to syslog disks that are too warm. If the
# disks are very warm, the system will be shut down.
# The C source code of 'hdtemp' is at the bottom of this
# script. You will need to save that in a file, after
# removing the hashes, which are used as comments.
# You *must* set any disks you want to exclude for testing. The CD-ROM
# is an obvious example of one to exclude. Also, exclude any that
# do not support the ability to measure temperature.
excludedisks="c0t6d0s2"
# Warnings are issued if there is less than
# 'warningsafetymargin' deg C between the actual temperature
# and the trip temperature.
warningsafetymargin=4
# The system is shut down if there are less than
# 'criticalsafetymargin' deg C between the actual temperature
# and the trip temperature.
criticalsafetymargin=2
################################################## #######
################################################## #######
## THERE IS NO NEED TO CHANGE ANYTHING BELOW THIS LINE ##
################################################## #######
################################################## #######
# Check all disks on the system, except any that are
# excluded.
for disk in `ls /dev/rdsk/*s2 | grep -v $excludedisks`
do
# Read the disk temperature.
temp=`/usr/local/bin/hdtemp $disk | awk '{print $4}'`
# Read the maximum permitted temperature.
maxtemp=`/usr/local/bin/hdtemp $disk | awk '{print $9}'`
# Compute the difference between the actual temperature and the
# maximum temperature.
margin=`echo $maxtemp - $temp | bc`
# Use syslog to log a message if the disk is too warm.
if [ $margin -lt $criticalsafetymargin ] ; then
echo Shutting down due to temperature problem on disk $disk
Temperature is $temp deg C. Max permissable temperature is $maxtemp deg
C. System shut down since the safety margin is less than
$warningsafetymargin deg C. | logger -p local3.crit
shutdown -y -g 120 -i 5
exit 1
fi
# If the disk is really warm, then log it as critical and shut the
system down.
if [ $margin -lt $warningsafetymargin ] ; then
echo Temperature problem on disk $disk Temperature is $temp deg C.
Max permissable temperature is $maxtemp deg C. Warnings are issued when
the safety margin is less than $warningsafetymargin deg C. | logger -p
local3.warn
fi
done
exit 0
# To put in one place, here is the C source of 'hdtemp'
# Compile with gcc hdtemp.c -o hdtemp
##include <stdlib.h>
##include <stdio.h>
##include <sys/types.h>
##include <fcntl.h>
##include <unistd.h>
##include <errno.h>
##include <string.h>
##include <sys/scsi/scsi.h>
#
##define LOG_SENSE 0x4d
##define TEMPERATURE_PAGE 0x0d
#
#int
#scsi_log_sense(int fd, int pagenum, uint8_t *pbuf, size_t buflen,
# size_t known_resp_len)
#{
# struct uscsi_cmd ucmd;
# struct scsi_extended_sense sense;
# uint8_t cdb[10];
# int status;
#
# memset(&ucmd, 0, sizeof (ucmd));
# memset(cdb, 0, sizeof (cdb));
#
# cdb[0] = LOG_SENSE;
# cdb[2] = 0x40 | (pagenum & 0x3f);
# cdb[7] = known_resp_len >> 8;
# cdb[8] = known_resp_len;
#
# ucmd.uscsi_cdb = (caddr_t)cdb;
# ucmd.uscsi_cdblen = sizeof (cdb);
# ucmd.uscsi_bufaddr = pbuf;
# ucmd.uscsi_buflen = known_resp_len;
# ucmd.uscsi_rqbuf = (caddr_t)&sense;
# ucmd.uscsi_rqlen = sizeof (sense);
# ucmd.uscsi_timeout = 15;
# ucmd.uscsi_flags = USCSI_READ;
#
# status = ioctl(fd, USCSICMD, &ucmd);
#
# return (status);
#}
#int main(int argc, char *argv[])
#{
# char *device;
# int fd;
# int err;
# uint8_t tbuf[16];
#
# if (argc != 2) {
# fprintf(stderr, "usage: %s <device>\n", argv[0]);
# exit (1);
# }
# device = argv[1];
#
# fd = open(device, O_RDONLY | O_NONBLOCK);
# if (fd < 0) {
# perror(device);
# exit(1);
# }
#
# memset(tbuf, 0, sizeof (tbuf));
# err = scsi_log_sense(fd, TEMPERATURE_PAGE, tbuf, sizeof (tbuf),
# sizeof (tbuf));
# if (err != 0) {
# perror("scsi_log_sense failed - disk might not exist,
or be powered off");
# tbuf[9]=1;
# }
# if (tbuf[9] == 255 )
# tbuf[9]=0;
# printf("current temp = %d C, trip temp = %d C\n", tbuf[9], tbuf[15]);
#
# return (0);
#}
--
Dave K
http://www.southminster-branch-line.org.uk/
Please note my email address changes periodically to avoid spam.
It is always of the form: month-year@domain. Hitting reply will work
for a couple of months only. Later set it manually. The month is
always written in 3 letters (e.g. Jan, not January etc)