From owner-freebsd-hardware@FreeBSD.ORG Fri Feb 13 15:47:00 2009 Return-Path: Delivered-To: freebsd-hardware@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 664EC1065675 for ; Fri, 13 Feb 2009 15:47:00 +0000 (UTC) (envelope-from roy.badami@globalgraphics.com) Received: from queeg.cam.harlequin.co.uk (queeg.cam.harlequin.co.uk [82.1.12.146]) by mx1.freebsd.org (Postfix) with ESMTP id E16F98FC21 for ; Fri, 13 Feb 2009 15:46:59 +0000 (UTC) (envelope-from roy.badami@globalgraphics.com) Received: from post.cam.harlequin.co.uk (post.cam.harlequin.co.uk [172.16.130.105]) by queeg.cam.harlequin.co.uk (8.13.8/8.13.8) with ESMTP id n1DFOTOK006894 for ; Fri, 13 Feb 2009 15:24:29 GMT (envelope-from roy.badami@globalgraphics.com) Received: from erbium.cam.harlequin.co.uk (dhcp-vlan200-cam404.cam.harlequin.co.uk [172.16.159.148]) by post.cam.harlequin.co.uk (8.12.10/8.12.10) with ESMTP id n1DFOSev020585; Fri, 13 Feb 2009 15:24:28 GMT (envelope-from roy.badami@globalgraphics.com) From: Roy Badami MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18837.37036.643000.795911@erbium.cam.harlequin.co.uk> Date: Fri, 13 Feb 2009 15:24:28 +0000 To: freebsd-hardware@freebsd.org X-Mailer: VM 7.11 under Emacs 21.2.1 Subject: SIIG Cyber Serial 4S and system hang X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Feb 2009 15:47:01 -0000 Can't find anyone else having the same problem as me, so I'm hoping this is the right place to post... The short version: I have a FreeBSD 7.1-RELEASE-p2 machine with a SIIG CyberSerial 4S card (one of the ones with the 10x clock) and I can fairly consistently make the machine hang by accessing the serial port. The long version: The machine is an Intel D815EEA ('Easton') motherboard with an 866 MHz Pentium III processor, and is running FreeBSD 7.1-RELEASE-p2. The kernel configuration is identical to GENERIC except that device puc has been uncommented. The machine has one serial port on the back (not sure if there's a second on the motherboard which isn't brought out) and I've installed a SIIG CyberSerial 4S card which identifies itself as: puc0: port 0xcf20-0xcf3f,0xcf00-0xcf1f mem 0xfc9fc000-0xfc9fcfff,0xfc9fb000-0xfc9fbfff irq 11 at device 11.0 on pci1 I believe this is one of the cards with the 10x clock, the patches for which finally made it to prime time in 7.1 (this card didn't work at all for me in 7.0). Initially I had smstools-3.1.3_1 running on cuad0 (m/b serial port) talking to an external GSM terminal for sending SMS messages and this worked fine. I then attached a serial-attached thermometer (that we built ourselves) to cuau0 (the first port on the CyberSerial) and started the temperature monitoring daemon, thermd (a perl script we wrote ourselves). When I started the temperature monitoring daemon from the command line, the system pretty consistently immediately locked up - both my ssh session and the physical console stopped responding - the Caps Lock light wouldn't even toggle. The really odd thing though is that now I swapped the two applications round, so the thermometer is on cuad0 and the GSM terminal is on cuau0 then everything is working fine. I've attached the thermd script in case the way it accesses the tty sheds any light as to what was going on, but unfortunately it's not possible to reproduce the hang without the (custom built) thermometer attached to the serial port. Although things seem to be working, I'd like to get to the bottom of this so that I can be confident that the machine won't freeze again in the future - and I have more serial applications I need to run on this server. Any suggestions as to how to proceed - short of changing to a different model of serial card? Thanks, -roy -------------------- #!/usr/local/bin/perl5 $ALERT_TEMP = 26; #$ALERT_TEMP = 32; $ALERT_INTERVAL = 60*20; $ALERT_THRESHOLD_FALLING = 2; $ALERT_THRESHOLD_RISING = 2; $ALERT = 'alert_hall'; $tempfile = '/usr/local/etc/temperature'; $pager = '/usr/local/bin/pager'; $debug = 0; use POSIX; use Fcntl; die "No tty specified" unless $ARGV[0]; $tty=$ARGV[0]; $tty = "/dev/$tty" unless $tty =~ m'/'; &open_tty(\*TTY, $tty); open TEMP, "+> $tempfile" or die "Can't open $tempfile"; &daemon unless $debug; for (;;) { until (&check_thermometer(\*TTY)) { print STDERR "Error communicating with thermometer" unless $error; $error = 1; #No need to wait, since there's a wait in check_thermometer } print STDERR "Communication established" if $error; $error = 0; for (;;) { $temp = &read_thermometer(\*TTY); if (!defined($temp)) { print STDERR "Error reading thermometer"; $error = 1; last; } &process_temp($temp); sleep 10; } } exit; sub open_tty { my ($FILE, $tty) = @_; my ($termios,$iflag,$oflag,$cflag,$lflag); open $FILE, "+> $tty" or die "Can't open $tty"; $termios = POSIX::Termios->new; $termios->getattr(fileno $FILE); #iflag $iflag = $termios->getiflag; $iflag &= ~(BRKINT|PARMRK|ISTRIP|INLCR|IGNCR|ICRNL|IXON|IXOFF|IXANY|IMAXBEL); $iflag |= (IGNBRK|IGNPAR|INPCK); $termios->setiflag($iflag); #oflag $oflag = $termios->getoflag; $oflag &= ~OPOST; $termios->setoflag($oflag); #cflag $cflag = $termios->getcflag; $cflag = ($cflag & ~CSIZE) | CS8; $cflag &= ~(CSTOPB|PARENB|HUPCL|CCTR_OFLOW|CRTS_IFLOW|MDMBUF); $cflag |= (CREAD|CLOCAL); $termios->setcflag($cflag); #lflag $lflag = $termios->getlflag; $lflag &= ~(ECHO|ECHOCTL|ISIG|ICANON|IEXTEN); $termios->setlflag($lflag); #reads time out after 2 seconds $termios->setcc(VMIN, 0); $termios->setcc(VTIME, 20); #baud rate $termios->setispeed(B9600); $termios->setospeed(B9600); #doit! $termios->setattr(fileno $FILE, TCSANOW); } sub check_thermometer { my ($TTY) = @_; my ($fl, $buffer, $result); $fl = fcntl $TTY, &F_GETFL, 0; $fl |= &O_NONBLOCK; fcntl $TTY, &F_SETFL, $fl; sysread $TTY, $buffer, 8192; $fl &= ~&O_NONBLOCK; fcntl $TTY, &F_SETFL, $fl; defined(syswrite $TTY, 'S', 1) or die "Error writing to tty"; defined(sysread $TTY, $buffer, 1) or die "Error reading from tty"; #Allow thermometer to recover sleep 1; $result = ($buffer eq 'S'); if ($debug) { print 'check_thermometer: ', ($result?'OK':'Failed'), "\n"; } $result; } sub read_thermometer { my ($TTY) = @_; my ($buffer, $res_string); defined(syswrite $TTY, 'T', 1) or die "Error writing to tty"; defined(sysread $TTY, $buffer, 1) or die "Error reading from tty"; sleep 1; $res_string = ($buffer?sprintf("%03d",ord($buffer)):'ERR'); lseek (fileno TEMP, 0, &SEEK_SET); syswrite (TEMP, $res_string, 3); if ($debug) { print "read_thermometer: $res_string\n" } $buffer ? ord($buffer) : undef; } sub process_temp { my ($temp) = @_; my ($time); $time = time; if ($temp >= $ALERT_TEMP && (!$alert || ($temp >= $last_alert_temp + $ALERT_THRESHOLD_RISING && $time-$last_alert_time > $ALERT_INTERVAL))) { if ($alert) { &alert("Temperature rising, now ${temp}C"); } else { &alert("Over temperature alarm: ${temp}C"); } $alert = 1; $last_alert_temp = $temp; $last_alert_time = $time; } elsif ($alert && $temp <= ($last_alert_temp - $ALERT_THRESHOLD_FALLING)) { if ($temp < $ALERT_TEMP) { &alert("Temperature now normal at ${temp}C"); $alert = 0; } elsif ($time-$last_alert_time > $ALERT_INTERVAL) { &alert("Temperature dropping, now ${temp}C"); $last_alert_temp = $temp; $last_alert_time = $time; } } } sub alert { my ($msg) = @_; print STDERR "thermometer: $msg\n"; system ("$pager -p $ALERT $msg"); } sub daemon { my ($pid); if ($daemon) { return } $daemon = 1; open NULL, "/dev/null" or die "Can't open /dev/null"; open CONSOLE, "/dev/console" or die "Can't open /dev/console"; $pid = fork; die "Can't fork" unless defined $pid; exit if $pid; # Parent setsid(); close STDIN; close STDOUT; close STDERR; open STDIN, "<&NULL"; open STDOUT, ">&NULL"; open STDERR, ">&CONSOLE"; close NULL; close CONSOLE; }