Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Feb 2006 16:38:03 -0600
From:      Mike Holloway <mikhollo@cisco.com>
To:        freebsd-proliant@freebsd.org
Subject:   hpasmcli locks up a DL380G3
Message-ID:  <A26FFDF4-5C23-4C09-8934-73914CD0F331@cisco.com>

next in thread | raw e-mail | index | archive | help
 >> Hi!
 >>
 >> Sorry for being late on this one, found this browsing around.
 >>
 >> Yes, I have had ONE machine lock up on me once.
 >> And older HP Proliant DL380G1 UP. Just as you describe, it had been
 >> working great for
 >> a couple of weeks, then suddenly when starting hpasmcli it froze.
 >> Couldn't even ping the machine.
 >>
 >> This particualar machine really is not doing anything, and as I  
belive
 >> it still is running (Moved/changed job) and I could probably
 >> recreate the lockup.
 >> I still have access to this machine, so if anyone want me to try
 >> something, I can do it.
 >> The machine is 600km away from me now, so if lockup occurs it can
 >> take some time to get it
 >> powercycled though.
 >>
 >> Oh, 5.3 or 5.4 as I recall.
 >>
 >> Have you seen any other lockup Greg?
 >
 >I haven't tempted fate that way yet. I always restart the hpasmd
 >before using the client on a machine. This seems to avoid the problem.
 >
 >Thanks for responding to my mail, you're the third person to confirm
 >the problem, which given that it locks the machine up hard, is a very
 >serious one.
 >
 >best.
 >greg.


Besides the hpasmcli tool hanging just after the banner message, I've  
also experienced reboots caused by hpasmd, and have had to remove it  
completely from my test lab servers.  I was able to find a scenario  
which would invariably cause the servers to reboot, I had hpasmd  
running on approximately 20 HP DL380 G4 servers all running the same  
customized FreeBSD 6.0 release kernel on x86 (intel xeon).

All machines were configured to run hpasmcli -s "show temps;" every 5  
minutes, within a perl wrapper around hpasmcli (included below) which  
would kill the perl wrapper process (and so hpasmcli) via an ALARM  
signal if hpasmcli didn't exit within 45 seconds.  Within a few  
hours, a few machines would show the hpasmcli tool hanging and only  
displaying the banner message.  Cron was continuing to run-and-kill  
the hung hpasmcli tool every 5 minutes for some period of hours  
before I would notice.  After commenting out the cron job and  
verifying that no hpasmcli processes existed,  I could then stop  
hpasmd via the init script, which sends a TERM signal to the process  
followed by a KILL signal a couple of seconds later.  Without  
exception those servers would spontaneously reboot a few minutes  
(2-5) later.  On servers that the hpasmcli tool hadn't yet hung, I  
could stop hpasmd with no ill effects to the system.


John, are you still working on this very useful tool?  I can provide  
access to a DL380 G4 if you need a platform to test on.


-mike


#!/usr/bin/perl

eval {
    local $SIG{ALRM} =
    sub {
       local $SIG{HUP} = 'IGNORE';
       kill 1,(-$$);
    };
    alarm 45;
    system ("/usr/sbin/hpasmcli -s \"show temps;\"");
    alarm 0;
};

$SIG{HUP} = 'DEFAULT';

exit 0;



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A26FFDF4-5C23-4C09-8934-73914CD0F331>