FreeBSD Mail Archives

Date:      Tue, 13 Mar 2007 14:08:48 +0000
From:      Adrian Wontroba <aw1@stade.co.uk>
To:        freebsd-stable@freebsd.org
Subject:   6.2-STABLE deadlock?
Message-ID:  <20070313140848.GA89182@steerpike.hanley.stade.co.uk>

next in thread | raw e-mail | index | archive | help

At work, amoungst my stable of old computers running FreeBSD, I have a
Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
primarily runs Nagios and a small and lightly used MySQL database, along
with a few inbound FTP transfers per minute. It has a Mylex card based
disc subsystem, ruling out crash dumps.

At some point during 5.5-STABLE this machine started to occasionally hang while
performing its daily "application" housekeeping - closing and restarting
Apache and Nagios, and dumping the database. Upgrading to 6.2-STABLE
appeared to solve the problem, with no problems visible while running
1,000 cycles of the sequence which seemed to provoke the problem.

cvsup for this version of the kernel and userland was run at 01:20 GMT
on 06 March.

However, shortly after 15:15 last Sunday afternoon the machine hung
again "out of the blue". kdb diagnostics were taken some 12 hours later,
and look somewhat odd. Maybe it was left to fester for too long.

ps etc output at http://www.stade.co.uk/crash/console which contains
boot to boot serial console output, including some output from test
cycles. I'd be grateful for any expert comments on the ps etc output.

Supporting stuff.

[root@beastie ~/crash]# df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/mlxd0s1a 507630 70074 396946 15% /
devfs 1 1 0 100% /dev
/dev/mlxd0s1f 63541498 44355014 14103166 76% /home
/dev/mlxd0s1e 16244334 6784900 8159888 45% /usr
/dev/mlxd0s1d 1012974 117456 814482 13% /var
/dev/md0 1646 32 1484 2% /home/topftp/instances
/dev/md1 253678 132 233252 0% /tmp

[root@beastie ~]# find /var -inum 23 -ls
23 4 -rw-r--r-- 1 daemon daemon 60 Mar 12 20:22 /var/rwho/whod.xjamesfriis

Problem stopped http and FTP logging soon after 15:14 on Sunday 11, diagnostics taken and machine rebooted around 04:30 on Monday 12.

172.19.112.92 - - [11/Mar/2007:15:14:53 +0000] "GET / HTTP/1.0" 200 688 "-" "check_http/1.89 (nagios-plugins 1.4.3)"
<time passes>
172.19.112.92 - - [12/Mar/2007:04:44:14 +0000] "GET / HTTP/1.0" 200 688 "-" "check_http/1.89 (nagios-plugins 1.4.3)"

Mar 11 15:15:35 beastie ftpd[91652]: connection from appsupcen (10.208.1.134)
Mar 11 15:15:35 beastie ftpd[91652]: FTP LOGIN FROM appsupcen as topftp
Mar 11 15:15:35 beastie ftpd[91652]: session root changed to /home/topftp/instances
Mar 11 15:15:35 beastie ftpd[91652]: put in.env_status.html.gz = 592 bytes (wd: /topftp/appsupcen; chrooted)
<time passes>
Mar 11 15:15:35 beastie ftpd[91652]: rename in.env_status.html.gz env_status.html.gz (wd: /topftp/appsupcen; chrooted)
Mar 12 04:44:31 beastie ftpd[1161]: connection from appsupcen (10.208.1.134)
Mar 12 04:44:31 beastie ftpd[1161]: FTP LOGIN FROM appsupcen as topftp
Mar 12 04:44:31 beastie ftpd[1161]: session root changed to /home/topftp/instances
Mar 12 04:44:31 beastie ftpd[1161]: mkdir topftp/appsupcen (wd: /; chrooted)

Support diary:

15:20
Beastie seems like its crashed and down;

16:54
Beastie is now longer pingable by rjmon1;

04:30 - 04:43
(support person quoting from the documentation I'd provided about what
to do after a hang)
Type "return tilde hash" (CR~#) which will make cu send a break signal to beastie, and should cause beastie to drop into the ddb kernel debugger.
In the following, you may see "more" prompts. Type space at each for the next page.
Type these debugger commands
ps
show pcpu
show allpcpu
show locks
show alllocks
show lockedvnods
trace
alltrace
04:43 - beastie now back up and working now by typing call cpu_reset()
after the above commands to reboot beastie.

AW: preserved and inspected diagnostic output. It looks very unlike
that for previous crashes (without a serial console) where a noticable
feature was many ftpd processes in a UFS state. Possibly "things
happened" in the 12 hour period between the onset of the problem on
Sunday afternoon and the diagnostics being taken on Monday morning.

--
Adrian Wontroba
Adrian's Birthday Celebration: Crewe Limelight, Saturday 17 March. David
Hughes and Tiny Tin Lady. Free but ticketed - email me your postal
address if you want to come. No under 18s.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070313140848.GA89182>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation