Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Oct 2001 19:41:13 +1000
From:      Stanley Hopcroft <Stanley.Hopcroft@IPAustralia.Gov.AU>
To:        Questions@FreeBSD.ORG
Subject:   4.3-RELEASE kaput after running Perl applications that forks (and pings).
Message-ID:  <20011022194111.B375@IPAustralia.Gov.AU>

next in thread | raw e-mail | index | archive | help
Dear Ladies and Gentlemen,

I am writing to invite your comment on something I feel more likely to
occur on an MS Operating system.

An unloaded FreeBSD 4.3-RELEASE reboots part way through a Perl script
that forks a few hundred processes that exec 'ping'.

There is nothing in dmesg (apart from boot messages) or in
/var/log/messages.

Should I enable a debugging kernel ?

Why wasn't the user (un priviledged process) simply terminated ?

When the application runs, the system

. shows a load average of no more than 30
. total CPU utilisation bottoming at 0% with <= 40% system utilisation
. is running no more than 500 user processes.
. has no exceptional swap activity (can't remember what it was)

If there was a panic, I don't know what the message was; the only
indication I had of the machine failing was the ssh connection closing.

Here is the application; it failed when it tried to have 255 processes
running at one time (replace 100 by 255 in Parallel::ForkManager
constructor; this is a CPAN class that while new seems Ok)

#!/usr/bin/perl -w

use strict ;
use Parallel::ForkManager ;

use constant TIMEOUT    => 5 ;
use constant COUNT      => 5 ;
use constant PING_CMD   => 'ping -q -n -c ' . COUNT . ' -t '. TIMEOUT ;
use constant DEBUG      => 0 ;

close STDOUT unless DEBUG ;
close STDERR unless DEBUG ;

my ($pm, $start, $host, $address) ;

$pm = new Parallel::ForkManager(100) ;

foreach $start (3..7, 10, 96, 98, 100) {
  foreach $host (0..255) {
    $address = "10.0.$start.$host" ;
    print STDERR "${ \PING_CMD } $address\n" ;
    $pm->start and next ;
    exec "${ \PING_CMD } $address" ;
  }
}
$pm->wait_all_children ;

The reboot is repeatable.

There is nothing extraordinary running when the application that causs
the reboot runs.

Thank you,

Yours sincerely.



-- 
------------------------------------------------------------------------
Stanley Hopcroft	IP Australia
Network Specialist
+61 2 6283 3189	+61 2 6281 1353 (FAX)	Stanley.Hopcroft@IPAustralia.Gov.AU
------------------------------------------------------------------------
cursor address, n:
	"Hello, cursor!"
		-- Stan Kelly-Bootle, "The Devil's DP Dictionary"


Notes

1 The culprit

wins> uname -a
FreeBSD wins.aipo.gov.au 4.3-RELEASE FreeBSD 4.3-RELEASE #2: Wed Jul  4
19:09:37 EST 2001     root@wins.aipo.gov.au:/usr/src/sys/compile/WINS
i386
wins> 

2 It's usual slothful activity

last pid: 94364;  load averages:  0.01,  0.03,  0.00
up 3+08:04:31  19:38:32
29 processes:  1 running, 28 sleeping
CPU states:     % user,     % nice,     % system,     % interrupt,     %
idle
Mem: 12M Active, 38M Inact, 23M Wired, 12K Cache, 35M Buf, 176M Free
Swap: 256M Total, 256M Free

  PID USERNAME PRI NICE  SIZE    RES STATE    TIME   WCPU    CPU COMMAND
  132 bind       2   0  4248K  3904K select   2:52  0.00%  0.00% named
  248 root       2   0  2760K  2224K select   0:56  0.00%  0.00% nmbd
 5009 root       2 -12  1256K   920K select   0:10  0.00%  0.00% ntpd
  259 root       2   0  1500K  1112K select   0:06  0.00%  0.00% httpd
  169 root      10   0   980K   740K nanslp   0:03  0.00%  0.00% cron
  128 root       2   0   928K   636K select   0:03  0.00%  0.00% syslogd
  175 root       2   0  2148K  1516K select   0:02  0.00%  0.00% sshd
78845 root       2   0  3336K  3056K select   0:00  0.00%  0.00% dhcpd
  250 root      -6   0  1916K  1276K piperd   0:00  0.00%  0.00% nmbd
  151 root       2   0  1120K   812K select   0:00  0.00%  0.00% amd
94220 root       2   0  2232K  1816K select   0:00  0.00%  0.00% sshd
94221 anwsmh    18   0  1272K   928K pause    0:00  0.00%  0.00% csh
94364 anwsmh    28   0  1884K  1096K RUN      0:00  0.00%  0.00% top
  265 nobody     2   0  1548K  1196K accept   0:00  0.00%  0.00% httpd
  266 nobody     2   0  1548K  1196K accept   0:00  0.00%  0.00% httpd
  136 daemon     2   0   940K   636K select   0:00  0.00%  0.00% portmap
  272 root       3   0   936K   652K ttyin    0:00  0.00%  0.00% getty
  271 root       3   0   936K   652K ttyin    0:00  0.00%  0.00% getty
  273 root       3   0   936K   652K ttyin    0:00  0.00%  0.00% getty
  172 root       2   0   928K   636K select   0:00  0.00%  0.00% lpd
  263 nobody     2   0  1500K  1124K accept   0:00  0.00%  0.00% httpd
  264 nobody     2   0  1500K  1124K accept   0:00  0.00%  0.00% httpd
 5612 nobody     2   0  1508K  1144K accept   0:00  0.00%  0.00% httpd
  267 nobody     2   0  1500K  1124K accept   0:00  0.00%  0.00% httpd
   28 root      18   0   208K    92K pause    0:00  0.00%  0.00%
adjkerntz
  143 root      10   0   208K    80K nfsidl   0:00  0.00%  0.00% nfsiod
  145 root      10   0   208K    80K nfsidl   0:00  0.00%  0.00% nfsiod
  146 root      10   0   208K    80K nfsidl   0:00  0.00%  0.00% nfsiod
  144 root      10   0   208K    80K nfsidl   0:00  0.00%  0.00% nfsiod







To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011022194111.B375>