From owner-freebsd-cluster@FreeBSD.ORG Sun Apr 11 12:20:13 2004 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C2E3916A4CE; Sun, 11 Apr 2004 12:20:13 -0700 (PDT) Received: from madison.lisco.com (madison.lisco.com [69.18.32.36]) by mx1.FreeBSD.org (Postfix) with ESMTP id E267A43D2F; Sun, 11 Apr 2004 12:20:12 -0700 (PDT) (envelope-from freebsd@thebeatbox.org) Received: from rw (69-18-60-38.lisco.net [69.18.60.38]) by madison.lisco.com (8.11.6/8.11.2) with ESMTP id i3BJK7J00333; Sun, 11 Apr 2004 14:20:11 -0500 (CDT) From: "Roland Wells" To: "'Jeffrey Racine'" , , Date: Sun, 11 Apr 2004 14:20:01 -0500 Message-ID: <024f01c41ffa$029327e0$0c03a8c0@internal.thebeatbox.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2616 In-Reply-To: <1081635706.67575.26.camel@x1-6-00-b0-d0-c2-67-0e.twcny.rr.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Importance: Normal Subject: RE: LAM MPI on dual processor opteron box sees only one cpu... X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Apr 2004 19:20:14 -0000 Jeffrey, I am not familiar with the LAM MPI issue, but in a dual proc box, you should also get an additional line towards the bottom in your dmesg, similar to: SMP: AP CPU #1 Launched! -Roland -----Original Message----- From: owner-freebsd-cluster@freebsd.org [mailto:owner-freebsd-cluster@freebsd.org] On Behalf Of Jeffrey Racine Sent: Saturday, April 10, 2004 5:22 PM To: freebsd-amd64@freebsd.org; freebsd-cluster@freebsd.org Subject: LAM MPI on dual processor opteron box sees only one cpu... Hi. I am converging on getting a new dual opteron box running. Now I am setting up and testing LAM MPI, however, the OS is not farming out=20 the job as expected, and only sees one processor.=20 This runs fine on RH 7.3 and RH 9.0 both on a cluster and on a dual processor PIV desktop. I am running 5-current. Basically, mpirun -np 1 binaryfile has the same runtime as mpirun -np 2 binaryfile, while on the dual PIV box it runs in half the time. When I check top, mpirun -np 2 both run on CPU 0... here is the relevant portion from top with -np 2... 9306 jracine 4 0 7188K 2448K sbwait 0 0:03 19.53% 19.53% n_lam 29307 jracine 119 0 7148K 2372K CPU0 0 0:03 19.53% 19.53% n_lam I include output from laminfo, dmesg (cpu relevnt info), and lamboot -d bhost.lam... any suggestions most appreciated, and thanks in advance! -- laminfo LAM/MPI: 7.0.4 Prefix: /usr/local Architecture: amd64-unknown-freebsd5.2 Configured by: root Configured on: Sat Apr 10 11:22:02 EDT 2004 Configure host: jracine.maxwell.syr.edu C bindings: yes C++ bindings: yes Fortran bindings: yes C profiling: yes C++ profiling: yes Fortran profiling: yes ROMIO support: yes IMPI support: no Debug support: no Purify clean: no SSI boot: globus (Module v0.5) SSI boot: rsh (Module v1.0) SSI coll: lam_basic (Module v7.0) SSI coll: smp (Module v1.0) SSI rpi: crtcp (Module v1.0.1) SSI rpi: lamd (Module v7.0) SSI rpi: sysv (Module v7.0) SSI rpi: tcp (Module v7.0) SSI rpi: usysv (Module v7.0) -- dmesg sees two cpus... CPU: AMD Opteron(tm) Processor 248 (2205.02-MHz K8-class CPU) Origin =3D "AuthenticAMD" Id =3D 0xf58 Stepping =3D 8 Features=3D0x78bfbff AMD Features=3D0xe0500800 real memory =3D 3623813120 (3455 MB) avail memory =3D 3494363136 (3332 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 -- bhost has the requisite information 128.230.130.10 cpu=3D2 user=3Djracine -- Here are the results from lamboot -d bhost.lam -bash-2.05b$ lamboot -d ~/bhost.lam n0<29283> ssi:boot: Opening n0<29283> ssi:boot: opening module globus n0<29283> ssi:boot: initializing module globus n0<29283> ssi:boot:globus: globus-job-run not found, globus boot will not run n0<29283> ssi:boot: module not available: globus n0<29283> ssi:boot: opening module rsh n0<29283> ssi:boot: initializing module rsh n0<29283> ssi:boot:rsh: module initializing n0<29283> ssi:boot:rsh:agent: rsh n0<29283> ssi:boot:rsh:username: n0<29283> ssi:boot:rsh:verbose: 1000 n0<29283> ssi:boot:rsh:algorithm: linear n0<29283> ssi:boot:rsh:priority: 10 n0<29283> ssi:boot: module available: rsh, priority: 10 n0<29283> ssi:boot: finalizing module globus n0<29283> ssi:boot:globus: finalizing n0<29283> ssi:boot: closing module globus n0<29283> ssi:boot: Selected boot module rsh =20 LAM 7.0.4/MPI 2 C++/ROMIO - Indiana University =20 n0<29283> ssi:boot:base: looking for boot schema in following directories: n0<29283> ssi:boot:base: n0<29283> ssi:boot:base: $TROLLIUSHOME/etc n0<29283> ssi:boot:base: $LAMHOME/etc n0<29283> ssi:boot:base: /usr/local/etc n0<29283> ssi:boot:base: looking for boot schema file: n0<29283> ssi:boot:base: /home/jracine/bhost.lam n0<29283> ssi:boot:base: found boot schema: /home/jracine/bhost.lam n0<29283> ssi:boot:rsh: found the following hosts: n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu (cpu=3D2) n0<29283> ssi:boot:rsh: resolved hosts: n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu --> 128.230.130.10 (origin)n0<29283> ssi:boot:rsh: starting RTE procs n0<29283> ssi:boot:base:linear: starting n0<29283> ssi:boot:base:server: opening server TCP socket n0<29283> ssi:boot:base:server: opened port 49832 n0<29283> ssi:boot:base:linear: booting n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting lamd on (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting on n0 (jracine.maxwell.syr.edu): hboot -t -c lam-conf.lamd -d -I -H 128.230.130.10 -P 49832 -n 0 -o 0 n0<29283> ssi:boot:rsh: launching locally hboot: performing tkill hboot: tkill -d tkill: setting prefix to (null) tkill: setting suffix to (null) tkill: got killname back: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile tkill: removing socket file ... tkill: socket file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-kernel-socketd tkill: removing IO daemon socket file ... tkill: IO daemon socket file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-io-socket tkill: f_kill =3D = "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" tkill: nothing to kill: "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" hboot: booting... hboot: fork /usr/local/bin/lamd [1] 29286 lamd -H 128.230.130.10 -P 49832 -n 0 -o 0 -d n0<29283> ssi:boot:rsh: successfully launched on n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:base:server: expecting connection from finite list hboot: attempting to execute n-1<29286> ssi:boot: Opening n-1<29286> ssi:boot: opening module globus n-1<29286> ssi:boot: initializing module globus n-1<29286> ssi:boot:globus: globus-job-run not found, globus boot will not run n-1<29286> ssi:boot: module not available: globus n-1<29286> ssi:boot: opening module rsh n-1<29286> ssi:boot: initializing module rsh n-1<29286> ssi:boot:rsh: module initializing n-1<29286> ssi:boot:rsh:agent: rsh n-1<29286> ssi:boot:rsh:username: n-1<29286> ssi:boot:rsh:verbose: 1000 n-1<29286> ssi:boot:rsh:algorithm: linear n-1<29286> ssi:boot:rsh:priority: 10 n-1<29286> ssi:boot: module available: rsh, priority: 10 n-1<29286> ssi:boot: finalizing module globus n-1<29286> ssi:boot:globus: finalizing n-1<29286> ssi:boot: closing module globus n-1<29286> ssi:boot: Selected boot module rsh n0<29283> ssi:boot:base:server: got connection from 128.230.130.10 n0<29283> ssi:boot:base:server: this connection is expected (n0) n0<29283> ssi:boot:base:server: remote lamd is at 128.230.130.10:50206 n0<29283> ssi:boot:base:server: closing server socket n0<29283> ssi:boot:base:server: connecting to lamd at 128.230.130.10:49833 n0<29283> ssi:boot:base:server: connected n0<29283> ssi:boot:base:server: sending number of links (1) n0<29283> ssi:boot:base:server: sending info: n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:base:server: finished sending n0<29283> ssi:boot:base:server: disconnected from 128.230.130.10:49833 n0<29283> ssi:boot:base:linear: finished n0<29283> ssi:boot:rsh: all RTE procs started n0<29283> ssi:boot:rsh: finalizing n0<29283> ssi:boot: Closing n-1<29286> ssi:boot:rsh: finalizing n-1<29286> ssi:boot: Closing _______________________________________________ freebsd-cluster@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-cluster To unsubscribe, send any mail to "freebsd-cluster-unsubscribe@freebsd.org" From owner-freebsd-cluster@FreeBSD.ORG Mon Apr 12 06:04:31 2004 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 36F9216A4CE; Mon, 12 Apr 2004 06:04:31 -0700 (PDT) Received: from ms-smtp-01.nyroc.rr.com (ms-smtp-01.nyroc.rr.com [24.24.2.55]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9405243D53; Mon, 12 Apr 2004 06:04:30 -0700 (PDT) (envelope-from jracine@maxwell.syr.edu) Received: from [24.59.145.52] (syr-24-59-145-52.twcny.rr.com [24.59.145.52]) i3CD4Rdd013237; Mon, 12 Apr 2004 09:04:27 -0400 (EDT) From: Jeffrey Racine To: Roland Wells In-Reply-To: <024f01c41ffa$029327e0$0c03a8c0@internal.thebeatbox.org> References: <024f01c41ffa$029327e0$0c03a8c0@internal.thebeatbox.org> Content-Type: text/plain Organization: Syracuse University Message-Id: <1081775064.990.13.camel@x1-6-00-b0-d0-c2-67-0e.twcny.rr.com> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.5.6.2FreeBSD GNOME Team Port Date: Mon, 12 Apr 2004 09:04:24 -0400 Content-Transfer-Encoding: 7bit X-Virus-Scanned: Symantec AntiVirus Scan Engine cc: freebsd-cluster@freebsd.org cc: freebsd-amd64@freebsd.org Subject: RE: LAM MPI on dual processor opteron box sees only one cpu... X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Apr 2004 13:04:31 -0000 Hi Roland. I do get CPU #1 launched. This is not the problem. The problem appears to be with the way that current is scheduling. With mpirun np 2 I get the job running on CPU 0 (two instances on one proc). However, it turns out that with np 4 I get the job running on CPU 0 and 1 though with 4 instances (and associated overhead). Here is top for np 4... notice that in the C column it is using both procs. PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 96090 jracine 131 0 7148K 2172K CPU1 1 0:19 44.53% 44.53% n_lam 96088 jracine 125 0 7148K 2172K RUN 0 0:18 43.75% 43.75% n_lam 96089 jracine 136 0 7148K 2172K RUN 1 0:19 42.19% 42.19% n_lam 96087 jracine 135 0 7188K 2248K RUN 0 0:19 41.41% 41.41% n_lam One run (once when I rebooted lam) did allocate the job correctly with np 2, but this is not in general the case. On other systems I use, however, they correctly farm out np 2 to CPU 0 and 1... Thanks, and any suggestions welcome. -- Jeff On Sun, 2004-04-11 at 14:20 -0500, Roland Wells wrote: > Jeffrey, > I am not familiar with the LAM MPI issue, but in a dual proc box, you > should also get an additional line towards the bottom in your dmesg, > similar to: > > SMP: AP CPU #1 Launched! > > -Roland > -----Original Message----- > From: owner-freebsd-cluster@freebsd.org > [mailto:owner-freebsd-cluster@freebsd.org] On Behalf Of Jeffrey Racine > Sent: Saturday, April 10, 2004 5:22 PM > To: freebsd-amd64@freebsd.org; freebsd-cluster@freebsd.org > Subject: LAM MPI on dual processor opteron box sees only one cpu... > > > Hi. > > I am converging on getting a new dual opteron box running. Now I am > setting up and testing LAM MPI, however, the OS is not farming out > the job as expected, and only sees one processor. > > This runs fine on RH 7.3 and RH 9.0 both on a cluster and on a dual > processor PIV desktop. I am running 5-current. Basically, mpirun -np 1 > binaryfile has the same runtime as mpirun -np 2 binaryfile, while on the > dual PIV box it runs in half the time. When I check top, mpirun -np 2 > both run on CPU 0... here is the relevant portion from top with -np 2... > > 9306 jracine 4 0 7188K 2448K sbwait 0 0:03 19.53% 19.53% n_lam > 29307 jracine 119 0 7148K 2372K CPU0 0 0:03 19.53% 19.53% > n_lam > > I include output from laminfo, dmesg (cpu relevnt info), and lamboot -d > bhost.lam... any suggestions most appreciated, and thanks in advance! > > -- laminfo > > LAM/MPI: 7.0.4 > Prefix: /usr/local > Architecture: amd64-unknown-freebsd5.2 > Configured by: root > Configured on: Sat Apr 10 11:22:02 EDT 2004 > Configure host: jracine.maxwell.syr.edu > C bindings: yes > C++ bindings: yes > Fortran bindings: yes > C profiling: yes > C++ profiling: yes > Fortran profiling: yes > ROMIO support: yes > IMPI support: no > Debug support: no > Purify clean: no > SSI boot: globus (Module v0.5) > SSI boot: rsh (Module v1.0) > SSI coll: lam_basic (Module v7.0) > SSI coll: smp (Module v1.0) > SSI rpi: crtcp (Module v1.0.1) > SSI rpi: lamd (Module v7.0) > SSI rpi: sysv (Module v7.0) > SSI rpi: tcp (Module v7.0) > SSI rpi: usysv (Module v7.0) > > -- dmesg sees two cpus... > > CPU: AMD Opteron(tm) Processor 248 (2205.02-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0xf58 Stepping = 8 > > Features=0x78bfbff MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> > AMD Features=0xe0500800 > real memory = 3623813120 (3455 MB) > avail memory = 3494363136 (3332 MB) > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > > -- bhost has the requisite information > > 128.230.130.10 cpu=2 user=jracine > > -- Here are the results from lamboot -d bhost.lam > > -bash-2.05b$ lamboot -d ~/bhost.lam > n0<29283> ssi:boot: Opening > n0<29283> ssi:boot: opening module globus > n0<29283> ssi:boot: initializing module globus > n0<29283> ssi:boot:globus: globus-job-run not found, globus boot will > not run n0<29283> ssi:boot: module not available: globus n0<29283> > ssi:boot: opening module rsh n0<29283> ssi:boot: initializing module rsh > n0<29283> ssi:boot:rsh: module initializing n0<29283> > ssi:boot:rsh:agent: rsh n0<29283> ssi:boot:rsh:username: > n0<29283> ssi:boot:rsh:verbose: 1000 n0<29283> ssi:boot:rsh:algorithm: > linear n0<29283> ssi:boot:rsh:priority: 10 n0<29283> ssi:boot: module > available: rsh, priority: 10 n0<29283> ssi:boot: finalizing module > globus n0<29283> ssi:boot:globus: finalizing n0<29283> ssi:boot: closing > module globus n0<29283> ssi:boot: Selected boot module rsh > > LAM 7.0.4/MPI 2 C++/ROMIO - Indiana University > > n0<29283> ssi:boot:base: looking for boot schema in following > directories: > n0<29283> ssi:boot:base: > n0<29283> ssi:boot:base: $TROLLIUSHOME/etc > n0<29283> ssi:boot:base: $LAMHOME/etc > n0<29283> ssi:boot:base: /usr/local/etc > n0<29283> ssi:boot:base: looking for boot schema file: > n0<29283> ssi:boot:base: /home/jracine/bhost.lam > n0<29283> ssi:boot:base: found boot schema: /home/jracine/bhost.lam > n0<29283> ssi:boot:rsh: found the following hosts: > n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu (cpu=2) > n0<29283> ssi:boot:rsh: resolved hosts: > n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu --> 128.230.130.10 > (origin)n0<29283> ssi:boot:rsh: starting RTE procs > n0<29283> ssi:boot:base:linear: starting > n0<29283> ssi:boot:base:server: opening server TCP socket n0<29283> > ssi:boot:base:server: opened port 49832 n0<29283> ssi:boot:base:linear: > booting n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting > lamd on (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting on n0 > (jracine.maxwell.syr.edu): hboot -t -c lam-conf.lamd -d -I -H > 128.230.130.10 -P 49832 -n 0 -o 0 n0<29283> ssi:boot:rsh: launching > locally > hboot: performing tkill > hboot: tkill -d > tkill: setting prefix to (null) > tkill: setting suffix to (null) > tkill: got killname > back: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile > tkill: removing socket file ... > tkill: socket > file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-kernel-socketd > tkill: removing IO daemon socket file ... > tkill: IO daemon socket > file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-io-socket > tkill: f_kill = "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" > tkill: nothing to kill: > "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" > hboot: booting... > hboot: fork /usr/local/bin/lamd > [1] 29286 lamd -H 128.230.130.10 -P 49832 -n 0 -o 0 -d n0<29283> > ssi:boot:rsh: successfully launched on n0 > (jracine.maxwell.syr.edu) > n0<29283> ssi:boot:base:server: expecting connection from finite list > hboot: attempting to execute > n-1<29286> ssi:boot: Opening > n-1<29286> ssi:boot: opening module globus > n-1<29286> ssi:boot: initializing module globus > n-1<29286> ssi:boot:globus: globus-job-run not found, globus boot will > not run n-1<29286> ssi:boot: module not available: globus n-1<29286> > ssi:boot: opening module rsh n-1<29286> ssi:boot: initializing module > rsh n-1<29286> ssi:boot:rsh: module initializing n-1<29286> > ssi:boot:rsh:agent: rsh n-1<29286> ssi:boot:rsh:username: > n-1<29286> ssi:boot:rsh:verbose: 1000 n-1<29286> ssi:boot:rsh:algorithm: > linear n-1<29286> ssi:boot:rsh:priority: 10 n-1<29286> ssi:boot: module > available: rsh, priority: 10 n-1<29286> ssi:boot: finalizing module > globus n-1<29286> ssi:boot:globus: finalizing n-1<29286> ssi:boot: > closing module globus n-1<29286> ssi:boot: Selected boot module rsh > n0<29283> ssi:boot:base:server: got connection from 128.230.130.10 > n0<29283> ssi:boot:base:server: this connection is expected (n0) > n0<29283> ssi:boot:base:server: remote lamd is at 128.230.130.10:50206 > n0<29283> ssi:boot:base:server: closing server socket n0<29283> > ssi:boot:base:server: connecting to lamd at 128.230.130.10:49833 > n0<29283> ssi:boot:base:server: connected n0<29283> > ssi:boot:base:server: sending number of links (1) n0<29283> > ssi:boot:base:server: sending info: n0 > (jracine.maxwell.syr.edu) > n0<29283> ssi:boot:base:server: finished sending > n0<29283> ssi:boot:base:server: disconnected from 128.230.130.10:49833 > n0<29283> ssi:boot:base:linear: finished n0<29283> ssi:boot:rsh: all RTE > procs started n0<29283> ssi:boot:rsh: finalizing n0<29283> ssi:boot: > Closing n-1<29286> ssi:boot:rsh: finalizing n-1<29286> ssi:boot: Closing > > > > _______________________________________________ > freebsd-cluster@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-cluster > To unsubscribe, send any mail to > "freebsd-cluster-unsubscribe@freebsd.org" > From owner-freebsd-cluster@FreeBSD.ORG Mon Apr 12 06:13:18 2004 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1C48E16A4CE; Mon, 12 Apr 2004 06:13:17 -0700 (PDT) Received: from pimout1-ext.prodigy.net (pimout1-ext.prodigy.net [207.115.63.77]) by mx1.FreeBSD.org (Postfix) with ESMTP id DB51043D2D; Mon, 12 Apr 2004 06:13:16 -0700 (PDT) (envelope-from bob@immure.com) Received: from maul.immure.com (adsl-66-136-206-1.dsl.austtx.swbell.net [66.136.206.1])i3CDDFuO220884; Mon, 12 Apr 2004 09:13:15 -0400 Received: (from root@localhost) by maul.immure.com (8.12.11/8.12.11) id i3CDDFOt086742; Mon, 12 Apr 2004 08:13:15 -0500 (CDT) (envelope-from bob@immure.com) Received: from luke.immure.com (luke.immure.com [10.1.132.3]) by maul.immure.com (8.12.11/8.12.3) with ESMTP id i3CDDDcX086717; Mon, 12 Apr 2004 08:13:13 -0500 (CDT) (envelope-from bob@immure.com) Received: from luke.immure.com (localhost [127.0.0.1]) by luke.immure.com (8.12.11/8.12.11) with ESMTP id i3CDDDga071341; Mon, 12 Apr 2004 08:13:13 -0500 (CDT) (envelope-from bob@luke.immure.com) Received: (from bob@localhost) by luke.immure.com (8.12.11/8.12.11/Submit) id i3CDDDZM071340; Mon, 12 Apr 2004 08:13:13 -0500 (CDT) (envelope-from bob) Date: Mon, 12 Apr 2004 08:13:13 -0500 From: Bob Willcox To: Jeffrey Racine Message-ID: <20040412131313.GA70050@luke.immure.com> References: <024f01c41ffa$029327e0$0c03a8c0@internal.thebeatbox.org> <1081775064.990.13.camel@x1-6-00-b0-d0-c2-67-0e.twcny.rr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1081775064.990.13.camel@x1-6-00-b0-d0-c2-67-0e.twcny.rr.com> User-Agent: Mutt/1.5.6i X-scanner: scanned by Inflex 1.0.12.3 on maul.immure.com cc: freebsd-amd64@freebsd.org cc: freebsd-cluster@freebsd.org Subject: Re: LAM MPI on dual processor opteron box sees only one cpu... X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Bob Willcox List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Apr 2004 13:13:18 -0000 On Mon, Apr 12, 2004 at 09:04:24AM -0400, Jeffrey Racine wrote: > Hi Roland. > > I do get CPU #1 launched. This is not the problem. > > The problem appears to be with the way that current is scheduling. > > With mpirun np 2 I get the job running on CPU 0 (two instances on one > proc). However, it turns out that with np 4 I get the job running on CPU > 0 and 1 though with 4 instances (and associated overhead). Here is top > for np 4... notice that in the C column it is using both procs. > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > COMMAND > 96090 jracine 131 0 7148K 2172K CPU1 1 0:19 44.53% 44.53% > n_lam > 96088 jracine 125 0 7148K 2172K RUN 0 0:18 43.75% 43.75% > n_lam > 96089 jracine 136 0 7148K 2172K RUN 1 0:19 42.19% 42.19% > n_lam > 96087 jracine 135 0 7188K 2248K RUN 0 0:19 41.41% 41.41% > n_lam > > > One run (once when I rebooted lam) did allocate the job correctly with > np 2, but this is not in general the case. On other systems I use, > however, they correctly farm out np 2 to CPU 0 and 1... > > Thanks, and any suggestions welcome. What scheduler are you using? I've seen this behavior on my 5-current Athlon MP system when running two instances of setiathome (when running with the default SCHED_ULE scheduler). Sometimes it would run both setiathome processes on the same CPU for hours (even days) leaving one CPU essentially idle. When I switched to the SCHED_4BSD scheduler it then ran setiathome on both CPUs. Bob > > -- Jeff > > On Sun, 2004-04-11 at 14:20 -0500, Roland Wells wrote: > > Jeffrey, > > I am not familiar with the LAM MPI issue, but in a dual proc box, you > > should also get an additional line towards the bottom in your dmesg, > > similar to: > > > > SMP: AP CPU #1 Launched! > > > > -Roland > > -----Original Message----- > > From: owner-freebsd-cluster@freebsd.org > > [mailto:owner-freebsd-cluster@freebsd.org] On Behalf Of Jeffrey Racine > > Sent: Saturday, April 10, 2004 5:22 PM > > To: freebsd-amd64@freebsd.org; freebsd-cluster@freebsd.org > > Subject: LAM MPI on dual processor opteron box sees only one cpu... > > > > > > Hi. > > > > I am converging on getting a new dual opteron box running. Now I am > > setting up and testing LAM MPI, however, the OS is not farming out > > the job as expected, and only sees one processor. > > > > This runs fine on RH 7.3 and RH 9.0 both on a cluster and on a dual > > processor PIV desktop. I am running 5-current. Basically, mpirun -np 1 > > binaryfile has the same runtime as mpirun -np 2 binaryfile, while on the > > dual PIV box it runs in half the time. When I check top, mpirun -np 2 > > both run on CPU 0... here is the relevant portion from top with -np 2... > > > > 9306 jracine 4 0 7188K 2448K sbwait 0 0:03 19.53% 19.53% n_lam > > 29307 jracine 119 0 7148K 2372K CPU0 0 0:03 19.53% 19.53% > > n_lam > > > > I include output from laminfo, dmesg (cpu relevnt info), and lamboot -d > > bhost.lam... any suggestions most appreciated, and thanks in advance! > > > > -- laminfo > > > > LAM/MPI: 7.0.4 > > Prefix: /usr/local > > Architecture: amd64-unknown-freebsd5.2 > > Configured by: root > > Configured on: Sat Apr 10 11:22:02 EDT 2004 > > Configure host: jracine.maxwell.syr.edu > > C bindings: yes > > C++ bindings: yes > > Fortran bindings: yes > > C profiling: yes > > C++ profiling: yes > > Fortran profiling: yes > > ROMIO support: yes > > IMPI support: no > > Debug support: no > > Purify clean: no > > SSI boot: globus (Module v0.5) > > SSI boot: rsh (Module v1.0) > > SSI coll: lam_basic (Module v7.0) > > SSI coll: smp (Module v1.0) > > SSI rpi: crtcp (Module v1.0.1) > > SSI rpi: lamd (Module v7.0) > > SSI rpi: sysv (Module v7.0) > > SSI rpi: tcp (Module v7.0) > > SSI rpi: usysv (Module v7.0) > > > > -- dmesg sees two cpus... > > > > CPU: AMD Opteron(tm) Processor 248 (2205.02-MHz K8-class CPU) > > Origin = "AuthenticAMD" Id = 0xf58 Stepping = 8 > > > > Features=0x78bfbff > MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> > > AMD Features=0xe0500800 > > real memory = 3623813120 (3455 MB) > > avail memory = 3494363136 (3332 MB) > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > > cpu0 (BSP): APIC ID: 0 > > cpu1 (AP): APIC ID: 1 > > > > -- bhost has the requisite information > > > > 128.230.130.10 cpu=2 user=jracine > > > > -- Here are the results from lamboot -d bhost.lam > > > > -bash-2.05b$ lamboot -d ~/bhost.lam > > n0<29283> ssi:boot: Opening > > n0<29283> ssi:boot: opening module globus > > n0<29283> ssi:boot: initializing module globus > > n0<29283> ssi:boot:globus: globus-job-run not found, globus boot will > > not run n0<29283> ssi:boot: module not available: globus n0<29283> > > ssi:boot: opening module rsh n0<29283> ssi:boot: initializing module rsh > > n0<29283> ssi:boot:rsh: module initializing n0<29283> > > ssi:boot:rsh:agent: rsh n0<29283> ssi:boot:rsh:username: > > n0<29283> ssi:boot:rsh:verbose: 1000 n0<29283> ssi:boot:rsh:algorithm: > > linear n0<29283> ssi:boot:rsh:priority: 10 n0<29283> ssi:boot: module > > available: rsh, priority: 10 n0<29283> ssi:boot: finalizing module > > globus n0<29283> ssi:boot:globus: finalizing n0<29283> ssi:boot: closing > > module globus n0<29283> ssi:boot: Selected boot module rsh > > > > LAM 7.0.4/MPI 2 C++/ROMIO - Indiana University > > > > n0<29283> ssi:boot:base: looking for boot schema in following > > directories: > > n0<29283> ssi:boot:base: > > n0<29283> ssi:boot:base: $TROLLIUSHOME/etc > > n0<29283> ssi:boot:base: $LAMHOME/etc > > n0<29283> ssi:boot:base: /usr/local/etc > > n0<29283> ssi:boot:base: looking for boot schema file: > > n0<29283> ssi:boot:base: /home/jracine/bhost.lam > > n0<29283> ssi:boot:base: found boot schema: /home/jracine/bhost.lam > > n0<29283> ssi:boot:rsh: found the following hosts: > > n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu (cpu=2) > > n0<29283> ssi:boot:rsh: resolved hosts: > > n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu --> 128.230.130.10 > > (origin)n0<29283> ssi:boot:rsh: starting RTE procs > > n0<29283> ssi:boot:base:linear: starting > > n0<29283> ssi:boot:base:server: opening server TCP socket n0<29283> > > ssi:boot:base:server: opened port 49832 n0<29283> ssi:boot:base:linear: > > booting n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting > > lamd on (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting on n0 > > (jracine.maxwell.syr.edu): hboot -t -c lam-conf.lamd -d -I -H > > 128.230.130.10 -P 49832 -n 0 -o 0 n0<29283> ssi:boot:rsh: launching > > locally > > hboot: performing tkill > > hboot: tkill -d > > tkill: setting prefix to (null) > > tkill: setting suffix to (null) > > tkill: got killname > > back: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile > > tkill: removing socket file ... > > tkill: socket > > file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-kernel-socketd > > tkill: removing IO daemon socket file ... > > tkill: IO daemon socket > > file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-io-socket > > tkill: f_kill = "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" > > tkill: nothing to kill: > > "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" > > hboot: booting... > > hboot: fork /usr/local/bin/lamd > > [1] 29286 lamd -H 128.230.130.10 -P 49832 -n 0 -o 0 -d n0<29283> > > ssi:boot:rsh: successfully launched on n0 > > (jracine.maxwell.syr.edu) > > n0<29283> ssi:boot:base:server: expecting connection from finite list > > hboot: attempting to execute > > n-1<29286> ssi:boot: Opening > > n-1<29286> ssi:boot: opening module globus > > n-1<29286> ssi:boot: initializing module globus > > n-1<29286> ssi:boot:globus: globus-job-run not found, globus boot will > > not run n-1<29286> ssi:boot: module not available: globus n-1<29286> > > ssi:boot: opening module rsh n-1<29286> ssi:boot: initializing module > > rsh n-1<29286> ssi:boot:rsh: module initializing n-1<29286> > > ssi:boot:rsh:agent: rsh n-1<29286> ssi:boot:rsh:username: > > n-1<29286> ssi:boot:rsh:verbose: 1000 n-1<29286> ssi:boot:rsh:algorithm: > > linear n-1<29286> ssi:boot:rsh:priority: 10 n-1<29286> ssi:boot: module > > available: rsh, priority: 10 n-1<29286> ssi:boot: finalizing module > > globus n-1<29286> ssi:boot:globus: finalizing n-1<29286> ssi:boot: > > closing module globus n-1<29286> ssi:boot: Selected boot module rsh > > n0<29283> ssi:boot:base:server: got connection from 128.230.130.10 > > n0<29283> ssi:boot:base:server: this connection is expected (n0) > > n0<29283> ssi:boot:base:server: remote lamd is at 128.230.130.10:50206 > > n0<29283> ssi:boot:base:server: closing server socket n0<29283> > > ssi:boot:base:server: connecting to lamd at 128.230.130.10:49833 > > n0<29283> ssi:boot:base:server: connected n0<29283> > > ssi:boot:base:server: sending number of links (1) n0<29283> > > ssi:boot:base:server: sending info: n0 > > (jracine.maxwell.syr.edu) > > n0<29283> ssi:boot:base:server: finished sending > > n0<29283> ssi:boot:base:server: disconnected from 128.230.130.10:49833 > > n0<29283> ssi:boot:base:linear: finished n0<29283> ssi:boot:rsh: all RTE > > procs started n0<29283> ssi:boot:rsh: finalizing n0<29283> ssi:boot: > > Closing n-1<29286> ssi:boot:rsh: finalizing n-1<29286> ssi:boot: Closing > > > > > > > > _______________________________________________ > > freebsd-cluster@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-cluster > > To unsubscribe, send any mail to > > "freebsd-cluster-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-amd64@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-amd64 > To unsubscribe, send any mail to "freebsd-amd64-unsubscribe@freebsd.org" -- Bob Willcox What's done to children, they will do to society. bob@immure.com Austin, TX From owner-freebsd-cluster@FreeBSD.ORG Mon Apr 12 06:37:22 2004 Return-Path: Delivered-To: freebsd-cluster@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F72A16A4CE; Mon, 12 Apr 2004 06:37:22 -0700 (PDT) Received: from maxwell.syr.edu (maxwell.syr.edu [128.230.129.5]) by mx1.FreeBSD.org (Postfix) with ESMTP id 039C343D58; Mon, 12 Apr 2004 06:37:22 -0700 (PDT) (envelope-from jracine@maxwell.syr.edu) Received: from exchange.maxwell.syr.edu (excluster1.maxwell.syr.edu [128.230.129.230]) by maxwell.syr.edu (8.12.10/8.9.1) with ESMTP id i3CDbKo2005159; Mon, 12 Apr 2004 09:37:20 -0400 (EDT) X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Mon, 12 Apr 2004 09:36:18 -0400 Message-ID: <32A8B2CB12BFC84D8D11D872C787AA9A015E9AD2@EXCHANGE.forest.maxwell.syr.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: LAM MPI on dual processor opteron box sees only one cpu... Thread-Index: AcQgj+3mBKYbA9a3QJuXp89m+1xauwAAzTUM From: "Jeff Racine" To: "Bob Willcox" cc: freebsd-amd64@freebsd.org cc: freebsd-cluster@freebsd.org Subject: RE: LAM MPI on dual processor opteron box sees only one cpu... X-BeenThere: freebsd-cluster@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Clustering FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Apr 2004 13:37:22 -0000 Hi Bob. Good to hear someone else has seen this behavior. re: what scheduler am I using... ULE... Thanks! -- Jeff -----Original Message----- From: Bob Willcox [mailto:bob@immure.com] Sent: Mon 4/12/2004 9:13 AM To: Jeff Racine Cc: Roland Wells; freebsd-cluster@freebsd.org; freebsd-amd64@freebsd.org Subject: Re: LAM MPI on dual processor opteron box sees only one cpu... =20 On Mon, Apr 12, 2004 at 09:04:24AM -0400, Jeffrey Racine wrote: > Hi Roland. >=20 > I do get CPU #1 launched. This is not the problem. >=20 > The problem appears to be with the way that current is scheduling. >=20 > With mpirun np 2 I get the job running on CPU 0 (two instances on one > proc). However, it turns out that with np 4 I get the job running on = CPU > 0 and 1 though with 4 instances (and associated overhead). Here is top > for np 4... notice that in the C column it is using both procs. >=20 > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > COMMAND > 96090 jracine 131 0 7148K 2172K CPU1 1 0:19 44.53% 44.53% > n_lam > 96088 jracine 125 0 7148K 2172K RUN 0 0:18 43.75% 43.75% > n_lam > 96089 jracine 136 0 7148K 2172K RUN 1 0:19 42.19% 42.19% > n_lam > 96087 jracine 135 0 7188K 2248K RUN 0 0:19 41.41% 41.41% > n_lam >=20 >=20 > One run (once when I rebooted lam) did allocate the job correctly with > np 2, but this is not in general the case. On other systems I use, > however, they correctly farm out np 2 to CPU 0 and 1... >=20 > Thanks, and any suggestions welcome. What scheduler are you using? I've seen this behavior on my 5-current Athlon MP system when running two instances of setiathome (when running with the default SCHED_ULE scheduler). Sometimes it would run both setiathome processes on the same CPU for hours (even days) leaving one CPU essentially idle. When I switched to the SCHED_4BSD scheduler it then ran setiathome on both CPUs. Bob >=20 > -- Jeff >=20 > On Sun, 2004-04-11 at 14:20 -0500, Roland Wells wrote: > > Jeffrey, > > I am not familiar with the LAM MPI issue, but in a dual proc box, = you > > should also get an additional line towards the bottom in your dmesg, > > similar to: > >=20 > > SMP: AP CPU #1 Launched! > >=20 > > -Roland > > -----Original Message----- > > From: owner-freebsd-cluster@freebsd.org > > [mailto:owner-freebsd-cluster@freebsd.org] On Behalf Of Jeffrey = Racine > > Sent: Saturday, April 10, 2004 5:22 PM > > To: freebsd-amd64@freebsd.org; freebsd-cluster@freebsd.org > > Subject: LAM MPI on dual processor opteron box sees only one cpu... > >=20 > >=20 > > Hi. > >=20 > > I am converging on getting a new dual opteron box running. Now I am > > setting up and testing LAM MPI, however, the OS is not farming out=20 > > the job as expected, and only sees one processor.=20 > >=20 > > This runs fine on RH 7.3 and RH 9.0 both on a cluster and on a dual > > processor PIV desktop. I am running 5-current. Basically, mpirun -np = 1 > > binaryfile has the same runtime as mpirun -np 2 binaryfile, while on = the > > dual PIV box it runs in half the time. When I check top, mpirun -np = 2 > > both run on CPU 0... here is the relevant portion from top with -np = 2... > >=20 > > 9306 jracine 4 0 7188K 2448K sbwait 0 0:03 19.53% 19.53% = n_lam > > 29307 jracine 119 0 7148K 2372K CPU0 0 0:03 19.53% 19.53% > > n_lam > >=20 > > I include output from laminfo, dmesg (cpu relevnt info), and lamboot = -d > > bhost.lam... any suggestions most appreciated, and thanks in = advance! > >=20 > > -- laminfo > >=20 > > LAM/MPI: 7.0.4 > > Prefix: /usr/local > > Architecture: amd64-unknown-freebsd5.2 > > Configured by: root > > Configured on: Sat Apr 10 11:22:02 EDT 2004 > > Configure host: jracine.maxwell.syr.edu > > C bindings: yes > > C++ bindings: yes > > Fortran bindings: yes > > C profiling: yes > > C++ profiling: yes > > Fortran profiling: yes > > ROMIO support: yes > > IMPI support: no > > Debug support: no > > Purify clean: no > > SSI boot: globus (Module v0.5) > > SSI boot: rsh (Module v1.0) > > SSI coll: lam_basic (Module v7.0) > > SSI coll: smp (Module v1.0) > > SSI rpi: crtcp (Module v1.0.1) > > SSI rpi: lamd (Module v7.0) > > SSI rpi: sysv (Module v7.0) > > SSI rpi: tcp (Module v7.0) > > SSI rpi: usysv (Module v7.0) > >=20 > > -- dmesg sees two cpus... > >=20 > > CPU: AMD Opteron(tm) Processor 248 (2205.02-MHz K8-class CPU) > > Origin =3D "AuthenticAMD" Id =3D 0xf58 Stepping =3D 8 > >=20 > > = Features=3D0x78bfbff > MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> > > AMD Features=3D0xe0500800 > > real memory =3D 3623813120 (3455 MB) > > avail memory =3D 3494363136 (3332 MB) > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > > cpu0 (BSP): APIC ID: 0 > > cpu1 (AP): APIC ID: 1 > >=20 > > -- bhost has the requisite information > >=20 > > 128.230.130.10 cpu=3D2 user=3Djracine > >=20 > > -- Here are the results from lamboot -d bhost.lam > >=20 > > -bash-2.05b$ lamboot -d ~/bhost.lam > > n0<29283> ssi:boot: Opening > > n0<29283> ssi:boot: opening module globus > > n0<29283> ssi:boot: initializing module globus > > n0<29283> ssi:boot:globus: globus-job-run not found, globus boot = will > > not run n0<29283> ssi:boot: module not available: globus n0<29283> > > ssi:boot: opening module rsh n0<29283> ssi:boot: initializing module = rsh > > n0<29283> ssi:boot:rsh: module initializing n0<29283> > > ssi:boot:rsh:agent: rsh n0<29283> ssi:boot:rsh:username: > > n0<29283> ssi:boot:rsh:verbose: 1000 n0<29283> = ssi:boot:rsh:algorithm: > > linear n0<29283> ssi:boot:rsh:priority: 10 n0<29283> ssi:boot: = module > > available: rsh, priority: 10 n0<29283> ssi:boot: finalizing module > > globus n0<29283> ssi:boot:globus: finalizing n0<29283> ssi:boot: = closing > > module globus n0<29283> ssi:boot: Selected boot module rsh > > =20 > > LAM 7.0.4/MPI 2 C++/ROMIO - Indiana University > > =20 > > n0<29283> ssi:boot:base: looking for boot schema in following > > directories: > > n0<29283> ssi:boot:base: > > n0<29283> ssi:boot:base: $TROLLIUSHOME/etc > > n0<29283> ssi:boot:base: $LAMHOME/etc > > n0<29283> ssi:boot:base: /usr/local/etc > > n0<29283> ssi:boot:base: looking for boot schema file: > > n0<29283> ssi:boot:base: /home/jracine/bhost.lam > > n0<29283> ssi:boot:base: found boot schema: /home/jracine/bhost.lam > > n0<29283> ssi:boot:rsh: found the following hosts: > > n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu (cpu=3D2) > > n0<29283> ssi:boot:rsh: resolved hosts: > > n0<29283> ssi:boot:rsh: n0 jracine.maxwell.syr.edu --> = 128.230.130.10 > > (origin)n0<29283> ssi:boot:rsh: starting RTE procs > > n0<29283> ssi:boot:base:linear: starting > > n0<29283> ssi:boot:base:server: opening server TCP socket n0<29283> > > ssi:boot:base:server: opened port 49832 n0<29283> = ssi:boot:base:linear: > > booting n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: = starting > > lamd on (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting = on n0 > > (jracine.maxwell.syr.edu): hboot -t -c lam-conf.lamd -d -I -H > > 128.230.130.10 -P 49832 -n 0 -o 0 n0<29283> ssi:boot:rsh: launching > > locally > > hboot: performing tkill > > hboot: tkill -d > > tkill: setting prefix to (null) > > tkill: setting suffix to (null) > > tkill: got killname > > back: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile > > tkill: removing socket file ... > > tkill: socket > > file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-kernel-socketd > > tkill: removing IO daemon socket file ... > > tkill: IO daemon socket > > file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-io-socket > > tkill: f_kill =3D = "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" > > tkill: nothing to kill: > > "/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile" > > hboot: booting... > > hboot: fork /usr/local/bin/lamd > > [1] 29286 lamd -H 128.230.130.10 -P 49832 -n 0 -o 0 -d n0<29283> > > ssi:boot:rsh: successfully launched on n0 > > (jracine.maxwell.syr.edu) > > n0<29283> ssi:boot:base:server: expecting connection from finite = list > > hboot: attempting to execute > > n-1<29286> ssi:boot: Opening > > n-1<29286> ssi:boot: opening module globus > > n-1<29286> ssi:boot: initializing module globus > > n-1<29286> ssi:boot:globus: globus-job-run not found, globus boot = will > > not run n-1<29286> ssi:boot: module not available: globus n-1<29286> > > ssi:boot: opening module rsh n-1<29286> ssi:boot: initializing = module > > rsh n-1<29286> ssi:boot:rsh: module initializing n-1<29286> > > ssi:boot:rsh:agent: rsh n-1<29286> ssi:boot:rsh:username: > > n-1<29286> ssi:boot:rsh:verbose: 1000 n-1<29286> = ssi:boot:rsh:algorithm: > > linear n-1<29286> ssi:boot:rsh:priority: 10 n-1<29286> ssi:boot: = module > > available: rsh, priority: 10 n-1<29286> ssi:boot: finalizing module > > globus n-1<29286> ssi:boot:globus: finalizing n-1<29286> ssi:boot: > > closing module globus n-1<29286> ssi:boot: Selected boot module rsh > > n0<29283> ssi:boot:base:server: got connection from 128.230.130.10 > > n0<29283> ssi:boot:base:server: this connection is expected (n0) > > n0<29283> ssi:boot:base:server: remote lamd is at = 128.230.130.10:50206 > > n0<29283> ssi:boot:base:server: closing server socket n0<29283> > > ssi:boot:base:server: connecting to lamd at 128.230.130.10:49833 > > n0<29283> ssi:boot:base:server: connected n0<29283> > > ssi:boot:base:server: sending number of links (1) n0<29283> > > ssi:boot:base:server: sending info: n0 > > (jracine.maxwell.syr.edu) > > n0<29283> ssi:boot:base:server: finished sending > > n0<29283> ssi:boot:base:server: disconnected from = 128.230.130.10:49833 > > n0<29283> ssi:boot:base:linear: finished n0<29283> ssi:boot:rsh: all = RTE > > procs started n0<29283> ssi:boot:rsh: finalizing n0<29283> ssi:boot: > > Closing n-1<29286> ssi:boot:rsh: finalizing n-1<29286> ssi:boot: = Closing > >=20 > >=20 > >=20 > > _______________________________________________ > > freebsd-cluster@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-cluster > > To unsubscribe, send any mail to > > "freebsd-cluster-unsubscribe@freebsd.org" > >=20 >=20 > _______________________________________________ > freebsd-amd64@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-amd64 > To unsubscribe, send any mail to = "freebsd-amd64-unsubscribe@freebsd.org" --=20 Bob Willcox What's done to children, they will do to = society. bob@immure.com Austin, TX