Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Apr 2004 14:20:01 -0500
From:      "Roland Wells" <freebsd@thebeatbox.org>
To:        "'Jeffrey Racine'" <jracine@maxwell.syr.edu>, <freebsd-amd64@freebsd.org>, <freebsd-cluster@freebsd.org>
Subject:   RE: LAM MPI on dual processor opteron box sees only one cpu...
Message-ID:  <024f01c41ffa$029327e0$0c03a8c0@internal.thebeatbox.org>
In-Reply-To: <1081635706.67575.26.camel@x1-6-00-b0-d0-c2-67-0e.twcny.rr.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeffrey,
I am not familiar with the LAM MPI issue, but in a dual proc box, you
should also get an additional line towards the bottom in your dmesg,
similar to:

SMP: AP CPU #1 Launched!

-Roland
-----Original Message-----
From: owner-freebsd-cluster@freebsd.org
[mailto:owner-freebsd-cluster@freebsd.org] On Behalf Of Jeffrey Racine
Sent: Saturday, April 10, 2004 5:22 PM
To: freebsd-amd64@freebsd.org; freebsd-cluster@freebsd.org
Subject: LAM MPI on dual processor opteron box sees only one cpu...


Hi.

I am converging on getting a new dual opteron box running. Now I am
setting up and testing LAM MPI, however, the OS is not farming out=20
the job as expected, and only sees one processor.=20

This runs fine on RH 7.3 and RH 9.0 both on a cluster and on a dual
processor PIV desktop. I am running 5-current. Basically, mpirun -np 1
binaryfile has the same runtime as mpirun -np 2 binaryfile, while on the
dual PIV box it runs in half the time. When I check top, mpirun -np 2
both run on CPU 0... here is the relevant portion from top with -np 2...

9306 jracine    4    0  7188K  2448K sbwait 0   0:03 19.53% 19.53% n_lam
29307 jracine  119    0  7148K  2372K CPU0   0   0:03 19.53% 19.53%
n_lam

I include output from laminfo, dmesg (cpu relevnt info), and lamboot -d
bhost.lam... any suggestions most appreciated, and thanks in advance!

-- laminfo

           LAM/MPI: 7.0.4
            Prefix: /usr/local
      Architecture: amd64-unknown-freebsd5.2
     Configured by: root
     Configured on: Sat Apr 10 11:22:02 EDT 2004
    Configure host: jracine.maxwell.syr.edu
        C bindings: yes
      C++ bindings: yes
  Fortran bindings: yes
       C profiling: yes
     C++ profiling: yes
 Fortran profiling: yes
     ROMIO support: yes
      IMPI support: no
     Debug support: no
      Purify clean: no
          SSI boot: globus (Module v0.5)
          SSI boot: rsh (Module v1.0)
          SSI coll: lam_basic (Module v7.0)
          SSI coll: smp (Module v1.0)
           SSI rpi: crtcp (Module v1.0.1)
           SSI rpi: lamd (Module v7.0)
           SSI rpi: sysv (Module v7.0)
           SSI rpi: tcp (Module v7.0)
           SSI rpi: usysv (Module v7.0)

-- dmesg sees two cpus...

CPU: AMD Opteron(tm) Processor 248 (2205.02-MHz K8-class CPU)
  Origin =3D "AuthenticAMD"  Id =3D 0xf58  Stepping =3D 8

Features=3D0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE=
,
MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
  AMD Features=3D0xe0500800<SYSCALL,NX,MMX+,LM,3DNow!+,3DNow!>
real memory  =3D 3623813120 (3455 MB)
avail memory =3D 3494363136 (3332 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1

-- bhost has the requisite information

128.230.130.10 cpu=3D2 user=3Djracine

-- Here are the results from lamboot -d bhost.lam

-bash-2.05b$ lamboot -d ~/bhost.lam
n0<29283> ssi:boot: Opening
n0<29283> ssi:boot: opening module globus
n0<29283> ssi:boot: initializing module globus
n0<29283> ssi:boot:globus: globus-job-run not found, globus boot will
not run n0<29283> ssi:boot: module not available: globus n0<29283>
ssi:boot: opening module rsh n0<29283> ssi:boot: initializing module rsh
n0<29283> ssi:boot:rsh: module initializing n0<29283>
ssi:boot:rsh:agent: rsh n0<29283> ssi:boot:rsh:username: <same>
n0<29283> ssi:boot:rsh:verbose: 1000 n0<29283> ssi:boot:rsh:algorithm:
linear n0<29283> ssi:boot:rsh:priority: 10 n0<29283> ssi:boot: module
available: rsh, priority: 10 n0<29283> ssi:boot: finalizing module
globus n0<29283> ssi:boot:globus: finalizing n0<29283> ssi:boot: closing
module globus n0<29283> ssi:boot: Selected boot module rsh
=20
LAM 7.0.4/MPI 2 C++/ROMIO - Indiana University
=20
n0<29283> ssi:boot:base: looking for boot schema in following
directories:
n0<29283> ssi:boot:base:   <current directory>
n0<29283> ssi:boot:base:   $TROLLIUSHOME/etc
n0<29283> ssi:boot:base:   $LAMHOME/etc
n0<29283> ssi:boot:base:   /usr/local/etc
n0<29283> ssi:boot:base: looking for boot schema file:
n0<29283> ssi:boot:base:   /home/jracine/bhost.lam
n0<29283> ssi:boot:base: found boot schema: /home/jracine/bhost.lam
n0<29283> ssi:boot:rsh: found the following hosts:
n0<29283> ssi:boot:rsh:   n0 jracine.maxwell.syr.edu (cpu=3D2)
n0<29283> ssi:boot:rsh: resolved hosts:
n0<29283> ssi:boot:rsh:   n0 jracine.maxwell.syr.edu --> 128.230.130.10
(origin)n0<29283> ssi:boot:rsh: starting RTE procs
n0<29283> ssi:boot:base:linear: starting
n0<29283> ssi:boot:base:server: opening server TCP socket n0<29283>
ssi:boot:base:server: opened port 49832 n0<29283> ssi:boot:base:linear:
booting n0 (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting
lamd on (jracine.maxwell.syr.edu) n0<29283> ssi:boot:rsh: starting on n0
(jracine.maxwell.syr.edu): hboot -t -c lam-conf.lamd -d -I -H
128.230.130.10 -P 49832 -n 0 -o 0 n0<29283> ssi:boot:rsh: launching
locally
hboot: performing tkill
hboot: tkill -d
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname
back: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile
tkill: removing socket file ...
tkill: socket
file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-kernel-socketd
tkill: removing IO daemon socket file ...
tkill: IO daemon socket
file: /tmp/lam-jracine@jracine.maxwell.syr.edu/lam-io-socket
tkill: f_kill =3D =
"/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile"
tkill: nothing to kill:
"/tmp/lam-jracine@jracine.maxwell.syr.edu/lam-killfile"
hboot: booting...
hboot: fork /usr/local/bin/lamd
[1]  29286 lamd -H 128.230.130.10 -P 49832 -n 0 -o 0 -d n0<29283>
ssi:boot:rsh: successfully launched on n0
(jracine.maxwell.syr.edu)
n0<29283> ssi:boot:base:server: expecting connection from finite list
hboot: attempting to execute
n-1<29286> ssi:boot: Opening
n-1<29286> ssi:boot: opening module globus
n-1<29286> ssi:boot: initializing module globus
n-1<29286> ssi:boot:globus: globus-job-run not found, globus boot will
not run n-1<29286> ssi:boot: module not available: globus n-1<29286>
ssi:boot: opening module rsh n-1<29286> ssi:boot: initializing module
rsh n-1<29286> ssi:boot:rsh: module initializing n-1<29286>
ssi:boot:rsh:agent: rsh n-1<29286> ssi:boot:rsh:username: <same>
n-1<29286> ssi:boot:rsh:verbose: 1000 n-1<29286> ssi:boot:rsh:algorithm:
linear n-1<29286> ssi:boot:rsh:priority: 10 n-1<29286> ssi:boot: module
available: rsh, priority: 10 n-1<29286> ssi:boot: finalizing module
globus n-1<29286> ssi:boot:globus: finalizing n-1<29286> ssi:boot:
closing module globus n-1<29286> ssi:boot: Selected boot module rsh
n0<29283> ssi:boot:base:server: got connection from 128.230.130.10
n0<29283> ssi:boot:base:server: this connection is expected (n0)
n0<29283> ssi:boot:base:server: remote lamd is at 128.230.130.10:50206
n0<29283> ssi:boot:base:server: closing server socket n0<29283>
ssi:boot:base:server: connecting to lamd at 128.230.130.10:49833
n0<29283> ssi:boot:base:server: connected n0<29283>
ssi:boot:base:server: sending number of links (1) n0<29283>
ssi:boot:base:server: sending info: n0
(jracine.maxwell.syr.edu)
n0<29283> ssi:boot:base:server: finished sending
n0<29283> ssi:boot:base:server: disconnected from 128.230.130.10:49833
n0<29283> ssi:boot:base:linear: finished n0<29283> ssi:boot:rsh: all RTE
procs started n0<29283> ssi:boot:rsh: finalizing n0<29283> ssi:boot:
Closing n-1<29286> ssi:boot:rsh: finalizing n-1<29286> ssi:boot: Closing



_______________________________________________
freebsd-cluster@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to
"freebsd-cluster-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?024f01c41ffa$029327e0$0c03a8c0>