Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Jan 2002 13:24:19 -0800
From:      Peter Wemm <peter@wemm.org>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        alpha@FreeBSD.ORG
Subject:   Re: Is anybody actually able to netboot at the moment? 
Message-ID:  <20020122212419.B6CB039F1@overcee.wemm.org>
In-Reply-To: <15437.31085.698208.990497@grasshopper.cs.duke.edu> 

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Gallatin wrote:
> 
> Peter Wemm writes:
>  > By netboot, I mean having something like ewa0_protocols = BOOTP and
>  > 'boot ewa0' (or ewb0 in some of my cases).. ?
>  > 
>  > And if so, how are you doing it?  I've been fighting with a group of
>  > cranky PWS 500au's (MIATAs) on a (fairly high powered) switch.
>  > 
>  > If I run a tcpdump on the machine running dhcpd, I see about (maybe) one i
    n
>  > 50 broadcast bootp (or dhcp discover) packets actually arriving. However,
>  > when net_open() switches to RARP, I see every single one of those arrive.
>  > Sometimes even SRM fails to have its bootp broadcasts seen and has to
>  > retry.  Most of the times when the server actually sees the query and
>  > replies, the reply isn't seen by the client.  However, the tftp downloads
>  > and rarp/arp broadcasts seem 100% reliable.
>  > 
>  > Eventually, if I am lucky, the client will actually get a response to the
>  > packets it sends and will magically snap into life, and fire up the NFS
>  > root mount etc.
>  > 
>  > The only holdup seems to be the dhcp query.. :-(
> 
> <..>
> 
> I seem to remember finding a problem in libstand quite some time ago,
> but never having time to track it down.  I think that it had to do
> with checksum calculations for recv'ed packets.  Try turning off UDP
> checksums on the dhcp server & see if that improves matters.

I think I will.  I'll also try putting them on a small 10-mbit dumb hub
with a freebsd box and no switch in between and see what *really* makes it
to the wire.  Maybe even use a crossover cable instead.  I suspect the
switch is being too smart and is "protecting" us for some reason.

>  > Anyway.. the final straw is that when it finally does get up to a loader
>  > 'ok' prompt, doing a "load kernel" causes a 'kernel stack not valid'
>  > trap back to SRM. (doh!)
> 
> That's a new one!  Does it actually start loading the kernel?  (as
> verified by tcpdump)

It gets as far as opening the file and getting a valid file handle.
It does not seem to actually read any data..

04:22:51.203865 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 102: axp0.FreeBSD.org.1005 > axp1.FreeBSD.org.1012:  udp 60
04:22:51.281716 0:0:f8:75:67:16 0:0:f8:75:92:b ip 146: axp1.FreeBSD.org.26 > axp0.FreeBSD.org.nfs: 104 lookup fh 963,937832/1919234 "modules"
04:22:51.281828 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 70: axp0.FreeBSD.org.nfs > axp1.FreeBSD.org.26: reply ok 28 lookup ERROR: No such file or directory
04:22:51.359840 0:0:f8:75:67:16 0:0:f8:75:92:b ip 126: axp1.FreeBSD.org.1011 > axp0.FreeBSD.org.1005:  udp 84
04:22:51.360192 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 102: axp0.FreeBSD.org.1005 > axp1.FreeBSD.org.1011:  udp 60
04:22:51.437973 0:0:f8:75:67:16 0:0:f8:75:92:b ip 146: axp1.FreeBSD.org.28 > axp0.FreeBSD.org.nfs: 104 lookup fh 963,937832/1919234 "kernel"
04:22:51.438203 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 170: axp0.FreeBSD.org.nfs > axp1.FreeBSD.org.28: reply ok 128 lookup fh 963,937832/1919237

The kernel that it did a getfh on is this:
-r-xr-xr-x  1 root  wheel  3419433 Jan 22 02:00 kernel
there is nothing odd there.

The console log:  (I still have some debugging printfs in there, this is
mercifully smaller than the dump-entire-send-and-recieve-frame debugging :-).

Hit [Enter] to boot immediately, or any other key for command prompt.^M
^M^@Booting [kernel] in 9 seconds... ^M^@Booting [kernel]...               ^M
SEND^M
prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M
SEND^M
prom_write: len=0x96, pkt=0x2003443a, hate@0x20033ad0^M
SEND^M
prom_write: len=0x7e, pkt=0x200343aa, hate@0x20033a40^M
SEND^M
prom_write: len=0x96, pkt=0x2003434a, hate@0x200339e0^M
SEND^M
prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M
SEND^M
prom_write: len=0x8e, pkt=0x2003443a, hate@0x20033ad0^M
SEND^M
prom_write: len=0x7e, pkt=0x200343aa, hate@0x20033a40^M
SEND^M
prom_write: len=0x8e, pkt=0x2003434a, hate@0x200339e0^M
SEND^M
prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M
SEND^M
prom_write: len=0x92, pkt=0x2003443a, hate@0x20033ad0^M
SEND^M
prom_write: len=0x7e, pkt=0x200343aa, hate@0x20033a40^M
SEND^M
prom_write: len=0x92, pkt=0x2003434a, hate@0x200339e0^M
SEND^M
prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M
SEND^M
prom_write: len=0x92, pkt=0x2003443a, hate@0x20033ad0^M
[this corresponds to the last packet above, followed by 5 second pause]
^M
halted CPU 0^M
^M
halt code = 2^M
kernel stack not valid halt^M
PC = 200000000       ^M

> Doug sent me a patch which helps to debug loader crashes last year.
> I posted it to the list & its archived here:
> 
> http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=18451+0+archive/2001/freebsd-alp
    ha/20010603.freebsd-alpha

I'll try it, thanks.

>  > Can anybody please sanity check this for me?  On several different
>  > combinations of hardware if possible.
> 
> Unfortunately, I'm no longer in a position to play with this...  I
> wish you'd been interested a year ago :-(

Heh. :-(  I've never been able to get my PC164SX to boot either.

And, ironically, I've never been able to get the IA64 box to netboot either..
It ignores all the replies to its bootp requests.  There must be something
else going on with libstand.  That was with a switch too.. as was my home
network... Hmm...

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-alpha" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020122212419.B6CB039F1>