From owner-freebsd-alpha Tue Jan 22 13:24:27 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by hub.freebsd.org (Postfix) with ESMTP id 820A937B402 for ; Tue, 22 Jan 2002 13:24:20 -0800 (PST) Received: from peter3.wemm.org ([12.232.27.13]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020122212420.ZZGV3578.rwcrmhc52.attbi.com@peter3.wemm.org> for ; Tue, 22 Jan 2002 21:24:20 +0000 Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id g0MLOJs16259 for ; Tue, 22 Jan 2002 13:24:19 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (Postfix) with ESMTP id B6CB039F1; Tue, 22 Jan 2002 13:24:19 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Andrew Gallatin Cc: alpha@FreeBSD.ORG Subject: Re: Is anybody actually able to netboot at the moment? In-Reply-To: <15437.31085.698208.990497@grasshopper.cs.duke.edu> Date: Tue, 22 Jan 2002 13:24:19 -0800 From: Peter Wemm Message-Id: <20020122212419.B6CB039F1@overcee.wemm.org> Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Andrew Gallatin wrote: > > Peter Wemm writes: > > By netboot, I mean having something like ewa0_protocols = BOOTP and > > 'boot ewa0' (or ewb0 in some of my cases).. ? > > > > And if so, how are you doing it? I've been fighting with a group of > > cranky PWS 500au's (MIATAs) on a (fairly high powered) switch. > > > > If I run a tcpdump on the machine running dhcpd, I see about (maybe) one i n > > 50 broadcast bootp (or dhcp discover) packets actually arriving. However, > > when net_open() switches to RARP, I see every single one of those arrive. > > Sometimes even SRM fails to have its bootp broadcasts seen and has to > > retry. Most of the times when the server actually sees the query and > > replies, the reply isn't seen by the client. However, the tftp downloads > > and rarp/arp broadcasts seem 100% reliable. > > > > Eventually, if I am lucky, the client will actually get a response to the > > packets it sends and will magically snap into life, and fire up the NFS > > root mount etc. > > > > The only holdup seems to be the dhcp query.. :-( > > <..> > > I seem to remember finding a problem in libstand quite some time ago, > but never having time to track it down. I think that it had to do > with checksum calculations for recv'ed packets. Try turning off UDP > checksums on the dhcp server & see if that improves matters. I think I will. I'll also try putting them on a small 10-mbit dumb hub with a freebsd box and no switch in between and see what *really* makes it to the wire. Maybe even use a crossover cable instead. I suspect the switch is being too smart and is "protecting" us for some reason. > > Anyway.. the final straw is that when it finally does get up to a loader > > 'ok' prompt, doing a "load kernel" causes a 'kernel stack not valid' > > trap back to SRM. (doh!) > > That's a new one! Does it actually start loading the kernel? (as > verified by tcpdump) It gets as far as opening the file and getting a valid file handle. It does not seem to actually read any data.. 04:22:51.203865 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 102: axp0.FreeBSD.org.1005 > axp1.FreeBSD.org.1012: udp 60 04:22:51.281716 0:0:f8:75:67:16 0:0:f8:75:92:b ip 146: axp1.FreeBSD.org.26 > axp0.FreeBSD.org.nfs: 104 lookup fh 963,937832/1919234 "modules" 04:22:51.281828 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 70: axp0.FreeBSD.org.nfs > axp1.FreeBSD.org.26: reply ok 28 lookup ERROR: No such file or directory 04:22:51.359840 0:0:f8:75:67:16 0:0:f8:75:92:b ip 126: axp1.FreeBSD.org.1011 > axp0.FreeBSD.org.1005: udp 84 04:22:51.360192 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 102: axp0.FreeBSD.org.1005 > axp1.FreeBSD.org.1011: udp 60 04:22:51.437973 0:0:f8:75:67:16 0:0:f8:75:92:b ip 146: axp1.FreeBSD.org.28 > axp0.FreeBSD.org.nfs: 104 lookup fh 963,937832/1919234 "kernel" 04:22:51.438203 0:0:f8:75:92:b 0:0:f8:75:67:16 ip 170: axp0.FreeBSD.org.nfs > axp1.FreeBSD.org.28: reply ok 128 lookup fh 963,937832/1919237 The kernel that it did a getfh on is this: -r-xr-xr-x 1 root wheel 3419433 Jan 22 02:00 kernel there is nothing odd there. The console log: (I still have some debugging printfs in there, this is mercifully smaller than the dump-entire-send-and-recieve-frame debugging :-). Hit [Enter] to boot immediately, or any other key for command prompt.^M ^M^@Booting [kernel] in 9 seconds... ^M^@Booting [kernel]... ^M SEND^M prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M SEND^M prom_write: len=0x96, pkt=0x2003443a, hate@0x20033ad0^M SEND^M prom_write: len=0x7e, pkt=0x200343aa, hate@0x20033a40^M SEND^M prom_write: len=0x96, pkt=0x2003434a, hate@0x200339e0^M SEND^M prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M SEND^M prom_write: len=0x8e, pkt=0x2003443a, hate@0x20033ad0^M SEND^M prom_write: len=0x7e, pkt=0x200343aa, hate@0x20033a40^M SEND^M prom_write: len=0x8e, pkt=0x2003434a, hate@0x200339e0^M SEND^M prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M SEND^M prom_write: len=0x92, pkt=0x2003443a, hate@0x20033ad0^M SEND^M prom_write: len=0x7e, pkt=0x200343aa, hate@0x20033a40^M SEND^M prom_write: len=0x92, pkt=0x2003434a, hate@0x200339e0^M SEND^M prom_write: len=0x7e, pkt=0x2003449a, hate@0x20033b30^M SEND^M prom_write: len=0x92, pkt=0x2003443a, hate@0x20033ad0^M [this corresponds to the last packet above, followed by 5 second pause] ^M halted CPU 0^M ^M halt code = 2^M kernel stack not valid halt^M PC = 200000000 ^M > Doug sent me a patch which helps to debug loader crashes last year. > I posted it to the list & its archived here: > > http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=18451+0+archive/2001/freebsd-alp ha/20010603.freebsd-alpha I'll try it, thanks. > > Can anybody please sanity check this for me? On several different > > combinations of hardware if possible. > > Unfortunately, I'm no longer in a position to play with this... I > wish you'd been interested a year ago :-( Heh. :-( I've never been able to get my PC164SX to boot either. And, ironically, I've never been able to get the IA64 box to netboot either.. It ignores all the replies to its bootp requests. There must be something else going on with libstand. That was with a switch too.. as was my home network... Hmm... Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message