From owner-freebsd-alpha Tue Jan 22 17:54:55 2002 Delivered-To: freebsd-alpha@freebsd.org Received: from rwcrmhc52.attbi.com (rwcrmhc52.attbi.com [216.148.227.88]) by hub.freebsd.org (Postfix) with ESMTP id 3B92C37B402 for ; Tue, 22 Jan 2002 17:54:37 -0800 (PST) Received: from peter3.wemm.org ([12.232.27.13]) by rwcrmhc52.attbi.com (InterMail vM.4.01.03.27 201-229-121-127-20010626) with ESMTP id <20020123015430.JASB3578.rwcrmhc52.attbi.com@peter3.wemm.org> for ; Wed, 23 Jan 2002 01:54:30 +0000 Received: from overcee.wemm.org (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id g0N1sUs17104 for ; Tue, 22 Jan 2002 17:54:30 -0800 (PST) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.wemm.org (Postfix) with ESMTP id 0DECD39F1; Tue, 22 Jan 2002 17:54:30 -0800 (PST) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.5 07/13/2001 with nmh-1.0.4 To: Terry Lambert Cc: Andrew Gallatin , alpha@FreeBSD.ORG Subject: Re: Is anybody actually able to netboot at the moment? In-Reply-To: <3C4DFC23.F5391D2D@mindspring.com> Date: Tue, 22 Jan 2002 17:54:30 -0800 From: Peter Wemm Message-Id: <20020123015430.0DECD39F1@overcee.wemm.org> Sender: owner-freebsd-alpha@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Terry Lambert wrote: > Peter Wemm wrote: > > > Actually, there's a bug in the one's complement case on the > > > FreeBSD checksum calculation, sometimes. I was able to see > > > incorrect checksums on a number of packets. I think it's in > > > the incremental update code, but since it doesn't seem to > > > stop things from working, I never tracked down the source of > > > the ethreal traces where I saw this. > > > > Terry, what crack are you smoking this time? We dont do incremental > > checksums in the libstand code. That stuff is as simple and as unoptimized > > as it gets. > > The bug is on transmit, not on receive, Peter. 8-). Working > validation on the receive with packets with bad checksums would > stop the load. You said "I think it's in the incremental update code", which is what I was responding to. There is no "incremental update code". And we do not calculate ethernet frame CRC's either, that is done by the chip and/or SRM itself. > To see if this is the problem, it would be wise to do a dump > of a failed boot attempt with ethreal, which flags checksum > errors on packets on the wire. Sure, but there appear to be no packets on the wire (as I already said). That is the problem. tcpdump in promiscious mode is not seeing a damn thing. So either there is a frame CRC error (outside our jurisdiction) which the switch is killing or SRM is not transmitting it. The prom write call is reporting that it succeeded though. Here is the actual packet contents being sent.. The first one did not make it to the wire: bootpsend: d=20036d28 called.^M bootpsend: calling sendudp^M sendudp: d=20036d28 called.^M saddr: 0.0.0.0:68 daddr: 255.255.255.255:67^M sendudp: dest ethernet addr = ff:ff:ff:ff:ff:ff^M sendether: called, len 328^M 0000: ff ff ff ff ff ff 00 00 f8 75 67 16 08 00 45 00^M 0010: 01 48 00 00 00 00 04 11 b5 a6 00 00 00 00 ff ff^M 0020: ff ff 00 44 00 43 01 34 00 00 01 01 06 00 00 00^M 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0040: 00 00 00 00 00 00 00 00 f8 75 67 16 00 00 00 00^M 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0110: 00 00 00 00 00 00 63 82 53 63 ff 00 00 00 00 00^M 0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0150: 00 00 00 00 00 00^M prom0: netif_put^M prom_write: len=0x156, pkt=0x20036336, hate@0x20035a10^M ret.bits = 0x0000000000000156^M ret.u.retval = 0x156^M ret.u.unit = 0x0^M ret.u.mbz = 0x0^M ret.u.error = 0x0^M ret.u.status = 0x0^M prom0: netif_put returning 342^M sendudp: sendether returned 328^M readudp: called^M readether: called, len 328, tleft 1^M prom0: netif_get^M ret.bits = 0x8000000000000000^M ret.u.retval = 0x0^M ret.u.unit = 0x0^M ret.u.mbz = 0x0^M ret.u.error = 0x0^M ret.u.status = 0x4^M cc = 0^M prom0: netif_get returning 0^M ie: prom_write returned success, but we times out waiting for a reply. tcpdump on two other boxes confirms that the broadcast never made it out. This second write did work, and tcpdump on both other boxes confirms the packet, and we happened to catch the reply. Both sent packets are identical except for the bootp bp_secs field (seconds counter). Here is the second one which worked: bootpsend: d=20036d28 called.^M bootpsend: calling sendudp^M sendudp: d=20036d28 called.^M saddr: 0.0.0.0:68 daddr: 255.255.255.255:67^M sendudp: dest ethernet addr = ff:ff:ff:ff:ff:ff^M sendether: called, len 328^M 0000: ff ff ff ff ff ff 00 00 f8 75 67 16 08 00 45 00^M 0010: 01 48 00 00 00 00 04 11 b5 a6 00 00 00 00 ff ff^M 0020: ff ff 00 44 00 43 01 34 00 00 01 01 06 00 00 00^M 0030: 00 00 00 4e 00 00 00 00 00 00 00 00 00 00 00 00^M 0040: 00 00 00 00 00 00 00 00 f8 75 67 16 00 00 00 00^M 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0110: 00 00 00 00 00 00 63 82 53 63 ff 00 00 00 00 00^M 0120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0150: 00 00 00 00 00 00^M prom0: netif_put^M prom_write: len=0x156, pkt=0x20036336, hate@0x20035a10^M ret.bits = 0x0000000000000156^M ret.u.retval = 0x156^M ret.u.unit = 0x0^M ret.u.mbz = 0x0^M ret.u.error = 0x0^M ret.u.status = 0x0^M prom0: netif_put returning 342^M sendudp: sendether returned 328^M readudp: called^M readether: called, len 328, tleft 3^M prom0: netif_get^M ret.bits = 0x000000000000015a^M ret.u.retval = 0x15a^M ret.u.unit = 0x0^M ret.u.mbz = 0x0^M ret.u.error = 0x0^M ret.u.status = 0x0^M cc = 346^M prom0: netif_get returning 346^M readether: got len 346^M 0000: 00 00 f8 75 67 16 00 00 f8 75 92 0b 08 00 45 10^M 0010: 01 48 00 00 00 00 10 11 60 02 d8 88 cc 40 d8 88^M 0020: cc 41 00 43 00 44 01 34 9b f9 02 01 06 00 00 00^M 0030: 00 00 00 4e 00 00 00 00 00 00 d8 88 cc 41 d8 88^M 0040: cc 40 00 00 00 00 00 00 f8 75 67 16 00 00 00 00^M 0050: 00 00 00 00 00 00 68 30 68 30 20 6d 61 67 69 63^M 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0090: 00 00 00 00 00 00 6e 65 74 62 6f 6f 74 00 00 00^M 00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00^M 0110: 00 00 00 00 00 00 63 82 53 63 01 04 ff ff ff 80^M 0120: 03 04 d8 88 cc 01 06 04 d8 88 cc 12 11 21 32 31^M 0130: 36 2e 31 33 36 2e 32 30 34 2e 36 34 3a 2f 61 2f^M 0140: 6e 66 73 72 6f 6f 74 73 2f 34 2e 64 69 72 34 ff^M 0150: 00 00 00 00 00 00^M bootprecv: checked. bp = 0x200364c0, n = 304^M bootprecv: got one!^M vend_rfc1048 bootp info. len=64^M 'native netmask' is 255.255.255.0^M mask: 255.255.255.128^M net_open: server addr: 216.136.204.64^M net_open: server path: /a/nfsroots/4.dir4^M sendrecv: called^M SEND^M sendudp: d=20036d28 called.^M saddr: 216.136.204.65:1023 daddr: 216.136.204.64:111^M .... > > I have experimented with alignment in the ethernet frame send code.. it > > seems that we are trying to send with 2-byte alignment for the bootp case. > > Fixing it doesn't seem to make much difference. However, I wonder if SRM > > is doing some length rounding or something because the lengths are not 4 or > > 8 byte multiples for the bootp queries but are for the working rarp > > queries. However, even that doesn't make sense because it sometimes works. > > I'm more suspicious of interactions between the tulip cards when being > > driven by SRM and the switch at the moment. > > OK, another shot in the dark. The first 16 bit NE1000 cards > an interesting problem, in that, unless you sent an even > number of bus transfer units, it would always do an even > transfer anyway, and the last two bytes would be byte-swapped > when you went to checksum them, and you'd sum some garbage > byte instead of the right byte. > > The fix for this was to always send an even number of bytes, > even if the payload wwas an odd length, to get around the > problem. Well, we're sending even byte counts in this case. (as I already said) > Maybe this is a byte-order problem? Doesn't explain why it is not making it to the wire. And it doesn't explain why it works *sometimes*. (about 1 in 50, as I already said). > If it is, the place to fix it is on the server (again), by > making it pad packets out to a 2 (or 4 or 8?) byte boundary > so that the received packets are transferred as a unit, but > only the payload portion is checked. > > This "fix" would only apply if the packets sent on the wire > were good in both directions (i.e. it's still time for the > ethreal trace by an otherwise uninvolved third party machine). > > Hope this helps... I'm waving my hands as fast as I can... ;^) I'd love to see an explanation for why prom_write() doesn't seem to work for bootp requests. > -- Terry > Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message