Date: Tue, 9 Jan 2007 12:50:51 +1100 (EST) From: Bruce Evans <bde@zeta.org.au> To: Sven Willenberger <sven@dmv.com> Cc: stable@FreeBSD.org, freebsd-amd64@FreeBSD.org Subject: Re: Panic in 6.2-PRERELEASE with bge on amd64 Message-ID: <20070109124826.M79616@delplex.bde.org> In-Reply-To: <1168271935.23549.10.camel@lanshark.dmv.com> References: <1168211205.22629.6.camel@lanshark.dmv.com> <20070108154433.C75042@delplex.bde.org> <1168271935.23549.10.camel@lanshark.dmv.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 8 Jan 2007, Sven Willenberger wrote: > On Mon, 2007-01-08 at 16:06 +1100, Bruce Evans wrote: >> On Sun, 7 Jan 2007, Sven Willenberger wrote: >>> The short and dirty of the dump: >>> ... >>> --- trap 0xc, rip = 0xffffffff801d5f17, rsp = 0xffffffffb371ab50, rbp = 0xffffffffb371aba0 --- >>> bge_rxeof() at bge_rxeof+0x3b7 >> >> What is the instruction here? > > I will do my best to ferret out the information you need. For the > bge_rxeof() at bge_rxeof+0x3b7 line, the instruction is: > > 0xffffffff801d5f17 <bge_rxeof+951>: mov %r15,0x28(%r14) > ... >> Looks like a null pointer panic anyway. I guess the instruction is >> movl to/from 0x28(%reg) where %reg is a null pointer. >> > > from the above lines, apparently %r14 is null then. Yes. It's a bit suprising that the access is a write. >>> ... >>> #8 0xffffffff801db818 in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2707 >> >> What is the statement here? It presumably follow a null pointer and only >> the exprssion for the pointer is interesting. xsc is already null but >> that is probably a bug in gdb, or the result of excessive optimization. >> Compiling kernels with -O2 has little effect except to break debugging. > > the block of code from if_bge.c: > > 2705 if (ifp->if_drv_flags & IFF_DRV_RUNNING) { > 2706 /* Check RX return ring producer/consumer. */ > 2707 bge_rxeof(sc); > 2708 > 2709 /* Check TX ring producer/consumer. */ > 2710 bge_txeof(sc); > 2711 } Oops. I should have asked for the statment in bge_rxeof(). > By default -O2 is passed to CC (I don't use any custom make flags other > than and only define CPUTYPE in my /etc/make.conf). -O2 is unfortunately the default for COPTFLAGS for most arches in sys/conf/kern.pre.mk. All of my machines and most FreeBSD cluster machines override this default in /etc/make.conf. With the override overridden for RELENG_6 amd64, gcc inlines bge_rxeof(), so your environment must be a little different to get even the above ifo. I think gdb can show the correct line numbers but not the call frames (since there is no call). ddb and the kernel stack trace can only show the call frames for actual calls. With -O1, I couldn't find any instruction similar to the mov to the null pointer + 28. 28 is a popular offset in mbufs > The short of it is that this interface sees pretty much non-stop traffic > as this is a mailserver (final destination) and is constantly being > delivered to (direct disk access) and mail being retrieved (remote > machine(s) with nfs mounted mail spools. If a momentary down of the > interface is enough to completely panic the driver and then the kernel, > this hardly seems "robust" if, in fact, this is what is happening. So > the question arises as to what would be causing the down/up of the > interface; I could start looking at the cable, the switch it's connected > to and ... any other ideas? (I don't have watchdog enabled or anything > like that, for example). I don't think down/up can occur in normal operation, since it takes ioctls or a watchdog timeout to do it. Maybe some ioctls other than a full down/up can cause problems... bge_init() is called for the following ioctls: - mtu changes - some near down/up (possibly only these) Suspend/resume and of course detach/attach do much the same things as down/up. BTW, I added some sysctls and found it annoying to have to do down/up to make the sysctls take effect. Sysctls in several other NIC drivers require the same, since doing a full reinitialization is easiest. Since I am tuning using sysctls, I got used to doing down/up too much. Similarly for the mtu ioctl. I think a full reinitialization is used for mtu changes mainly in cases the change switches on/off support for jumbo buffers. Then there is a lot of buffer reallocation to be done, and interfaces have to be stopped to ensure that the bufferes being deallocated are not in use, etc. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070109124826.M79616>