Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Sep 2009 15:31:23 -0700
From:      Chris Cowart <ccowart@rescomp.berkeley.edu>
To:        freebsd-net@freebsd.org
Subject:   Crash in ether_input
Message-ID:  <20090904223123.GD16213@hal.rescomp.berkeley.edu>

next in thread | raw e-mail | index | archive | help

--iVCmgExH7+hIHJ1A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hello,

Starting about a week ago, our primary webserver (then running FreeBSD
7.0) began crashing several times a day, typically during our
higher-load times of day. We have since upgraded to 7.1p7, but continued
to see the frequent crashes.=20

We are running an apache22 webserver with a lot of perl, logging via
syslog-ng, and using IPSec in transport mode between the webserver and
both the fileserver and logserver. Everything is IPv4.

=46rom uname:=20

| FreeBSD mug.rescomp.berkeley.edu 7.1-RELEASE-p7 FreeBSD 7.1-RELEASE-p7
| #0: Wed Sep  2 17:56:59 PDT 2009
| root@mug.rescomp.berkeley.edu:/usr/obj/usr/src/sys/GENERIC  amd64

Some information that appears typical across many crashes:

| Unread portion of the kernel message buffer:
|=20
| Fatal trap 27: stack fault while in kernel mode
| cpuid =3D 0; apic id =3D 00
| instruction pointer     =3D 0x8:0xffffffff80559fb4
| stack pointer           =3D 0x10:0xffffffffae39faf0
| frame pointer           =3D 0x10:0xf85ecc37f9239402
| code segment            =3D base 0x0, limit 0xfffff, type 0x1b
|                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
| processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
| current process         =3D 27 (em0 taskq)
| trap number             =3D 27
| panic: stack fault
| cpuid =3D 0
| Uptime: 43m44s
| Physical memory: 4082 MB
| Dumping 361 MB: 346 330 314 298 282 266 250 234 218 202em0: watchdog time=
out -- resetting

| (kgdb) bt
| #0  doadump () at pcpu.h:195
| #1  0x0000000000000004 in ?? ()
| #2  0xffffffff804bd9b9 in boot (howto=3D260) at /usr/src/sys/kern/kern_sh=
utdown.c:418
| #3  0xffffffff804bddc2 in panic (fmt=3D0x104 <Address 0x104 out of bounds=
>) at /usr/src/sys/kern/kern_shutdown.c:574
| #4  0xffffffff807b9f23 in trap_fatal (frame=3D0xffffff00012d66e0, eva=3DV=
ariable "eva" is not available.
| ) at /usr/src/sys/amd64/amd64/trap.c:764
| #5  0xffffffff807baa75 in trap (frame=3D0xffffffffae39fa40) at /usr/src/s=
ys/amd64/amd64/trap.c:565
| #6  0xffffffff807a042e in calltrap () at /usr/src/sys/amd64/amd64/excepti=
on.S:209
| #7  0xffffffff80559fb4 in ether_input (ifp=3D0xffffff00012bf000, m=3D0xff=
ffff0003576000) at /usr/src/sys/net/if_ethersubr.c:545
| #8  0xffffffff802bd645 in em_rxeof (adapter=3D0xffffffff80e4c000, count=
=3D99) at /usr/src/sys/dev/e1000/if_em.c:4539
| #9  0xffffffff802be55e in em_handle_rxtx (context=3DVariable "context" is=
 not available.
| ) at /usr/src/sys/dev/e1000/if_em.c:1702
| #10 0xffffffff804f2afd in taskqueue_run (queue=3D0xffffff00012c8c80) at /=
usr/src/sys/kern/subr_taskqueue.c:282
| #11 0xffffffff804f2da6 in taskqueue_thread_loop (arg=3DVariable "arg" is =
not available.
| ) at /usr/src/sys/kern/subr_taskqueue.c:401
| #12 0xffffffff8049b2f3 in fork_exit (callout=3D0xffffffff804f2d40 <taskqu=
eue_thread_loop>, arg=3D0xffffffff80e50588, frame=3D0xffffffffae39fc80) at =
/usr/src/sys/kern/kern_fork.c:804
| #13 0xffffffff807a07fe in fork_trampoline () at /usr/src/sys/amd64/amd64/=
exception.S:455
| #14 0x0000000000000000 in ?? ()
| #15 0x0000000000000000 in ?? ()
| #16 0x0000000000000001 in ?? ()
[...]

| (kgdb) source debug/gdb6
| (kgdb) frame 7
| #7  0xffffffff80559fb4 in ether_input (ifp=3D0xffffff00012bf000, m=3D0xff=
ffff0003576000) at /usr/src/sys/net/if_ethersubr.c:545
| 545             eh =3D mtod(m, struct ether_header *);
| (kgdb) info locals
| eh =3D (struct ether_header *) 0xf85ecc37f9239402
| (kgdb) info args
| ifp =3D (struct ifnet *) 0xffffff00012bf000
| m =3D (struct mbuf *) 0xffffff0003576000
| (kgdb) mbuf m
| 0xffffff0003576000: 125 bytes ext 0xaf29dcb45d53e701 packet: 125 bytes re=
ceived via em0
| 0xbb763383e10eda22Cannot access memory at address 0xbb763383e10eda3a
| (kgdb)=20

If anyone can provide some points on other things I can try to get
useful data out of these core dumps, I'm open to it.

We did decide to stop mounting NFS, upgrade to syslog-ng3 (which
supports TLS), and revert the webserver back to a GENERIC kernel. Since
booting the GENERIC kernel, the system has been up for nearly 2 days.

Right now, we're logging via TLS to a temporary/testing logserver. That
logserver is one of our default builds with IPSec. It is configured to
forward logs over udp/syslog (via IPSec in transport mode) to our
primary logserver.=20

Within hours of beginning to pass the production webserver's logs
through this temporary logserver (and thus having its syslog-ng forward
to the primary logserver), the temporary logserver began exhibiting the
same behavior that the webserver was previously showing.

We're totally grasping at straws here, but it's looking like some kind
of bug related to IPSec. Maybe related to long messages? High volume of
messages?

We would love to get this hammered out, so please let me know if there's
any debugging we can perform or patches we can try.

Thanks,

--=20
Chris Cowart
Network Technical Lead
Network & Infrastructure Services, RSSP-IT
UC Berkeley

--iVCmgExH7+hIHJ1A
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (FreeBSD)

iQIcBAEBAwAGBQJKoZU7AAoJEC8b9sM8ejXt6ScP/R3VDPUU2zSdpDQtfmZjf4oO
eFi3oKAd97XyZ9WkdTadfeS7cQzuqYC9nqqCG7Pzqv14AJ1607dW7LYp6Fe/Y6dt
miFv3/PrfQBijeY8Ab9h+NoBwEVUeMuXktHpbdDbgHpWKGtOeRnPk3BwVwXOThqq
JhVjl/jjBg/K9U87Y54M+Xj8PIhj0mOVahqwm9kywOCRJ24x9LuD+8OCrHBsE6N+
ke+K+gn7ZXjZlaNfg3UsUh5ZlnfcuxJvfanPMyaIwOT5XvRTP6mNS2n9X2K7WBvk
j74pz2L7MSg0cksMHDxBsDYBOipMoqGQHBjBDhtKA9AkCuaFnP21DWOMIrF6sJKg
8VF5Ti0i+Wuy/SsaEr23OxVu+v8RXNbju2XwC/8j8w36ORz3xEJWeUs4l+2nXSUQ
qvKc4iJfQqc7sIG9EcHmBp3AcdFRt98ETohhkiUcYT5zpT0+50s396XeJxb+9Igi
v8L/b/Lqx7f0TfcdZfzDztlJT8RACQPd+h+zPCh/OTolJMzv5IUNzALDlFOQYeIy
cVbe5ZqrQoLu+ldDfhOrKKeI5aJrLKjTvdknIuP+lqWbjkdSqPXbiAi10eHRBSdn
txALeQ0wxjPXjMU3W4RdrWnwcSOaiHem8+Yq8GodWAe6qz4JEzSZP2qpnuUgaOhh
uR771RZdLf78kLQUw1vJ
=wAoy
-----END PGP SIGNATURE-----

--iVCmgExH7+hIHJ1A--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090904223123.GD16213>