From owner-freebsd-net@FreeBSD.ORG Sat Mar 30 02:16:42 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 55478580 for ; Sat, 30 Mar 2013 02:16:42 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 06C2A8B9 for ; Sat, 30 Mar 2013 02:16:41 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.5/8.14.5) with ESMTP id r2U2Gds0014605; Fri, 29 Mar 2013 22:16:39 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.5/8.14.4/Submit) id r2U2GdMv014602; Fri, 29 Mar 2013 22:16:39 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20822.19207.597798.216479@hergotha.csail.mit.edu> Date: Fri, 29 Mar 2013 22:16:39 -0400 From: Garrett Wollman To: Rick Macklem Subject: IPsec crash in TCP, also NFS DRC patches (was: Re: Limits on jumbo mbuf cluster allocation) In-Reply-To: <75232221.3844453.1363146480616.JavaMail.root@erie.cs.uoguelph.ca> References: <20798.44871.601547.24628@hergotha.csail.mit.edu> <75232221.3844453.1363146480616.JavaMail.root@erie.cs.uoguelph.ca> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Fri, 29 Mar 2013 22:16:40 -0400 (EDT) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: freebsd-net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 30 Mar 2013 02:16:42 -0000 < said: > The patch includes a lot of drc2.patch and drc3.patch, so don't try > and apply it to a patched kernel. Hopefully it will apply cleanly to > vanilla sources. > Tha patch has been minimally tested. Well, it's taken a long time, but I was finally able to get some testing. The user whose OpenStack cluster jobs had eaten previous file servers killed this one, too, but not in a way that's attributable to the NFS code. He was able to put on a fairly heavy load from about 630 virtual machines in our cluster without the server even getting particularly busy. Another cluster job, however, repeatedly panicked the server. Thankfully, there's a backtrace: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8074ee11 stack pointer = 0x28:0xffffff9a469ee6d0 frame pointer = 0x28:0xffffff9a469ee710 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq260: ix0:que 3) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x1ce trap_fatal() at trap_fatal+0x290 trap_pfault() at trap_pfault+0x21c trap() at trap+0x365 calltrap() at calltrap+0x8 --- trap 0xc, rip = 0xffffffff8074ee11, rsp = 0xffffff9a469ee6d0, rbp = 0xffffff9a469ee710 --- ipsec_getpolicybysock() at ipsec_getpolicybysock+0x31 ipsec46_in_reject() at ipsec46_in_reject+0x24 ipsec4_in_reject() at ipsec4_in_reject+0x9 tcp_input() at tcp_input+0x498 ip_input() at ip_input+0x1de netisr_dispatch_src() at netisr_dispatch_src+0x20b ether_demux() at ether_demux+0x14d ether_nh_input() at ether_nh_input+0x1f4 netisr_dispatch_src() at netisr_dispatch_src+0x20b ixgbe_rxeof() at ixgbe_rxeof+0x1cb ixgbe_msix_que() at ixgbe_msix_que+0xa8 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa6 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff9a469eecf0, rbp = 0 --- ipsec_setspidx_inpcb() is inlined here; the fault is on the line: error = ipsec_setspidx(m, &inp->inp_sp->sp_in->spidx, 1); where inp->inp_sp is being dereferenced: 0xffffffff8074ee02 : mov 0xf0(%rdx),%rax 0xffffffff8074ee09 : mov $0x1,%edx 0xffffffff8074ee0e : mov %rcx,%r15 0xffffffff8074ee11 : mov (%rax),%rsi <-- FAULT! 0xffffffff8074ee14 : add $0x34,%rsi 0xffffffff8074ee18 : callq 0xffffffff8074e6f0 (inp is in %rdx here). The crash occurs when the clients are making about 200 connections per second. (We're not sure if this is by design or if it's a broken NAT implementation on the OpenStack nodes. My money is on a broken NAT, because we were also seeing lots of data being sent on apparently-closed connections. The kernel was also logging many [ECONNABORTED] errors when nfsd tried to accept() new client connections. A capture is available if anyone wants to look at this in more detail, although obviously not from the traffic that actually caused the crash.) inp_sp is declared thus: struct inpcbpolicy *inp_sp; /* (s) for IPSEC */ The locking code "(s)" is supposed to indicate "protected by another subsystem's locks", but there is no locking at all in ipsec_getpolicybysock(), so that seems to be a misstatement at best. I quickly installed a new kernel with IPSEC disabled (we have no need for it), and it seems to have survived further abuse, although it seems that the user's test jobs were winding down at that time as well so I can't say for certain that it wouldn't have crashed somewhere else. -GAWollman