From owner-freebsd-net@freebsd.org Wed Jun 8 11:42:28 2016 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B2944B6FB4F for ; Wed, 8 Jun 2016 11:42:28 +0000 (UTC) (envelope-from des@des.no) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id A47FA1E23 for ; Wed, 8 Jun 2016 11:42:28 +0000 (UTC) (envelope-from des@des.no) Received: by mailman.ysv.freebsd.org (Postfix) id A3D5FB6FB4D; Wed, 8 Jun 2016 11:42:28 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A3788B6FB4C for ; Wed, 8 Jun 2016 11:42:28 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 71BC71E22 for ; Wed, 8 Jun 2016 11:42:28 +0000 (UTC) (envelope-from des@des.no) Received: from desk.des.no (smtp.des.no [194.63.250.102]) by smtp.des.no (Postfix) with ESMTP id E58D56237 for ; Wed, 8 Jun 2016 11:42:26 +0000 (UTC) Received: by desk.des.no (Postfix, from userid 1001) id 3C503623D4; Wed, 8 Jun 2016 13:42:28 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: net@freebsd.org Subject: Locking issues in CARP in 10.2 Date: Wed, 08 Jun 2016 13:42:27 +0200 Message-ID: <86y46frjoc.fsf@desk.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2016 11:42:28 -0000 I have two routers which have been unstable ever since I upgraded them from 10.1 to 10.2. The symptoms were mostly livelocks, where the machine doesn't freeze completely but is unusable (network is down, console doesn't refresh, it seems to react to keyboard input and tries but fails to shut down when I press Ctrl+Alt+Del) I suspected that it was related to CARP, because it never happened if the router was taken out of the group (i.e. not just in BACKUP state, but no CARP addresses configured at all), and although I could not confirm this 100%, it seemed to be triggered by adding or removing an address to an interface. VLAN interfaces on these routers are created, destroyed and modified dynamically based on data from the provisioning system, so it's difficult to pinpoint. However, earlier today, one of the routers panicked right as I was taking it offline. I assume that it was triggered by one of two things which happened almost simultaneously: first, the CARP address had been configured on another router, possibly triggering a state change, and second, I manually deleted the CARP address from the router that crashed. Crash dumps were not enabled, but it looks like the instruction pointer was in __mtx_lock_sleep(), around line 438 in sys/kern/kern_mutex.c: 435 v =3D m->mtx_lock; 436 if (v !=3D MTX_UNOWNED) { 437 owner =3D (struct thread *)(v & ~MTX_FLAGMASK); 438 if (TD_IS_RUNNING(owner)) { 439 if (LOCK_LOG_TEST(&m->lock_object, 0)) 440 CTR3(KTR_LOCK, 441 "%s: spinning on %p held by %p", Is this a known issue? If not, has anyone else had similar problems, or does anyone know of locking issues in the CARP code which might trigger a livelock or panic when a CARP address is added or removed? DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no