From owner-freebsd-bugs@FreeBSD.ORG Thu Mar 8 23:00:27 2012 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5C8F1065673 for ; Thu, 8 Mar 2012 23:00:27 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 777938FC0C for ; Thu, 8 Mar 2012 23:00:27 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q28N0Ris097189 for ; Thu, 8 Mar 2012 23:00:27 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q28N0R12097188; Thu, 8 Mar 2012 23:00:27 GMT (envelope-from gnats) Resent-Date: Thu, 8 Mar 2012 23:00:27 GMT Resent-Message-Id: <201203082300.q28N0R12097188@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Eric van Gyzen Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 37133106566C for ; Thu, 8 Mar 2012 22:52:14 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [IPv6:2001:4f8:fff6::22]) by mx1.freebsd.org (Postfix) with ESMTP id 0C7258FC0C for ; Thu, 8 Mar 2012 22:52:14 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.4/8.14.4) with ESMTP id q28MqDYw073866 for ; Thu, 8 Mar 2012 22:52:13 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.4/8.14.4/Submit) id q28MqDlN073865; Thu, 8 Mar 2012 22:52:13 GMT (envelope-from nobody) Message-Id: <201203082252.q28MqDlN073865@red.freebsd.org> Date: Thu, 8 Mar 2012 22:52:13 GMT From: Eric van Gyzen To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/165863: [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer() X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 23:00:27 -0000 >Number: 165863 >Category: kern >Synopsis: [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer() >Confidential: no >Severity: critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Mar 08 23:00:27 UTC 2012 >Closed-Date: >Last-Modified: >Originator: Eric van Gyzen >Release: 8.2-RELEASE >Organization: Dell Compellent >Environment: FreeBSD 8.2-RELEASE amd64 >Description: in_lltable_prefix_free() does not acquire the if_afdata_lock, so it can free the llentry while another thread is in lla_lookup(), such as during packet processing. While working on a fix, it quickly became apparent that in_lltable_prefix_free() does not safely drain the callout, so it races with arptimer(). If arptimer() has already tested callout_active() before in_lltable_prefix_free() calls callout_drain(), arptimer() will have freed the llentry before callout_drain() returns. I can reliably reproduce the first problem (lla_lookup) on 8.1-RELEASE and 8.2-RELEASE. I cannot reproduce it on 9.0-RELEASE or head, but since the relevant code has not changed, I suspect it still exists. I have not earnestly tried to reproduce the second problem (arptimer). The same problem exists in the IPv6 code, but that code is never called. >How-To-Repeat: In a second process, repeatedly and rapidly call SIOCSIFADDR and SIOCDIFADDR on the /only/ interface address by which that neighbor is reached. This drives in_lltable_prefix_free(). In one process, flood ping an IPv4 neighbor. This drives lla_lookup() via the processing of ARP replies, packets queued on la_hold, and echo request packets. This reliably panics in 10-20 seconds. >Fix: With the attached patch, my systems survive the test for over 30 minutes. Patch attached with submission follows: --- sys/netinet/in.c.orig 2012-03-08 12:57:29.000000000 -0600 +++ sys/netinet/in.c 2012-03-08 12:14:26.000000000 -0600 @@ -1351,21 +1351,39 @@ struct llentry *lle, *next; register int i; +restart: + IF_AFDATA_WLOCK(llt->llt_ifp); for (i=0; i < LLTBL_HASHTBL_SIZE; i++) { LIST_FOREACH_SAFE(lle, &llt->lle_head[i], lle_next, next) { - if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in *)L3_ADDR(lle), + if (IN_ARE_MASKED_ADDR_EQUAL(satosin(L3_ADDR(lle)), pfx, msk)) { - int canceled; + int canceled = 0; - canceled = callout_drain(&lle->la_timer); LLE_WLOCK(lle); - if (canceled) + if (!callout_active(&lle->la_timer) || + (canceled = callout_stop(&lle->la_timer))) { + if (canceled) + LLE_REMREF(lle); + llentry_free(lle); + } else { + /* + * The callout is in progress. + * Since we deactivated it, it won't + * do anything. Drop its reference. + * Drop our locks to drain it, which + * requires restarting this function. + */ LLE_REMREF(lle); - llentry_free(lle); + LLE_WUNLOCK(lle); + IF_AFDATA_WUNLOCK(llt->llt_ifp); + (void) callout_drain(&lle->la_timer); + goto restart; + } } } } + IF_AFDATA_WUNLOCK(llt->llt_ifp); } >Release-Note: >Audit-Trail: >Unformatted: