Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Mar 2012 22:52:13 GMT
From:      Eric van Gyzen <eric_van_gyzen@dell.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/165863: [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer()
Message-ID:  <201203082252.q28MqDlN073865@red.freebsd.org>
Resent-Message-ID: <201203082300.q28N0R12097188@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         165863
>Category:       kern
>Synopsis:       [panic] [patch] in_lltable_prefix_free() races with lla_lookup() and arptimer()
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Mar 08 23:00:27 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Eric van Gyzen
>Release:        8.2-RELEASE
>Organization:
Dell Compellent
>Environment:
FreeBSD 8.2-RELEASE amd64
>Description:
in_lltable_prefix_free() does not acquire the if_afdata_lock, so it can free the llentry while another thread is in lla_lookup(), such as during packet processing.

While working on a fix, it quickly became apparent that in_lltable_prefix_free() does not safely drain the callout, so it races with arptimer().  If arptimer() has already tested callout_active() before in_lltable_prefix_free() calls callout_drain(), arptimer() will have freed the llentry before callout_drain() returns.

I can reliably reproduce the first problem (lla_lookup) on 8.1-RELEASE and 8.2-RELEASE.  I cannot reproduce it on 9.0-RELEASE or head, but since the relevant code has not changed, I suspect it still exists.

I have not earnestly tried to reproduce the second problem (arptimer).

The same problem exists in the IPv6 code, but that code is never called.
>How-To-Repeat:
In a second process, repeatedly and rapidly call SIOCSIFADDR and SIOCDIFADDR on the /only/ interface address by which that neighbor is reached.  This drives in_lltable_prefix_free().

In one process, flood ping an IPv4 neighbor.  This drives lla_lookup()
via the processing of ARP replies, packets queued on la_hold, and
echo request packets.

This reliably panics in 10-20 seconds.
>Fix:
With the attached patch, my systems survive the test for over 30 minutes.

Patch attached with submission follows:

--- sys/netinet/in.c.orig	2012-03-08 12:57:29.000000000 -0600
+++ sys/netinet/in.c	2012-03-08 12:14:26.000000000 -0600
@@ -1351,21 +1351,39 @@
 	struct llentry *lle, *next;
 	register int i;
 
+restart:
+	IF_AFDATA_WLOCK(llt->llt_ifp);
 	for (i=0; i < LLTBL_HASHTBL_SIZE; i++) {
 		LIST_FOREACH_SAFE(lle, &llt->lle_head[i], lle_next, next) {
 
-			if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in *)L3_ADDR(lle), 
+			if (IN_ARE_MASKED_ADDR_EQUAL(satosin(L3_ADDR(lle)),
 						     pfx, msk)) {
-				int canceled;
+				int canceled = 0;
 
-				canceled = callout_drain(&lle->la_timer);
 				LLE_WLOCK(lle);
-				if (canceled)
+				if (!callout_active(&lle->la_timer) ||
+				    (canceled = callout_stop(&lle->la_timer))) {
+					if (canceled)
+						LLE_REMREF(lle);
+					llentry_free(lle);
+				} else {
+					/*
+					 * The callout is in progress.
+					 * Since we deactivated it, it won't
+					 * do anything.  Drop its reference.
+					 * Drop our locks to drain it, which
+					 * requires restarting this function.
+					 */
 					LLE_REMREF(lle);
-				llentry_free(lle);
+					LLE_WUNLOCK(lle);
+					IF_AFDATA_WUNLOCK(llt->llt_ifp);
+					(void) callout_drain(&lle->la_timer);
+					goto restart;
+				}
 			}
 		}
 	}
+	IF_AFDATA_WUNLOCK(llt->llt_ifp);
 }
 
 


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201203082252.q28MqDlN073865>