Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Jul 2008 07:59:25 GMT
From:      Damien Deville <da.deville@gmail.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   bin/125922: Deadlock in arp
Message-ID:  <200807240759.m6O7xPf9039063@www.freebsd.org>
Resent-Message-ID: <200807240800.m6O80FlL009920@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         125922
>Category:       bin
>Synopsis:       Deadlock in arp
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jul 24 08:00:15 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Damien Deville
>Release:        RELENG_6_3
>Organization:
NETASQ
>Environment:
FreeBSD barbar.netasq.com 6.3-RELEASE FreeBSD 6.3-RELEASE #3: Thu Jan 24 16:38:07 CET 2008     pb@barbar.netasq.com:/usr/obj/usr/src/sys/SMP  i386
>Description:
When two concurrent arp are running one doing an 'arp -a -d' and the other one doing an 'arp -a -d' or 'arp -S', one of the two get blocked in rtmsg function doing a read syscall.

Both processes will be in concurrence to access the table. One process will successfully nuke all entries of the arp table, the other one will be blocked in rtmsg function on the read while executing a RTM_GET or RTM_DELETE command after some time. By instrumenting arp we noticed that it happened when both process access to the same entry.

It seems that the process that block read all entries available in the PF_ROUTE socket, do not find the one it is looking for and ends blocked on the PF_ROUTE socket as no more entries are available after reading and entry with rtm->rtm_pid == 0 and rtm-
>How-To-Repeat:
Here is a way to reproduce it:
- add a bunch of arp entries in your arp table (best is around 255 entries).
- launch two arp -a -d in parallel ('arp -a -d & arp -a -d &')

It can also be done with an 'arp -a -d' in parallel of an 'arp -S' but is more difficult to reproduce.

>Fix:
A patch for arp.c is provided for FreeBSD 6.3 it prevent the deadlock but might not be the right solution to the issue.


Patch attached with submission follows:

--- arp.c.orig	2006-10-21 07:43:29.000000000 +0200
+++ arp.c	2008-07-23 10:41:44.000000000 +0200
@@ -706,17 +706,28 @@
 	l = rtm->rtm_msglen;
 	rtm->rtm_seq = ++seq;
 	rtm->rtm_type = cmd;
 	if ((rlen = write(s, (char *)&m_rtmsg, l)) < 0) {
 		if (errno != ESRCH || cmd != RTM_DELETE) {
 			warn("writing to routing socket");
 			return (NULL);
 		}
 	}
 	do {
 		l = read(s, (char *)&m_rtmsg, sizeof(m_rtmsg));
+		if ( l > 0 && rtm->rtm_seq == 0 && rtm->rtm_pid == 0 )
+			return (NULL); /* something weird happened */
 	} while (l > 0 && (rtm->rtm_seq != seq || rtm->rtm_pid != pid));
 	if (l < 0)
 		warn("read from routing socket");
 	return (rtm);
 }
 


>Release-Note:
>Audit-Trail:
>Unformatted:
 >rtm_seq == 0.
 
 Here is a backtrace of the blocked arp on FreeBSD 7.0
 
 (gdb) bt
 #0  0x28158f81 in read () from /lib/libc.so.7
 #1  0x08049091 in rtmsg ()
 #2  0x08049b44 in delete ()
 #3  0x0804a1fd in nuke_entry ()
 #4  0x08049a77 in search ()
 #5  0x08049e75 in main ()
 
 I can reproduce this on FreeBSD 4.11, 6.2 and 6.3, and FreeBSD 7.0.
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200807240759.m6O7xPf9039063>