From owner-freebsd-bugs@FreeBSD.ORG Fri May 3 13:00:02 2013 Return-Path: Delivered-To: freebsd-bugs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 29E57730 for ; Fri, 3 May 2013 13:00:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) by mx1.freebsd.org (Postfix) with ESMTP id 1100D1F3A for ; Fri, 3 May 2013 13:00:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.6/8.14.6) with ESMTP id r43D01U9097972 for ; Fri, 3 May 2013 13:00:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.6/8.14.6/Submit) id r43D01Es097971; Fri, 3 May 2013 13:00:01 GMT (envelope-from gnats) Resent-Date: Fri, 3 May 2013 13:00:01 GMT Resent-Message-Id: <201305031300.r43D01Es097971@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Luiz Otavio O Souza Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 78B316D1 for ; Fri, 3 May 2013 12:58:20 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from red.freebsd.org (red.freebsd.org [69.147.83.34]) by mx1.freebsd.org (Postfix) with ESMTP id 6B2901F07 for ; Fri, 3 May 2013 12:58:20 +0000 (UTC) Received: from red.freebsd.org (localhost [127.0.0.1]) by red.freebsd.org (8.14.5/8.14.5) with ESMTP id r43CwKKN023534 for ; Fri, 3 May 2013 12:58:20 GMT (envelope-from nobody@red.freebsd.org) Received: (from nobody@localhost) by red.freebsd.org (8.14.5/8.14.5/Submit) id r43CwK4x023533; Fri, 3 May 2013 12:58:20 GMT (envelope-from nobody) Message-Id: <201305031258.r43CwK4x023533@red.freebsd.org> Date: Fri, 3 May 2013 12:58:20 GMT From: Luiz Otavio O Souza To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Subject: kern/178318: [patch] [arge] if_arge/bootp race under some circunstances X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 May 2013 13:00:02 -0000 >Number: 178318 >Category: kern >Synopsis: [patch] [arge] if_arge/bootp race under some circunstances >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri May 03 13:00:01 UTC 2013 >Closed-Date: >Last-Modified: >Originator: Luiz Otavio O Souza >Release: -head r250121 >Organization: >Environment: FreeBSD rb433 10.0-CURRENT FreeBSD 10.0-CURRENT #61 r250121M: Fri May 3 09:45:51 BRT 2013 root@devel:/data/rb/rb433/obj/mips.mips/data/rb/rb433/src/sys/RSPRO mips >Description: I'd discovered (by the hard way :) that adding some debug on arge_init_locked() (like the example bellow) will cause bootp to fail. Index: mips/atheros/if_arge.c =================================================================== --- mips/atheros/if_arge.c (revision 250121) +++ mips/atheros/if_arge.c (working copy) @@ -1006,6 +1006,7 @@ ARGE_LOCK_ASSERT(sc); +printf("%s: called\n", __func__); arge_stop(sc); /* Init circular RX list. */ Bootp will loop for a while with the timeout message until the kernel panics: arge0: link state changed to UP arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 arge_init_locked: called panic: EFBIG KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x4c: lui at,0x8059 db> After confirm that it really was the printf() that causes the problem i started to look why arge_init() was being called twice between the timeouts and why it was making bootp timeout and fail to boot. A few things contribute for this race to occur, first arge_init() forces a full stop->start cicle every time it is called, so with the following debug we can understand what happens: bootpc_call: set netmask 0.0.0.0 arge_init_locked: called bootpc_call: sosend() bootpc_call: set netmask 255.0.0.0 arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: set netmask 0.0.0.0 arge_init_locked: called bootpc_call: sosend() bootpc_call: set netmask 255.0.0.0 arge_init_locked: called DHCP/BOOTP timeout for server 255.255.255.255 bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: soreceive() bootpc_call: set netmask 0.0.0.0 If arge_init() isn't fast enough while resetting the driver on the second netmask change it will miss the bootp response packet. >How-To-Repeat: Add something like this to arge_init_locked(): Index: mips/atheros/if_arge.c =================================================================== --- mips/atheros/if_arge.c (revision 250121) +++ mips/atheros/if_arge.c (working copy) @@ -1006,6 +1006,7 @@ ARGE_LOCK_ASSERT(sc); +printf("%s: called\n", __func__); arge_stop(sc); /* Init circular RX list. */ Add the following to RSPRO kernel: Index: sys/mips/conf/RSPRO =================================================================== --- sys/mips/conf/RSPRO (revision 250121) +++ sys/mips/conf/RSPRO (working copy) @@ -28,3 +28,12 @@ # Boot off of flash options ROOTDEVNAME=\"ufs:redboot/rootfs.uzip\" +options NFSCL +options NFS_ROOT +options BOOTP +options BOOTP_NFSROOT +options BOOTP_NFSV3 +options BOOTP_WIRED_TO=arge0 +options BOOTP_COMPAT + + And try boot from bootp. >Fix: The fix is based on simply refuse to proceed with the driver restart if the driver is already 'up' and 'running'. There is no need to restart the driver on each time we change or add an IP address or netmask. Then, if we just proceed when the driver is stopped we don't need to force the stop->start cicle anymore. The leakage that leads to the panic will be fixed in a subsequent PR. Patch attached with submission follows: Index: sys/mips/atheros/if_arge.c =================================================================== --- sys/mips/atheros/if_arge.c (revision 250121) +++ sys/mips/atheros/if_arge.c (working copy) @@ -1006,7 +1006,8 @@ ARGE_LOCK_ASSERT(sc); - arge_stop(sc); + if ((ifp->if_flags & IFF_UP) && (ifp->if_drv_flags & IFF_DRV_RUNNING)) + return; /* Init circular RX list. */ if (arge_rx_ring_init(sc) != 0) { >Release-Note: >Audit-Trail: >Unformatted: