Date: Fri, 25 Apr 2014 22:33:57 GMT From: Alan Somers <asomers@freebsd.org> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/189003: Page fault in lacp_req() while the lagg is being destroyed Message-ID: <201404252233.s3PMXvVM083834@cgiserv.freebsd.org> Resent-Message-ID: <201404252240.s3PMe1Gh034344@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 189003 >Category: kern >Synopsis: Page fault in lacp_req() while the lagg is being destroyed >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Apr 25 22:40:01 UTC 2014 >Closed-Date: >Last-Modified: >Originator: Alan Somers >Release: 11.0 CURRENT >Organization: Spectra Logic >Environment: FreeBSD alans-fbsd-head 11.0-CURRENT FreeBSD 11.0-CURRENT #53 r264920M: Fri Apr 25 13:52:21 MDT 2014 alans@ns1.eng.sldomain.com:/vmpool/obj/usr/home/alans/freebsd/head/sys/GENERIC amd64 >Description: If you do an "ifconfig -am" in one thread while doing an "ifconfig lagg0 destroy" in another thread, at least two panics may result. One is in lacp_req(), caused by NULL == lsc. What happens is that the "ifconfig lagg0 destroy" thread does this: 1) lagg_clone_destroy() acquires LAGG_WLOCK(sc) 2) lagg_clone_destroy() calls lagg_lacp_detach, which calls lacp_detach, which sets sc->sc_psc = NULL 3) lagg_clone_destroy() calls LAGG_WUNLOCK(sc) then the "ifconfig status" thread does this: 1) calls lagg_ioctl(SIOCGLAGG) 2) lagg_ioctl() acquires LAGG_RLOCK(sc, &tracker) 3) lagg_ioctl() calls sc->sc_req, which dereferences to lacp_req 4) lacp_req does *lsc = LACP_SOFTC(sc), which returns NULL 5) lacp_req dereferences lsc, and panics db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009781d380 kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe009781d430 witness_warn() at witness_warn+0x4b5/frame 0xfffffe009781d4f0 trap_pfault() at trap_pfault+0x59/frame 0xfffffe009781d590 trap() at trap+0x4d5/frame 0xfffffe009781d7a0 calltrap() at calltrap+0x8/frame 0xfffffe009781d7a0 --- trap 0xc, rip = 0xffffffff81eb9b44, rsp = 0xfffffe009781d860, rbp = 0xfffffe009781d890 --- lacp_req() at lacp_req+0x14/frame 0xfffffe009781d890 lagg_ioctl() at lagg_ioctl+0x270/frame 0xfffffe009781d970 ifioctl() at ifioctl+0xbf7/frame 0xfffffe009781da30 kern_ioctl() at kern_ioctl+0x22b/frame 0xfffffe009781da90 sys_ioctl() at sys_ioctl+0x13c/frame 0xfffffe009781dae0 amd64_syscall() at amd64_syscall+0x25a/frame 0xfffffe009781dbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe009781dbf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fa045a, rsp = 0x7fffffffd808, rbp = 0x7fffffffe290 --- >How-To-Repeat: First, backout change 253687. That will increase the likelihood of hitting this panic. Run this script: #! /usr/local/bin/bash ifconfig tap0 create sleep .2 ifconfig tap1 create sleep .2 ifconfig tap2 create sleep .2 ifconfig tap0 up sleep .2 ifconfig tap1 up sleep .2 ifconfig tap2 up sleep .2 while true; do echo "About to create" ifconfig lagg0 create #sleep 0.2 echo "About to up" ifconfig lagg0 up laggproto lacp laggport tap0 laggport tap1 laggport tap2 192.0.0.2/24 sleep 0.2 echo "About to destroy" ifconfig lagg0 destroy sleep 0.2 done & while true; do ifconfig -am > /dev/null done >Fix: The purpose of lacp_req is to return LACP property information to userland when you do "ifconfig lagg0". So I think that it would be ok if it returned a block full of zeros. This would only happen while the interface is being destroyed, and userland should be able to deal with that. So my proposed fix (attached), is to simply check for NULL == lsc and return early. Patch attached with submission follows: Index: sys/net/ieee8023ad_lacp.c =================================================================== --- sys/net/ieee8023ad_lacp.c (revision 264920) +++ sys/net/ieee8023ad_lacp.c (working copy) @@ -590,10 +590,20 @@ { struct lacp_opreq *req = (struct lacp_opreq *)data; struct lacp_softc *lsc = LACP_SOFTC(sc); - struct lacp_aggregator *la = lsc->lsc_active_aggregator; + struct lacp_aggregator *la; + bzero(req, sizeof(struct lacp_opreq)); + + /* + * If the LACP softc is NULL, return with the opreq structure full of + * zeros. It is normal for the softc to be NULL while the lagg is + * being destroyed. + */ + if (NULL == lsc) + return; + + la = lsc->lsc_active_aggregator; LACP_LOCK(lsc); - bzero(req, sizeof(struct lacp_opreq)); if (la != NULL) { req->actor_prio = ntohs(la->la_actor.lip_systemid.lsi_prio); memcpy(&req->actor_mac, &la->la_actor.lip_systemid.lsi_mac, >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201404252233.s3PMXvVM083834>