From owner-freebsd-bugs@FreeBSD.ORG Sun Sep 10 16:40:22 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1FB5E16A47B for ; Sun, 10 Sep 2006 16:40:22 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id C8AE243D58 for ; Sun, 10 Sep 2006 16:40:20 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k8AGeKlD055731 for ; Sun, 10 Sep 2006 16:40:20 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k8AGeK6x055730; Sun, 10 Sep 2006 16:40:20 GMT (envelope-from gnats) Resent-Date: Sun, 10 Sep 2006 16:40:20 GMT Resent-Message-Id: <200609101640.k8AGeK6x055730@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Martin Blapp Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E4A9516A40F for ; Sun, 10 Sep 2006 16:33:45 +0000 (UTC) (envelope-from mbr@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id A420743D49 for ; Sun, 10 Sep 2006 16:33:45 +0000 (GMT) (envelope-from mbr@FreeBSD.org) Received: from freefall.freebsd.org (mbr@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k8AGXjj8055574 for ; Sun, 10 Sep 2006 16:33:45 GMT (envelope-from mbr@freefall.freebsd.org) Received: (from mbr@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k8AGXj9o055573; Sun, 10 Sep 2006 16:33:45 GMT (envelope-from mbr) Message-Id: <200609101633.k8AGXj9o055573@freefall.freebsd.org> Date: Sun, 10 Sep 2006 16:33:45 GMT From: Martin Blapp To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: kern/103101: Locking race in tty.c causes frequent panics on SMP X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Martin Blapp List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Sep 2006 16:40:22 -0000 >Number: 103101 >Category: kern >Synopsis: Locking race in tty.c causes frequent panics on SMP >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Sep 10 16:40:20 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Martin Blapp >Release: FreeBSD 6.1-STABLE i386 >Organization: ImproWare AG >Environment: >Description: Normally a shared lock of the proctree lock is used to protect tp->t_session. But this lock isn't used everywhere consequently to protect against races like this one. The proctree_lock at this place happens too late. The patch does fix this problem. (kgdb) bt #0 doadump () at pcpu.h:165 #1 0xc066355e in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc06638b5 in panic (fmt=0xc0891732 "%s") at /usr/src/sys/kern/kern_shutdown.c:565 #3 0xc085c6b6 in trap_fatal (frame=0xed6e4ab8, eva=4) at /usr/src/sys/i386/i386/trap.c:836 #4 0xc085c3bf in trap_pfault (frame=0xed6e4ab8, usermode=0, eva=4) at /usr/src/sys/i386/i386/trap.c:744 #5 0xc085bfb5 in trap (frame= {tf_fs = 8, tf_es = 40, tf_ds = -1063714776, tf_edi = -1064042304, tf_esi = 0, tf_ebp = -311538944, tf_isp = -311538972, tf_ebx = -967615488, tf_edx = -1063651212, tf_ecx = -941099136, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1066845359, tf_cs = 32, tf_eflags = 66194, tf_esp = -967615488, tf_ss = 0}) at /usr/src/sys/i386/i386/trap.c:434 #6 0xc0848bea in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc0693b51 in ttymodem (tp=0xc6535c00, flag=-1063651212) at /usr/src/sys/kern/tty.c:1659 #8 0xc0698362 in ptcclose (dev=0x0, flags=3, fmt=8192, td=0xc7e7f780) at linedisc.h:136 #9 0xc0638a6f in giant_close (dev=0xcb3c1100, fflag=3, devtype=8192, td=0xc7e7f780) at /usr/src/sys/kern/kern_conf.c:266 #10 0xc06162bf in devfs_close (ap=0xed6e4b7c) at /usr/src/sys/fs/devfs/devfs_vnops.c:287 #11 0xc086dc1c in VOP_CLOSE_APV (vop=0x0, a=0xc099f874) at vnode_if.c:426 #12 0xc06c87e2 in vn_close (vp=0xc9cdf660, flags=3, file_cred=0x0, td=0xc7e7f780) at vnode_if.h:227 #13 0xc06c974a in vn_closefile (fp=0xc6fc5438, td=0xc7e7f780) at /usr/src/sys/kern/vfs_vnops.c:865 #14 0xc06162e7 in devfs_close_f (fp=0xc6fc5438, td=0xc7e7f780) at /usr/src/sys/fs/devfs/devfs_vnops.c:297 #15 0xc0642cdc in fdrop_locked (fp=0xc6fc5438, td=0xc7e7f780) at file.h:295 #16 0xc0642c29 in fdrop (fp=0xc6fc5438, td=0xc7e7f780) at /usr/src/sys/kern/kern_descrip.c:2122 #17 0xc06411c7 in closef (fp=0xc6fc5438, td=0xc7e7f780) at /usr/src/sys/kern/kern_descrip.c:1942 #18 0xc063e329 in close (td=0xc7e7f780, uap=0x0) at /usr/src/sys/kern/kern_descrip.c:1007 In ttymodem() the current code checks correcty if tp->t_session isn't NULL, but does the necessary process group lock later. Then it tries to access a member of tp->t_session while it became NULL just before -> panic(). The only way to solve this for now is to protect t_session with exclusive locks at the places where we modify it in tty_close() and ttioctl(), and shared locks at places where we first test it and then access a member of it. I know that this isn't a perfect solution, the tty subsystem definitly needs proper locking and someone has to do it. But in the meantime, we need a stable SMP freebsd. I've made a more complete patch available at: http://antispam.imp.ch/patches/patch-tty.t_pgrp.diff >How-To-Repeat: I haven't found a way to quickly reproduce this bug. We have seen this panics on all SMP servers we run with FreeBSD 5/6, mostly under load conditions after 2-3 days uptime. An active serial console will trigger the bug more often it seems but seems not to be necessary. >Fix: --- sys/kern/tty.c Sun Nov 6 17:09:32 2005 +++ sys/kern/tty.c Sat Jul 8 08:29:07 2006 @@ -1654,8 +1668,8 @@ !ISSET(tp->t_cflag, CLOCAL)) { SET(tp->t_state, TS_ZOMBIE); CLR(tp->t_state, TS_CONNECTED); + sx_slock(&proctree_lock); /* XXX: protect t_session */ if (tp->t_session) { - sx_slock(&proctree_lock); if (tp->t_session->s_leader) { struct proc *p; @@ -1664,8 +1678,8 @@ psignal(p, SIGHUP); PROC_UNLOCK(p); } - sx_sunlock(&proctree_lock); } + sx_sunlock(&proctree_lock); ttyflush(tp, FREAD | FWRITE); return (0); } >Release-Note: >Audit-Trail: >Unformatted: >System: SMP kernel on SMP systems. The bug is present in RELENG_5, RELENG_6 and in HEAD. During my tests, I've seen the panic on FreeBSD 5 as well on FreeBSD 6.