From owner-freebsd-bugs@FreeBSD.ORG Thu Jan 10 13:50:02 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4FEE16A420 for ; Thu, 10 Jan 2008 13:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9F9BA13C46E for ; Thu, 10 Jan 2008 13:50:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m0ADo106081712 for ; Thu, 10 Jan 2008 13:50:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m0ADo1Gb081711; Thu, 10 Jan 2008 13:50:01 GMT (envelope-from gnats) Resent-Date: Thu, 10 Jan 2008 13:50:01 GMT Resent-Message-Id: <200801101350.m0ADo1Gb081711@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Sebastien Petit Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 010D516A420 for ; Thu, 10 Jan 2008 13:46:37 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id BF45413C448 for ; Thu, 10 Jan 2008 13:46:36 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m0ADjThO094210 for ; Thu, 10 Jan 2008 13:45:29 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m0ADjTCt094209; Thu, 10 Jan 2008 13:45:29 GMT (envelope-from nobody) Message-Id: <200801101345.m0ADjTCt094209@www.freebsd.org> Date: Thu, 10 Jan 2008 13:45:29 GMT From: Sebastien Petit To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/119530: Kqueue/Kevent causes fatal trap 12 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 13:50:02 -0000 >Number: 119530 >Category: kern >Synopsis: Kqueue/Kevent causes fatal trap 12 >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Jan 10 13:50:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Sebastien Petit >Release: FreeBSD-6.2-RELEASE, FreeBSD-6.2-STABLE, FreeBSD-6.3-PRERELEASE >Organization: Kewego >Environment: FreeBSD proxy0.XXXXXXXXXXXX 6.3-PRERELEASE FreeBSD 6.3-PRERELEASE #0: Thu Jan 10 00:13:27 CET 2008 root@build0.XXXXXXXXXXXX:/usr/src-6.2-STABLE/sys/i386/compile/PE2950-i386 i386 >Description: There is probably a race condition with kqueue and expire of a EVFILT_TIMER event set with EV_ONESHOT flag. In some cases, the kernel crash with a supervisor read error on callout_reset(), probably a race condition because the first argument is NULL, and should not be (struct callout* is NULL) Application that cause this bug create a lot of EVFILT_TIMER events (about 300-400) with 300 seconds of timeout. when EVFILT_TIMER expire, a new is created with 300 seconds of timeout. This application cause the crash detailed below: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18 fault code = supervisor read, page not present instruction pointer = 0x20:0xc051b4bc stack pointer = 0x28:0xe6ea5c68 frame pointer = 0x28:0xe6ea5c78 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = resume, IOPL = 0 current process = 14 (swi4: clock) [thread pid 14 tid 100002 ] Stopped at callout_reset+0xc4: testb $0x4,0x18(%esi) db> where Tracing pid 14 tid 100002 td 0xc8326c00 callout_reset(0,1,c04ed97c,c85cf4c8) at callout_reset+0xc4 filt_timerexpire(c85cf4c8) at filt_timerexpire+0xa8 softclock(0) at softclock+0x2eb ithread_execute_handlers(c8325430,c8375b80) at ithread_execute_handlers+0x125 ithread_loop(c83068c0,e6ea5d38) at ithread_loop+0x55 fork_exit(c04f6cb8,c83068c0,e6ea5d38) at fork_exit+0x71 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe6ea5d6c, ebp = 0 --- db> Ugly patch is attached (test of NULL pointers before calling callout_reset and print a kernel error if a NULL is detected). It must work but a good patch must be created to avoid that. >How-To-Repeat: Run an application that do a lot of EVFILT_TIMER with EV_ONESHOT flag and read the same kqueue with multiple threads. libthr is used. Seem to appear on SMP servers only Kqueue/Kevent is not thread safe ? >Fix: Patch on /usr/src/sys/kern/kern_event.c (filt_timerexpire function) to see what is happening and avoid the call of callout_reset with a NULL struct callout* that cause the fatal trap. static void filt_timerexpire(void *knx) { struct knote *kn = knx; struct callout *calloutp; + if (! knx) { + printf("knx is NULL. cannot expire the timer\n"); + return; + } kn->kn_data++; KNOTE_ACTIVATE(kn, 0); /* XXX - handle locking */ if ((kn->kn_flags & EV_ONESHOT) != EV_ONESHOT) { calloutp = (struct callout *)kn->kn_hook; + if (calloutp) + callout_reset(calloutp, timertoticks(kn->kn_sdata), + filt_timerexpire, kn); + else + printf("warning: calloutp is already freed, aborting\n"); } } I don't know if this patch correct the problem completly, I have patched my systems and see if the race condition happen again. >Release-Note: >Audit-Trail: >Unformatted: