From owner-freebsd-stable@FreeBSD.ORG Fri Oct 23 20:51:18 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 690681065670 for ; Fri, 23 Oct 2009 20:51:18 +0000 (UTC) (envelope-from heliocentric@gmail.com) Received: from mail-yx0-f171.google.com (mail-yx0-f171.google.com [209.85.210.171]) by mx1.freebsd.org (Postfix) with ESMTP id 25FFA8FC15 for ; Fri, 23 Oct 2009 20:51:17 +0000 (UTC) Received: by yxe1 with SMTP id 1so8285868yxe.3 for ; Fri, 23 Oct 2009 13:51:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=K3+VpXvA6Q3PEHIF2JtwYHCiOzCF40Xllgrs6BC5e5E=; b=t7ifK4Fm72fganm3mmmgbx8Mtjpf3jUUtrbB9rpF+Ufq4eNMsaZxhccEzqMbPCDPz+ o350sSHYTNSdfexNoSrIn1US0QUEu+pXf0jvtVq8uH4CRLKJ4aJV04d+guP85ix9psGz H9SZF4dBBXGZkSzq8x/Xq8v6U19Pi3Ob27tBI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=fSNLA7Azpcv33q/jOnYh+lnnYKLZ5XeqfhQw+NY+FYGj5fqlA2TSna2C4NgTsNoKuV EVu35lvXvoR+tY4iEYUOrbVTq1wR5tfuqHfucR7wOIOf4d+r8r2FU5p9X0vE3SyCCiHl didlob8kZ00dJUoa+tX9QtJ4pWXMTcDqdUJco= MIME-Version: 1.0 Received: by 10.151.21.1 with SMTP id y1mr19116944ybi.3.1256329739658; Fri, 23 Oct 2009 13:28:59 -0700 (PDT) In-Reply-To: References: Date: Fri, 23 Oct 2009 16:28:59 -0400 Message-ID: From: Dylan Cochran To: Jaime Bozza Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-stable@freebsd.org" Subject: Re: Possible scheduler (SCHED_ULE) bug? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Oct 2009 20:51:18 -0000 On 10/23/09, Jaime Bozza wrote: > I believe I found a problem with the ULE scheduler - At least the fact that > there is a problem, but I'm not sure where to go from here. The system > locks all processes, but doesn't panic, so I have no output to give. > > I was able to duplicate this on three different machines and solved it by > switching to the scheduler to 4BSD. > > Here's the environment: > > FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no > other changes other than setting timezone, changing root password, and > turning on sshd (allowing root and password connection). > > Running portsnap (fetch, then extract) to get latest ports tree. > > >From ports, make installs of lang/php5 and www/lighttpd, using defaults for > all ports installed. > > Modified lighttpd.conf for PHP (attached diff), created a short script > called uploadfile.php (attached). File was installed at > /usr/local/www/data/uploadfile.php > > Start lighttpd (lighttpd_enable="YES" in rc.conf, > /usr/local/etc/rc.d/lighttpd start), connect and run script. > > As long as I upload a file less than 64K, everything works fine. If I try > to upload something larger than 64K, system no longer responds. Console > prompt at login will allow me to enter username/password, but nothing > happens after that. Console prompt logged in will allow me to type a single > line, but if I press enter, nothing after that. > > No errors get written anywhere - console, logs, etc. > > I'm at a loss of what to do next. Can anyone give me ideas of what else I > can do? Superficially, this seams identical to a deadlock I reported for 7.1-RC1. Would you mind compiling a kernel with these options: options DDB options KDB options SW_WATCHDOG options DEBUG_VFS_LOCKS then add the following to /etc/rc.conf: watchdogd_enable="YES" watchdogd_flags="-e 'ls -al /etc'" This should force a panic when the lockup happens again, which will drop to a debugger. Please check the backtrace, and tell me if the call stack is the same as this one (between the --- interrupt, and --- syscall sections): KDB: stack backtrace: db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29 hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9 lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c Xtimerint() at Xtimerint+0x1f --- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 --- kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at do_sendfile+0xb1 sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13 syscall(e66e0d38) at syscall+0x335 Xint0x80_syscall() at Xint0x80_syscall+0x20 --- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp = 0xbfbfc7cc, ebp = 0xbfbfe848 --- KDB: enter: watchdog timeout You can type 'reboot' to reboot the machine (in my case, panic would not work, so a useful dump wasn't in the cards)