From owner-freebsd-current@FreeBSD.ORG Thu Feb 8 21:16:24 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D9D7016A406; Thu, 8 Feb 2007 21:16:24 +0000 (UTC) (envelope-from frode@nordahl.net) Received: from smtp1.powertech.no (smtp1.powertech.no [195.159.0.145]) by mx1.freebsd.org (Postfix) with ESMTP id 72D1A13C461; Thu, 8 Feb 2007 21:16:24 +0000 (UTC) (envelope-from frode@nordahl.net) Received: from [195.159.148.126] (dhcp7.xu.nordahl.net [195.159.148.126]) by smtp1.powertech.no (Postfix) with ESMTP id 44FB48B20; Thu, 8 Feb 2007 21:49:13 +0100 (CET) In-Reply-To: <20061127092146.GA69556@deviant.kiev.zoral.com.ua> References: <456950AF.3090308@sh.cvut.cz> <20061127092146.GA69556@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <84F23118-A6C3-44F8-B3FD-AE21C50D0EF9@nordahl.net> Content-Transfer-Encoding: 7bit From: Frode Nordahl Date: Thu, 8 Feb 2007 21:49:23 +0100 To: Kostik Belousov X-Mailer: Apple Mail (2.752.3) Cc: freebsd-stable@freebsd.org, V??clav Haisman , tegge@freebsd.org, bde@freebsd.org, freebsd-current@freebsd.org Subject: Re: kqueue LOR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Feb 2007 21:16:24 -0000 On 27. nov. 2006, at 10.21, Kostik Belousov wrote: > On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: >> Hi, >> the attached lor.txt contains LOR I got this yesterday. It is >> FreeBSD 6.1 >> with relatively recent kernel, from last week or so. >> >> -- >> VH > >> +lock order reversal: >> + 1st 0xc537f300 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c: >> 1547 >> + 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ /usr/src/ >> sys/ufs/ufs/ufs_vnops.c:138 >> +KDB: stack backtrace: >> +kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) >> at kdb_backtrace+0x2f >> +witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at >> witness_checkorder+0x5fe >> +_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at >> _mtx_lock_flags+0x32 >> +ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at >> ufs_itimes+0x6c >> +ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at >> ufs_getattr+0x20 >> +VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) >> at VOP_GETATTR_APV+0x3a >> +filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 >> +knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 >> +VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at >> VOP_WRITE_APV+0x148 >> +vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at vn_write >> +0x201 >> +dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at >> dofilewrite+0x84 >> +kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 >> +write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f >> +syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 >> +Xint0x80_syscall() at Xint0x80_syscall+0x1f >> +--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = >> 0xbfbfea1c, ebp = 0xbfbfea48 --- > > Thank you for the report. The LOR is caused by my commit into > sys/ufs/ufs/ufs_vnops.c, rev. 1.280. While debugging a problem I have with 6.2-RELEASE on one of my servers I saw this LOR. After being up for a short while the server freezes, not responding to serial console, network og keyboard. I can't even get to DDB by sending BREAK on the serial console. Enabling INVARIANTS, INVARIANT_SUPPORT, WITNESS and WITNESS_SKIPSPIN did not give more information about the freeze other than printing the LOR now and then. The LOR I am getting is exactly the same except the calls are made to writev instead of write. > What application you run that triggers the LOR ? Patch below is one > possible approach to fixing it. I am seeing this on a front-end MX server, I can trigger it by running "tail -f /var/log/maillog", the LOR is printed before any output is printed by tail. After triggering it once, it will not trigger regularilly until waiting for some time. Waiting 180 seconds seems to be good to make it happen every time, but it can be triggered earlier. My maillog grows about 976K during that time. May this LOR have something to do with the system freze I am experiencing? Should I try the patch in your mail from november 27. or december 13? Or has some other fix emerged since then? -- Frode Nordahl