From owner-freebsd-current@FreeBSD.ORG Tue Nov 4 11:20:19 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 587E216A4CE for ; Tue, 4 Nov 2003 11:20:19 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 009C443F75 for ; Tue, 4 Nov 2003 11:20:18 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.9p2/8.12.9) with ESMTP id hA4JInMg064767; Tue, 4 Nov 2003 14:18:49 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)hA4JImUL064764; Tue, 4 Nov 2003 14:18:49 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Tue, 4 Nov 2003 14:18:48 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Nils Andreas Hakansson In-Reply-To: <20031104184810.E16804@gandalf.midgard.liu.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Magnus Dahlstedt cc: freebsd-current@freebsd.org Subject: Re: Page fault X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Nov 2003 19:20:19 -0000 On Tue, 4 Nov 2003, Nils Andreas Hakansson wrote: > I've disabled softupdates because of > a panic("softdep_move_dependencies: need merge code"); Can't comment on this bit. Might want to send e-mail to Kirk directly. > Could someone take a look at this? > > pst: timeout mfa=0x0032d5d0 cmd=0x02 > pst: timeout mfa=0x00336390 cmd=0x02 > pst: timeout mfa=0x0034cdd0 cmd=0x02 > > pst: timeout mfa=0x003b7ab0 cmd=0x02 > pst: timeout mfa=0x00396db0 cmd=0x02 > pst: timeout mfa=0x003a3530 cmd=0x02 > pst: timeout mfa=0x00376890 cmd=0x02 This is your storage device getting unhappy, but I'm not really informed enough on pst to say how or why. I don't know if it is because the requests are bad, or because the controller/chain/device is unable to service the request. > ufs_access(): Error retrieving ACL on object (5). > > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). > ufs_access(): Error retrieving ACL on object (5). This is the UFS ACL code failing closed: it's unable to read the ACLs from disk due to EIO (I/O failure). This is a correct response to that scenario. > Fatal trap 12: page fault while in kernel mode > cpuid = 0; lapic.id = 00000000 > fault virtual address = 0xae18c0de > fault code = supervisor read, page not present > instruction pointer = 0x8:0xc066a566 > stack pointer = 0x10:0xea3a78cc > frame pointer = 0x10:0xea3a7900 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 76932 (smbd) > kernel: type 12 trap, code=0 > Stopped at generic_bcopy+0x1a: repe movsl (%esi),%es:(%edi) > db> trace > generic_bcopy(cf6b0000,1a8,2,c06bd12c,0) at generic_bcopy+0x1a > ffs_getextattr(ea3a7960,ea3a795c,c05159ad,d0346200,184) at > ffs_getextattr+0xe0 This appears to be a bug in UFS2's handling of corrupted EA data on disk. We have some changes in the TrustedBSD development trees to improve resilience to on-disk corruption, but haven't merged them yet. Just to confirm, could you use "gdb -k" on a copy of your kernel with debugging symbols to see where *ffs_getextattr+0xe0 is? For me, it turns up in ffs_vnops.c:1616, which is a variable assignment. There's a bcopy not far above there, which seems the likely candidate. > vn_extattr_get(cb1a8c8c,8,2,c06bd12c,ea3a79d0) at vn_extattr_get+0xaa > ufs_getacl(ea3a7a14,ea3a7a40,c061560b,ea3a7a14,c06df280) at > ufs_getacl+0x99 > ufs_vnoperate(ea3a7a14,c06df280,2,a6,c853cd10) at ufs_vnoperate+0x18 > ufs_access(ea3a7a6c,ea3a7b28,c057dcc9,ea3a7a6c,c0716cc8) at > ufs_access+0xca > ufs_vnoperate(ea3a7a6c,c0716cc8,c0716cc8,c853cd10,cb1a8c8c) at > ufs_vnoperate+0x1 > 8 > vn_open_cred(ea3a7bdc,ea3a7cdc,1a4,d0bb7800,22) at vn_open_cred+0x359 > vn_open(ea3a7bdc,ea3a7cdc,1a4,22,c3ee0fb4) at vn_open+0x30 > kern_open(c853cd10,bfbff130,0,1,1a4) at kern_open+0x143 > open(c853cd10,ea3a7d14,c06c44d0,3ed,3) at open+0x30 > syscall(bfbf002f,82b002f,bfbf002f,bfbffd70,82b3724) at syscall+0x28f > Xint0x80_syscall() at Xint0x80_syscall+0x1d > --- syscall (5, FreeBSD ELF32, open), eip = 0x662b5233, esp = 0xbfbff07c, > ebp = > 0xbfbff098 --- > db> show locks > exclusive sleep mutex Giant r = 0 (0xc07115c0) locked @ > /usr/src/sys/vm/vm_fault > .c:223 Holding Giant here is good. So to summarize: This could be the result of a disk read failure. The UFS code appears to be intolerant of said failure. The ACL code failed closed properly, although perhaps not so usefully. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories