From owner-freebsd-hackers Mon Nov 4 17:27:50 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA19237 for hackers-outgoing; Mon, 4 Nov 1996 17:27:50 -0800 (PST) Received: from mail.crl.com (mail.crl.com [165.113.1.22]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id RAA19231 for ; Mon, 4 Nov 1996 17:27:47 -0800 (PST) Received: from parkplace.cet.co.jp by mail.crl.com with SMTP id AA19395 (5.65c/IDA-1.5 for ); Mon, 4 Nov 1996 18:28:54 -0700 Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.8.2/CET-v2.1) with SMTP id BAA00924; Tue, 5 Nov 1996 01:21:12 GMT Date: Tue, 5 Nov 1996 10:21:12 +0900 (JST) From: Michael Hancock To: Thomas David Rivers Cc: FreeBSD Hackers Subject: Re: More info on the daily panics... In-Reply-To: <199611040418.XAA01799@lakes.water.net> Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk I think you're on the right track. Continue looking at the initialization code of the inode allocation code and also start looking at deallocation. /sys/ufs/ufs/ufs_inode.c:ufs_reclaim() and /sys/kern/vfs_subr.c:vclean(). It could also be a nasty race condition. Also, please provide some stats on your system including the size of RAM, Size of Disk, %Free Disk Space, %Free Inodes, the number of fictitious cylinders, etc. Regards, Mike Hancock btw, I am by no means an fs expert. I am interested in file systems though. On Sun, 3 Nov 1996, Thomas David Rivers wrote: > I've applied Bruce's fix; but it really was for a different problem. > > I got another panic today: > > ffs_valloc: dup alloc > > that's the one I usually get when doing the newfs on the install > (for 2.1.0 and 2.1.5) if I don't alter the newfs parms. Just to > be clear; this is a 2.1.5-Stable kernel with Bruce's ufs_vnops.c > fix. > > Here's the gdb -k traceback. This isn't a debugable kernel (I should > probably fix that), so there's not too much information: > > > Script started on Sun Nov 3 22:39:49 1996 > [ponds.water.net]$ gdb -k kernel.6 vmcore.6 > GDB is free software and you are welcome to distribute copies of it > under certain conditions; type "show copying" to see the conditions. > There is absolutely no warranty for GDB; type "show warranty" for details. > GDB 4.13 (i386-unknown-freebsd), > Copyright 1994 Free Software Foundation, Inc...(no debugging symbols found)... > IdlePTD 1e4000 > current pcb at 1d5484 > panic: ffs_valloc: dup alloc > #0 0xf0193c7b in boot () > (kgdb) where > #0 0xf0193c7b in boot () > #1 0xf0112b83 in panic () > #2 0xf0175183 in ffs_valloc () > #3 0xf01813c6 in ufs_makeinode () > #4 0xf017ed85 in ufs_create () > #5 0xf012cb97 in vn_open () > #6 0xf012a3cf in open () > #7 0xf019bff6 in syscall () > #8 0xf01914bb in Xsyscall () > #9 0x33c0 in ?? () > #10 0x32cd in ?? () > #11 0x327d in ?? () > #12 0x31d7 in ?? () > #13 0x2f7b in ?? () > #14 0x2e76 in ?? () > #15 0x3b87 in ?? () > #16 0x4854 in ?? () > #17 0x474e in ?? () > #18 0x467a in ?? () > #19 0x1ffe in ?? () > #20 0x1e6b in ?? () > #21 0x1d21 in ?? () > #22 0x16a7 in ?? () > #23 0x10d3 in ?? () > (kgdb) quit > [ponds.water.net]$ > Script done on Sun Nov 3 22:40:09 1996 > > As you can see from the ls -l on /var/crash - after I enabled > savedump; I get a crash almost every day (although I went for > almost a week and ran just fine): > > [ponds.water.net]$ pwd > /var/crash > [ponds.water.net]$ ls -l > total 131684 > -rw-r--r-- 1 root wheel 2 Nov 3 13:21 bounds > -rw-r--r-- 1 root wheel 956177 Oct 23 01:29 kernel.0 > -rw-r--r-- 1 root wheel 956177 Oct 24 03:07 kernel.1 > -rw-r--r-- 1 root wheel 965207 Oct 30 07:54 kernel.2 > -rw-r--r-- 1 root wheel 965207 Oct 31 03:20 kernel.3 > -rw-r--r-- 1 root wheel 965207 Nov 1 03:21 kernel.4 > -rw-r--r-- 1 root wheel 965207 Nov 2 03:24 kernel.5 > -rw-r--r-- 1 root wheel 965207 Nov 3 13:22 kernel.6 > -rw-rw-r-- 1 root wheel 5 Jul 16 22:37 minfree > -rw-r--r-- 1 root wheel 8650752 Oct 23 01:29 vmcore.0 > -rw-r--r-- 1 root wheel 8650752 Oct 24 03:07 vmcore.1 > -rw-r--r-- 1 root wheel 8650752 Oct 30 07:54 vmcore.2 > -rw-r--r-- 1 root wheel 8650752 Oct 31 03:20 vmcore.3 > -rw-r--r-- 1 root wheel 8650752 Nov 1 03:21 vmcore.4 > -rw-r--r-- 1 root wheel 8650752 Nov 2 03:23 vmcore.5 > -rw-r--r-- 1 root wheel 8650752 Nov 3 13:22 vmcore.6 > > > (By the way, the panic on Nov 2nd - vmcore.5, was > panic: ifree: freeing free inode > the most frequent one I see...) > > You'll note from the times, these crashes tend to occur at > roughly the same time. I've been scanning the cron logs; the > only thing I see starting up about that time is the 03:15:00 > /usr/libexec/atrun and a 03:15:00 /usr/lib/newsbin/input/newsrun. > Both of these run every hour (well, atrun runs more often than that, > of course) - so I'm not yet sure exactly what command(s) is/are > causing the panic. > > Almost all the entries in the cron logs look like: > > Oct 23 01:15:01 ponds CRON[3494]: (news) CMD (/usr/lib/newsbin/input/newsrun) > Oct 23 01:20:00 ponds CRON[3610]: (root) CMD (/usr/libexec/atrun) > Oct 23 01:30:01 ponds cron[138]: (CRON) STARTUP (fork ok) > > although a few have something besides 'newsrun' starting up... it all > seems pretty harmless. > > Let me add that there are no 'at' entries in the queue, for anyone > on the system... > > Perhaps my next step is to build a debuggable kernel and start > debugging... > > Let me add that a duplicate alloc, and freeing already free'd inodes, > etc.... tend to point to a problem in the same area. Something > fishy the the inode management in ffs_alloc.c. > > Of course, a reliable reproduction of the problem would help toward > locating the fix... > > The 'freeing free inode' occurs if cg_inosused for the inode being > free'd is already clear. The 'dup alloc' panic occurs if the i_mode > of the inode which was returned from the allocator was already set > (indicating the inode was already allocated.) The allocator here > is ffs_nodealloccg() - which can return a wrong result if the cg_inosused > field is clear... > > isclr() (from /usr/include/sys/param.h) is: > > #define isclr(a,i) (((a)[(i)/NBBY] & (1<<((i)%NBBY))) == 0) > > [NBBY is '8' from types.h.] This seems completely reasonable for > accessing a bit from a character array... > > The cg_inosused() macro to access the used inodes table would be > incorrect if CG_MAGIC wasn't set for some reason - then you would > wind up pointing to trash for the inode table... > > This is what makes me think that allocating more inodes/blocks > in a file system tickles this problem... > > Ideas on possible investigation avenues (i.e. hints on where to > look) would be appreciated :-) But, I'm starting to become > familiar with ffs_valloc.c & the cg structure :-) > > - Dave Rivers -