Date: Sun, 3 Nov 1996 23:18:23 -0500 (EST) From: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com> To: dyson@freebsd.org Cc: ponds!freefall.cdrom.com!freebsd-hackers, ponds!lakes.water.net!rivers Subject: More info on the daily panics... Message-ID: <199611040418.XAA01799@lakes.water.net>
next in thread | raw e-mail | index | archive | help
> > > > > > Everyone - > > > > As an example of something that crashes 2.1.5 (fairly quickly, I > > may add) and will not likely be determined by 'crashme'. I offer > > the following shell script. > > > If you can digest the problem with your system down to this level, I am sure > that a fix will be quickly forthcoming. Try the fix that Bruce just > posted, and see if your problem goes away. > > John > Well - I don't really have any ideas... I was trying to suggest that running 'crashme' for weeks on end probably isn't going to route out too many, if any, bugs... The problems I'm seeing appear to all be file-system related. I've applied Bruce's fix; but it really was for a different problem. I got another panic today: ffs_valloc: dup alloc that's the one I usually get when doing the newfs on the install (for 2.1.0 and 2.1.5) if I don't alter the newfs parms. Just to be clear; this is a 2.1.5-Stable kernel with Bruce's ufs_vnops.c fix. Here's the gdb -k traceback. This isn't a debugable kernel (I should probably fix that), so there's not too much information: Script started on Sun Nov 3 22:39:49 1996 [ponds.water.net]$ gdb -k kernel.6 vmcore.6 GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.13 (i386-unknown-freebsd), Copyright 1994 Free Software Foundation, Inc...(no debugging symbols found)... IdlePTD 1e4000 current pcb at 1d5484 panic: ffs_valloc: dup alloc #0 0xf0193c7b in boot () (kgdb) where #0 0xf0193c7b in boot () #1 0xf0112b83 in panic () #2 0xf0175183 in ffs_valloc () #3 0xf01813c6 in ufs_makeinode () #4 0xf017ed85 in ufs_create () #5 0xf012cb97 in vn_open () #6 0xf012a3cf in open () #7 0xf019bff6 in syscall () #8 0xf01914bb in Xsyscall () #9 0x33c0 in ?? () #10 0x32cd in ?? () #11 0x327d in ?? () #12 0x31d7 in ?? () #13 0x2f7b in ?? () #14 0x2e76 in ?? () #15 0x3b87 in ?? () #16 0x4854 in ?? () #17 0x474e in ?? () #18 0x467a in ?? () #19 0x1ffe in ?? () #20 0x1e6b in ?? () #21 0x1d21 in ?? () #22 0x16a7 in ?? () #23 0x10d3 in ?? () (kgdb) quit [ponds.water.net]$ Script done on Sun Nov 3 22:40:09 1996 As you can see from the ls -l on /var/crash - after I enabled savedump; I get a crash almost every day (although I went for almost a week and ran just fine): [ponds.water.net]$ pwd /var/crash [ponds.water.net]$ ls -l total 131684 -rw-r--r-- 1 root wheel 2 Nov 3 13:21 bounds -rw-r--r-- 1 root wheel 956177 Oct 23 01:29 kernel.0 -rw-r--r-- 1 root wheel 956177 Oct 24 03:07 kernel.1 -rw-r--r-- 1 root wheel 965207 Oct 30 07:54 kernel.2 -rw-r--r-- 1 root wheel 965207 Oct 31 03:20 kernel.3 -rw-r--r-- 1 root wheel 965207 Nov 1 03:21 kernel.4 -rw-r--r-- 1 root wheel 965207 Nov 2 03:24 kernel.5 -rw-r--r-- 1 root wheel 965207 Nov 3 13:22 kernel.6 -rw-rw-r-- 1 root wheel 5 Jul 16 22:37 minfree -rw-r--r-- 1 root wheel 8650752 Oct 23 01:29 vmcore.0 -rw-r--r-- 1 root wheel 8650752 Oct 24 03:07 vmcore.1 -rw-r--r-- 1 root wheel 8650752 Oct 30 07:54 vmcore.2 -rw-r--r-- 1 root wheel 8650752 Oct 31 03:20 vmcore.3 -rw-r--r-- 1 root wheel 8650752 Nov 1 03:21 vmcore.4 -rw-r--r-- 1 root wheel 8650752 Nov 2 03:23 vmcore.5 -rw-r--r-- 1 root wheel 8650752 Nov 3 13:22 vmcore.6 (By the way, the panic on Nov 2nd - vmcore.5, was panic: ifree: freeing free inode the most frequent one I see...) You'll note from the times, these crashes tend to occur at roughly the same time. I've been scanning the cron logs; the only thing I see starting up about that time is the 03:15:00 /usr/libexec/atrun and a 03:15:00 /usr/lib/newsbin/input/newsrun. Both of these run every hour (well, atrun runs more often than that, of course) - so I'm not yet sure exactly what command(s) is/are causing the panic. Almost all the entries in the cron logs look like: Oct 23 01:15:01 ponds CRON[3494]: (news) CMD (/usr/lib/newsbin/input/newsrun) Oct 23 01:20:00 ponds CRON[3610]: (root) CMD (/usr/libexec/atrun) Oct 23 01:30:01 ponds cron[138]: (CRON) STARTUP (fork ok) although a few have something besides 'newsrun' starting up... it all seems pretty harmless. Let me add that there are no 'at' entries in the queue, for anyone on the system... Perhaps my next step is to build a debuggable kernel and start debugging... Let me add that a duplicate alloc, and freeing already free'd inodes, etc.... tend to point to a problem in the same area. Something fishy the the inode management in ffs_alloc.c. Of course, a reliable reproduction of the problem would help toward locating the fix... The 'freeing free inode' occurs if cg_inosused for the inode being free'd is already clear. The 'dup alloc' panic occurs if the i_mode of the inode which was returned from the allocator was already set (indicating the inode was already allocated.) The allocator here is ffs_nodealloccg() - which can return a wrong result if the cg_inosused field is clear... isclr() (from /usr/include/sys/param.h) is: #define isclr(a,i) (((a)[(i)/NBBY] & (1<<((i)%NBBY))) == 0) [NBBY is '8' from types.h.] This seems completely reasonable for accessing a bit from a character array... The cg_inosused() macro to access the used inodes table would be incorrect if CG_MAGIC wasn't set for some reason - then you would wind up pointing to trash for the inode table... This is what makes me think that allocating more inodes/blocks in a file system tickles this problem... Ideas on possible investigation avenues (i.e. hints on where to look) would be appreciated :-) But, I'm starting to become familiar with ffs_valloc.c & the cg structure :-) - Dave Rivers -
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611040418.XAA01799>