From owner-freebsd-current@FreeBSD.ORG Sun Jan 16 09:27:44 2005 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A52F16A4CE for ; Sun, 16 Jan 2005 09:27:44 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id A9A3C43D3F for ; Sun, 16 Jan 2005 09:27:43 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.1/8.13.1) with ESMTP id j0G9Raci037056; Sun, 16 Jan 2005 01:27:40 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200501160927.j0G9Raci037056@gw.catspoiler.org> Date: Sun, 16 Jan 2005 01:27:36 -0800 (PST) From: Don Lewis To: phk@phk.freebsd.dk In-Reply-To: <200501152329.j0FNTqrf035987@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT cc: current@FreeBSD.org Subject: Re: reproduced: ffs_blkfree: freeing free block X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Jan 2005 09:27:44 -0000 On 15 Jan, Don Lewis wrote: > On 15 Jan, Poul-Henning Kamp wrote: >> >> Quite by accident my test-machine here can now reliably reproduce >> the dreaded "panic: ffs_blkfree: freeing free block" in a few minutes >> of time. >> >> It is very interesting that the location of the actual error is a >> very narrow stripe of the filesystem: >> >> dev = ad8, block = 13456368, fs = /hex >> dev = ad8, block = 13455888, fs = /hex >> dev = ad8, block = 13454688, fs = /hex >> dev = ad8, block = 13455040, fs = /hex >> dev = ad8, block = 13455200, fs = /hex >> dev = ad8, block = 13455880, fs = /hex >> >> The application I'm running at the time adds/modifies records in a >> db(3) hash file and nothing much besides. >> >> Before I start implementing complete I/O traces and spend days >> groveling over UFS/FFS on-disk bits, are there anybody who has >> suggestions for things I should try to enable/disable to narrow >> this down ? > > I'm reasonably certain that the on-disk bits are ok. I'm seeing it in > the case where a freshly written file is being re-written. If I unmount > the file system after the file is initially created, the file system > fsck's clean, and if I then remount the file system, I am unable to > reproduce the problem. Fsdb lists the block causing the panic as one > that was initially allocated to the file. It looks like the in-core > block bitmap is getting corrupted. > > In my openoffice build example, the following final step is sufficient > to trigger the panic: > jot -b x 174113 > \ > /mnt/usr/ports/editors/openoffice-1.1/work/config_office/configure > Merely truncating the file with "cat /dev/null" does not appear to be > sufficient to trigger the panic. If I run the openoffice-1.1 make targets from extract through patch-autotools, then truncate and rewrite teh configure script, I can reproduce the panic: fsync /mnt/usr/ports/editors/openoffice-1.1/work/config_office/configure ls -li /mnt/usr/ports/editors/openoffice-1.1/work/config_office/configure 1274809 -rwxr-xr-x 1 root wheel 348277 Aug 27 03:03 /mnt/usr/ports/editors/openoffice-1.1/work/config_office/configure fsdb -r /dev/da0s2a ** /dev/da0s2a (NO WRITE) Examining file system `/dev/da0s2a' Last Mounted on /mnt current inode: directory I=2 MODE=40755 SIZE=512 MTIME=Feb 14 11:55:43 2004 [0 nsec] CTIME=Feb 14 11:55:43 2004 [0 nsec] ATIME=Jan 14 18:24:05 2005 [0 nsec] OWNER=root GRP=wheel LINKCNT=7 FLAGS=0 BLKCNT=4 GEN=53557245 fsdb (inum: 2)> inode 1274809 current inode: regular file I=1274809 MODE=100755 SIZE=348277 MTIME=Aug 27 03:03:28 2004 [0 nsec] CTIME=Jan 16 00:47:00 2005 [0 nsec] ATIME=Dec 31 16:00:00 1969 [0 nsec] OWNER=root GRP=wheel LINKCNT=1 FLAGS=0 BLKCNT=2e0 GEN=791e742c fsdb (inum: 1274809)> blocks Blocks for inode 1274809: Direct blocks: 5089312, 5089320, 5089328, 5089336, 5089344, 5089352, 5089360, 5089368, 5089424, 5089432, 5089440, 5089448 Indirect blocks: 5089456, 5089464, 5089472, 5089480, 5089488, 5089496, 5089504, 5089512, 5089520, 5089528, fsdb (inum: 1274809)> q cat /dev/null > /mnt/usr/ports/editors/openoffice-1.1/work/config_office/configure sleep 120 jot -b x 174113 > /mnt/usr/ports/editors/openoffice-1.1/work/config_office/configure sleep 120 The system panics during this final sleep: dev = da0s2a, block = 5089472, fs = /mnt panic: ffs_blkfree: freeing free block cpuid = 0 KDB: enter: panic The block number in question is one that was initially allocated to the configure file. Because of the sleep command, I would think that the free block bitmap should have been updated to mark this block free sometime between when the configure file was truncated by "cat /dev/null" and when the configure file was re-written by jot, so it looks like block 5089472 is being freed twice. Without the final jot command to rewrite the file, I can wait an arbitrarily long time and the system will not panic. GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". doadump () at pcpu.h:159 (kgdb) where #0 doadump () at pcpu.h:159 #1 0xc04661e6 in db_fncall (dummy1=-1065580187, dummy2=0, dummy3=-455755064, dummy4=0xe4d5ba60 "\224؛صن") at /usr/src/sys/ddb/db_command.c:531 #2 0xc046657c in db_command_loop () at /usr/src/sys/ddb/db_command.c:349 #3 0xc0467fa8 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221 #4 0xc063baee in kdb_trap (type=3, code=0, tf=0xe4d5bb88) at /usr/src/sys/kern/subr_kdb.c:421 #5 0xc07dba6c in trap (frame= {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = 0, tf_esi = 1, tf_ebp = -455754808, tf_isp = -455754828, tf_ebx = -455754764, tf_edx = -1065055680, tf_ecx = -1064242016, tf_eax = -1065046973, tf_trapno = 3, tf_err = 0, tf_eip = -1067206872, tf_cs = 8, tf_eflags = 646, tf_esp = -455754776, tf_ss = -1067304789}) at /usr/src/sys/i386/i386/trap.c:573 #6 0xc07ca3da in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0x00000018 in ?? () #8 0x00000010 in ?? () #9 0x00000010 in ?? () #10 0x00000000 in ?? () #11 0x00000001 in ?? () #12 0xe4d5bbc8 in ?? () #13 0xe4d5bbb4 in ?? () #14 0xe4d5bbf4 in ?? () #15 0xc0848a40 in ?? () #16 0xc090f4a0 in shutdown_howto () #17 0xc084ac43 in ?? () #18 0x00000003 in ?? () #19 0x00000000 in ?? () #20 0xc063b728 in kdb_enter (msg=0x0) at cpufunc.h:56 #21 0xc06238ab in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:538 #22 0xc0753f58 in ffs_blkfree (fs=0xc2861800, devvp=0xc298e114, bno=5089472, size=16384, inum=1274809) at /usr/src/sys/ufs/ffs/ffs_alloc.c:1791 #23 0xc0762dae in handle_workitem_freefrag (freefrag=0xc63ae540) at /usr/src/sys/ufs/ffs/ffs_softdep.c:1588 #24 0xc0765a38 in process_worklist_item (matchmnt=0x0, flags=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:768 #25 0xc076a3aa in softdep_process_worklist (matchmnt=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:623 #26 0xc0677f12 in sched_sync () at /usr/src/sys/kern/vfs_subr.c:1595 #27 0xc0610ba2 in fork_exit (callout=0xc0677b60 , arg=0x0, frame=0xe4d5bd48) at /usr/src/sys/kern/kern_fork.c:787 #28 0xc07ca43c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208 Note that the inum parameter to ffs_blkfree() matches the inode number of the configure file. I would not expect softupdates to be freeing blocks for this inode at the time when the machine paniced. At the time of the panic, there is one more item on the worklist, another struct freefrag that points to block #5089464. (kgdb) frame 25 #25 0xc076a3aa in softdep_process_worklist (matchmnt=0x0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:623 623 if ((cnt = process_worklist_item(matchmnt, 0)) == -1) (kgdb) print num_on_worklist $2 = 2 (kgdb) frame 24 #24 0xc0765a38 in process_worklist_item (matchmnt=0x0, flags=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:768 768 handle_workitem_freefrag(WK_FREEFRAG(wk)); print *(struct freefrag *)wk->wk_list->le_next $8 = {ff_list = {wk_list = {le_next = 0xc57cd9e0, le_prev = 0xc096a018}, wk_type = 7, wk_state = 32768}, ff_mnt = 0xc26a9800, ff_blkno = 5089464, ff_fragsize = 16384, ff_inum = 1274809}