From owner-cvs-src Mon Mar 17 11: 9:42 2003 Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B7BF237B40B; Mon, 17 Mar 2003 11:09:35 -0800 (PST) Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by mx1.FreeBSD.org (Postfix) with ESMTP id E1AF043F75; Mon, 17 Mar 2003 11:09:34 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.12.3/8.12.3) with ESMTP id h2HJ9PFL013291; Mon, 17 Mar 2003 11:09:25 -0800 (PST) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200303171909.h2HJ9PFL013291@beastie.mckusick.com> To: dwmalone@freebsd.org, El Vampiro Subject: kern/42277 Cc: "Evgueni V. Gavrilov" , Mike Makonnen , src-committers@freebsd.org, cvs-src@freebsd.org, cvs-all@freebsd.org X-URL: http://WWW.McKusick.COM/ Reply-To: Kirk McKusick Date: Mon, 17 Mar 2003 11:09:25 -0800 From: Kirk McKusick Sender: owner-cvs-src@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Date: Mon, 24 Feb 2003 03:04:47 -0500 From: Mike Makonnen To: Kirk McKusick Cc: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/ufs/ffs ffs_softdep.c On Sun, 23 Feb 2003 23:28:41 -0800 (PST) Kirk McKusick wrote: > mckusick 2003/02/23 23:28:41 PST > > Modified files: > sys/ufs/ffs ffs_softdep.c > Log: > When removing the last item from a non-empty worklist, the worklist > tail pointer must be updated. > This looks like it might solve kern/42277. Is that correct? -- Mike Makonnen | GPG-KEY: http://www.identd.net/~mtm/mtm.asc mtm@identd.net | Fingerprint: D228 1A6F C64E 120A A1C9 A3AA DAE1 E2AF DBCC 68B9 I had hoped that the above patch would solve the problem, but alas it did not (though that fix should be MFC'ed into -stable as it is a valid fix). Further investigation with the help of Evgueni V. Gavrilov has determined that there is a memory corruption problem in the 128-byte bucket space. Applying the patch below makes the soft updates panics stop. It does so by creating an "unused" field in the inodedep structure at the 64-byte offset. The printf showing that the "unused" field has changed does occur, but at least soft updates continues to work. Evgueni V. Gavrilov and I have been unable to track down the memory corrupting culprit, but perhaps some other on this list can help. I am not inclined to check in this patch to -stable as it fixes a symptom rather than a bug. I am hoping that the true cause of the corruption can be found and fixed instead. Kirk McKusick =-=-=-=-= To: "Evgueni V. Gavrilov" Subject: Re: kern/42277: crash #4 In-Reply-To: Your message of "Sat, 01 Mar 2003 13:25:20 +0600." <20030301072520.GA52366@rusunix.org> Date: Sun, 02 Mar 2003 15:14:15 -0800 From: Kirk McKusick Date: Sat, 1 Mar 2003 13:25:20 +0600 From: "Evgueni V. Gavrilov" To: Kirk McKusick Subject: Re: kern/42277: crash #4 X-ASK-Info: Whitelist match ehlo I got one more panic. I started upload of vmcore.4.bz2 The kernel and the sources are the same. -- http://aquatique.rusunix.org http://rusunix.org Thanks for your help and patience. I would like to say that I found the bug, but alas I have not. But I have determined that all four of your crashes are caused by the same bug. In each case the short at a 64-byte offset from the beginning of an inodedep is being decremented. As this offset is the top half of a pointer, the next time the pointer (now with a value of 0xffff0000 is dereferenced, the kernel panics. The different crashes show the corruption happening at different times in the life of the inodedep, usually after it has been in existence for several seconds, but occationally sooner. This sort of trashing most commonly occurs when a previous user of dynamic memory continues using something that they have freed. So, I would like to test out this theory on your system. Could you please apply the patch below. It creates an unused field in the inodedep structure at the location that is getting trashed, sets it to a known value and then verifies that value has not changed when it is done with the inodedep (printing out a warning if it has changed). If my theory is correct, then the panics will stop and you will get the console message "free_inodedep: trashed memory 0x12335678". If it is a soft updates code problem, then the same panics will persist. Either way, we will have narrowed the scope of possible problems. Kirk McKusick =-=-=-=-= *** softdep.h Thu Jun 22 12:27:42 2000 --- softdep.h.new Sun Mar 2 14:35:26 2003 *************** *** 243,248 **** --- 243,249 ---- off_t id_savedsize; /* file size saved during rollback */ struct workhead id_pendinghd; /* entries awaiting directory write */ struct workhead id_bufwait; /* operations after inode written */ + int id_unused; struct workhead id_inowait; /* operations waiting inode update */ struct allocdirectlst id_inoupdt; /* updates before inode written */ struct allocdirectlst id_newinoupdt; /* updates when inode written */ *** ffs_softdep.c Sun Mar 2 14:34:33 2003 --- ffs_softdep.c.new Sun Mar 2 14:56:41 2003 *************** *** 1012,1017 **** --- 1012,1018 ---- num_inodedep += 1; MALLOC(inodedep, struct inodedep *, sizeof(struct inodedep), M_INODEDEP, M_SOFTDEP_FLAGS); + inodedep->id_unused = 0x12345678; inodedep->id_list.wk_type = D_INODEDEP; inodedep->id_fs = fs; inodedep->id_ino = inum; *************** *** 2097,2102 **** --- 2098,2106 ---- inodedep->id_nlinkdelta != 0 || inodedep->id_savedino != NULL) return (0); LIST_REMOVE(inodedep, id_hash); + if (inodedep->id_unused != 0x12345678) + printf("free_inodedep: trashed memory 0x%x\n", + inodedep->id_unused); WORKITEM_FREE(inodedep, D_INODEDEP); num_inodedep -= 1; return (1); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-src" in the body of the message