From owner-freebsd-current@FreeBSD.ORG Wed Nov 12 21:27:54 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 930DD1065676 for ; Wed, 12 Nov 2008 21:27:54 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA01.emeryville.ca.mail.comcast.net (qmta01.emeryville.ca.mail.comcast.net [76.96.30.16]) by mx1.freebsd.org (Postfix) with ESMTP id 64B9E8FC19 for ; Wed, 12 Nov 2008 21:27:54 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA11.emeryville.ca.mail.comcast.net ([76.96.30.36]) by QMTA01.emeryville.ca.mail.comcast.net with comcast id eKAF1a00M0mlR8UA1MTups; Wed, 12 Nov 2008 21:27:54 +0000 Received: from koitsu.dyndns.org ([69.181.141.110]) by OMTA11.emeryville.ca.mail.comcast.net with comcast id eMTq1a00S2P6wsM8XMTqPm; Wed, 12 Nov 2008 21:27:51 +0000 X-Authority-Analysis: v=1.0 c=1 a=QycZ5dHgAAAA:8 a=4gly3xOMEbJdIkLzrQUA:9 a=kzvec7GOa2N6VVSVESEA:7 a=D2W3T4IwZuoi9KRaOZC7f1ByGcUA:4 a=EoioJ0NPDVgA:10 a=SV7veod9ZcQA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 542475C19; Wed, 12 Nov 2008 13:27:50 -0800 (PST) Date: Wed, 12 Nov 2008 13:27:50 -0800 From: Jeremy Chadwick To: Attilio Rao Message-ID: <20081112212750.GA2129@icarus.home.lan> References: <491AEBB5.8010001@zedat.fu-berlin.de> <20081112154240.GA28818@icarus.home.lan> <3bbf2fe10811120744hd740388s25e7413e84bbb8c1@mail.gmail.com> <20081112154744.GA28943@icarus.home.lan> <3bbf2fe10811120752k5e42b912nd0933771696519e0@mail.gmail.com> <20081112161644.GA98426@icarus.home.lan> <3bbf2fe10811120820xeb54b4fj4f4c5e285670c29a@mail.gmail.com> <20081112182148.GA1308@icarus.home.lan> <3bbf2fe10811121121q29b60f19va9be4808b962259a@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3bbf2fe10811121121q29b60f19va9be4808b962259a@mail.gmail.com> User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-current@freebsd.org, "O. Hartmann" Subject: Re: fsck_ufs after every reboot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Nov 2008 21:27:54 -0000 On Wed, Nov 12, 2008 at 08:21:15PM +0100, Attilio Rao wrote: > 2008/11/12, Jeremy Chadwick : > > On Wed, Nov 12, 2008 at 05:20:56PM +0100, Attilio Rao wrote: > > > 2008/11/12, Jeremy Chadwick : > > > > On Wed, Nov 12, 2008 at 04:52:59PM +0100, Attilio Rao wrote: > > > > > 2008/11/12, Jeremy Chadwick : > > > > > > On Wed, Nov 12, 2008 at 04:44:52PM +0100, Attilio Rao wrote: > > > > > > > 2008/11/12, Jeremy Chadwick : > > > > > > > > On Wed, Nov 12, 2008 at 02:44:05PM +0000, O. Hartmann wrote: > > > > > > > > > I run FreeBSD 8.0/AMD64 on two boxes (one is a UP older AMD64 Athlon64 > > > > > > > > > 3500, other an 8-Core Dell Poweredge 1950). > > > > > > > > > > > > > > > > > > After nearly every reboot the box does fsck on all UFS2 filesystems. In > > > > > > > > > most cases, while shuting down, the box reports about not willing to die > > > > > > > > > processes and after a reboot, the filesystems are unclean. > > > > > > > > > > > > > > > > > > Is this a common problem at the moment or special? > > > > > > > > > > > > > > > > > > > > > > > > I've seen this happen on my CURRENT box at home when using "shutdown -p > > > > > > > > now". Instead of the box powering off, it would lock up near the very > > > > > > > > end of the shutdown process (before marking the filesystems clean). > > > > > > > > > > > > > > > > Oddly, this works fine in RELENG_7, so I'm guessing there's some ACPI > > > > > > > > development going on (I can't complain, it *is* CURRENT). > > > > > > > > > > > > > > This could cames after my VFS works. > > > > > > > Could you spend some time on this? > > > > > > > I will tell you what to look at. > > > > > > > > > > > > > > > > > > Sure thing! > > > > > > > > > > > > Let me know what I need to do to help, what information you need, or if > > > > > > I should revert some commits to see if the behaviour changes. Build > > > > > > date of the box (src-all csup'd about 45 minutes prior to the build > > > > > > date): > > > > > > > > > > > > FreeBSD icarus.home.lan 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri Nov 7 14:19:03 PST 2008 root@icarus.home.lan:/usr/obj/usr/src/sys/X7SBA_CURRENT_amd64 amd64 > > > > > > > > > > Is this reproducible? > > > > > > > > > > > > I don't have an answer at this time. I've only performed "shutdown -p > > > > now" on this box twice since running CURRENT, and both times the problem > > > > described occurred. > > > > > > > > > > > > > I need you build a kernel with following options: > > > > > INVARIANT_SUPPORT > > > > > INVARIANTS > > > > > DEBUG_VFS_LOCKS > > > > > WITNESS > > > > > and without WITNESS_SKIPSPIN > > > > > > > > > > > > Will do. Relevant options I use: > > > > > > > > makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols > > > > options SCHED_ULE # ULE scheduler > > > > options PREEMPTION # Enable kernel thread preemption > > > > options BREAK_TO_DEBUGGER # Sending a serial BREAK drops to DDB > > > > options KDB # Enable kernel debugger support > > > > options KDB_TRACE # Print stack trace automatically on panic > > > > options DDB # Support DDB > > > > options GDB # Support remote GDB > > > > options INVARIANTS # Enable calls of extra sanity checking > > > > options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS > > > > options WITNESS # Enable checks to detect deadlocks and cycles > > > > options DEBUG_VFS_LOCKS # vfs lock debugging > > > > > > > > I have physical access to the console of this machine on a regular > > > > basis. > > > > > > It's fine, great. > > > > > > And as luck would have it, I can't reproduce the problem any more. I've > > shutdown -p now'd literally 6 times in a row without any sort of lock > > up, and this is running on the old kernel. The same behaviour is now > > seen with the new kernel. > > > > So, the 2-3 times I've seen "shutdown -p now" not fully power off the > > machine were either flukes, or who knows what/why. > > > > I simply can't reproduce the problem any longer. I'm sorry. > > Can you recompile your kernel with the old option (read: not use the > old kernel, but recompile it with the old options) and see if it > hangs? Here's the behaviour and details: Old kernel, built 2008/11/07, csup'd 11/07, kernel config without WITNESS: shutdown -p now failed 2-3 times, but appears to work now. Not sure what/where the fluke was. New kernel, built 2008/11/12, csup'd 11/12, kernel config with WITNESS: shutdown -p now works. I'll try rebuilding the 2008/11/12 kernel (with the same csup sources) but without WITNESS and see if a couple shutdown -p now's work OK. I'm not sure when I'll get to this (see below) though. I'm not sure how much longer I'll be able to test CURRENT, because I keep encountering seriously broken shit (pardon my language) that I do not have the tolerance to deal with (I REALLY need to just build a 2nd FreeBSD box for my home to run CURRENT and test for folks). This is not the thread to put this in, but I do not see the point in starting a new thread about this because I guarantee people will go "looks like a local problem, sounds hardware related", especially since some others cannot reproduce it themselves, and there are some high temperature with my hardware (for unknown reasons). I went with CURRENT because I kept encountering a deadlocked kernel on RELENG_7 whenever attempting to use USB umass/da. CURRENT has the same problem as RELENG_7 in this regard, even with USB4BSD. However, the latest USB4BSD busdma patch fixes that issue, but there are times during file copies (USB write operation) where the copy literally will sit for 20-30 full seconds doing nothing, yet dd speed/bandwidth tests show no sign of such. Possibly this "doing nothing for 20-30 seconds" is a symptom of the next thing I'm seeing. I've done a write-up on this completely bizarre problem where processes on CURRENT are getting wedged for random amounts of time, chewing up very large amounts of processor time (between 60-100%) on my dual-core system; load average sky-rockets (above 7.xx), and coretemp(4) on the cores shows a tremendous increase in temperature (indicating the processors *really are* getting hammered by something). Yet ktrace and truss on the processes show nothing happening, and they flip between "-" and "wait" state in ps. I don't know what to do about it though, and after reading my own write-up, I realise the visual symptoms are so bizarre that it can't be taken seriously. But there's more: even when the system is sitting idle (no processes in that weird state), I believe the overall temperature of my cores is 8-10C higher than that of RELENG_7 (I was seeing core temperatures of 32-34C when idling on RELENG_7, and in CURRENT I'm seeing 40-42C; and without powerd in CURRENT, I see temps of 50-51C while idling). But I should be fair with regards to this paragraph: I need to do a *full reinstall* of RELENG_7 and gather statistics/evidence before stating "yeah CURRENT is churning CPU and increasing CPU temps". For all I know there could be something evil going on with my hardware that has nothing to do with FreeBSD, at least with regards to this issue. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |