From owner-freebsd-stable@FreeBSD.ORG Fri Dec 18 08:41:59 2009 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2117F106566C; Fri, 18 Dec 2009 08:41:59 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id A076C8FC27; Fri, 18 Dec 2009 08:41:58 +0000 (UTC) Received: from outgoing.leidinger.net (pD954F3E8.dip.t-dialin.net [217.84.243.232]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 0067084B3; Fri, 18 Dec 2009 09:41:52 +0100 (CET) Received: from webmail.leidinger.net (webmail.leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 62E812C3C47; Fri, 18 Dec 2009 09:41:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=Leidinger.net; s=outgoing-alex; t=1261125709; bh=7Kipapgy3rweeTfUUAQOKrhXCQloAUE2qdwHXgHuMYg=; h=Message-ID:Date:From:To:Cc:Subject:References:In-Reply-To: MIME-Version:Content-Type:Content-Transfer-Encoding; b=3SsbuwRoWjDZf4oS8QfJagFvp2JGsjctzRHSdE/QbYkd7vkKU7FYACdT+2n4rNdTh /UQ5RFjERECkCPFQIDICx1vDsn+A2x8Endj1pPP09pYuYgkMWoSZeicpZkFzhaFxIs gc8OlAyg2TpQTgwbnitEfCnnY1uWUd6MkojxQhSyj2dhIz+qqwk0MaJEnzZ5093XWb gCmrP4qQGWk/NGHDxrgGzwAm8WQzrCJi1k/Q3aefDdNAWHezoFxlcbrfo8/tMyGzQH 8ZXLfI2oppfWX2NPlgvq7EbCTofFcrQ4eaP30oBj5LHBZh/LCWVPXJxuZMXz9Udbn0 C0TyREbDmDATw== Received: (from www@localhost) by webmail.leidinger.net (8.14.3/8.13.8/Submit) id nBI8fmuN080401; Fri, 18 Dec 2009 09:41:48 +0100 (CET) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Fri, 18 Dec 2009 09:41:48 +0100 Message-ID: <20091218094148.201530v5i6vmlrgg@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Fri, 18 Dec 2009 09:41:48 +0100 From: Alexander Leidinger To: Boris Samorodov References: <20091215153543.2686145v583um280@webmail.leidinger.net> <55734559@ipt.ru> In-Reply-To: <55734559@ipt.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Internet Messaging Program (IMP) H3 (4.3.5) / FreeBSD-8.0 X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 0067084B3.2A63B X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0.86, required 6, autolearn=disabled, ALL_TRUSTED -1.44, DKIM_SIGNED 0.00, DKIM_VERIFIED -0.00, MANGLED_SEX 2.30) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1261730514.35433@ha/rK0+ivgtF+xuNay5rLg X-EBL-Spam-Status: No Cc: stable@freebsd.org, ivoras@freebsd.org Subject: Re: Stability problems with 7-stable (after 7.1 -> 7.2 -> 7-stable) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2009 08:41:59 -0000 Quoting Boris Samorodov (from Thu, 17 Dec 2009 20:55:44 +0300): > Ivan Voras writes: >> Alexander Leidinger wrote: >>> Hi, >>> >>> please CC me on replies. > > Seems you were not CCed... I'm now subscribed to stable@, thanks for forwarding this. >>> I have a system which was at 7.1-pX. After the update to 7.2-p5 it >>> started to exhibit deadlocks after some minutes of uptime. >>> >>> With 7.1 (generic kernel) it was running fine, with 7.2 generic the >>> problems started directly. >>> >>> The system is now at 7-stable with a custom kernel >>> (http://www.Leidinger.net/test/ALCATRAZ), basically generic without >>> unneeded drivers plus witness/invariants/sw-watchdog. >>> >>> The system is an AMD Dual Core with NVidia MCP61 chipset >>> (http://www.Leidinger.net/test/dmesg.alcatraz), 2 GB RAM, 2 >>> harddisks and FreeBSD 32bit install. >> >> Some generic things to try: >> - did you monitor the system with something (top or systat >> -vm) to see if there is something unusual, like interrupt storms? When I had the initial problems, I asked for a KVM-switch to be connected to the system (not a free service). In SU mode I didn't see any problem. When starting the system but not the jails, I didn't see any problem (cvsup/buildworld/...). When I started the jails, I started to see the problems. >> - no physical access is a problem; If you do manage it, I'd >> say try running single user for some time with systat -vm just to see >> what happens. This is not an option now. >> I would not trust ZFS in 7-stable since it lags a bit behind patches >> done to 8 but 7.2 should be fine - at least I don't have any such >> problems with it (though no AMD boxes to test them with it). Ivan, the system started out to be without ZFS, just after I started to see deadlocks I switched to ZFS. This _improved_ the situation. Now the system survives between 3h and about 11h without a deadlock. If I run every 5 minutes a script which logs 4 text lines to the root (UFS) and runs 3x sync + sleep 5 + 3x sync the frequency of deadlocks increases. >> If you haven't updated your ZFS pools, I'd suggest reverting back to >> 7.1, then building or downloading an 8.0 kernel and try it with 7.1 >> userland (reboot -k ...) simply to see if it helps. IIRC there where KBI changes (ifconfig?) which prevents me to go back to 7.1 without access to the console. As this is a production machine (it hosts not only my blog/website/mails, but stuff from other persons too), the goal is to stabilize this system now. Kib analyzed 2 crashdumps I had (watchdog triggered) and he thinks they are because of ZFS deadlocks. So the initial problem (without ZFS) is not know yet, but this info will hopefully allow to stabilize the system further (see also my mail about at least 57 unmerged ZFS patches). Bye, Alexander. -- Universities are places of knowledge. The freshman each bring a little in with them, and the seniors take none away, so knowledge accumulates. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137