From owner-freebsd-current@FreeBSD.ORG Mon Jun 27 10:08:24 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E1C8106564A; Mon, 27 Jun 2011 10:08:24 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4FEC58FC16; Mon, 27 Jun 2011 10:08:22 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA20175; Mon, 27 Jun 2011 13:08:14 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Qb8jm-0001ue-DE; Mon, 27 Jun 2011 13:08:14 +0300 Message-ID: <4E08568E.4060309@FreeBSD.org> Date: Mon, 27 Jun 2011 13:08:14 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110503 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: Warner Losh References: <4E05F582.2010500@FreeBSD.org> <6C42CE07-9298-444A-8094-9C60384CA4F1@bsdimp.com> In-Reply-To: <6C42CE07-9298-444A-8094-9C60384CA4F1@bsdimp.com> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: kern.sync_on_panic X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2011 10:08:24 -0000 on 26/06/2011 08:51 Warner Losh said the following: > > On Jun 25, 2011, at 8:49 AM, Andriy Gapon wrote: >> Does anybody actually use kern.sync_on_panic tunable/sysctl? If yes, then >> in what circumstances do you need it? That is, why any other alternative >> doesn't work for you? Like: 1. remounting filesystems R/O before panic if >> you knowingly provoke it for testing 2. using netboot for your test system >> 3. using su+j, gjournal or a different filesystem altogether 4. using fsck >> after reboot >> >> It seems to me that syncing filesystems in panic context is an adventure. >> And it may become even more of an adventure if we introduce code that >> completely stops scheduler in and after panic. > > I've used it in the past when I was developing a device driver that was in > the late stages of maturing. Since all the panics in the system were when > the driver dereferenced NULL in that driver, sync was safe because all the > data structures were sane except the aforementioned driver. > > (1) It was a production system, and everything that could be was already > mounted r/w. However, some small, but every critical, amount of data was > still r/w and it was very important to not lose this data. Production here > likely should be in quotes, because it was in the late stages of > testing/validation. The problem was without this sometimes the saved state > of the GPS receiver and other hardware would wind up being zero, which meant > that we'd have to do a cold start which cost us a few hours of time. At the > time I was doing this, we saw zero files a couple times a day without this > turned on. (2) netbooting wasn't an option since we were qualifying a > non-netbooting system. (3) these weren't available at the time, but the goal > was to prevent data loss, not to necessarily have to avoid fsck on boot. (4) > Data loss without it. > > Now, I'll be the first to admit this has been a few years, and I haven't done > a fresh evaluation to see if things are still safe. I'll also be the first > to admit that this was a useful debugging setting late in development, and > not in production. I'm also the first to admit this isn't what I'd call a > very wide-spread case. But it did come in very handy when chasing a few bugs > to be able to do 10 panic/reboot cycles an hour rather than 2 a day. A fine enough use-case for me. I guess the problem ultimately boiled down to peculiarities of UFS behavior, but still... However, please be aware that sync_on_panic might get broken when/if we start stopping scheduler in panic. -- Andriy Gapon