Date: Mon, 27 Jun 2011 12:19:47 -0700 From: mdf@FreeBSD.org To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-current@freebsd.org, freebsd-stable@freebsd.org, Warner Losh <imp@bsdimp.com> Subject: Re: kern.sync_on_panic Message-ID: <BANLkTi=UcmXqZLqmU3E4HqByHX1QewHuQQ@mail.gmail.com> In-Reply-To: <4E08568E.4060309@FreeBSD.org> References: <4E05F582.2010500@FreeBSD.org> <6C42CE07-9298-444A-8094-9C60384CA4F1@bsdimp.com> <4E08568E.4060309@FreeBSD.org>
index | next in thread | previous in thread | raw e-mail
On Mon, Jun 27, 2011 at 3:08 AM, Andriy Gapon <avg@freebsd.org> wrote: > on 26/06/2011 08:51 Warner Losh said the following: >> >> On Jun 25, 2011, at 8:49 AM, Andriy Gapon wrote: >>> Does anybody actually use kern.sync_on_panic tunable/sysctl? If yes, then >>> in what circumstances do you need it? That is, why any other alternative >>> doesn't work for you? Like: 1. remounting filesystems R/O before panic if >>> you knowingly provoke it for testing 2. using netboot for your test system >>> 3. using su+j, gjournal or a different filesystem altogether 4. using fsck >>> after reboot >>> >>> It seems to me that syncing filesystems in panic context is an adventure. >>> And it may become even more of an adventure if we introduce code that >>> completely stops scheduler in and after panic. >> >> I've used it in the past when I was developing a device driver that was in >> the late stages of maturing. Since all the panics in the system were when >> the driver dereferenced NULL in that driver, sync was safe because all the >> data structures were sane except the aforementioned driver. >> >> (1) It was a production system, and everything that could be was already >> mounted r/w. However, some small, but every critical, amount of data was >> still r/w and it was very important to not lose this data. Production here >> likely should be in quotes, because it was in the late stages of >> testing/validation. The problem was without this sometimes the saved state >> of the GPS receiver and other hardware would wind up being zero, which meant >> that we'd have to do a cold start which cost us a few hours of time. At the >> time I was doing this, we saw zero files a couple times a day without this >> turned on. (2) netbooting wasn't an option since we were qualifying a >> non-netbooting system. (3) these weren't available at the time, but the goal >> was to prevent data loss, not to necessarily have to avoid fsck on boot. (4) >> Data loss without it. >> >> Now, I'll be the first to admit this has been a few years, and I haven't done >> a fresh evaluation to see if things are still safe. I'll also be the first >> to admit that this was a useful debugging setting late in development, and >> not in production. I'm also the first to admit this isn't what I'd call a >> very wide-spread case. But it did come in very handy when chasing a few bugs >> to be able to do 10 panic/reboot cycles an hour rather than 2 a day. > > A fine enough use-case for me. I guess the problem ultimately boiled down to > peculiarities of UFS behavior, but still... > However, please be aware that sync_on_panic might get broken when/if we start > stopping scheduler in panic. The entirety of the sync code should be a subroutine in vfs_bio.c so the 'buf' variable is static to the file. At that point it would be reasonable to explicitly call it at the beginning of panic(9) for the sync-on-panic case, either before IPIing the other CPUs, or at least before entering the critical section that prevents the scheduler from running. Cheers, matthewhome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTi=UcmXqZLqmU3E4HqByHX1QewHuQQ>
