FreeBSD Mail Archives

Date:      Mon, 27 Jun 2011 12:19:47 -0700
From:      mdf@FreeBSD.org
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-current@freebsd.org, freebsd-stable@freebsd.org, Warner Losh <imp@bsdimp.com>
Subject:   Re: kern.sync_on_panic
Message-ID:  <BANLkTi=UcmXqZLqmU3E4HqByHX1QewHuQQ@mail.gmail.com>
In-Reply-To: <4E08568E.4060309@FreeBSD.org>
References:  <4E05F582.2010500@FreeBSD.org> <6C42CE07-9298-444A-8094-9C60384CA4F1@bsdimp.com> <4E08568E.4060309@FreeBSD.org>

index | next in thread | previous in thread | raw e-mail


On Mon, Jun 27, 2011 at 3:08 AM, Andriy Gapon <avg@freebsd.org> wrote:
> on 26/06/2011 08:51 Warner Losh said the following:
>>
>> On Jun 25, 2011, at 8:49 AM, Andriy Gapon wrote:
>>> Does anybody actually use kern.sync_on_panic tunable/sysctl? If yes, then
>>> in what circumstances do you need it? That is, why any other alternative
>>> doesn't work for you? Like: 1. remounting filesystems R/O before panic if
>>> you knowingly provoke it for testing 2. using netboot for your test system
>>> 3. using su+j, gjournal or a different filesystem altogether 4. using fsck
>>> after reboot
>>>
>>> It seems to me that syncing filesystems in panic context is an adventure.
>>> And it may become even more of an adventure if we introduce code that
>>> completely stops scheduler in and after panic.
>>
>> I've used it in the past when I was developing a device driver that was in
>> the late stages of maturing. �Since all the panics in the system were when
>> the driver dereferenced NULL in that driver, sync was safe because all the
>> data structures were sane except the aforementioned driver.
>>
>> (1) It was a production system, and everything that could be was already
>> mounted r/w. �However, some small, but every critical, amount of data was
>> still r/w and it was very important to not lose this data. �Production here
>> likely should be in quotes, because it was in the late stages of
>> testing/validation. �The problem was without this sometimes the saved state
>> of the GPS receiver and other hardware would wind up being zero, which meant
>> that we'd have to do a cold start which cost us a few hours of time. �At the
>> time I was doing this, we saw zero files a couple times a day without this
>> turned on. (2) netbooting wasn't an option since we were qualifying a
>> non-netbooting system. (3) these weren't available at the time, but the goal
>> was to prevent data loss, not to necessarily have to avoid fsck on boot. (4)
>> Data loss without it.
>>
>> Now, I'll be the first to admit this has been a few years, and I haven't done
>> a fresh evaluation to see if things are still safe. �I'll also be the first
>> to admit that this was a useful debugging setting late in development, and
>> not in production. �I'm also the first to admit this isn't what I'd call a
>> very wide-spread case. �But it did come in very handy when chasing a few bugs
>> to be able to do 10 panic/reboot cycles an hour rather than 2 a day.
>
> A fine enough use-case for me. �I guess the problem ultimately boiled down to
> peculiarities of UFS behavior, but still...
> However, please be aware that sync_on_panic might get broken when/if we start
> stopping scheduler in panic.

The entirety of the sync code should be a subroutine in vfs_bio.c so
the 'buf' variable is static to the file.  At that point it would be
reasonable to explicitly call it at the beginning of panic(9) for the
sync-on-panic case, either before IPIing the other CPUs, or at least
before entering the critical section that prevents the scheduler from
running.

Cheers,
matthew

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTi=UcmXqZLqmU3E4HqByHX1QewHuQQ>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation