From owner-svn-src-head@FreeBSD.ORG  Sun Dec 16 04:07:35 2012
Return-Path: <owner-svn-src-head@FreeBSD.ORG>
Delivered-To: svn-src-head@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 648C7A29;
 Sun, 16 Dec 2012 04:07:35 +0000 (UTC)
 (envelope-from peter@rulingia.com)
Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au
 [122.100.2.194])
 by mx1.freebsd.org (Postfix) with ESMTP id AD5EE8FC0A;
 Sun, 16 Dec 2012 04:07:33 +0000 (UTC)
Received: from server.rulingia.com
 (c220-239-237-241.belrs5.nsw.optusnet.com.au [220.239.237.241])
 by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qBG47QqQ028923
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Sun, 16 Dec 2012 15:07:26 +1100 (EST)
 (envelope-from peter@rulingia.com)
X-Bogosity: Ham, spamicity=0.000000
Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1])
 by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qBG47H0H030144
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 16 Dec 2012 15:07:18 +1100 (EST)
 (envelope-from peter@server.rulingia.com)
Received: (from peter@localhost)
 by server.rulingia.com (8.14.5/8.14.5/Submit) id qBG47HiU030143;
 Sun, 16 Dec 2012 15:07:17 +1100 (EST) (envelope-from peter)
Date: Sun, 16 Dec 2012 15:07:17 +1100
From: Peter Jeremy <peter@rulingia.com>
To: src-committers@FreeBSD.org, svn-src-all@FreeBSD.org,
 svn-src-head@FreeBSD.org
Subject: Re: svn commit: r244112 - head/sys/kern
Message-ID: <20121216040717.GG35245@server.rulingia.com>
References: <201212110708.qBB78EWx025288@svn.freebsd.org>
 <201212121046.43706.jhb@freebsd.org>
 <CAJ-Vmo=U04GX+ZyKuzXLwV+PpzU6_dm5BCmL=DWfsmhTVAR+sA@mail.gmail.com>
 <201212121658.49048.jhb@freebsd.org> <50C90567.8080406@FreeBSD.org>
 <50C909BD.9090709@mu.org> <50C91B32.4080904@FreeBSD.org>
 <20121215205202.GF1411@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="JgQwtEuHJzHdouWu"
Content-Disposition: inline
In-Reply-To: <20121215205202.GF1411@garage.freebsd.pl>
X-PGP-Key: http://www.rulingia.com/keys/peter.pgp
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Dec 2012 04:07:35 -0000


--JgQwtEuHJzHdouWu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

[pruning the CC list]
On 2012-Dec-15 21:52:03 +0100, Pawel Jakub Dawidek <pjd@FreeBSD.org> wrote:
>On Wed, Dec 12, 2012 at 04:02:58PM -0800, Navdeep Parhar wrote:
>> A KASSERT() really is for a condition that should never happen.

and can't be practically recovered from.

>I have sort of mixed feelings about this change, but in reality we have
>three cases:
>
>1. Fatal conditions that shouldn't happen, but may happen for some
>   reason and we definiately want to stop running (corrupted file system
>   metadata that can mess up the file system more badly). For those we have
>   direct panic(9) calls.
>
>2. Fatal conditions that cannot happen and for those we have KASSERT(9).

I don't see the difference between these two cases.  In both of them,
the system has entered a state that the designer/programmer didn't
envisage and can't recover from.  The best option is to abort as
quickly to minimise further corruption.

>3. Non-fatal conditions that cannot happen, which we have no way to
>   report more gracefully and we do it through KASSERT(9).

"Cannot" needs to be quoted here because you can't hit them unless they
actually do happen.

And:
4. Unexpected conditions that should be non-fatal but no-one has
   written the code to recover so someone has included a KASSERT(9) as
   a place-holder.

I suspect a lot of the heat in this discussion is associated with
points 3 & 4 - unexpected conditions that the code can't cope with.

>It is annoying to run INVARIANTS kernel and trigger 3. I had this
>problem few times, for example in TCP/IP stack. It turned out to be
>non-fatal and KASSERT(9) was there to understand the code better.
>I'd much prefer to see it logged than to panic my box. Of course it is
>also sometimes hard to judge if it is fatal or not.
>
>I use plenty of assertions in my code to catch bugs early, but
>assertions like any other part of the code might have bugs and during
>rapid development they help a lot when the code around is changing a lot
>and some earlier assumptions are no longer valid. Many of those
>assertions are non-fatal.

IMHO, having lots of assertions (ala design-by-contract) can make it
easier to understand a function's interfaces.  Of course, if there
are errors in the interface conditions, you can wind up with spurious
panic()s.

>For me this problem is pretty complex, because:
>
>1. It would be nice to have a macro to test non-fatal conditions,

This should be fairly trivial to implement - test the specified
condition and if it's false print out a rate-limited message and
backtrace.

> but it
>   is hard to tell sometimes if it is fatal or not.

As a rule of thumb, if the code can't handle the condition not being
met then it's fatal - this includes my point 4 above.  If the code can
recover (which may require it to take drastic action like aborting a
process or closing a socket) then it's not fatal.  OTOH, IMHO, not
holding an expected lock _is_ fatal, even if there's no immediately
obvious downside to just continuing.

>3. Logging non-fatal "assertions" might make them go unnoticed, but we
>   currently don't enable kernel dumps by default, so when panic occurs
>   user has no idea what has happend, especially if he is in X. Logging
>   would give better chance to actually notice the problem currently.

Rebooting does mean that we restore the system to a sane state, even
if we don't record the previous state.

>In my opinion we should do the following:
>
>1. Leave the option to make KASSERTs non-fatal, but log big fat warning
>   that this is development feature and should not be used in production.

I disagree with this but not strongly enough to say it should be
backed out.  IMHO, an active KASSERT() should be fatal.

>2. Introduce handy macro that would log the problem, but won't panic the
>   box for non-fatal conditions, just like we do with LORs.

Agreed.

>3. Enable kernel dumps by default. The main obstacle to do that was lack
>   of a way to limit number of kernel dumps, which could lead to filling
>   /var/ after hitting some panic/reboot cycle.

I agree that it's reasonable to enable crashdumps by default.  Note that
/var/crash/minfree is supposed to stop /var filling up.

>My contribution to solve this is implementation of 3:
>	http://people.freebsd.org/~pjd/patches/savecore.patch

Adding a "maxdumps" parameter seems a good idea.  The behaviour when
it hits the limit is less clear - throwing away the latest crashdump
is probably as easy to justify as throwing away the oldest one.

A further downside to enabling crashdumps is the time overhead -
initially writing the dump, copying it to /var/crash and running
crashinfo.  This all directly increases the length of the outage.

--=20
Peter Jeremy

--JgQwtEuHJzHdouWu
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlDNSPQACgkQ/opHv/APuIdZvwCgtapsSSdtUfzuzuHhUI/O+j5O
5FQAn2Ua9j5BtgC8F90i5a2WewS+jHg5
=aOF4
-----END PGP SIGNATURE-----

--JgQwtEuHJzHdouWu--