From owner-svn-src-head@FreeBSD.ORG Sun Dec 16 04:07:35 2012 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 648C7A29; Sun, 16 Dec 2012 04:07:35 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id AD5EE8FC0A; Sun, 16 Dec 2012 04:07:33 +0000 (UTC) Received: from server.rulingia.com (c220-239-237-241.belrs5.nsw.optusnet.com.au [220.239.237.241]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id qBG47QqQ028923 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 16 Dec 2012 15:07:26 +1100 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id qBG47H0H030144 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 16 Dec 2012 15:07:18 +1100 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id qBG47HiU030143; Sun, 16 Dec 2012 15:07:17 +1100 (EST) (envelope-from peter) Date: Sun, 16 Dec 2012 15:07:17 +1100 From: Peter Jeremy To: src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r244112 - head/sys/kern Message-ID: <20121216040717.GG35245@server.rulingia.com> References: <201212110708.qBB78EWx025288@svn.freebsd.org> <201212121046.43706.jhb@freebsd.org> <201212121658.49048.jhb@freebsd.org> <50C90567.8080406@FreeBSD.org> <50C909BD.9090709@mu.org> <50C91B32.4080904@FreeBSD.org> <20121215205202.GF1411@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="JgQwtEuHJzHdouWu" Content-Disposition: inline In-Reply-To: <20121215205202.GF1411@garage.freebsd.pl> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Dec 2012 04:07:35 -0000 --JgQwtEuHJzHdouWu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [pruning the CC list] On 2012-Dec-15 21:52:03 +0100, Pawel Jakub Dawidek wrote: >On Wed, Dec 12, 2012 at 04:02:58PM -0800, Navdeep Parhar wrote: >> A KASSERT() really is for a condition that should never happen. and can't be practically recovered from. >I have sort of mixed feelings about this change, but in reality we have >three cases: > >1. Fatal conditions that shouldn't happen, but may happen for some > reason and we definiately want to stop running (corrupted file system > metadata that can mess up the file system more badly). For those we have > direct panic(9) calls. > >2. Fatal conditions that cannot happen and for those we have KASSERT(9). I don't see the difference between these two cases. In both of them, the system has entered a state that the designer/programmer didn't envisage and can't recover from. The best option is to abort as quickly to minimise further corruption. >3. Non-fatal conditions that cannot happen, which we have no way to > report more gracefully and we do it through KASSERT(9). "Cannot" needs to be quoted here because you can't hit them unless they actually do happen. And: 4. Unexpected conditions that should be non-fatal but no-one has written the code to recover so someone has included a KASSERT(9) as a place-holder. I suspect a lot of the heat in this discussion is associated with points 3 & 4 - unexpected conditions that the code can't cope with. >It is annoying to run INVARIANTS kernel and trigger 3. I had this >problem few times, for example in TCP/IP stack. It turned out to be >non-fatal and KASSERT(9) was there to understand the code better. >I'd much prefer to see it logged than to panic my box. Of course it is >also sometimes hard to judge if it is fatal or not. > >I use plenty of assertions in my code to catch bugs early, but >assertions like any other part of the code might have bugs and during >rapid development they help a lot when the code around is changing a lot >and some earlier assumptions are no longer valid. Many of those >assertions are non-fatal. IMHO, having lots of assertions (ala design-by-contract) can make it easier to understand a function's interfaces. Of course, if there are errors in the interface conditions, you can wind up with spurious panic()s. >For me this problem is pretty complex, because: > >1. It would be nice to have a macro to test non-fatal conditions, This should be fairly trivial to implement - test the specified condition and if it's false print out a rate-limited message and backtrace. > but it > is hard to tell sometimes if it is fatal or not. As a rule of thumb, if the code can't handle the condition not being met then it's fatal - this includes my point 4 above. If the code can recover (which may require it to take drastic action like aborting a process or closing a socket) then it's not fatal. OTOH, IMHO, not holding an expected lock _is_ fatal, even if there's no immediately obvious downside to just continuing. >3. Logging non-fatal "assertions" might make them go unnoticed, but we > currently don't enable kernel dumps by default, so when panic occurs > user has no idea what has happend, especially if he is in X. Logging > would give better chance to actually notice the problem currently. Rebooting does mean that we restore the system to a sane state, even if we don't record the previous state. >In my opinion we should do the following: > >1. Leave the option to make KASSERTs non-fatal, but log big fat warning > that this is development feature and should not be used in production. I disagree with this but not strongly enough to say it should be backed out. IMHO, an active KASSERT() should be fatal. >2. Introduce handy macro that would log the problem, but won't panic the > box for non-fatal conditions, just like we do with LORs. Agreed. >3. Enable kernel dumps by default. The main obstacle to do that was lack > of a way to limit number of kernel dumps, which could lead to filling > /var/ after hitting some panic/reboot cycle. I agree that it's reasonable to enable crashdumps by default. Note that /var/crash/minfree is supposed to stop /var filling up. >My contribution to solve this is implementation of 3: > http://people.freebsd.org/~pjd/patches/savecore.patch Adding a "maxdumps" parameter seems a good idea. The behaviour when it hits the limit is less clear - throwing away the latest crashdump is probably as easy to justify as throwing away the oldest one. A further downside to enabling crashdumps is the time overhead - initially writing the dump, copying it to /var/crash and running crashinfo. This all directly increases the length of the outage. --=20 Peter Jeremy --JgQwtEuHJzHdouWu Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlDNSPQACgkQ/opHv/APuIdZvwCgtapsSSdtUfzuzuHhUI/O+j5O 5FQAn2Ua9j5BtgC8F90i5a2WewS+jHg5 =aOF4 -----END PGP SIGNATURE----- --JgQwtEuHJzHdouWu--