Date: Sun, 16 Dec 2012 15:07:17 +1100 From: Peter Jeremy <peter@rulingia.com> To: src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r244112 - head/sys/kern Message-ID: <20121216040717.GG35245@server.rulingia.com> In-Reply-To: <20121215205202.GF1411@garage.freebsd.pl> References: <201212110708.qBB78EWx025288@svn.freebsd.org> <201212121046.43706.jhb@freebsd.org> <CAJ-Vmo=U04GX%2BZyKuzXLwV%2BPpzU6_dm5BCmL=DWfsmhTVAR%2BsA@mail.gmail.com> <201212121658.49048.jhb@freebsd.org> <50C90567.8080406@FreeBSD.org> <50C909BD.9090709@mu.org> <50C91B32.4080904@FreeBSD.org> <20121215205202.GF1411@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
--JgQwtEuHJzHdouWu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [pruning the CC list] On 2012-Dec-15 21:52:03 +0100, Pawel Jakub Dawidek <pjd@FreeBSD.org> wrote: >On Wed, Dec 12, 2012 at 04:02:58PM -0800, Navdeep Parhar wrote: >> A KASSERT() really is for a condition that should never happen. and can't be practically recovered from. >I have sort of mixed feelings about this change, but in reality we have >three cases: > >1. Fatal conditions that shouldn't happen, but may happen for some > reason and we definiately want to stop running (corrupted file system > metadata that can mess up the file system more badly). For those we have > direct panic(9) calls. > >2. Fatal conditions that cannot happen and for those we have KASSERT(9). I don't see the difference between these two cases. In both of them, the system has entered a state that the designer/programmer didn't envisage and can't recover from. The best option is to abort as quickly to minimise further corruption. >3. Non-fatal conditions that cannot happen, which we have no way to > report more gracefully and we do it through KASSERT(9). "Cannot" needs to be quoted here because you can't hit them unless they actually do happen. And: 4. Unexpected conditions that should be non-fatal but no-one has written the code to recover so someone has included a KASSERT(9) as a place-holder. I suspect a lot of the heat in this discussion is associated with points 3 & 4 - unexpected conditions that the code can't cope with. >It is annoying to run INVARIANTS kernel and trigger 3. I had this >problem few times, for example in TCP/IP stack. It turned out to be >non-fatal and KASSERT(9) was there to understand the code better. >I'd much prefer to see it logged than to panic my box. Of course it is >also sometimes hard to judge if it is fatal or not. > >I use plenty of assertions in my code to catch bugs early, but >assertions like any other part of the code might have bugs and during >rapid development they help a lot when the code around is changing a lot >and some earlier assumptions are no longer valid. Many of those >assertions are non-fatal. IMHO, having lots of assertions (ala design-by-contract) can make it easier to understand a function's interfaces. Of course, if there are errors in the interface conditions, you can wind up with spurious panic()s. >For me this problem is pretty complex, because: > >1. It would be nice to have a macro to test non-fatal conditions, This should be fairly trivial to implement - test the specified condition and if it's false print out a rate-limited message and backtrace. > but it > is hard to tell sometimes if it is fatal or not. As a rule of thumb, if the code can't handle the condition not being met then it's fatal - this includes my point 4 above. If the code can recover (which may require it to take drastic action like aborting a process or closing a socket) then it's not fatal. OTOH, IMHO, not holding an expected lock _is_ fatal, even if there's no immediately obvious downside to just continuing. >3. Logging non-fatal "assertions" might make them go unnoticed, but we > currently don't enable kernel dumps by default, so when panic occurs > user has no idea what has happend, especially if he is in X. Logging > would give better chance to actually notice the problem currently. Rebooting does mean that we restore the system to a sane state, even if we don't record the previous state. >In my opinion we should do the following: > >1. Leave the option to make KASSERTs non-fatal, but log big fat warning > that this is development feature and should not be used in production. I disagree with this but not strongly enough to say it should be backed out. IMHO, an active KASSERT() should be fatal. >2. Introduce handy macro that would log the problem, but won't panic the > box for non-fatal conditions, just like we do with LORs. Agreed. >3. Enable kernel dumps by default. The main obstacle to do that was lack > of a way to limit number of kernel dumps, which could lead to filling > /var/ after hitting some panic/reboot cycle. I agree that it's reasonable to enable crashdumps by default. Note that /var/crash/minfree is supposed to stop /var filling up. >My contribution to solve this is implementation of 3: > http://people.freebsd.org/~pjd/patches/savecore.patch Adding a "maxdumps" parameter seems a good idea. The behaviour when it hits the limit is less clear - throwing away the latest crashdump is probably as easy to justify as throwing away the oldest one. A further downside to enabling crashdumps is the time overhead - initially writing the dump, copying it to /var/crash and running crashinfo. This all directly increases the length of the outage. --=20 Peter Jeremy --JgQwtEuHJzHdouWu Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlDNSPQACgkQ/opHv/APuIdZvwCgtapsSSdtUfzuzuHhUI/O+j5O 5FQAn2Ua9j5BtgC8F90i5a2WewS+jHg5 =aOF4 -----END PGP SIGNATURE----- --JgQwtEuHJzHdouWu--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121216040717.GG35245>