Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Oct 2010 20:25:42 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        FreeBSD Current <current@freebsd.org>, freebsd-net@freebsd.org, Sergey Kandaurov <pluknet@freebsd.org>, Jack F Vogel <jfv@freebsd.org>, Ryan Stone <rstone@sandvine.com>, Ryan Stone <rysto32@gmail.com>, Ed Maste <emaste@sandvine.com>
Subject:   Re: [PATCH] Netdump for review and testing -- preliminary version
Message-ID:  <alpine.BSF.2.00.1010152019450.83418@fledge.watson.org>
In-Reply-To: <AANLkTimusir1uCE_uxS0uRQCa4rgm_%2B26duep3%2Bo1XUH@mail.gmail.com>
References:  <AANLkTikA5OUYD1A9pqCqVEZ5qk%2BVECq8x-fnRXnpp0KE@mail.gmail.com> <AANLkTikau6omhWrXVM13zonFEPCxXM%2B8EqJauovDu0OU@mail.gmail.com> <alpine.BSF.2.00.1010090121310.1232@fledge.watson.org> <AANLkTimisSojDg2z_f1_v71evfooVdPQ44eu2Thhrf3O@mail.gmail.com> <C73FFD46-80B0-44F0-9A19-2B047C285134@freebsd.org> <AANLkTimLnRsa4v=A3Ui-1hKiVc5YLwkBND4NOmT4t%2BtB@mail.gmail.com> <15387E38-1E6C-4347-BEA1-61AEE31B5544@freebsd.org> <AANLkTimusir1uCE_uxS0uRQCa4rgm_%2B26duep3%2Bo1XUH@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 14 Oct 2010, Attilio Rao wrote:

>> No, what I'm saying is: UMA needs to not call its drain handlers, and 
>> ideally not call into VM to fill slabs, from the dumping context. That's 
>> easy to implement and will cause the dump to fail rather than causing the 
>> system to hang.
>
> My point is, however, still the same: that should not happen just for the 
> netdump specific case but for all the dumping/KDB/panic cases (I know it is 
> unlikely current code !netdump calls into UMA but it is not an established 
> pre-requisite and may still happen that some added code does). I still see 
> this as a weakness on the infrastructure, independently from netdump. I can 
> see that your point is that it is vital to netdump correct behaviour though, 
> so I'd wonder if it worths fixing it now or later.

Quite a bit of our kernel and dumping infrastructure special cases debugging 
and dumping behavior to avoid sources of non-robustness.  For example, serial 
drivers avoid locking, and for disk dumps we bypass GEOM to avoid the memory 
allocation, freeing, and threading that it depends on.

The goal here is to be robust when handling dumps: hanging is worse than not 
dumping, since you won't get the dump either way, and if you don't reboot then 
the system requires manual intervention to recover.  Example of things that 
are critical to avoid include:

- The dumping thread tripping over locks held by the panicked thread, or by
   another now-suspended thread, leading to deadlock against a suspended
   thread.

- Corrupting dumps by increasing concurrency in the panic case.  We ran into a
   case a year or two ago where changing VM state during the dump on amd64
   caused file system corruption as the dump code assumed that the space
   required for a dump didn't change while dumping took place.

Any code dependency we add in the panic / KDB / dump path is one more risk 
that we don't successfully dump and reboot, so we need to minimize that code.

Robert



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1010152019450.83418>