Date: Sun, 28 Jul 2013 12:00:32 +0300 From: Daniel Braniss <danny@cs.huji.ac.il> To: Dominic Fandrey <kamikaze@bsdforen.de> Cc: Konstantin Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org Subject: Re: stopping amd causes a freeze Message-ID: <E1V3Mq8-000FLb-27@kabab.cs.huji.ac.il> In-Reply-To: <51F4D57E.4040002@bsdforen.de> References: <51ED0060.2050502@bsdforen.de> <20130722100720.GI5991@kib.kiev.ua> <51F0DA4B.3000809@bsdforen.de> <20130725100037.GM5991@kib.kiev.ua> <51F2AD8C.1000003@bsdforen.de> <51F385CE.1030606@bsdforen.de> <20130728062403.GD4972@kib.kiev.ua> <51F4D57E.4040002@bsdforen.de>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 28/07/2013 08:24, Konstantin Belousov wrote: > > On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote: > >> On 26/07/2013 19:10, Dominic Fandrey wrote: > >>> On 25/07/2013 12:00, Konstantin Belousov wrote: > >>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote: > >>>>> On 22/07/2013 12:07, Konstantin Belousov wrote: > >>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote: > >>>>>>> ... > >>>>>>> > >>>>>>> I run amd through sysutils/automounter, which is a scripting solution > >>>>>>> that generates an amd.map file based on encountered devices and devd > >>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated > >>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze. > >>>>>>> > >>>>>>> Nothing was mounted (by amd) during the last freeze. > >>>>>>> > >>>>>>> ... > >>>>>> > >>>>>> Are you sure that the machine did not paniced ? Do you have serial console ? > >>>>>> > >>>>>> The amd(8) locks itself into memory, most likely due to the fear of > >>>>>> deadlock. There are some known issues with user wirings in stable/9. > >>>>>> If the problem you see is indeed due to wiring, you might try to apply > >>>>>> r253187-r253191. > >>>>> > >>>>> I tried that. Applying the diff was straightforward enough. But the > >>>>> resulting kernel paniced as soon as it tried to mount the root fs. > >>>> You did provided a useful info to diagnose the issue. > >>>> > >>>> Patch should keep KBI compatible, but, just in case, if you have any > >>>> third-party module, rebuild it. > >>>> > >>>>> > >>>>> So I'll wait for the MFC from someone who knows what he/she is doing. > >>>> > >>>> Patch below booted for me, and I run some sanity check tests for the > >>>> mlockall(2), which also did not resulted in misbehaviour. > >>>> > >>> > >>> Your patch applied cleanly and the system booted with the resulting > >>> kernel. > >>> > >>> Amd exhibits several very strange behaviours. ... > >> > >> I can verify the whole thing with a clean world and kernel. > >> > >> This time I'll concentrate on the first instance of amd: > >> > >> # tail -n3 /var/log/messages > >> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding > >> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding > >> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times > >> > >> The process, it turns out, simply doesn't exist. There is another > >> process, though: > >> # ps auxww | grep -F sbin/amd > >> root 5869 0.0 0.1 12036 8020 ?? S 10:08am 0:00.01 /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 /var/run/automounter.amd.mnt /var/run/automounter.amd.map > >> > >> # cat /var/run/automounter.amd.pid > >> 5868 > >> > >> Here is what I think happens, amd forks a subprocess and the main > >> process, silently dies after it wrote its pidfile. > > Nothing dies silently. Either process was killed by signal, or it > > exited with the explicit call to exit(2). In the first case, default > > kernel settings of kern.logsigexit should make a record in the syslog. > > The machdep.uprintf_signal might be also useful, but not for daemons. > > Well, after I reverted your patch I got some things in the syslog. > Sometimes amd works as expected, sometimes it dies right after starting: > Jul 28 10:19:42 mobileKamikaze kernel: pid 24217 (amd), uid 0: exited on signal 11 (core dumped) > > This is just all over confusing. just to confuse you a bit more :-) I gave up with mlockall(2) so I compiled amd statically linked. my 5 cents. danny
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1V3Mq8-000FLb-27>
