From owner-freebsd-stable@FreeBSD.ORG Sun Aug 18 11:19:10 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EF097F4C for ; Sun, 18 Aug 2013 11:19:09 +0000 (UTC) (envelope-from kamikaze@bsdforen.de) Received: from mail.server1.bsdforen.de (bsdforen.de [82.193.243.81]) by mx1.freebsd.org (Postfix) with ESMTP id 75EFA21CF for ; Sun, 18 Aug 2013 11:19:08 +0000 (UTC) Received: from mobileKamikaze.norad (HSI-KBW-134-3-231-194.hsi14.kabel-badenwuerttemberg.de [134.3.231.194]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.server1.bsdforen.de (Postfix) with ESMTPSA id EFD5886226; Sun, 18 Aug 2013 13:19:06 +0200 (CEST) Message-ID: <5210ADA9.4000506@bsdforen.de> Date: Sun, 18 Aug 2013 13:19:05 +0200 From: Dominic Fandrey MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: stopping amd causes a freeze References: <51ED0060.2050502@bsdforen.de> <20130722100720.GI5991@kib.kiev.ua> <51F0DA4B.3000809@bsdforen.de> <20130725100037.GM5991@kib.kiev.ua> <51F2AD8C.1000003@bsdforen.de> <51F385CE.1030606@bsdforen.de> <20130728062403.GD4972@kib.kiev.ua> In-Reply-To: <20130728062403.GD4972@kib.kiev.ua> Content-Type: text/plain; charset=ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Aug 2013 11:19:10 -0000 On 28/07/2013 08:24, Konstantin Belousov wrote: > On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote: >> On 26/07/2013 19:10, Dominic Fandrey wrote: >>> On 25/07/2013 12:00, Konstantin Belousov wrote: >>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote: >>>>> On 22/07/2013 12:07, Konstantin Belousov wrote: >>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote: >>>>>>> ... >>>>>>> >>>>>>> I run amd through sysutils/automounter, which is a scripting solution >>>>>>> that generates an amd.map file based on encountered devices and devd >>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated >>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze. >>>>>>> >>>>>>> Nothing was mounted (by amd) during the last freeze. >>>>>>> >>>>>>> ... >>>>>> >>>>>> Are you sure that the machine did not paniced ? Do you have serial console ? >>>>>> >>>>>> The amd(8) locks itself into memory, most likely due to the fear of >>>>>> deadlock. There are some known issues with user wirings in stable/9. >>>>>> If the problem you see is indeed due to wiring, you might try to apply >>>>>> r253187-r253191. >>>>> >>>>> I tried that. Applying the diff was straightforward enough. But the >>>>> resulting kernel paniced as soon as it tried to mount the root fs. >>>> You did provided a useful info to diagnose the issue. >>>> >>>> Patch should keep KBI compatible, but, just in case, if you have any >>>> third-party module, rebuild it. >>>> >>>>> >>>>> So I'll wait for the MFC from someone who knows what he/she is doing. >>>> >>>> Patch below booted for me, and I run some sanity check tests for the >>>> mlockall(2), which also did not resulted in misbehaviour. >>>> >>> >>> Your patch applied cleanly and the system booted with the resulting >>> kernel. >>> >>> Amd exhibits several very strange behaviours. ... >> >> I can verify the whole thing with a clean world and kernel. >> >> This time I'll concentrate on the first instance of amd: >> >> # tail -n3 /var/log/messages >> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding >> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding >> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times >> >> The process, it turns out, simply doesn't exist. There is another >> process, though: >> # ps auxww | grep -F sbin/amd >> root 5869 0.0 0.1 12036 8020 ?? S 10:08am 0:00.01 /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 /var/run/automounter.amd.mnt /var/run/automounter.amd.map >> >> # cat /var/run/automounter.amd.pid >> 5868 >> >> Here is what I think happens, amd forks a subprocess and the main >> process, silently dies after it wrote its pidfile. > Nothing dies silently. Either process was killed by signal, or it > exited with the explicit call to exit(2). In the first case, default > kernel settings of kern.logsigexit should make a record in the syslog. > The machdep.uprintf_signal might be also useful, but not for daemons. Well, it finally turned out, that amd came up in this broken state with missing processes because rpcbind wasn't running. I think it would be a good idea for amd to fail with a bit of noise instead of coming up broken, causing the kernel to spam syslog, and confusing the user. At this point I'd usually pull whoever works on amd into the conversation, but the most recent change to src/contrib/amd is 4 years old. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?