From owner-freebsd-stable@FreeBSD.ORG Wed Oct 29 09:36:46 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8CF0FE99 for ; Wed, 29 Oct 2014 09:36:46 +0000 (UTC) Received: from sinkng.sics.se (unknown [IPv6:2001:6b0:3a:1:c654:44ff:fe45:117c]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1C086C53 for ; Wed, 29 Oct 2014 09:36:45 +0000 (UTC) Received: from P142s.sics.se (P142s.sics.se [193.10.66.127]) by sinkng.sics.se (8.14.9/8.14.9) with ESMTP id s9T9ag1t064652 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 29 Oct 2014 10:36:42 +0100 (CET) (envelope-from bengta@P142s.sics.se) Received: from P142s.sics.se (localhost [127.0.0.1]) by P142s.sics.se (8.14.9/8.14.9) with ESMTP id s9T9adK3002084; Wed, 29 Oct 2014 10:36:39 +0100 (CET) (envelope-from bengta@P142s.sics.se) Received: (from bengta@localhost) by P142s.sics.se (8.14.9/8.14.9/Submit) id s9T9acw7002083; Wed, 29 Oct 2014 10:36:38 +0100 (CET) (envelope-from bengta@P142s.sics.se) From: Bengt Ahlgren To: Kevin Oberman Subject: Re: System hang on shutdown when running freebsd-update In-Reply-To: (Kevin Oberman's message of "Tue, 28 Oct 2014 20:21:08 -0700") References: <2B4EEDA7-C3D9-465A-B0C9-B5728D438077@spam.lifeforms.nl> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) Date: Wed, 29 Oct 2014 10:36:38 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: FreeBSD-STABLE Mailing List , Walter Hop X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Oct 2014 09:36:46 -0000 Kevin Oberman writes: > On Tue, Oct 28, 2014 at 3:09 PM, Walter Hop > wrote: > >> [Apologies for not replying directly to the thread; I found it at >> https://lists.freebsd.org/pipermail/freebsd-stable/2014-October/080595.h= tml >> ] >> >> I noticed this same hang after upgrading from 10.0-RELEASE to 10.1-RC3 in >> a VM running under VMware Fusion, so the problem appears still present. >> >> I could only make it happen in the single uptime just after the system w= as >> freebsd-updated from FreeBSD 10.0 to 10.1-RC3. >> >> Here is a screenshot: http://lf.ms/wait-for-reboot.png >> >> It did not make any progress after 2 hours of waiting. When restarting t= he >> VM, the disk was dirty. >> >> Some interesting facts: >> - Note "swapoff: /dev/da0p2: Cannot allocate memory" in the screenshot >> which might pose a clue. I haven=E2=80=99t seen this normally. >> - FreeBSD does respond to ping while it is busy, so it is not a complete >> "freeze". >> - The VM is at 100% CPU while this is going on. >> >> I have created a snapshot of the VM in the failed state, so maybe some >> useful information could be retrieved from it, although I don=E2=80=99t = have any >> experience with kernel debugging over VMware. >> >> Cheers, >> WH >> >> -- >> Walter Hop | PGP key: https://lifeforms.nl/pgp >> >> I am starting to suspect that some code that is needed to flush a resour= ce > that is blocking the complete shutdown is no longer available so waiting = is > not going to work. I tried a simple "shutdown now" and waited in single > user mode for a minute before "reboot". It worked fine. > > This is based on guesswork, but seems to fit the symptoms. Some more guesswork that better fit Walter's symtom than Kevin's... I have noticed that our server with large amounts of disk (three ZFS pools with 22x4TB disks) and 128GB RAM, often takes quite some time to shut down after syncing the disks. The last time it was in the order of 10 mins, but it has always completed. It seems to be related to swap. Swap is on dedicated GPT partitions on two system disks, and during the 10mins, it first accesses the first of these disks, then the other. I know for sure that the second must be accesses to swap, because this is the only partition currently used on this disk. I believe that it had in the order of 6GB pushed out to swap the last time. It is running 9.3-REL without Denninger's ZFS patches, so it tends to push some stuff to swap. Is there some swap GC going on before shutdown that can take this time? Bengt