From owner-freebsd-current@freebsd.org  Thu Oct 27 20:13:52 2016
Return-Path: <owner-freebsd-current@freebsd.org>
Delivered-To: freebsd-current@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 85F31C22277
 for <freebsd-current@mailman.ysv.freebsd.org>;
 Thu, 27 Oct 2016 20:13:52 +0000 (UTC)
 (envelope-from uspoerlein@gmail.com)
Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com
 [IPv6:2a00:1450:4010:c07::229])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id D9D3AE2A;
 Thu, 27 Oct 2016 20:13:51 +0000 (UTC)
 (envelope-from uspoerlein@gmail.com)
Received: by mail-lf0-x229.google.com with SMTP id b75so41109004lfg.3;
 Thu, 27 Oct 2016 13:13:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-transfer-encoding;
 bh=hcquva6Jx7g8aGLeMG4sM7j5EiQLk33ybXh0WkoolYE=;
 b=qJxKCUzsJrZ1MPqcNrei+HcBkxjItUdWO8o8gKECcKTH9HHDzmsqD/3u5SjmgX4CIh
 OszRSNA2j26ReEy8Bh7EHXwxEMwrtll05g5JatEkhLPEZDIbCIJNdachuML6ygWslkfP
 aGTyx8kY9N08YFiOTTqo8mBFUDf/55UgVetedbo66dLvgPyTkTdwutWcf1ZobijCXteH
 +3zLkHcW7p4GES3N/FBIpP0OAuRVmUDsRVPb1rnq35O6EUSo7bkekDmR8w9S6egLOaIs
 Jc2+7sI30L96Be650G8o6XRr/UevX4t7YyS5biMOLhHrENPWghSk3tPwrClkPCxcE8HF
 UYWA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:sender:in-reply-to:references:from
 :date:message-id:subject:to:cc:content-transfer-encoding;
 bh=hcquva6Jx7g8aGLeMG4sM7j5EiQLk33ybXh0WkoolYE=;
 b=IgmkmkIqTdiqx47GynGVBUEx5H5c1zAj8NG1izqFDgfEw8EgZVmnYU9EfTiWBZ8yf7
 7sJjmDS45pxGCYWyI9mxRsLOy7hUXapkU0LyDJHDC6k18sM+100l4Gd5c6yzn0eualoj
 CtpIiXog+ZA+d/SvISTjh7QDr4mXcUCIfB/uv7/epe3PJTrPnXBccV5mtCgcW1WabTa5
 3dI+lxQvr1+gnL8MBX6HLehNRvviK74OtWOF4VNLNQHi+g1CX/ybxlVyjNXKOEGPKpjD
 Rnv/JW39kR3gyikOyyozXgYjPjOfjaI3lXCCr5TOg1wIcOd6Co81r25yAPGSA3LV3RQY
 ZQUQ==
X-Gm-Message-State: ABUngvfTSl9JRO0XYeh6dt7ulhlzyZWZaeAMIAoKQpDSXgWdhMdv5ggVzZDKjtTxmBqllWBKyRywAsJGsS+WZQ==
X-Received: by 10.25.28.70 with SMTP id c67mr7541767lfc.93.1477599229220; Thu,
 27 Oct 2016 13:13:49 -0700 (PDT)
MIME-Version: 1.0
Sender: uspoerlein@gmail.com
Received: by 10.25.190.205 with HTTP; Thu, 27 Oct 2016 13:13:48 -0700 (PDT)
In-Reply-To: <CA+t49PLPHXw-F5t16G0O8YrNPynyB7eo2g6+ErZnbMziwCGj0g@mail.gmail.com>
References: <20161015161848.GD2532@acme.spoerlein.net>
 <6926bd72-35c9-cb21-4785-b50a05e581be@selasky.org>
 <CAN6yY1toUPRt3zBBJtgDVoFJCmZdgoGaUXznS9BbooN9BbfeEQ@mail.gmail.com>
 <20161024174327.GB2734@acme.spoerlein.net>
 <20161026164343.GC2734@acme.spoerlein.net>
 <20161026164518.GD2734@acme.spoerlein.net>
 <CA+t49PLPHXw-F5t16G0O8YrNPynyB7eo2g6+ErZnbMziwCGj0g@mail.gmail.com>
From: =?UTF-8?Q?Ulrich_Sp=C3=B6rlein?= <uqs@freebsd.org>
Date: Thu, 27 Oct 2016 22:13:48 +0200
X-Google-Sender-Auth: qjDuFldbZwoU210-vmHjtz1FZTs
Message-ID: <CAJ9axoRrcag7Gq=TeBt65Per=daNqWLAzzNJAVs2LOzyE3VPnQ@mail.gmail.com>
Subject: Re: 11.x deadlocking during pfault (was Re: FreeBSD 11.x grinds to a
 halt after about 48h of uptime)
To: Daniel Nebdal <dnebdal@gmail.com>
Cc: Kevin Oberman <rkoberman@gmail.com>, kib@freebsd.org, 
 Hans Petter Selasky <hps@selasky.org>,
 FreeBSD Current <freebsd-current@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 27 Oct 2016 20:13:52 -0000

2016-10-27 14:51 GMT+02:00 Daniel Nebdal <dnebdal@gmail.com>:
>
> On Wed, Oct 26, 2016 at 6:45 PM, Ulrich Sp=C3=B6rlein <uqs@freebsd.org> w=
rote:
> >
> > On Wed, 2016-10-26 at 18:43:43 +0200, Ulrich Sp=C3=B6rlein wrote:
> > > On Mon, 2016-10-24 at 19:43:27 +0200, Ulrich Sp=C3=B6rlein wrote:
> > > > On Sat, 2016-10-15 at 09:36:27 -0700, Kevin Oberman wrote:
> > > > > On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky <hps@selasky=
.org>
> > > > > wrote:
> > > > >
> > > > > > On 10/15/16 18:18, Ulrich Sp=C3=B6rlein wrote:
> > > > > >
> > > > > >> Hey all, while 11.x is -STABLE now, this happens to my machine=
 ever
> > > > > >> since I upgraded it to 11-CURRENT years ago. I have no idea wh=
en this
> > > > > >> started, actually, but what always happens is this:
> > > > > >>
> > > > > >> - System and X11 is up and running, I keep it running over nig=
ht as I'm
> > > > > >> too lazy to reboot and restart everthing.
> > > > > >> - There's a bunch of xterms, Chrome, Clementine-Player and som=
e other
> > > > > >> programs running
> > > > > >> - Coming back to the machine the next day (or the day after) i=
t will
> > > > > >> exit the screensaver just fine and then either I can use it fo=
r a couple
> > > > > >> of seconds before it freezes, or it's pretty much dead already=
. The
> > > > > >> mouse cursor still moves for a bit, but the also freezes (so i=
t this a
> > > > > >> GPU problem??)
> > > > > >>
> > > > > >> Now what I currently see on the screen is a clock widget stuck=
 at 18:04
> > > > > >> but conky itself has last updated at 18:00:18 ...
> > > > > >>
> > > > > >> This time I had some SSH sessions from another machine to see =
some more
> > > > > >> useful things. There was nothing in various logs under /var/lo=
g (I also
> > > > > >> can't run dmesg anymore ...)
> > > > > >> I had top(1) running in a loop, this is the last output:
> > > > > >>
> > > > > >> last pid: 25633;  load averages:  0.27,  0.39,  0.36  up 1+23:=
03:28
> > > > > >> 18:00:12
> > > > > >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting
> > > > > >>
> > > > > >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Fr=
ee
> > > > > >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 10=
12M Other
> > > > > >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse
> > > > > >>
> > > > > >>
> > > > > >>   PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIM=
E    WCPU
> > > > > >> COMMAND
> > > > > >>    11 root            8 155 ki31     0K   128K CPU0    0 364.6=
H 772.95%
> > > > > >> idle
> > > > > >>              3122 uqs            15  28    0  7113M  5861M uwa=
it   0
> > > > > >> 94:44  13.96% chrome
> > > > > >>                            2887 uqs            28  22    0  13=
94M   237M
> > > > > >> select  2 172:53   6.98% chrome
> > > > > >>                                        2890 uqs            11 =
 21    0
> > > > > >> 1034M   178M select  5 231:21   1.95% chrome
> > > > > >>                                                    1062 root  =
          9
> > > > > >> 21    0   440M 47220K select  0  67:09   0.98% Xorg
> > > > > >>                                                              3=
002 uqs
> > > > > >>       15  25    5  1159M   172M uwait   2  19:09   0.00% chrom=
e
> > > > > >>  3139 uqs            17  25    5  1163M   156M uwait   2  16:1=
5   0.00%
> > > > > >> chrome
> > > > > >>  3001 uqs            18  25    5  1639M   575M uwait   0  16:0=
5   0.00%
> > > > > >> chrome
> > > > > >>    12 root           24 -64    -     0K   384K WAIT   -1  10:5=
3   0.00%
> > > > > >> intr
> > > > > >>  3129 uqs            12  20    0  2820M  1746M uwait   6   8:3=
6   0.00%
> > > > > >> chrome
> > > > > >>  2822 uqs             9  20    0   217M 81300K select  0   5:1=
0   0.00%
> > > > > >> conky
> > > > > >>  3174 root            1  20    0 21532K  3188K select  0   4:2=
0   0.00%
> > > > > >> systat
> > > > > >>  3130 uqs            16  20    0  1058M   131M uwait   4   3:0=
3   0.00%
> > > > > >> chrome
> > > > > >>  2998 uqs            16  20    0  1110M   123M uwait   2   2:5=
3   0.00%
> > > > > >> chrome
> > > > > >>  3165 uqs            10  20    0  1209M   215M uwait   6   2:5=
2   0.00%
> > > > > >> chrome
> > > > > >>  3142 uqs            11  25    5  1344M   195M uwait   2   2:4=
6   0.00%
> > > > > >> chrome
> > > > > >>  2876 uqs            19  20    0   580M 37164K select  3   2:4=
2   0.00%
> > > > > >> clementine-player
> > > > > >>    20 root            2 -16    -     0K    32K psleep  6   2:2=
5   0.00%
> > > > > >> pagedaemon
> > > > > >>
> > > > > >> I also had systat -vm running and it continued to update its s=
creen ...
> > > > > >> for a short while, this is the last update before SSH died:
> > > > > >>
> > > > > >>
> > > > > >>    Mem usage:  0k%Phy  5%Kmem
> > > > > >> Mem: KB    REAL            VIRTUAL                      VN PAG=
ER   SWAP
> > > > > >> PAGER
> > > > > >>         Tot   Share      Tot    Share    Free           in   o=
ut     in
> > > > > >>  out
> > > > > >> Act  11051k   67868 71051992   255448   61840  count
> > > > > >> All  11051k   67924 71058776   262100          pages
> > > > > >> Proc:
> > > > > >> Interrupts
> > > > > >>   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        iofl=
t   224
> > > > > >> total
> > > > > >>      25     730  11   724  109  404  101   13             cow =
      2
> > > > > >> ehci0 16
> > > > > >>                                                           zfod=
      3
> > > > > >> ehci1 23
> > > > > >>  0.0%Sys   0.1%Intr  0.0%User  0.0%Nice 99.9%Idle         ozfo=
d    16
> > > > > >> cpu0:timer
> > > > > >> |    |    |    |    |    |    |    |    |    |           %ozfo=
d
> > > > > >>  xhci0 264
> > > > > >>                                                           daef=
r     3 em0
> > > > > >> 265
> > > > > >>                                         50 dtbuf          prcf=
r    94
> > > > > >> hdac1 266
> > > > > >> Namei     Name-cache   Dir-cache    349167 desvn          totf=
r
> > > > > >>  ahci0 270
> > > > > >>    Calls    hits   %    hits   %    349155 numvn          reac=
t     5
> > > > > >> cpu1:timer
> > > > > >>      121     121 100                253501 frevn          pdwa=
k     1
> > > > > >> cpu2:timer
> > > > > >>                                                           pdpg=
s    29
> > > > > >> cpu7:timer
> > > > > >> Disks   md0  ada0  ada1 pass0 pass1 pass2                 intr=
n    12
> > > > > >> cpu3:timer
> > > > > >> KB/t   0.00  0.00  0.00  0.00  0.00  0.00         5318892 wire=
     41
> > > > > >> cpu6:timer
> > > > > >> tps       0     0     0     0     0     0         9261404 act =
     12
> > > > > >> cpu5:timer
> > > > > >> MB/s   0.00  0.00  0.00  0.00  0.00  0.00         1598184 inac=
t     6
> > > > > >> cpu4:timer
> > > > > >> %busy     0     0     0     0     0     0                 cach=
e
> > > > > >>  vgapci0
> > > > > >>                                                     61840 free
> > > > > >>                                                    712304 buf
> > > > > >>
> > > > > >>
> > > > > >> Why do I have a Chrome tab using about 6G? What other sort of =
debugging
> > > > > >> output can be helpful to get to the bottom of this? The machin=
e still
> > > > > >> responds to pings just fine, TCP connections get set up but th=
e SSH
> > > > > >> handshake never completes.
> > > > > >>
> > > > > >> This always happens between 30-50h and is super annoying and h=
as been
> > > > > >> going on for >1year. Help?
> > > > > >>
> > > > > >> Note, I cut the power to the monitor overnight to save electri=
city, can
> > > > > >> this mess up something in the Radeon card or X server? What co=
mbinations
> > > > > >> would be most useful to try next?
> > > > > >>
> > > > > >>
> > > > > > Hi,
> > > > > >
> > > > > > Sounds like a memory leak. Can you track the memory use over ti=
me?
> > > > > >
> > > > > > Did you look at the output from:
> > > > > >
> > > > > > vmstat -m ?
> > > > > >
> > > > > > --HPS
> > > > >
> > > > >
> > > > > I have noted significant  memory leakage in chromium for some tim=
e. If I
> > > > > leave it running overnight, my system is essentially frozen. If I=
 terminate
> > > > > the chromium process, it slowly comes back to life. I always keep=
 a gkrellm
> > > > > session on-screen where the memory and swap utilization is contin=
uously
> > > > > displayed and that clearly shows resources declining.
> > > >
> > > > That is not what is happening to my system though, it actually
> > > > deadlocks. There's no way to recover from it, it seems.
> > > >
> > > > So I killed Chromium overnight each day, and I'm at this:
> > > >
> > > > % top -Sbores
> > > > last pid: 44526;  load averages:  0.10,  0.11,  0.56  up 7+09:53:30=
    19:33:25
> > > > 156 processes: 2 running, 153 sleeping, 1 waiting
> > > >
> > > > Mem: 315M Active, 550M Inact, 5671M Wired, 515M Buf, 9324M Free
> > > > ARC: 1852M Total, 541M MFU, 196M MRU, 16K Anon, 93M Header, 1022M O=
ther
> > > > Swap: 4096M Total, 2186M Used, 1910M Free, 53% Inuse
> > > >
> > > >
> > > >   PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    =
WCPU COMMAND
> > > >  2755 uqs            10  20    0  1697M   311M select  1  47:23   0=
.00% conky
> > > >  2736 uqs            32  20    0   699M   116M select  7  94:29   0=
.00% clementine-player
> > > >  3000 uqs            12  20    0  1126M 69380K select  5   9:48   0=
.00% digikam
> > > >   960 root            9  20    0   448M 59076K select  0 250:22   0=
.00% Xorg
> > > > 72608 uqs             8  20    0   939M 55432K uwait   5   0:01   0=
.00% chrome
> > > > 72599 uqs             9  52    0   929M 55116K uwait   0   0:00   0=
.00% chrome
> > > >  2567 root            1  20    0 89948K 42964K select  1   1:51   0=
.00% bsnmpd
> > > > 70476 uqs             1  20    0 93656K 25712K select  2   0:05   0=
.00% xterm
> > > >  2730 uqs             5  20    0   208M 14988K select  1   0:22   0=
.00% clock-applet
> > > >   880 root            1  20    0 22628K 12500K select  3   0:20   0=
.00% ntpd
> > > >  2726 uqs             4  20    0   206M 12456K select  6   0:09   0=
.00% mateweather-applet
> > > > 44352 uqs             1  20    0 75224K 12348K select  4   0:00   0=
.00% xterm
> > > > 43049 uqs             1  20    0 75224K 11792K select  5   0:00   0=
.00% xterm
> > > >  3074 uqs             2  20    0   308M  9692K select  1   0:02   0=
.00% kdeinit4
> > > >  2671 uqs             1  20    0   144M  9488K select  1   0:13   0=
.00% openbox
> > > >  3072 uqs             1  20    0   210M  8284K select  3   0:00   0=
.00% kdeinit4
> > > >  2724 uqs             4  20    0   154M  8256K select  2   0:19   0=
.00% wnck-applet
> > > >  2701 uqs             5  20    0   177M  8144K select  2   0:01   0=
.00% mate-panel
> > > >
> > > >
> > > > 7d running, pretty good. But look closer, the system is doing prett=
y
> > > > much nothing but did swap out 2G. What?
> > > >
> > > > > Try closing your chromium at night and see if that fixes the prob=
lem.
> > > >
> > > > It's better, but I'm not sure it's a real fix. I've now turned off
> > > > "hardware acceleration" in Chromium, though chrome://gpu didn't rea=
l
> > > > inspire confidence that it was actually using any h/w accel at all.
> > > >
> > > >
> > > > > If you have never tried gkrellm (sysutils/gkrellm2), it is a the =
best
> > > > > system monitor I have found. though pulls in a lot of dependencie=
s. It also
> > > > > can run as a server with remote systems displaying the data. Hand=
y to
> > > > > monitor servers.
> > > >
> > > > I had a cacti-setup that would also monitor my workstation (through=
 a
> > > > OpenVPN tunnel), but that has bit-rotted and Apache only gives me 5=
00s
> > > > on that cacti URL and nothing in the logs, oh well ...)
> > > >
> > > > Hooking up a serial console and testing whether DDB works is probab=
ly
> > > > the next best step to take ...
> > >
> > > Sigh, I forgot to shut down Chrome overnight, and my system is
> > > deadlocked again. I can still switch virtual desktops in X11 and a
> > > running xterm accepts keyboard input, but it is 18:35 now and I see t=
hat
> > > my X11 clock has last updated at 17:18 and conky is stuck at 17:14:11
> > >
> > > my top-in-a-loop is stuck here and no longer loops:
> > >
> > > last pid: 73731;  load averages:  0.23,  0.24,  0.23  up 9+07:34:15  =
  17:14:10
> > > 160 processes: 3 running, 146 sleeping, 11 zombie
> > >
> > > Mem: 9302M Active, 763M Inact, 5682M Wired, 752M Buf, 113M Free
> > > ARC: 1731M Total, 549M MFU, 129M MRU, 16K Anon, 91M Header, 963M Othe=
r
> > > Swap:
> > >
> > >
> > >   PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WC=
PU COMMAND
> > >   960 root            9  21    0   451M 55796K CPU7    7 299:15   2.9=
8% Xorg
> > > 53884 uqs            27  22    0   904M   250M CPU4    4 105:05   2.9=
8% chrome
> > > 54081 uqs            15  21    0  3601M  2527M uwait   6  38:11   0.9=
8% chrome
> > >  2736 uqs            33  20    0   697M 84620K select  5  95:38   0.0=
0% clementine-player
> > >  2755 uqs            10  20    0  2111M   721M select  0  60:47   0.0=
0% conky
> > > 78943 uqs             1  20    0 21532K  3328K select  7  13:12   0.0=
0% systat
> > > 54048 uqs            15  25    5  1093M   159M uwait   4   9:51   0.0=
0% chrome
> > >  3000 uqs            12  20    0  1126M 53248K select  7   9:50   0.0=
0% digikam
> > > 53999 uqs            19  25    5  1525M   514M uwait   5   6:28   0.0=
0% chrome
> > >  2703 uqs             1  20    0   169M  5948K select  0   5:40   0.0=
0% mate-volume-control
> > >  2707 uqs             1  20    0 60240K  2516K select  7   5:21   0.0=
0% autocutsel
> > > 54077 uqs            15  20    0  1803M   821M uwait   6   3:51   0.0=
0% chrome
> > > 78869 uqs             1  20    0 93480K  6636K select  7   3:38   0.0=
0% sshd
> > > 38396 uqs             1  20    0 75224K  3896K select  0   3:18   0.0=
0% xterm
> > >  2567 root            1  20    0 89948K 43892K select  6   2:25   0.0=
0% bsnmpd
> > >   876 root            9  52    0 30120K  3468K uwait   7   2:20   0.0=
0% nscd
> > >   883 root            1  20    0 10444K  1708K select  0   2:16   0.0=
0% powerd
> > >  2586 haldaemon       2  20    0 56620K  3920K select  1   2:04   0.0=
0% hald
> > >
> > >
> > > systat -vm:
> > >
> > >    10 users    Load  0.21  0.23  0.23                  Oct 26 17:14
> > >    Mem usage:  0k%Phy  5%Kmem
> > > Mem: KB    REAL            VIRTUAL                      VN PAGER   SW=
AP PAGER
> > >         Tot   Share      Tot    Share    Free           in   out     =
in   out
> > > Act  10545k   51036 67451248   249344   61912  count   120     3
> > > All  10550k   55504 67480212   275720          pages   807     3
> > > Proc:                                                            Inte=
rrupts
> > >   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt    149 ioflt   845=
 total
> > >       6     781  12  9327 6923  24k  318   18 6649     20 cow       2=
 ehci0 16
> > >                                                      5064 zfod      3=
 ehci1 23
> > >  4.7%Sys   0.0%Intr  1.7%User  0.0%Nice 93.6%Idle      16 ozfod   106=
 cpu0:timer
> > > |    |    |    |    |    |    |    |    |    |           %ozfod      =
 xhci0 264
> > > =3D=3D>                                                   927 daefr  =
  45 em0 265
> > >                                         51 dtbuf      545 prcfr    94=
 hdac1 266
> > > Namei     Name-cache   Dir-cache    349167 desvn     1787 totfr   175=
 ahci0 270
> > >    Calls    hits   %    hits   %    337762 numvn          react    50=
 cpu1:timer
> > >      935     901  96                170078 frevn          pdwak    48=
 cpu3:timer
> > >                                                   1435985 pdpgs    66=
 cpu2:timer
> > > Disks   md0  ada0  ada1 pass0 pass1 pass2                 intrn    77=
 cpu7:timer
> > > KB/t   0.00 22.59  0.00  0.00  0.00  0.00         5819100 wire     75=
 cpu4:timer
> > > tps      17   168     0     0     0     0         9723008 act      51=
 cpu6:timer
> > > MB/s   0.00  3.70  0.00  0.00  0.00  0.00          636300 inact    53=
 cpu5:timer
> > > %busy     0     3     0     0     0     0                 cache      =
 vgapci0
> > >                                                     61912 free
> > >                                                    770732 buf
> > >
> > >
> > > I'm unable to scroll back the vmstat -m output in my tmux pane (runni=
ng on a
> > > different system, this is super strange), so all I can show is this:
> > >
> > >
> > >      inodedep    16 12293K       -  2770872  512
> > >     bmsafemap     9    49K       -  2136243  256,8192
> > >        newblk    28 24582K       -  4111401  256
> > >      indirdep     4     1K       -   865983  128,16384
> > >      freefrag     6     1K       -   682027  128
> > >      freeblks     3     1K       -  1318985  256
> > >      freefile     3     1K       -  1272832  64
> > >        diradd     2     1K       -  1306741  128
> > >         mkdir     0     0K       -     5120  128
> > >        dirrem     0     0K       -  1305358  128
> > >     newdirblk     0     0K       -     2610  64
> > >      freework     9     1K       -  1566489  64,128
> > >         sbdep     0     0K       -    45661  64
> > >      savedino     0     0K       -   186280  256
> > >       softdep     6     3K       -        6  512
> > >   ufs_dirhash  1533   767K       -   109513  16,32,64,128,256,512,102=
4,2048,4096
> > >     ufs_quota     1  2048K       -        1
> > >     ufs_mount    18    55K       -       18  512,2048,4096,8192
> > >     vm_pgdata     1  2048K       -        2  128
> > >       UMAHash    23  5385K       -      108  512,1024,2048,4096,8192,=
16384,32768,65536
> > >       memdesc     1     4K       -        1  4096
> > >      atkbddev     2     1K       -        2  64
> > >       entropy     1     1K       -   894489  32,4096
> > >        apmdev     1     1K       -        1  128
> > >    madt_table     0     0K       -        1  4096
> > >      SCSI ENC    25   100K       -   120744  16,64,256,2048,32768
> > >       io_apic     1     2K       -        1  2048
> > >           MCA    18     3K       -       18  64,128
> > >           msi    16     2K       -       16  128
> > >      nexusdev     5     1K       -        5  16
> > >          hdaa     5    54K       -        5  2048,16384,32768
> > >          hdac     1     1K       -        1  1024
> > >         hdacc     1     1K       -        1  32
> > >         linux    29     2K       -       29  64
> > >       solaris 295750 228696K       - 140520323  16,32,64,128,256,512,=
1024,2048,4096,8192,32768
> > >    kstat_data     6     1K       -        6  64
> > >      eli data    22     4K       -  6218901  64,256,512,1024,2048,409=
6,8192,16384,32768,65536
> > >          ksem     1     8K       -        1  8192
> > >   nullfs_hash     1  2048K       -        1
> > >   nullfs_node     9     1K       -       41  64
> > >  nullfs_mount     9     1K       -        9  32
> > >   fdesc_mount     1     1K       -        1  16
> > >      gem_name    46    14K       -      122  32,4096
> > >    drm_global     2     1K       -        2  128,256
> > >       drm_dma     1     1K       -        1  32
> > >     drm_sarea     1     1K       -        1  16
> > >    drm_driver    91  2278K       -      125  16,32,64,128,256,512,102=
4,2048,4096,8192,32768
> > >     drm_minor     2     1K       -        2  128
> > >     drm_files     1     1K       -        3  512
> > > drm_ctxbitmap     1     4K       -        1  4096
> > >      drm_sman    41     6K       -       42  128
> > >   drm_hashtab     3  4129K       -        4  128,32768
> > >       drm_kms    69    19K       -      163  16,32,64,128,256,512,204=
8,4096,8192
> > >    drm_vblank     7     1K       -        7  32,256
> > >        ttm_pd    16    17K       -       18  16,128,2048,65536
> > >      ttm_rman     2     1K       -        2  256
> > >      ttm_zone     2     1K       -        2  64
> > >   ttm_poolmgr     1     1K       -        1  512
> > >
> > >
> > > Now what?
> > >
> > > The xterm I have running locally with a stuck top is showing the top =
3 chrome
> > > processes in pfault state and it has "Swap: 11M In" in the header, so=
 clearly
> > > 11.x is prone to deadlock during page faults and or swapping. It has =
last
> > > updated on 17:14:13 (compare to the other top at 17:14:10 not showing=
 pfault
> > > state yet).
> >
> > Addendum: I still see the HDD indication light flickering every couple
> > of seconds, so something is still doing I/O. My SSH sessions into the
> > machine haven't time out, and the screensaver (just DPMS blank) is also
> > still kicking in correctly.
> >
>
> Just out of general interest, how is your swap set up? Is it on a zvol
> or something exotic like that, or is it just a normal partition?

Regular swap on a direct partition, no ZVOL involved. In fact, for
*that* deadlock above, I ran "swapoff -a" before I left it overnight,
to make sure that it's not the swap that is the problem.

And a 2nd addendum: as it was still writing/reading from disk
_somehow_, I pushed the power button to see if it would do the ACPI
shutdown thing. It did! It took a while, but eventually the X server
was killed and I saw it syncing the disk and then it did an orderly
shutdown.

Very strange, and I've been having this for a year or so. Always with
a recent build, obviously.