From owner-freebsd-current@freebsd.org Thu Oct 27 20:13:52 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 85F31C22277 for ; Thu, 27 Oct 2016 20:13:52 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com [IPv6:2a00:1450:4010:c07::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D9D3AE2A; Thu, 27 Oct 2016 20:13:51 +0000 (UTC) (envelope-from uspoerlein@gmail.com) Received: by mail-lf0-x229.google.com with SMTP id b75so41109004lfg.3; Thu, 27 Oct 2016 13:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=hcquva6Jx7g8aGLeMG4sM7j5EiQLk33ybXh0WkoolYE=; b=qJxKCUzsJrZ1MPqcNrei+HcBkxjItUdWO8o8gKECcKTH9HHDzmsqD/3u5SjmgX4CIh OszRSNA2j26ReEy8Bh7EHXwxEMwrtll05g5JatEkhLPEZDIbCIJNdachuML6ygWslkfP aGTyx8kY9N08YFiOTTqo8mBFUDf/55UgVetedbo66dLvgPyTkTdwutWcf1ZobijCXteH +3zLkHcW7p4GES3N/FBIpP0OAuRVmUDsRVPb1rnq35O6EUSo7bkekDmR8w9S6egLOaIs Jc2+7sI30L96Be650G8o6XRr/UevX4t7YyS5biMOLhHrENPWghSk3tPwrClkPCxcE8HF UYWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=hcquva6Jx7g8aGLeMG4sM7j5EiQLk33ybXh0WkoolYE=; b=IgmkmkIqTdiqx47GynGVBUEx5H5c1zAj8NG1izqFDgfEw8EgZVmnYU9EfTiWBZ8yf7 7sJjmDS45pxGCYWyI9mxRsLOy7hUXapkU0LyDJHDC6k18sM+100l4Gd5c6yzn0eualoj CtpIiXog+ZA+d/SvISTjh7QDr4mXcUCIfB/uv7/epe3PJTrPnXBccV5mtCgcW1WabTa5 3dI+lxQvr1+gnL8MBX6HLehNRvviK74OtWOF4VNLNQHi+g1CX/ybxlVyjNXKOEGPKpjD Rnv/JW39kR3gyikOyyozXgYjPjOfjaI3lXCCr5TOg1wIcOd6Co81r25yAPGSA3LV3RQY ZQUQ== X-Gm-Message-State: ABUngvfTSl9JRO0XYeh6dt7ulhlzyZWZaeAMIAoKQpDSXgWdhMdv5ggVzZDKjtTxmBqllWBKyRywAsJGsS+WZQ== X-Received: by 10.25.28.70 with SMTP id c67mr7541767lfc.93.1477599229220; Thu, 27 Oct 2016 13:13:49 -0700 (PDT) MIME-Version: 1.0 Sender: uspoerlein@gmail.com Received: by 10.25.190.205 with HTTP; Thu, 27 Oct 2016 13:13:48 -0700 (PDT) In-Reply-To: References: <20161015161848.GD2532@acme.spoerlein.net> <6926bd72-35c9-cb21-4785-b50a05e581be@selasky.org> <20161024174327.GB2734@acme.spoerlein.net> <20161026164343.GC2734@acme.spoerlein.net> <20161026164518.GD2734@acme.spoerlein.net> From: =?UTF-8?Q?Ulrich_Sp=C3=B6rlein?= Date: Thu, 27 Oct 2016 22:13:48 +0200 X-Google-Sender-Auth: qjDuFldbZwoU210-vmHjtz1FZTs Message-ID: Subject: Re: 11.x deadlocking during pfault (was Re: FreeBSD 11.x grinds to a halt after about 48h of uptime) To: Daniel Nebdal Cc: Kevin Oberman , kib@freebsd.org, Hans Petter Selasky , FreeBSD Current Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Oct 2016 20:13:52 -0000 2016-10-27 14:51 GMT+02:00 Daniel Nebdal : > > On Wed, Oct 26, 2016 at 6:45 PM, Ulrich Sp=C3=B6rlein w= rote: > > > > On Wed, 2016-10-26 at 18:43:43 +0200, Ulrich Sp=C3=B6rlein wrote: > > > On Mon, 2016-10-24 at 19:43:27 +0200, Ulrich Sp=C3=B6rlein wrote: > > > > On Sat, 2016-10-15 at 09:36:27 -0700, Kevin Oberman wrote: > > > > > On Sat, Oct 15, 2016 at 9:26 AM, Hans Petter Selasky > > > > > wrote: > > > > > > > > > > > On 10/15/16 18:18, Ulrich Sp=C3=B6rlein wrote: > > > > > > > > > > > >> Hey all, while 11.x is -STABLE now, this happens to my machine= ever > > > > > >> since I upgraded it to 11-CURRENT years ago. I have no idea wh= en this > > > > > >> started, actually, but what always happens is this: > > > > > >> > > > > > >> - System and X11 is up and running, I keep it running over nig= ht as I'm > > > > > >> too lazy to reboot and restart everthing. > > > > > >> - There's a bunch of xterms, Chrome, Clementine-Player and som= e other > > > > > >> programs running > > > > > >> - Coming back to the machine the next day (or the day after) i= t will > > > > > >> exit the screensaver just fine and then either I can use it fo= r a couple > > > > > >> of seconds before it freezes, or it's pretty much dead already= . The > > > > > >> mouse cursor still moves for a bit, but the also freezes (so i= t this a > > > > > >> GPU problem??) > > > > > >> > > > > > >> Now what I currently see on the screen is a clock widget stuck= at 18:04 > > > > > >> but conky itself has last updated at 18:00:18 ... > > > > > >> > > > > > >> This time I had some SSH sessions from another machine to see = some more > > > > > >> useful things. There was nothing in various logs under /var/lo= g (I also > > > > > >> can't run dmesg anymore ...) > > > > > >> I had top(1) running in a loop, this is the last output: > > > > > >> > > > > > >> last pid: 25633; load averages: 0.27, 0.39, 0.36 up 1+23:= 03:28 > > > > > >> 18:00:12 > > > > > >> 202 processes: 2 running, 188 sleeping, 11 zombie, 1 waiting > > > > > >> > > > > > >> Mem: 8873M Active, 1783M Inact, 5072M Wired, 567M Buf, 132M Fr= ee > > > > > >> ARC: 1844M Total, 469M MFU, 268M MRU, 16K Anon, 96M Header, 10= 12M Other > > > > > >> Swap: 4096M Total, 2395M Used, 1701M Free, 58% Inuse > > > > > >> > > > > > >> > > > > > >> PID USERNAME THR PRI NICE SIZE RES STATE C TIM= E WCPU > > > > > >> COMMAND > > > > > >> 11 root 8 155 ki31 0K 128K CPU0 0 364.6= H 772.95% > > > > > >> idle > > > > > >> 3122 uqs 15 28 0 7113M 5861M uwa= it 0 > > > > > >> 94:44 13.96% chrome > > > > > >> 2887 uqs 28 22 0 13= 94M 237M > > > > > >> select 2 172:53 6.98% chrome > > > > > >> 2890 uqs 11 = 21 0 > > > > > >> 1034M 178M select 5 231:21 1.95% chrome > > > > > >> 1062 root = 9 > > > > > >> 21 0 440M 47220K select 0 67:09 0.98% Xorg > > > > > >> 3= 002 uqs > > > > > >> 15 25 5 1159M 172M uwait 2 19:09 0.00% chrom= e > > > > > >> 3139 uqs 17 25 5 1163M 156M uwait 2 16:1= 5 0.00% > > > > > >> chrome > > > > > >> 3001 uqs 18 25 5 1639M 575M uwait 0 16:0= 5 0.00% > > > > > >> chrome > > > > > >> 12 root 24 -64 - 0K 384K WAIT -1 10:5= 3 0.00% > > > > > >> intr > > > > > >> 3129 uqs 12 20 0 2820M 1746M uwait 6 8:3= 6 0.00% > > > > > >> chrome > > > > > >> 2822 uqs 9 20 0 217M 81300K select 0 5:1= 0 0.00% > > > > > >> conky > > > > > >> 3174 root 1 20 0 21532K 3188K select 0 4:2= 0 0.00% > > > > > >> systat > > > > > >> 3130 uqs 16 20 0 1058M 131M uwait 4 3:0= 3 0.00% > > > > > >> chrome > > > > > >> 2998 uqs 16 20 0 1110M 123M uwait 2 2:5= 3 0.00% > > > > > >> chrome > > > > > >> 3165 uqs 10 20 0 1209M 215M uwait 6 2:5= 2 0.00% > > > > > >> chrome > > > > > >> 3142 uqs 11 25 5 1344M 195M uwait 2 2:4= 6 0.00% > > > > > >> chrome > > > > > >> 2876 uqs 19 20 0 580M 37164K select 3 2:4= 2 0.00% > > > > > >> clementine-player > > > > > >> 20 root 2 -16 - 0K 32K psleep 6 2:2= 5 0.00% > > > > > >> pagedaemon > > > > > >> > > > > > >> I also had systat -vm running and it continued to update its s= creen ... > > > > > >> for a short while, this is the last update before SSH died: > > > > > >> > > > > > >> > > > > > >> Mem usage: 0k%Phy 5%Kmem > > > > > >> Mem: KB REAL VIRTUAL VN PAG= ER SWAP > > > > > >> PAGER > > > > > >> Tot Share Tot Share Free in o= ut in > > > > > >> out > > > > > >> Act 11051k 67868 71051992 255448 61840 count > > > > > >> All 11051k 67924 71058776 262100 pages > > > > > >> Proc: > > > > > >> Interrupts > > > > > >> r p d s w Csw Trp Sys Int Sof Flt iofl= t 224 > > > > > >> total > > > > > >> 25 730 11 724 109 404 101 13 cow = 2 > > > > > >> ehci0 16 > > > > > >> zfod= 3 > > > > > >> ehci1 23 > > > > > >> 0.0%Sys 0.1%Intr 0.0%User 0.0%Nice 99.9%Idle ozfo= d 16 > > > > > >> cpu0:timer > > > > > >> | | | | | | | | | | %ozfo= d > > > > > >> xhci0 264 > > > > > >> daef= r 3 em0 > > > > > >> 265 > > > > > >> 50 dtbuf prcf= r 94 > > > > > >> hdac1 266 > > > > > >> Namei Name-cache Dir-cache 349167 desvn totf= r > > > > > >> ahci0 270 > > > > > >> Calls hits % hits % 349155 numvn reac= t 5 > > > > > >> cpu1:timer > > > > > >> 121 121 100 253501 frevn pdwa= k 1 > > > > > >> cpu2:timer > > > > > >> pdpg= s 29 > > > > > >> cpu7:timer > > > > > >> Disks md0 ada0 ada1 pass0 pass1 pass2 intr= n 12 > > > > > >> cpu3:timer > > > > > >> KB/t 0.00 0.00 0.00 0.00 0.00 0.00 5318892 wire= 41 > > > > > >> cpu6:timer > > > > > >> tps 0 0 0 0 0 0 9261404 act = 12 > > > > > >> cpu5:timer > > > > > >> MB/s 0.00 0.00 0.00 0.00 0.00 0.00 1598184 inac= t 6 > > > > > >> cpu4:timer > > > > > >> %busy 0 0 0 0 0 0 cach= e > > > > > >> vgapci0 > > > > > >> 61840 free > > > > > >> 712304 buf > > > > > >> > > > > > >> > > > > > >> Why do I have a Chrome tab using about 6G? What other sort of = debugging > > > > > >> output can be helpful to get to the bottom of this? The machin= e still > > > > > >> responds to pings just fine, TCP connections get set up but th= e SSH > > > > > >> handshake never completes. > > > > > >> > > > > > >> This always happens between 30-50h and is super annoying and h= as been > > > > > >> going on for >1year. Help? > > > > > >> > > > > > >> Note, I cut the power to the monitor overnight to save electri= city, can > > > > > >> this mess up something in the Radeon card or X server? What co= mbinations > > > > > >> would be most useful to try next? > > > > > >> > > > > > >> > > > > > > Hi, > > > > > > > > > > > > Sounds like a memory leak. Can you track the memory use over ti= me? > > > > > > > > > > > > Did you look at the output from: > > > > > > > > > > > > vmstat -m ? > > > > > > > > > > > > --HPS > > > > > > > > > > > > > > > I have noted significant memory leakage in chromium for some tim= e. If I > > > > > leave it running overnight, my system is essentially frozen. If I= terminate > > > > > the chromium process, it slowly comes back to life. I always keep= a gkrellm > > > > > session on-screen where the memory and swap utilization is contin= uously > > > > > displayed and that clearly shows resources declining. > > > > > > > > That is not what is happening to my system though, it actually > > > > deadlocks. There's no way to recover from it, it seems. > > > > > > > > So I killed Chromium overnight each day, and I'm at this: > > > > > > > > % top -Sbores > > > > last pid: 44526; load averages: 0.10, 0.11, 0.56 up 7+09:53:30= 19:33:25 > > > > 156 processes: 2 running, 153 sleeping, 1 waiting > > > > > > > > Mem: 315M Active, 550M Inact, 5671M Wired, 515M Buf, 9324M Free > > > > ARC: 1852M Total, 541M MFU, 196M MRU, 16K Anon, 93M Header, 1022M O= ther > > > > Swap: 4096M Total, 2186M Used, 1910M Free, 53% Inuse > > > > > > > > > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME = WCPU COMMAND > > > > 2755 uqs 10 20 0 1697M 311M select 1 47:23 0= .00% conky > > > > 2736 uqs 32 20 0 699M 116M select 7 94:29 0= .00% clementine-player > > > > 3000 uqs 12 20 0 1126M 69380K select 5 9:48 0= .00% digikam > > > > 960 root 9 20 0 448M 59076K select 0 250:22 0= .00% Xorg > > > > 72608 uqs 8 20 0 939M 55432K uwait 5 0:01 0= .00% chrome > > > > 72599 uqs 9 52 0 929M 55116K uwait 0 0:00 0= .00% chrome > > > > 2567 root 1 20 0 89948K 42964K select 1 1:51 0= .00% bsnmpd > > > > 70476 uqs 1 20 0 93656K 25712K select 2 0:05 0= .00% xterm > > > > 2730 uqs 5 20 0 208M 14988K select 1 0:22 0= .00% clock-applet > > > > 880 root 1 20 0 22628K 12500K select 3 0:20 0= .00% ntpd > > > > 2726 uqs 4 20 0 206M 12456K select 6 0:09 0= .00% mateweather-applet > > > > 44352 uqs 1 20 0 75224K 12348K select 4 0:00 0= .00% xterm > > > > 43049 uqs 1 20 0 75224K 11792K select 5 0:00 0= .00% xterm > > > > 3074 uqs 2 20 0 308M 9692K select 1 0:02 0= .00% kdeinit4 > > > > 2671 uqs 1 20 0 144M 9488K select 1 0:13 0= .00% openbox > > > > 3072 uqs 1 20 0 210M 8284K select 3 0:00 0= .00% kdeinit4 > > > > 2724 uqs 4 20 0 154M 8256K select 2 0:19 0= .00% wnck-applet > > > > 2701 uqs 5 20 0 177M 8144K select 2 0:01 0= .00% mate-panel > > > > > > > > > > > > 7d running, pretty good. But look closer, the system is doing prett= y > > > > much nothing but did swap out 2G. What? > > > > > > > > > Try closing your chromium at night and see if that fixes the prob= lem. > > > > > > > > It's better, but I'm not sure it's a real fix. I've now turned off > > > > "hardware acceleration" in Chromium, though chrome://gpu didn't rea= l > > > > inspire confidence that it was actually using any h/w accel at all. > > > > > > > > > > > > > If you have never tried gkrellm (sysutils/gkrellm2), it is a the = best > > > > > system monitor I have found. though pulls in a lot of dependencie= s. It also > > > > > can run as a server with remote systems displaying the data. Hand= y to > > > > > monitor servers. > > > > > > > > I had a cacti-setup that would also monitor my workstation (through= a > > > > OpenVPN tunnel), but that has bit-rotted and Apache only gives me 5= 00s > > > > on that cacti URL and nothing in the logs, oh well ...) > > > > > > > > Hooking up a serial console and testing whether DDB works is probab= ly > > > > the next best step to take ... > > > > > > Sigh, I forgot to shut down Chrome overnight, and my system is > > > deadlocked again. I can still switch virtual desktops in X11 and a > > > running xterm accepts keyboard input, but it is 18:35 now and I see t= hat > > > my X11 clock has last updated at 17:18 and conky is stuck at 17:14:11 > > > > > > my top-in-a-loop is stuck here and no longer loops: > > > > > > last pid: 73731; load averages: 0.23, 0.24, 0.23 up 9+07:34:15 = 17:14:10 > > > 160 processes: 3 running, 146 sleeping, 11 zombie > > > > > > Mem: 9302M Active, 763M Inact, 5682M Wired, 752M Buf, 113M Free > > > ARC: 1731M Total, 549M MFU, 129M MRU, 16K Anon, 91M Header, 963M Othe= r > > > Swap: > > > > > > > > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WC= PU COMMAND > > > 960 root 9 21 0 451M 55796K CPU7 7 299:15 2.9= 8% Xorg > > > 53884 uqs 27 22 0 904M 250M CPU4 4 105:05 2.9= 8% chrome > > > 54081 uqs 15 21 0 3601M 2527M uwait 6 38:11 0.9= 8% chrome > > > 2736 uqs 33 20 0 697M 84620K select 5 95:38 0.0= 0% clementine-player > > > 2755 uqs 10 20 0 2111M 721M select 0 60:47 0.0= 0% conky > > > 78943 uqs 1 20 0 21532K 3328K select 7 13:12 0.0= 0% systat > > > 54048 uqs 15 25 5 1093M 159M uwait 4 9:51 0.0= 0% chrome > > > 3000 uqs 12 20 0 1126M 53248K select 7 9:50 0.0= 0% digikam > > > 53999 uqs 19 25 5 1525M 514M uwait 5 6:28 0.0= 0% chrome > > > 2703 uqs 1 20 0 169M 5948K select 0 5:40 0.0= 0% mate-volume-control > > > 2707 uqs 1 20 0 60240K 2516K select 7 5:21 0.0= 0% autocutsel > > > 54077 uqs 15 20 0 1803M 821M uwait 6 3:51 0.0= 0% chrome > > > 78869 uqs 1 20 0 93480K 6636K select 7 3:38 0.0= 0% sshd > > > 38396 uqs 1 20 0 75224K 3896K select 0 3:18 0.0= 0% xterm > > > 2567 root 1 20 0 89948K 43892K select 6 2:25 0.0= 0% bsnmpd > > > 876 root 9 52 0 30120K 3468K uwait 7 2:20 0.0= 0% nscd > > > 883 root 1 20 0 10444K 1708K select 0 2:16 0.0= 0% powerd > > > 2586 haldaemon 2 20 0 56620K 3920K select 1 2:04 0.0= 0% hald > > > > > > > > > systat -vm: > > > > > > 10 users Load 0.21 0.23 0.23 Oct 26 17:14 > > > Mem usage: 0k%Phy 5%Kmem > > > Mem: KB REAL VIRTUAL VN PAGER SW= AP PAGER > > > Tot Share Tot Share Free in out = in out > > > Act 10545k 51036 67451248 249344 61912 count 120 3 > > > All 10550k 55504 67480212 275720 pages 807 3 > > > Proc: Inte= rrupts > > > r p d s w Csw Trp Sys Int Sof Flt 149 ioflt 845= total > > > 6 781 12 9327 6923 24k 318 18 6649 20 cow 2= ehci0 16 > > > 5064 zfod 3= ehci1 23 > > > 4.7%Sys 0.0%Intr 1.7%User 0.0%Nice 93.6%Idle 16 ozfod 106= cpu0:timer > > > | | | | | | | | | | %ozfod = xhci0 264 > > > =3D=3D> 927 daefr = 45 em0 265 > > > 51 dtbuf 545 prcfr 94= hdac1 266 > > > Namei Name-cache Dir-cache 349167 desvn 1787 totfr 175= ahci0 270 > > > Calls hits % hits % 337762 numvn react 50= cpu1:timer > > > 935 901 96 170078 frevn pdwak 48= cpu3:timer > > > 1435985 pdpgs 66= cpu2:timer > > > Disks md0 ada0 ada1 pass0 pass1 pass2 intrn 77= cpu7:timer > > > KB/t 0.00 22.59 0.00 0.00 0.00 0.00 5819100 wire 75= cpu4:timer > > > tps 17 168 0 0 0 0 9723008 act 51= cpu6:timer > > > MB/s 0.00 3.70 0.00 0.00 0.00 0.00 636300 inact 53= cpu5:timer > > > %busy 0 3 0 0 0 0 cache = vgapci0 > > > 61912 free > > > 770732 buf > > > > > > > > > I'm unable to scroll back the vmstat -m output in my tmux pane (runni= ng on a > > > different system, this is super strange), so all I can show is this: > > > > > > > > > inodedep 16 12293K - 2770872 512 > > > bmsafemap 9 49K - 2136243 256,8192 > > > newblk 28 24582K - 4111401 256 > > > indirdep 4 1K - 865983 128,16384 > > > freefrag 6 1K - 682027 128 > > > freeblks 3 1K - 1318985 256 > > > freefile 3 1K - 1272832 64 > > > diradd 2 1K - 1306741 128 > > > mkdir 0 0K - 5120 128 > > > dirrem 0 0K - 1305358 128 > > > newdirblk 0 0K - 2610 64 > > > freework 9 1K - 1566489 64,128 > > > sbdep 0 0K - 45661 64 > > > savedino 0 0K - 186280 256 > > > softdep 6 3K - 6 512 > > > ufs_dirhash 1533 767K - 109513 16,32,64,128,256,512,102= 4,2048,4096 > > > ufs_quota 1 2048K - 1 > > > ufs_mount 18 55K - 18 512,2048,4096,8192 > > > vm_pgdata 1 2048K - 2 128 > > > UMAHash 23 5385K - 108 512,1024,2048,4096,8192,= 16384,32768,65536 > > > memdesc 1 4K - 1 4096 > > > atkbddev 2 1K - 2 64 > > > entropy 1 1K - 894489 32,4096 > > > apmdev 1 1K - 1 128 > > > madt_table 0 0K - 1 4096 > > > SCSI ENC 25 100K - 120744 16,64,256,2048,32768 > > > io_apic 1 2K - 1 2048 > > > MCA 18 3K - 18 64,128 > > > msi 16 2K - 16 128 > > > nexusdev 5 1K - 5 16 > > > hdaa 5 54K - 5 2048,16384,32768 > > > hdac 1 1K - 1 1024 > > > hdacc 1 1K - 1 32 > > > linux 29 2K - 29 64 > > > solaris 295750 228696K - 140520323 16,32,64,128,256,512,= 1024,2048,4096,8192,32768 > > > kstat_data 6 1K - 6 64 > > > eli data 22 4K - 6218901 64,256,512,1024,2048,409= 6,8192,16384,32768,65536 > > > ksem 1 8K - 1 8192 > > > nullfs_hash 1 2048K - 1 > > > nullfs_node 9 1K - 41 64 > > > nullfs_mount 9 1K - 9 32 > > > fdesc_mount 1 1K - 1 16 > > > gem_name 46 14K - 122 32,4096 > > > drm_global 2 1K - 2 128,256 > > > drm_dma 1 1K - 1 32 > > > drm_sarea 1 1K - 1 16 > > > drm_driver 91 2278K - 125 16,32,64,128,256,512,102= 4,2048,4096,8192,32768 > > > drm_minor 2 1K - 2 128 > > > drm_files 1 1K - 3 512 > > > drm_ctxbitmap 1 4K - 1 4096 > > > drm_sman 41 6K - 42 128 > > > drm_hashtab 3 4129K - 4 128,32768 > > > drm_kms 69 19K - 163 16,32,64,128,256,512,204= 8,4096,8192 > > > drm_vblank 7 1K - 7 32,256 > > > ttm_pd 16 17K - 18 16,128,2048,65536 > > > ttm_rman 2 1K - 2 256 > > > ttm_zone 2 1K - 2 64 > > > ttm_poolmgr 1 1K - 1 512 > > > > > > > > > Now what? > > > > > > The xterm I have running locally with a stuck top is showing the top = 3 chrome > > > processes in pfault state and it has "Swap: 11M In" in the header, so= clearly > > > 11.x is prone to deadlock during page faults and or swapping. It has = last > > > updated on 17:14:13 (compare to the other top at 17:14:10 not showing= pfault > > > state yet). > > > > Addendum: I still see the HDD indication light flickering every couple > > of seconds, so something is still doing I/O. My SSH sessions into the > > machine haven't time out, and the screensaver (just DPMS blank) is also > > still kicking in correctly. > > > > Just out of general interest, how is your swap set up? Is it on a zvol > or something exotic like that, or is it just a normal partition? Regular swap on a direct partition, no ZVOL involved. In fact, for *that* deadlock above, I ran "swapoff -a" before I left it overnight, to make sure that it's not the swap that is the problem. And a 2nd addendum: as it was still writing/reading from disk _somehow_, I pushed the power button to see if it would do the ACPI shutdown thing. It did! It took a while, but eventually the X server was killed and I saw it syncing the disk and then it did an orderly shutdown. Very strange, and I've been having this for a year or so. Always with a recent build, obviously.