From owner-freebsd-fs@freebsd.org Tue Oct 1 11:09:06 2019 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0F5E0133961 for ; Tue, 1 Oct 2019 11:09:06 +0000 (UTC) (envelope-from SRS0=W37K=X2=perdition.city=julien@bebif.be) Received: from orval.bbpf.belspo.be (orval.bbpf.belspo.be [193.191.208.90]) by mx1.freebsd.org (Postfix) with ESMTP id 46jGjs0dQHz3Fmj for ; Tue, 1 Oct 2019 11:09:04 +0000 (UTC) (envelope-from SRS0=W37K=X2=perdition.city=julien@bebif.be) Received: from home.lan (unknown [77.109.104.59]) by orval.bbpf.belspo.be (Postfix) with ESMTPSA id 239101D4FC10; Tue, 1 Oct 2019 13:09:03 +0200 (CEST) Date: Tue, 1 Oct 2019 13:09:01 +0200 From: Julien Cigar To: Reshad Patuck Cc: FreeBSD FS Subject: Re: [zfs] filesystem reads hanging Message-ID: <20191001110901.GL49734@home.lan> References: <20191001082837.GF49734@home.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="hTKW8p8tUZ/8vLMe" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.4 (2019-03-13) X-Rspamd-Queue-Id: 46jGjs0dQHz3Fmj X-Spamd-Bar: ------- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of SRS0=W37K=X2=perdition.city=julien@bebif.be designates 193.191.208.90 as permitted sender) smtp.mailfrom=SRS0=W37K=X2=perdition.city=julien@bebif.be X-Spamd-Result: default: False [-7.10 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[59.104.109.77.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.11]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+mx]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; DMARC_NA(0.00)[perdition.city]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[90.208.191.193.list.dnswl.org : 127.0.10.0]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[julien@perdition.city,SRS0=W37K=X2=perdition.city=julien@bebif.be]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:2611, ipnet:193.191.192.0/19, country:BE]; FROM_NEQ_ENVFROM(0.00)[julien@perdition.city,SRS0=W37K=X2=perdition.city=julien@bebif.be]; IP_SCORE(-3.10)[ip: (-9.24), ipnet: 193.191.192.0/19(-4.62), asn: 2611(-1.64), country: BE(-0.01)]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Oct 2019 11:09:06 -0000 --hTKW8p8tUZ/8vLMe Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 01, 2019 at 03:46:40PM +0530, Reshad Patuck wrote: > Hi Julien, Hi Reshad, >=20 > I did come across that one an hour or so back, can you let me know if the= re > is any way to confirm that it is the same issue I am running up against. > The command `procstat -kka` does have very similar (and in some cases > identical) output to the lines in the PR mentioned. >=20 I'm confident that it's the same issue. > Unfortunately I need to stick to 12.0 till 12.1 is out, any idea if I can > merge the same change into 12.0 and compile it? > I can see the changes in the 12.1 branch, just wondering if I should jump > to the beta or wait it out if I cant compile it into 12.0. >=20 I can speak only for myself, but applying https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D202890&action=3Ddiff fixed the issue for me. > Thanks for your help, >=20 > Reshad >=20 cheers, Julien >=20 > On Tue, Oct 1, 2019 at 1:58 PM Julien Cigar wrote: >=20 > > On Tue, Oct 01, 2019 at 10:26:32AM +0530, Reshad Patuck wrote: > > > Hi, > > > > Hello, > > > > > > > > I have a FreeBSD 12.0-RELEASE-p9 system running ZFS. > > > The system runs an application that uses postgres, and python (among > > other > > > services). > > > > > > I have noticed that python suddenly is not able to connect to postgre= s. > > > When I try to investigate further, certain files on disk can not be r= ead. > > > The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or > > kill > > > -9 them), ps -aux shows them in a D+ state. > > > On killing the SSH session these processes continue running in orphan= s, I > > > am not able to kill them. > > > > > > Someone on IRC suggested running a zfs scrub to check for data > > corruption, > > > but running `zpool scrub zroot` has the same effect. > > > The command does not return, ctrl-c does not kill it and `zpool scrub= -s > > > zroot` says "cannot cancel scrubbing zroot: there is no active scrub". > > > > > > This has happened in the past 1 month to two of my production servers= and > > > since the application was critical they were rebooted and the boxes > > > function as normal after the reboot. > > > Files that were not cat-able on the production servers were working f= ine > > > and a zfs scrub worked fine to show 0 errors and 0 fixes. > > > One of these boxes needed a hard reboot as it got stuck in the shutti= ng > > > down stage of a soft reboot. > > > > > > I am not sure where to start debugging this or if there are any ways = to > > get > > > metrics on a box stuck in this state. > > > Please let me know if you would like me to fetch any metrics or run a= nd > > > commands, etc. for you. > > > Any help would be much appreciated. > > > > This is a known problem (see PR 236220) and has been fixed by r350894 > > (and MFC-ed into 12-STABLE, so I guess it should be in the upcoming > > 12.1-RELEASE) > > > > > > > > Best regards, > > > > > > Reshad > > > _______________________________________________ > > > freebsd-fs@freebsd.org mailing list > > > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > > -- > > Julien Cigar > > Belgian Biodiversity Platform (http://www.biodiversity.be) > > PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 > > No trees were killed in the creation of this message. > > However, many electrons were terribly inconvenienced. > > --=20 Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. --hTKW8p8tUZ/8vLMe Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE7vn2l0to0nV7EWolsrs3EKIEI8AFAl2TM80ACgkQsrs3EKIE I8Caag/+L+vUrfIjZDl1Ik+T7ymdxFWo2HhuZuzPhyRivmxUDTpkBRgQd0tzYBnp rP9gP0eBb4kNAU6jTHm9xzEnafScOPnQxUVYP3/Js14MxF6GcuuODc/vPprqL6B2 qMec3TWUlLu20WWkxKoLkPVcHyk9JSlpkNGaVoO8beRm6INMwU2sgGdIdb5Fnylo PyXyZZteiBgGNYZ/rLCTL1wUVdrqkGYHaKPOi49jThd6alGZrBHUVwTCkrEn4wBM KbgfB06fSlTqyFh83ca7qugYd6837bZscGQUYKLQa3Cd+9GFY6On37PrVZdaUgZG HPRmYt9csARXRXKO9AnN4iRZUmAb2+Mg2ft3bqnea8PuDxvKIH5Q1oJ0JT2boi7L jF8ijzUyTcMb1iaAbXZHasKNk25UZatyy93nrPTDWWCyb7ivHHiG+jEnJpxzTrfE +StICVnwGzOwVBTKI3aSel488Zg6iwW9QudQoacRkm6Pvr0B/glEUBGl1bECC84i cOxhQIq/OKdhj+AIkIVEGMflp7IetYU3ucPG5Rara3717g5b59/7Poi9pZNSo6cs Oto97SyPfrJD2fLoug4cjKqSigGOV4c20ksnuweRM6EXCYFhD7SgG9nD7To2WCb6 1R6gapKOKHLbcuqI/F/sJrd+vlyBrf9m1J0SyjH+4hwC8PKJA9E= =+CrQ -----END PGP SIGNATURE----- --hTKW8p8tUZ/8vLMe--