Date: Tue, 1 Oct 2019 11:04:59 +0530 From: Reshad Patuck <reshadpatuck1@gmail.com> To: Warner Losh <imp@bsdimp.com> Cc: FreeBSD FS <freebsd-fs@freebsd.org> Subject: Re: [zfs] filesystem reads hanging Message-ID: <CADaJeD2qnEbFiLE5Vj01=WGwTR90MCRKfceJs3GJ6K552tGQCw@mail.gmail.com> In-Reply-To: <CANCZdfrUctOKCzee7ZS7eL%2B7_SspG77dt_L4phSqmDuXnq4RhA@mail.gmail.com> References: <CADaJeD24HV0eW7nQT9jaQwEWp=1f4J2WL3OOLZiv--v1zyepwQ@mail.gmail.com> <CANCZdfrUctOKCzee7ZS7eL%2B7_SspG77dt_L4phSqmDuXnq4RhA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Warner, I will do a scrub the moment I reboot the box. As mentioned, running the zpool scrub command itself hangs (the command does not return, and I can not kill it), further I am not able to see the scrub running in the zpool status for zroot. Is there any other way I can check the disk/hardware? (the pool is running on a single SSD) I have no logs that look like disk errors to me in /var/log/all.log and /var/log/messages. Thanks, Reshad On Tue, Oct 1, 2019 at 10:51 AM Warner Losh <imp@bsdimp.com> wrote: > > > On Mon, Sep 30, 2019, 10:56 PM Reshad Patuck <reshadpatuck1@gmail.com> > wrote: > >> Hi, >> >> I have a FreeBSD 12.0-RELEASE-p9 system running ZFS. >> The system runs an application that uses postgres, and python (among other >> services). >> >> I have noticed that python suddenly is not able to connect to postgres. >> When I try to investigate further, certain files on disk can not be read. >> The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or >> kill >> -9 them), ps -aux shows them in a D+ state. >> On killing the SSH session these processes continue running in orphans, I >> am not able to kill them. >> >> Someone on IRC suggested running a zfs scrub to check for data corruption, >> but running `zpool scrub zroot` has the same effect. >> The command does not return, ctrl-c does not kill it and `zpool scrub -s >> zroot` says "cannot cancel scrubbing zroot: there is no active scrub". >> >> This has happened in the past 1 month to two of my production servers and >> since the application was critical they were rebooted and the boxes >> function as normal after the reboot. >> Files that were not cat-able on the production servers were working fine >> and a zfs scrub worked fine to show 0 errors and 0 fixes. >> One of these boxes needed a hard reboot as it got stuck in the shutting >> down stage of a soft reboot. >> >> I am not sure where to start debugging this or if there are any ways to >> get >> metrics on a box stuck in this state. >> Please let me know if you would like me to fetch any metrics or run and >> commands, etc. for you. >> Any help would be much appreciated. >> > > Step 1 should be to make sure there are no disk errors... the successful > scrub suggests not, but it doesn't hurt to rule out hardware... > > Warner > > Best regards, >> >> Reshad >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADaJeD2qnEbFiLE5Vj01=WGwTR90MCRKfceJs3GJ6K552tGQCw>