Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Sep 2019 23:20:48 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Reshad Patuck <reshadpatuck1@gmail.com>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: [zfs] filesystem reads hanging
Message-ID:  <CANCZdfrUctOKCzee7ZS7eL%2B7_SspG77dt_L4phSqmDuXnq4RhA@mail.gmail.com>
In-Reply-To: <CADaJeD24HV0eW7nQT9jaQwEWp=1f4J2WL3OOLZiv--v1zyepwQ@mail.gmail.com>
References:  <CADaJeD24HV0eW7nQT9jaQwEWp=1f4J2WL3OOLZiv--v1zyepwQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 30, 2019, 10:56 PM Reshad Patuck <reshadpatuck1@gmail.com>
wrote:

> Hi,
>
> I have a FreeBSD 12.0-RELEASE-p9 system running ZFS.
> The system runs an application that uses postgres, and python (among other
> services).
>
> I have noticed that python suddenly is not able to connect to postgres.
> When I try to investigate further, certain files on disk can not be read.
> The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or kill
> -9 them), ps -aux shows them in a D+ state.
> On killing the SSH session these processes continue running in orphans, I
> am not able to kill them.
>
> Someone on IRC suggested running a zfs scrub to check for data corruption,
> but running `zpool scrub zroot` has the same effect.
> The command does not return, ctrl-c does not kill it and `zpool scrub -s
> zroot` says "cannot cancel scrubbing zroot: there is no active scrub".
>
> This has happened in the past 1 month to two of my production servers and
> since the application was critical they were rebooted and the boxes
> function as normal after the reboot.
> Files that were not cat-able on the production servers were working fine
> and a zfs scrub worked fine to show 0 errors and 0 fixes.
> One of these boxes needed a hard reboot as it got stuck in the shutting
> down stage of a soft reboot.
>
> I am not sure where to start debugging this or if there are any ways to get
> metrics on a box stuck in this state.
> Please let me know if you would like me to fetch any metrics or run and
> commands, etc. for you.
> Any help would be much appreciated.
>

Step 1 should be to make sure there are no disk errors... the successful
scrub suggests not, but it doesn't hurt to rule out hardware...

Warner

Best regards,
>
> Reshad
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrUctOKCzee7ZS7eL%2B7_SspG77dt_L4phSqmDuXnq4RhA>