Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 1 Oct 2019 11:04:59 +0530
From:      Reshad Patuck <reshadpatuck1@gmail.com>
To:        Warner Losh <imp@bsdimp.com>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: [zfs] filesystem reads hanging
Message-ID:  <CADaJeD2qnEbFiLE5Vj01=WGwTR90MCRKfceJs3GJ6K552tGQCw@mail.gmail.com>
In-Reply-To: <CANCZdfrUctOKCzee7ZS7eL%2B7_SspG77dt_L4phSqmDuXnq4RhA@mail.gmail.com>
References:  <CADaJeD24HV0eW7nQT9jaQwEWp=1f4J2WL3OOLZiv--v1zyepwQ@mail.gmail.com> <CANCZdfrUctOKCzee7ZS7eL%2B7_SspG77dt_L4phSqmDuXnq4RhA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Warner,

I will do a scrub the moment I reboot the box.
As mentioned, running the zpool scrub command itself hangs (the command
does not return, and I can not kill it), further I am not able to see the
scrub running in the zpool status for zroot.

Is there any other way I can check the disk/hardware? (the pool is running
on a single SSD)
I have no logs that look like disk errors to me in /var/log/all.log and
/var/log/messages.

Thanks,

Reshad

On Tue, Oct 1, 2019 at 10:51 AM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Mon, Sep 30, 2019, 10:56 PM Reshad Patuck <reshadpatuck1@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a FreeBSD 12.0-RELEASE-p9 system running ZFS.
>> The system runs an application that uses postgres, and python (among other
>> services).
>>
>> I have noticed that python suddenly is not able to connect to postgres.
>> When I try to investigate further, certain files on disk can not be read.
>> The commands `cat` and `ls -l` hang (no output and I can not ctrl-c or
>> kill
>> -9 them), ps -aux shows them in a D+ state.
>> On killing the SSH session these processes continue running in orphans, I
>> am not able to kill them.
>>
>> Someone on IRC suggested running a zfs scrub to check for data corruption,
>> but running `zpool scrub zroot` has the same effect.
>> The command does not return, ctrl-c does not kill it and `zpool scrub -s
>> zroot` says "cannot cancel scrubbing zroot: there is no active scrub".
>>
>> This has happened in the past 1 month to two of my production servers and
>> since the application was critical they were rebooted and the boxes
>> function as normal after the reboot.
>> Files that were not cat-able on the production servers were working fine
>> and a zfs scrub worked fine to show 0 errors and 0 fixes.
>> One of these boxes needed a hard reboot as it got stuck in the shutting
>> down stage of a soft reboot.
>>
>> I am not sure where to start debugging this or if there are any ways to
>> get
>> metrics on a box stuck in this state.
>> Please let me know if you would like me to fetch any metrics or run and
>> commands, etc. for you.
>> Any help would be much appreciated.
>>
>
> Step 1 should be to make sure there are no disk errors... the successful
> scrub suggests not, but it doesn't hurt to rule out hardware...
>
> Warner
>
> Best regards,
>>
>> Reshad
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CADaJeD2qnEbFiLE5Vj01=WGwTR90MCRKfceJs3GJ6K552tGQCw>