Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 02 Feb 2019 06:37:17 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 235419] zpool scrub progress does not change for hours, heavy disk activity still present
Message-ID:  <bug-235419-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D235419

            Bug ID: 235419
           Summary: zpool scrub progress does not change for hours, heavy
                    disk activity still present
           Product: Base System
           Version: 11.2-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: bobf@mrp3.com

Frequently, on one of my computers running 11-STABLE, a 'zpool scrub' will
continue for hours while progress does not increase.  The scrub is still
'active' and there is a LOT of disk activity, causing stuttering of applica=
tion
response as you would expect.  This does not always happen, but happens more
often than not.  The previous scrub completed without any such 'hangs' 2 we=
eks
ago, with no changes to the configuration since.

This system uses a 'zfs everywhere' configuration, i.e. all partitions are =
zfs.

A second computer that has UFS+J partitions for userland and kernel does not
appear to exhibit this particular problem.

uname output:

FreeBSD hack.SFT.local 11.2-STABLE FreeBSD 11.2-STABLE #1 r339273: Tue Oct =
 9
21:10:39 PDT 2018     root@hack.SFT.local:/usr/obj/usr/src/sys/GENERIC  amd=
64

This system had been running for 80+ days.

At first, I discovered that the scrub had 'hung' at around 74% complete.  A=
fter
pausing the scrub for a while, and also terminating firefox and thunderbird,
the scrub re-started and continued. I re-started firefox and thunderbird, a=
nd
allowed everything to continue.  The scrub then 'hung' again at about 84%, =
and
terminating applications (including Xorg) did not seem to help.

With the scrub paused I performed a reboot, and the scrub restarted on boot
[causing the boot process to be excrutiatingly slow].  I have restarted mos=
t of
the applications that were running before, while the scrub was continuing to
run .  Now the zpool status shows that the scrub has completed with no erro=
rs.

here are some additional pieces of information that might help:

> mount
zroot/ROOT/default on / (zfs, NFS exported, local, noatime, nfsv4acls)
devfs on /dev (devfs, local, multilabel)
zroot/d-drive on /d-drive (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/e-drive on /e-drive (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/tmp on /tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot/usr/home on /usr/home (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/usr/ports on /usr/ports (zfs, NFS exported, local, noatime, nosuid,
nfsv4acls)
zroot/usr/src on /usr/src (zfs, NFS exported, local, noatime, nfsv4acls)
zroot/var/audit on /var/audit (zfs, local, noatime, noexec, nosuid, nfsv4ac=
ls)
zroot/var/crash on /var/crash (zfs, local, noatime, noexec, nosuid, nfsv4ac=
ls)
zroot/var/log on /var/log (zfs, local, noatime, noexec, nosuid, nfsv4acls)
zroot/var/mail on /var/mail (zfs, local, nfsv4acls)
zroot/var/tmp on /var/tmp (zfs, local, noatime, nosuid, nfsv4acls)
zroot on /zroot (zfs, local, noatime, nfsv4acls)

> kldstat
Id Refs Address            Size     Name
 1   44 0xffffffff80200000 206b5d0  kernel
 2    1 0xffffffff8226d000 393200   zfs.ko
 3    2 0xffffffff82601000 a380     opensolaris.ko
 4    1 0xffffffff82821000 4090     cuse.ko
 5    1 0xffffffff82826000 6e40     uftdi.ko
 6    1 0xffffffff8282d000 3c58     ucom.ko
 7    3 0xffffffff82831000 50c70    vboxdrv.ko
 8    2 0xffffffff82882000 2ad0     vboxnetflt.ko
 9    2 0xffffffff82885000 9a20     netgraph.ko
10    1 0xffffffff8288f000 14b8     ng_ether.ko
11    1 0xffffffff82891000 3f70     vboxnetadp.ko
12    2 0xffffffff82895000 37528    linux.ko
13    2 0xffffffff828cd000 2d28     linux_common.ko
14    1 0xffffffff828d0000 31e80    linux64.ko
15    1 0xffffffff82902000 c60      coretemp.ko
16    1 0xffffffff82903000 965128   nvidia.ko


there were no messages regarding zpool scrub that I could find.

port versions for things with kernel modules:

nvidia-driver-340-340.106
virtualbox-ose-5.1.18
virtualbox-ose-kmod-5.1.22
linux-c7-7.3.1611_1


This problem has happened since mid last year, around the time when the -ST=
ABLE
source went to 11.2 and I updated kernel+world on this computer.  The zpool=
 has
also been upgraded.  It is worth noting that this computer ran 11.0 for a l=
ong
time without incident.  The problem may have been present in 11.1 .


Related:  there is an apparent (random crash) bug in the NVidia module that=
 I
have been trying to track down.  It causes occasional page fault crashes.=20
Sometimes I will see swap space in use when there does not seem to be any
reason for it, and I believe this NVidia bug is a part of that (the crash
happening from randomly accessing 'after free' or random memory addresses, =
and
swap space is allocated as a consequence?).  Whether this NVidia driver bug=
 is
responsible for the zfs problem, I do not know, but this driver is only on =
this
particular computer, and so it's worth mentioning, as only this computer se=
ems
to exhibit the problem.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-235419-227>