Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Nov 2012 07:27:30 +1100
From:      Peter Jeremy <peter@rulingia.com>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: zfs diff deadlock
Message-ID:  <20121113202730.GA42238@server.rulingia.com>
In-Reply-To: <509F5E0A.1020501@FreeBSD.org>
References:  <20121110223249.GB506@server.rulingia.com> <20121111072739.GA4814@server.rulingia.com> <509F5E0A.1020501@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--y0ulUmNC+osPPQO6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2012-Nov-11 10:12:58 +0200, Andriy Gapon <avg@FreeBSD.org> wrote:
>on 11/11/2012 09:27 Peter Jeremy said the following:
>> On 2012-Nov-11 09:32:49 +1100, Peter Jeremy <peter@server.rulingia.com>
>> wrote:
>>> I recently decided to do a "zfs diff" between two snapshots to try and
>>> identify why there was so much "USED" space in the snapshot. The diff r=
an
>>> for a while (though with very little IO) but has now wedged unkillably.
>>> There's nothing on the console or in any logs, the pool reports no
>>> problems and there are no other visible FS issues.  Any ideas on tracki=
ng
>>> this down?
>> ...
>>> The systems is running a 4-month old 8-stable (r237444)
>>=20
>> I've tried a second system running the same world with the same result, =
so=20
>> this looks like a real bug in ZFS rather than a system glitch.
>>=20
>
>Are you able to catch the state of all threads in the system?
>E.g. via procstat -k -a.
>Or a crash dump.

Unfortunately, neither of those systems are really suitable for
debugging.  I have setup a VBox and sent most of the offending FS to
it.  That gives somewhat different results: On a recent 8-stable
(r242865M), I get a panic whilst on a recent head, I get a "Unable to
determine path or stats" error.

On 8-stable, I have a crashdump and the panic is:
suspending ithread with the following locks held:
shared spin mutex  ({6") r =3D 0 (0xffffff005c395a80) locked @ /usr/src/sys=
/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:522
panic: witness_warn
cpuid =3D 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x1ce
witness_warn() at witness_warn+0x2b2
ithread_loop() at ithread_loop+0x112
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffff800008ccf0, rbp =3D 0 ---

Note that zap.c:522 is the rw_enter() in zap_get_leaf_byblk() - which
is the offending function in the backtrace on r237444.

On head, I get some normal differences terminated by:
Unable to determine path or stats for object 2128453 in tank/beckett/home@2=
0120518: Invalid argument

A scrub reports no issues but the problem remains:
root@FB10-64:~ # zpool status=20
  pool: tank
 state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on software that does not support=
 feature
        flags.
  scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          ada2      ONLINE       0     0     0

errors: No known data errors

I've done some searching and found 2 hits on the message - one in an
OI IRC log and the other in a ZFS-on-Linux list.  Neither offered any
insights.

I've tried ktracing the zfs diff and that ends:
  1856 zfs      CALL  read(0x7,0x7fffffbfc160,0x18)
  1856 zfs      GIO   fd 7 read 24 bytes
       0x0000 0400 0000 0000 0000 e079 2000 0000 0000 397a 2000 0000 0000  =
                                                                           =
                    |.........y .....9z .....|

  1856 zfs      RET   read 24/0x18
  1856 zfs      CALL  ioctl(0x3,0xd5985a36,0x7fffffbfc178)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  read(0x7,0x7fffffbfc160,0x18)
  1856 zfs      GIO   fd 7 read 24 bytes
       0x0000 0200 0000 0000 0000 3a7a 2000 0000 0000 4d7a 2000 0000 0000  =
                                                                           =
                    |........:z .....Mz .....|

  1856 zfs      RET   read 24/0x18
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 2 No such file or directory
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl 0
  1856 zfs      CALL  ioctl(0x3,0xd5985a39,0x7fffffbfab18)
  1856 zfs      RET   ioctl -1 errno 22 Invalid argument
  1856 zfs      CALL  close(0x1)
  1856 zfs      RET   close 0
  1856 zfs      CALL  close(0x7)
  1856 zfs      RET   ioctl -1 errno 32 Broken pipe
  1856 zfs      CALL  close(0x8)
  1856 zfs      RET   close 0
  1856 zfs      CALL  thr_kill(0x18adf,SIG 32)
  1856 zfs      RET   thr_kill 0
  1856 zfs      CALL  _umtx_op(0x802c06c00,0x2,0x18adf,0,0)
  1856 zfs      RET   close 0
  1856 zfs      PSIG  SIG 32 caught handler=3D0x8020537f0 mask=3D0x0 code=
=3DSI_LWP
  1856 zfs      CALL  sigreturn(0x7fffffbfbca0)
  1856 zfs      RET   sigreturn JUSTRETURN
  1856 zfs      CALL  thr_exit(0x802c06c00)
  1856 zfs      RET   _umtx_op 0
  1856 zfs      CALL  close(0x6)
  1856 zfs      RET   close 0
  1856 zfs      CALL  stat(0x7fffffffa900,0x7fffffffa888)
  1856 zfs      NAMI  "/usr/share/nls/C/libc.cat"
  1856 zfs      RET   stat -1 errno 2 No such file or directory
  1856 zfs      CALL  stat(0x7fffffffa900,0x7fffffffa888)
  1856 zfs      NAMI  "/usr/share/nls/libc/C"
  1856 zfs      RET   stat -1 errno 2 No such file or directory
  1856 zfs      CALL  stat(0x7fffffffa900,0x7fffffffa888)
  1856 zfs      NAMI  "/usr/local/share/nls/C/libc.cat"
  1856 zfs      RET   stat -1 errno 2 No such file or directory
  1856 zfs      CALL  stat(0x7fffffffa900,0x7fffffffa888)
  1856 zfs      NAMI  "/usr/local/share/nls/libc/C"
  1856 zfs      RET   stat -1 errno 2 No such file or directory
  1856 zfs      CALL  write(0x2,0x7fffffffa740,0x65)
  1856 zfs      GIO   fd 2 wrote 101 bytes
       "Unable to determine path or stats for object 2128453 in tank/becket=
t/home@20120518: Invalid argument
       "
  1856 zfs      RET   write 101/0x65
  1856 zfs      CALL  close(0x5)
  1856 zfs      RET   close 0
  1856 zfs      CALL  close(0x3)
  1856 zfs      RET   close 0
  1856 zfs      CALL  close(0x4)
  1856 zfs      RET   close 0
  1856 zfs      CALL  exit(0x1)



--=20
Peter Jeremy

--y0ulUmNC+osPPQO6
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlCirTIACgkQ/opHv/APuIccAACdGfzyqFTjb5UUcu7pqRgz3DiH
pB4An0fjLFS7wQwDVAJCxEiALo4kcJZB
=E0LV
-----END PGP SIGNATURE-----

--y0ulUmNC+osPPQO6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121113202730.GA42238>