Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Apr 2018 13:27:28 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 227740] concurrent zfs management operations may lead to a race/subsystem locking
Message-ID:  <bug-227740-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D227740

            Bug ID: 227740
           Summary: concurrent zfs management operations may lead to a
                    race/subsystem locking
           Product: Base System
           Version: 11.1-STABLE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: emz@norma.perm.ru

concurrent zfs commands operations may lead to a race/subsystem locking.

for instance this is the current state wich is not changing for at least 30
minutes (system got into it after issuing concurrent zfs commands):

=3D=3D=3DCut=3D=3D=3D
[root@san1:~]# ps ax | grep zfs
    9  -  DL      7:41,34 [zfskern]
57922  -  Is      0:00,01 sshd: zfsreplica [priv] (sshd)
57924  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
57925  -  Is      0:00,00 csh -c zfs list -t snapshot
57927  -  D       0:00,00 zfs list -t snapshot
58694  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
58695  -  D       0:00,00 /sbin/zfs list -t all
59512  -  Is      0:00,02 sshd: zfsreplica [priv] (sshd)
59516  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
59517  -  Is      0:00,00 csh -c zfs list -t snapshot
59520  -  D       0:00,00 zfs list -t snapshot
59552  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59553  -  D       0:00,00 /sbin/zfs list -t all
59554  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59555  -  D       0:00,00 /sbin/zfs list -t all
59556  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59557  -  D       0:00,00 /sbin/zfs list -t all
59558  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59559  -  D       0:00,00 /sbin/zfs list -t all
59560  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59561  -  D       0:00,00 /sbin/zfs list -t all
59564  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59565  -  D       0:00,00 /sbin/zfs list -t all
59570  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59571  -  D       0:00,00 /sbin/zfs list -t all
59572  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59573  -  D       0:00,00 /sbin/zfs list -t all
59574  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
59575  -  D       0:00,00 /sbin/zfs list -t all
59878  -  Is      0:00,02 sshd: zfsreplica [priv] (sshd)
59880  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
59881  -  Is      0:00,00 csh -c zfs list -t snapshot
59883  -  D       0:00,00 zfs list -t snapshot
60800  -  Is      0:00,01 sshd: zfsreplica [priv] (sshd)
60806  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
60807  -  Is      0:00,00 csh -c zfs list -t snapshot
60809  -  D       0:00,00 zfs list -t snapshot
60917  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
60918  -  D       0:00,00 /sbin/zfs list -t all
60950  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
60951  -  D       0:00,00 /sbin/zfs list -t all
60966  -  Is      0:00,02 sshd: zfsreplica [priv] (sshd)
60968  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
60969  -  Is      0:00,00 csh -c zfs list -t snapshot
60971  -  D       0:00,00 zfs list -t snapshot
61432  -  Is      0:00,03 sshd: zfsreplica [priv] (sshd)
61434  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
61435  -  Is      0:00,00 csh -c zfs list -t snapshot
61437  -  D       0:00,00 zfs list -t snapshot
61502  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61503  -  D       0:00,00 /sbin/zfs list -t all
61504  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61505  -  D       0:00,00 /sbin/zfs list -t all
61506  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61507  -  D       0:00,00 /sbin/zfs list -t all
61508  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61509  -  D       0:00,00 /sbin/zfs list -t all
61510  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61511  -  D       0:00,00 /sbin/zfs list -t all
61512  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61513  -  D       0:00,00 /sbin/zfs list -t all
61569  -  I       0:00,01 /usr/local/bin/sudo /sbin/zfs list -t all
61570  -  D       0:00,00 /sbin/zfs list -t all
61851  -  Is      0:00,02 sshd: zfsreplica [priv] (sshd)
61853  -  I       0:00,00 sshd: zfsreplica@notty (sshd)
61854  -  Is      0:00,00 csh -c zfs list -t snapshot
61856  -  D       0:00,00 zfs list -t snapshot
57332  7  D+      0:00,04 zfs rename data/esx/boot-esx03
data/esx/boot-esx03_orig
58945  8  D+      0:00,00 zfs list
62119  3  S+      0:00,00 grep zfs
[root@san1:~]# ps ax | grep ctladm
62146  3  S+      0:00,00 grep ctladm
[root@san1:~]#
=3D=3D=3DCut=3D=3D=3D

This seems to be the operation that locks the system:

zfs rename data/esx/boot-esx03 data/esx/boot-esx03_orig

the dataset info:

=3D=3D=3DCut=3D=3D=3D
# zfs get all data/esx/boot-esx03
NAME                 PROPERTY              VALUE                       SOUR=
CE
data/esx/boot-esx03  type                  volume                      -
data/esx/boot-esx03  creation              =D1=81=D1=80 =D0=B0=D0=B2=D0=B3.=
  2 15:48 2017  -
data/esx/boot-esx03  used                  8,25G                       -
data/esx/boot-esx03  available             9,53T                       -
data/esx/boot-esx03  referenced            555M                        -
data/esx/boot-esx03  compressratio         1.06x                       -
data/esx/boot-esx03  reservation           none                        defa=
ult
data/esx/boot-esx03  volsize               8G                          local
data/esx/boot-esx03  volblocksize          8K                          defa=
ult
data/esx/boot-esx03  checksum              on                          defa=
ult
data/esx/boot-esx03  compression           lz4=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
inherited from data
data/esx/boot-esx03  readonly              off                         defa=
ult
data/esx/boot-esx03  copies                1                           defa=
ult
data/esx/boot-esx03  refreservation        8,25G                       local
data/esx/boot-esx03  primarycache          all                         defa=
ult
data/esx/boot-esx03  secondarycache        all                         defa=
ult
data/esx/boot-esx03  usedbysnapshots       0                           -
data/esx/boot-esx03  usedbydataset         555M                        -
data/esx/boot-esx03  usedbychildren        0                           -
data/esx/boot-esx03  usedbyrefreservation  7,71G                       -
data/esx/boot-esx03  logbias               latency                     defa=
ult
data/esx/boot-esx03  dedup                 off=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
inherited from data/esx
data/esx/boot-esx03  mlslabel                                          -
data/esx/boot-esx03  sync                  standard                    defa=
ult
data/esx/boot-esx03  refcompressratio      1.06x                       -
data/esx/boot-esx03  written               555M                        -
data/esx/boot-esx03  logicalused           586M                        -
data/esx/boot-esx03  logicalreferenced     586M                        -
data/esx/boot-esx03  volmode               dev=20=20=20=20=20=20=20=20=20=
=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20
inherited from data
data/esx/boot-esx03  snapshot_limit        none                        defa=
ult
data/esx/boot-esx03  snapshot_count        none                        defa=
ult
data/esx/boot-esx03  redundant_metadata    all                         defa=
ult
=3D=3D=3DCut=3D=3D=3D

Since the dataset is only 8G big, it's unlikely that it should take that am=
ount
of time to be rename, considering disks are idle.

Got this two times in a row, and as a result all the zfs/zpool commands sto=
pped
working.

I have manually brought the system into panicking to get the crashdumps.
Crashdumps are located here:

http://san1.linx.playkey.net/r332096M/

along with a brief description and full kernel/module binaries.
Please note that the vmcore.0 is from another panic, this lockup crashdumps=
 are
1 (unfortunately, no txt files saved) and 2.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-227740-227>