FreeBSD Mail Archives

Date:      Mon, 08 Jul 2019 07:25:30 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 234576] hastd exits ungracefully
Message-ID:  <bug-234576-227-R7MmKDU88F@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-234576-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-234576-227@https.bugs.freebsd.org/bugzilla/>

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D234576

Michel Le Cocq <nomad@neuronfarm.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nomad@neuronfarm.net

--- Comment #5 from Michel Le Cocq <nomad@neuronfarm.net> ---
Hi, I see exactly the same hastd issue on 12.0-RELEASE-p5 and also on
12.0-RELEASE-p7, I tried with hast directly on top of the drives (no
partitions) and also on a zfs gpt part.

I use hast to sync SSD ZIL drive of a ZFS pool.

+---------------------------------+
|            disk bay 1           |
+---------------------------------+
      |                      |
+----------+           +----------+
| server A |           | server B |
| ssds ZIL |-sync hast-| ssds ZIL |
|          |           |          |
+----------+           +----------+
      |                      |=20=20=20=20=20
+---------------------------------+
|            disk bay 2           |
+---------------------------------+

So I have 2 Pool raidz3 on 'disk bay 1' and 'disk bay 2'.
Each have it's own Zil cache.=20

Server A and B have 4 ssd.

Here is what can see server A when it manage baie1.

[root@server A ~/]# zpool status

        NAME                 STATE     READ WRITE CKSUM
        baie1                ONLINE       0     0     0
          raidz3-0           ONLINE       0     0     0
            multipath/sas0   ONLINE       0     0     0
            [...]=20
            multipath/sas11  ONLINE       0     0     0
        logs
          mirror-1           ONLINE       0     0     0
            hast/zil-baie1-0 ONLINE       0     0     0
            hast/zil-baie1-1 ONLINE       0     0     0

[root@server A ~/]# hastctl status
Name    Status   Role           Components
zil-baie1-0      complete primary        /dev/mfisyspd5  serverb.direct
zil-baie1-1      complete primary        /dev/mfisyspd6  serverb.direct
zil-baie2-0      complete secondary      /dev/mfisyspd8  serverb.direct
zil-baie2-1      complete secondary      /dev/mfisyspd9  serverb.direct

Paul Thornton said :

1) All of the hastd worker threads die virtually simultaneously.

In fact not exactly. I loose only the hast that manage the pool that do the
'writing'.
If the second pool have no write, this threads 'second pool one' are still
alive and keep my 'second' zil alive.

2) This doesn't appear happen immediately you start writing data, but a very
short while afterwards (order of a few seconds).

Yes if you have a look at drive activity with gstat you can see that some
writing on ZIL occure then hast crash and ZIL drives disappear.

I my case it only happen when my ZIL is used.

I didn't tried the Patch because I don't wanted to have Kernel Panic, and I
can't use 11 RELEASE because I use LACP over a Broadcom 10Gb SFP+ which is =
not
available on 11 RELEASE.

--=20
You are receiving this mail because:
You are the assignee for the bug.=

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-234576-227-R7MmKDU88F>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation