From owner-freebsd-bugs@freebsd.org Mon Jul 8 07:25:33 2019 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 87FBF15D6AB0 for ; Mon, 8 Jul 2019 07:25:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 2301A901CA for ; Mon, 8 Jul 2019 07:25:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id DB07A15D6AAD; Mon, 8 Jul 2019 07:25:32 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8CE215D6AAC for ; Mon, 8 Jul 2019 07:25:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3F367901C6 for ; Mon, 8 Jul 2019 07:25:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 7F3B07CC4 for ; Mon, 8 Jul 2019 07:25:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x687PVBF009456 for ; Mon, 8 Jul 2019 07:25:31 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x687PVtB009455 for bugs@FreeBSD.org; Mon, 8 Jul 2019 07:25:31 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 234576] hastd exits ungracefully Date: Mon, 08 Jul 2019 07:25:30 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 12.0-RELEASE X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: nomad@neuronfarm.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Jul 2019 07:25:33 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D234576 Michel Le Cocq changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nomad@neuronfarm.net --- Comment #5 from Michel Le Cocq --- Hi, I see exactly the same hastd issue on 12.0-RELEASE-p5 and also on 12.0-RELEASE-p7, I tried with hast directly on top of the drives (no partitions) and also on a zfs gpt part. I use hast to sync SSD ZIL drive of a ZFS pool. +---------------------------------+ | disk bay 1 | +---------------------------------+ | | +----------+ +----------+ | server A | | server B | | ssds ZIL |-sync hast-| ssds ZIL | | | | | +----------+ +----------+ | |=20=20=20=20=20 +---------------------------------+ | disk bay 2 | +---------------------------------+ So I have 2 Pool raidz3 on 'disk bay 1' and 'disk bay 2'. Each have it's own Zil cache.=20 Server A and B have 4 ssd. Here is what can see server A when it manage baie1. [root@server A ~/]# zpool status NAME STATE READ WRITE CKSUM baie1 ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 multipath/sas0 ONLINE 0 0 0 [...]=20 multipath/sas11 ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 hast/zil-baie1-0 ONLINE 0 0 0 hast/zil-baie1-1 ONLINE 0 0 0 [root@server A ~/]# hastctl status Name Status Role Components zil-baie1-0 complete primary /dev/mfisyspd5 serverb.direct zil-baie1-1 complete primary /dev/mfisyspd6 serverb.direct zil-baie2-0 complete secondary /dev/mfisyspd8 serverb.direct zil-baie2-1 complete secondary /dev/mfisyspd9 serverb.direct Paul Thornton said : 1) All of the hastd worker threads die virtually simultaneously. In fact not exactly. I loose only the hast that manage the pool that do the 'writing'. If the second pool have no write, this threads 'second pool one' are still alive and keep my 'second' zil alive. 2) This doesn't appear happen immediately you start writing data, but a very short while afterwards (order of a few seconds). Yes if you have a look at drive activity with gstat you can see that some writing on ZIL occure then hast crash and ZIL drives disappear. I my case it only happen when my ZIL is used. I didn't tried the Patch because I don't wanted to have Kernel Panic, and I can't use 11 RELEASE because I use LACP over a Broadcom 10Gb SFP+ which is = not available on 11 RELEASE. --=20 You are receiving this mail because: You are the assignee for the bug.=