From owner-freebsd-stable@freebsd.org  Mon Jun 11 12:48:12 2018
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AD02B101F9B9
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Mon, 11 Jun 2018 12:48:12 +0000 (UTC) (envelope-from wjw@digiware.nl)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 428918340C
 for <freebsd-stable@freebsd.org>; Mon, 11 Jun 2018 12:48:12 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 033B5101F9AB; Mon, 11 Jun 2018 12:48:12 +0000 (UTC)
Delivered-To: stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB110101F9A7
 for <stable@mailman.ysv.freebsd.org>; Mon, 11 Jun 2018 12:48:11 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from smtp.digiware.nl (smtp.digiware.nl [176.74.240.9])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5337883409
 for <stable@FreeBSD.org>; Mon, 11 Jun 2018 12:48:11 +0000 (UTC)
 (envelope-from wjw@digiware.nl)
Received: from router.digiware.nl (localhost.digiware.nl [127.0.0.1])
 by smtp.digiware.nl (Postfix) with ESMTP id 907FF3D9E8;
 Mon, 11 Jun 2018 14:48:09 +0200 (CEST)
X-Virus-Scanned: amavisd-new at digiware.com
Received: from smtp.digiware.nl ([127.0.0.1])
 by router.digiware.nl (router.digiware.nl [127.0.0.1]) (amavisd-new,
 port 10024)
 with ESMTP id OzMnA_wOuNYT; Mon, 11 Jun 2018 14:48:09 +0200 (CEST)
Received: from [192.168.101.70] (vpn.ecoracks.nl [176.74.240.1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by smtp.digiware.nl (Postfix) with ESMTPSA id 23FF43D9D6;
 Mon, 11 Jun 2018 14:48:09 +0200 (CEST)
Subject: Re: Continuous crashing ZFS server
To: Stefan Wendler <stefan.wendler@tngtech.com>
Cc: "stable@freebsd.org" <stable@FreeBSD.org>
References: <f9ecab27-5201-4b60-ea75-e68dd5ffb44c@digiware.nl>
 <17446f39-97a1-8603-11a0-32176e8cb833@FreeBSD.org>
 <d75b7d81-67c8-d473-7652-c212700ef0d1@digiware.nl>
 <100ea6d0-5cf4-1a00-0e3a-dfad6175df6c@FreeBSD.org>
 <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl>
 <25b13f67-76fd-621d-22b8-f1efdcc4ae0a@tngtech.com>
From: Willem Jan Withagen <wjw@digiware.nl>
Openpgp: preference=signencrypt
Message-ID: <34c4a21b-9555-3b34-14a3-94cdacc22179@digiware.nl>
Date: Mon, 11 Jun 2018 14:48:11 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <25b13f67-76fd-621d-22b8-f1efdcc4ae0a@tngtech.com>
Content-Type: text/plain; charset=utf-8
Content-Language: nl
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 Jun 2018 12:48:12 -0000

On 11-6-2018 14:35, Stefan Wendler wrote:
> Do you use L2ARC/ZIL disks? I had a similar problem that turned out to
> be a broken caching SSD. Scrubbing didn't help a bit because it reported
> that data was okay. And SMART was fine as well. Fortunately I could
> still send/recv snapshots to a backup disk but wasn't able to replace
> the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL
> data to disk and also wouldn't release the SSDs from the pool. Did you
> also check the logs for entries that look like broken RAM?

That was one of the things I looked for, bad things in log files.
But the server does not deem to have any hardware problems.

I'll dive a bit deeper into my ZIL SSDs

Thanx,
--WjW

> Cheers,
> Stefan
> 
> On 06/11/2018 01:29 PM, Willem Jan Withagen wrote:
>> On 11-6-2018 12:53, Andriy Gapon wrote:
>>> On 11/06/2018 13:26, Willem Jan Withagen wrote:
>>>> On 11/06/2018 12:13, Andriy Gapon wrote:
>>>>> On 08/06/2018 13:02, Willem Jan Withagen wrote:
>>>>>> My file server is crashing about every 15 minutes at the moment.
>>>>>> The panic looks like:
>>>>>>
>>>>>> Jun  8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: allocating
>>>>>> allocated segment(offset=12922221670400 size=24576)
>>>>>> Jun  8 11:48:43 zfs kernel:
>>>>>> Jun  8 11:48:43 zfs kernel: cpuid = 1
>>>>>> Jun  8 11:48:43 zfs kernel: KDB: stack backtrace:
>>>>>> Jun  8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_backtrace+0x67
>>>>>> Jun  8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+0x186
>>>>>> Jun  8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0x43
>>>>>> Jun  8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_err+0xc2
>>>>>> Jun  8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_panic_recover+0x5a
>>>>>> Jun  8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_tree_add+0x20f
>>>>>> Jun  8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metaslab_free_dva+0x276
>>>>>> Jun  8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metaslab_free+0x91
>>>>>> Jun  8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva_free+0x1a
>>>>>> Jun  8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_execute+0xac
>>>>>> Jun  8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at
>>>>>> taskqueue_run_locked+0x127
>>>>>> Jun  8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at
>>>>>> taskqueue_thread_loop+0xc8
>>>>>> Jun  8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_exit+0x85
>>>>>> Jun  8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_trampoline+0xe
>>>>>> Jun  8 11:48:43 zfs kernel: Uptime: 9m7s
>>>>>>
>>>>>> Maybe a known bug?
>>>>>> Is there anything I can do about this?
>>>>>> Any debugging needed?
>>>>>
>>>>> Sorry to inform you but your on-disk data got corrupted.
>>>>> The most straightforward thing you can do is try to save data from the pool in
>>>>> readonly mode.
>>>>
>>>> Hi Andriy,
>>>>
>>>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it was of a test
>>>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs.
>>>>
>>>> Removing the ZVOL actually fixed the rebooting, but now the question is:
>>>>     Is the remainder of the zpools on the same disks in danger?
>>>
>>> You can try to check with zdb -b on an idle (better exported) pool.  And zpool
>>> scrub.
>>
>> If scrub says things are oke, I can start breathing again?
>> exporting the pool is something for the small hours.
>>
>> Thanx,
>> --WjW
>>
>>
>> _______________________________________________
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>>
>