Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jun 2018 14:35:18 +0200
From:      Stefan Wendler <stefan.wendler@tngtech.com>
To:        Willem Jan Withagen <wjw@digiware.nl>
Cc:        "stable@freebsd.org" <stable@FreeBSD.org>
Subject:   Re: Continuous crashing ZFS server
Message-ID:  <25b13f67-76fd-621d-22b8-f1efdcc4ae0a@tngtech.com>
In-Reply-To: <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl>
References:  <f9ecab27-5201-4b60-ea75-e68dd5ffb44c@digiware.nl> <17446f39-97a1-8603-11a0-32176e8cb833@FreeBSD.org> <d75b7d81-67c8-d473-7652-c212700ef0d1@digiware.nl> <100ea6d0-5cf4-1a00-0e3a-dfad6175df6c@FreeBSD.org> <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
Do you use L2ARC/ZIL disks? I had a similar problem that turned out to
be a broken caching SSD. Scrubbing didn't help a bit because it reported
that data was okay. And SMART was fine as well. Fortunately I could
still send/recv snapshots to a backup disk but wasn't able to replace
the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL
data to disk and also wouldn't release the SSDs from the pool. Did you
also check the logs for entries that look like broken RAM?

Cheers,
Stefan

On 06/11/2018 01:29 PM, Willem Jan Withagen wrote:
> On 11-6-2018 12:53, Andriy Gapon wrote:
>> On 11/06/2018 13:26, Willem Jan Withagen wrote:
>>> On 11/06/2018 12:13, Andriy Gapon wrote:
>>>> On 08/06/2018 13:02, Willem Jan Withagen wrote:
>>>>> My file server is crashing about every 15 minutes at the moment.
>>>>> The panic looks like:
>>>>>
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: alloca=
ting
>>>>> allocated segment(offset=3D12922221670400 size=3D24576)
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel:
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: cpuid =3D 1
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: KDB: stack backtrace:
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_backt=
race+0x67
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+0x=
186
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0x4=
3
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_err+=
0xc2
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_panic=
_recover+0x5a
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_tre=
e_add+0x20f
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metaslab_=
free_dva+0x276
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metaslab_=
free+0x91
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva_f=
ree+0x1a
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_execu=
te+0xac
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at
>>>>> taskqueue_run_locked+0x127
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at
>>>>> taskqueue_thread_loop+0xc8
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_exi=
t+0x85
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_tra=
mpoline+0xe
>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: Uptime: 9m7s
>>>>>
>>>>> Maybe a known bug?
>>>>> Is there anything I can do about this?
>>>>> Any debugging needed?
>>>>
>>>> Sorry to inform you but your on-disk data got corrupted.
>>>> The most straightforward thing you can do is try to save data from t=
he pool in
>>>> readonly mode.
>>>
>>> Hi Andriy,
>>>
>>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it was =
of a test
>>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs.
>>>
>>> Removing the ZVOL actually fixed the rebooting, but now the question =
is:
>>> =C2=A0=C2=A0=C2=A0=C2=A0Is the remainder of the zpools on the same di=
sks in danger?
>>
>> You can try to check with zdb -b on an idle (better exported) pool.  A=
nd zpool
>> scrub.
>=20
> If scrub says things are oke, I can start breathing again?
> exporting the pool is something for the small hours.
>=20
> Thanx,
> --WjW
>=20
>=20
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.or=
g"
>=20

--=20
Stefan Wendler
stefan.wendler@tngtech.com
+49 (0) 176 -  2438 3835
Senior Consultant

TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterf=C3=B6hring
Gesch=C3=A4ftsf=C3=BChrer: Henrik Klagges, Dr. Robert Dahlke, Gerhard M=C3=
=BCller
Sitz: Unterf=C3=B6hring * Amtsgericht M=C3=BCnchen * HRB 135082



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?25b13f67-76fd-621d-22b8-f1efdcc4ae0a>