Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jun 2018 14:57:05 +0200
From:      Stefan Wendler <stefan.wendler@tngtech.com>
To:        Willem Jan Withagen <wjw@digiware.nl>
Cc:        "stable@freebsd.org" <stable@FreeBSD.org>
Subject:   Re: Continuous crashing ZFS server
Message-ID:  <324157ef-9565-69b4-1685-7b3ff45f9490@tngtech.com>
In-Reply-To: <34c4a21b-9555-3b34-14a3-94cdacc22179@digiware.nl>
References:  <f9ecab27-5201-4b60-ea75-e68dd5ffb44c@digiware.nl> <17446f39-97a1-8603-11a0-32176e8cb833@FreeBSD.org> <d75b7d81-67c8-d473-7652-c212700ef0d1@digiware.nl> <100ea6d0-5cf4-1a00-0e3a-dfad6175df6c@FreeBSD.org> <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl> <25b13f67-76fd-621d-22b8-f1efdcc4ae0a@tngtech.com> <34c4a21b-9555-3b34-14a3-94cdacc22179@digiware.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
Under normal circumstances you can just add/remove the caches from the
pool while the system is running. If something is fishy here then ZFS
should inform you that there is still "dirty" data that has to be synced
if you you try to remove the cache. I don't know the exact message but
it is pretty clear.


On 06/11/2018 02:48 PM, Willem Jan Withagen wrote:
> On 11-6-2018 14:35, Stefan Wendler wrote:
>> Do you use L2ARC/ZIL disks? I had a similar problem that turned out to
>> be a broken caching SSD. Scrubbing didn't help a bit because it report=
ed
>> that data was okay. And SMART was fine as well. Fortunately I could
>> still send/recv snapshots to a backup disk but wasn't able to replace
>> the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL
>> data to disk and also wouldn't release the SSDs from the pool. Did you
>> also check the logs for entries that look like broken RAM?
>=20
> That was one of the things I looked for, bad things in log files.
> But the server does not deem to have any hardware problems.
>=20
> I'll dive a bit deeper into my ZIL SSDs
>=20
> Thanx,
> --WjW
>=20
>> Cheers,
>> Stefan
>>
>> On 06/11/2018 01:29 PM, Willem Jan Withagen wrote:
>>> On 11-6-2018 12:53, Andriy Gapon wrote:
>>>> On 11/06/2018 13:26, Willem Jan Withagen wrote:
>>>>> On 11/06/2018 12:13, Andriy Gapon wrote:
>>>>>> On 08/06/2018 13:02, Willem Jan Withagen wrote:
>>>>>>> My file server is crashing about every 15 minutes at the moment.
>>>>>>> The panic looks like:
>>>>>>>
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: allo=
cating
>>>>>>> allocated segment(offset=3D12922221670400 size=3D24576)
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel:
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: cpuid =3D 1
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: KDB: stack backtrace:
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_bac=
ktrace+0x67
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+=
0x186
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0=
x43
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_er=
r+0xc2
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_pan=
ic_recover+0x5a
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_t=
ree_add+0x20f
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metasla=
b_free_dva+0x276
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metasla=
b_free+0x91
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva=
_free+0x1a
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_exe=
cute+0xac
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at
>>>>>>> taskqueue_run_locked+0x127
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at
>>>>>>> taskqueue_thread_loop+0xc8
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_e=
xit+0x85
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_t=
rampoline+0xe
>>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: Uptime: 9m7s
>>>>>>>
>>>>>>> Maybe a known bug?
>>>>>>> Is there anything I can do about this?
>>>>>>> Any debugging needed?
>>>>>>
>>>>>> Sorry to inform you but your on-disk data got corrupted.
>>>>>> The most straightforward thing you can do is try to save data from=
 the pool in
>>>>>> readonly mode.
>>>>>
>>>>> Hi Andriy,
>>>>>
>>>>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it wa=
s of a test
>>>>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs.
>>>>>
>>>>> Removing the ZVOL actually fixed the rebooting, but now the questio=
n is:
>>>>> =C2=A0=C2=A0=C2=A0=C2=A0Is the remainder of the zpools on the same =
disks in danger?
>>>>
>>>> You can try to check with zdb -b on an idle (better exported) pool. =
 And zpool
>>>> scrub.
>>>
>>> If scrub says things are oke, I can start breathing again?
>>> exporting the pool is something for the small hours.
>>>
>>> Thanx,
>>> --WjW
>>>
>>>
>>> _______________________________________________
>>> freebsd-stable@freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.=
org"
>>>
>>
>=20



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?324157ef-9565-69b4-1685-7b3ff45f9490>