From owner-freebsd-stable@freebsd.org Mon Jun 11 12:57:10 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 30F421020B38 for ; Mon, 11 Jun 2018 12:57:10 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id AB21183E45 for ; Mon, 11 Jun 2018 12:57:09 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: by mailman.ysv.freebsd.org (Postfix) id 64CD91020B2E; Mon, 11 Jun 2018 12:57:09 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 409FE1020B2A for ; Mon, 11 Jun 2018 12:57:09 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: from proxy.tng.vnc.biz (zimbra-vnc.tngtech.com [83.144.240.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BA6E683E43 for ; Mon, 11 Jun 2018 12:57:08 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: from localhost (localhost [127.0.0.1]) by proxy.tng.vnc.biz (Postfix) with ESMTP id E371A1E029C; Mon, 11 Jun 2018 14:57:06 +0200 (CEST) Received: from proxy.tng.vnc.biz ([127.0.0.1]) by localhost (proxy.tng.vnc.biz [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id g6S9OR_c0r0w; Mon, 11 Jun 2018 14:57:06 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by proxy.tng.vnc.biz (Postfix) with ESMTP id 362B51E0127; Mon, 11 Jun 2018 14:57:06 +0200 (CEST) Received: from proxy.tng.vnc.biz ([127.0.0.1]) by localhost (proxy.tng.vnc.biz [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id LgMRBIp1Q3s4; Mon, 11 Jun 2018 14:57:06 +0200 (CEST) Received: from quidnac.universe (fire.tngtech.com [212.204.93.100]) by proxy.tng.vnc.biz (Postfix) with ESMTPSA id 0CFB21E0426; Mon, 11 Jun 2018 14:57:06 +0200 (CEST) Subject: Re: Continuous crashing ZFS server To: Willem Jan Withagen Cc: "stable@freebsd.org" References: <17446f39-97a1-8603-11a0-32176e8cb833@FreeBSD.org> <100ea6d0-5cf4-1a00-0e3a-dfad6175df6c@FreeBSD.org> <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl> <25b13f67-76fd-621d-22b8-f1efdcc4ae0a@tngtech.com> <34c4a21b-9555-3b34-14a3-94cdacc22179@digiware.nl> From: Stefan Wendler Openpgp: preference=signencrypt Autocrypt: addr=stefan.wendler@tngtech.com; keydata= xsFNBFqf38YBEADWxuFlGavmDifcdeaRR5A+HA9g3whvWerUhMCaYqF/dPzyznPYRfzTanMZ aBeDYFKDMnURCMwgxZUW3ErPKKoYHUOEAz7zB7o8Y4S4qFDzzEubthupzpERIxFSkH66LpN0 NhKMLHHW15zkKvzh85pl+cGBJZpGXiBunw+3h+3xIbA5zA4136JjwD2tMY3dw/ynSh4kkNt+ lMiRNEMWdyTKb67k9hOrNFLO6HnJ+lw0RFrYhnzrz4Z8AL7ADXh7fl7Km1NJsNqWr5hxK1w8 8ut/a8hu3u87u7EfETYUTKZBPn2ahawFjvnEJmNWPepiITVzm/g4N+8mzo/ZQ14Iw4tmEQrC 8Yv58sW/2CRcOEYMKyFLAjDMNUoAAdQ1nAqf/RRzGjOX1RR/HG6sv+nfqOqPr/anWVbH8XUv irQfJjySGKOsnAZ8PLqQr9y0ZUw+8uo5rBnBI27tGnY7pDzSsLAwliY4OJXpyBqVIcmwAJcI kqWSYoXQnCnACMWcPFm3Q6jVaR4FXuCXi4LtQRbwOmF7DBqeLjFt5l4yruEKT6+T2pkh3T2L JVf42+uyPf+8GOq/Z6ijh9ZPcye8A8XUdnHNtGHqsHKnwyba3/30C3GUiZK1s0tnqczMe2z8 zbLE/gxft+Gl7PyZaaLmieZ6Mu3G9glA0ZftL8wwlthEoi4JswARAQABzStTdGVmYW4gV2Vu ZGxlciA8c3RlZmFuLndlbmRsZXJAdG5ndGVjaC5jb20+wsGUBBMBCAA+FiEECW/wPDDgb09C b/dYx6FkmALno+kFAlqf38YCGyMFCQWjmoAFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQ x6FkmALno+kbXhAArMSzeVKj1lK3VXQ6GFttagz755wkkoLhidgSb2FvPweK8uDpHxX91WIe Dxr2wavtZn6ALcEti8J7iSITg8htBZBL3V3Ld1Hput7Nw1ucWvGeLqr2uwLOcxOs1IMjlC/5 ny3pb2Oa8lFkTIrcTeFUHKzeITYaYHnWZOUWqwIOlsq1kFVP+dTUvMkXw1DTL7VRcPeZgUs9 vs+RNOilVAMG8fV+x+EkYbPoMHrFchOCtid/ZYewQU7ws1ma6NVdRuUdlGpvdOaqjemJiPvq lBo8XnP2Ww+x7TpeJw6RLiozyIVBjQkoHqHBI/b8MpVemD+/VCY4elROpDrIVsnQcMyaBxk4 juLhTNB3Ayog8blZ0DTBS1qZUl0T/5iXXGu9eGdOyU9gCib9TlqDdesbzRvcLj0nZ7l7Pc5u ndCoQfoMtp+xHAJol6Nn5kOTJqlWY4BYnH2lzazC2+6RvgRDPVI7A6on9eGu05zmjv94xSO2 0Vt/S7Tah8cEsvjmSGGYj+s570dIVTFgDTkdyYB/fFk1bWLb9LLkOdc+m4KS8WDcHfB13oK5 PDaa5fnHL5otWubpsm2IsDg7FBgGuKidtspbS8sVT3WiMiJqvHq0krLWOmFJk99SVF3HCCsw rfu+rVvnD66F23OHnsykvF1rZsKjVSTmQ4yeyjSkTHaFjuyXn2TOwU0EWp/fxgEQAMQirwqS PAI4PTr6q3V4SA0AE6Cg1CDRSmtVoM/QqAnQIqjTRnhQV/BbySrxzh8NVoRaIo64TOZrBpr3 DIXED6bBmREZY1ttzKs/c4zJ95KZVvnYH8v7tCv3ucu4BISCUVEnXsdgn10kj+OeLRnoBCO+ qfzSr01+OjYYl5wGgx7ysuZOcjPO10wZWqNrmnbDIgdSts6Zgy7xcUkknAMRQspX7mlncB4e UNQ9ehJb4334hmIdlNp/k8l4V+EEOEZHne6cGJKjhOPVad74zScG8IncZke9XWapFwAuhbZT Z629btHdT9NbvgCYa6wGQmtMP8DKNCdQAg9qznY0lGEwvm0HXsCC5BrDnypznC7jnZF42YZZ SOMgWlcmHFjJkjVSaaLsgnRc+izzA/rCz/W5Qbu92V/EicJZSahzZIZmVjmrvCHFQ0evDdtX gK3umoXHsw2z3icKxmETbOz8sx3ZseCWQ9qrtOwKIe/KL6O+UzjcRZb/hwsbY2vP1auHesmQ idGbHnNyX97Q6FBV1TyGx0NK0C2cti3cUPgobzSrO3W1MMSDmHnzQpGVHtrM8B7tiiHhrq1R ZdXPX0olGkdjf0gzG/FFvhOLuFi99CIuZI5UU+fJrJG8FudHuQsrPQC9k6eBKU0NNLCu1DdM QOH7Gzx/MEo6XbC7yAOwLUP21rdtABEBAAHCwXwEGAEIACYWIQQJb/A8MOBvT0Jv91jHoWSY Auej6QUCWp/fxgIbDAUJBaOagAAKCRDHoWSYAuej6fzwD/9Ixn3xGrKvYh1MEFZaj81D4w7t /kMu2JryDLgMy6AhcN8FIWLjyFJivo6GR8pPqRjtIqNV/RhO/GsjGd9CiZwq+LUdeUAPcXaX qQdILeBY/5WPmh0rHc1gMAbOvOeceJPmFqGPwh+1OHF79TPWp34ELIJXch+GOi9cvptT7edn rlNQfc0ZqNea+E4E59B/tTTKk/1T5fkQMqmM6wosKgt9UcbFMELZvOQTlHCGUEsHsjCacr1H 2FixF2RtqVXHXAz5Np2OhRD5TSMAEXEw/sJccwqvmn/j3CCmpsx13k67gYk1TgPlHzUyUwv+ 8NV2DHcxdoCwbKShO5KvjuND/Cwl7jKNDn7e9PEUFpVxOKSZzAuJ8OZ3HcOoHGjrnoQjkS0m XWSgqXWZzzQWUQHxfaivTwyNHwjYLykSl/rIkXNSIlIxUJyf4u9L3cXC2aVTGbPLt/BmFGUp Sn+BxQXQQ5tKZXu3Hqrsnzevud5gLEAFy1fDNj2h5y18jSmk2iC1a/MkVE959nTV98X25dmP ct+2KAmzpSwg5bqEPP+Cna9IiQGMECAhcwl/9xyMFI63Kch6zek39IWdEWkfq3aessFW0uT8 R1DPHe9G/ZISUoBBW2CKES+ieiidHEEnr6+zEuEpjRn1KEN68lIgiP6pyv2qtULceRV7aFM7 hlSHYCvrHQ== Message-ID: <324157ef-9565-69b4-1685-7b3ff45f9490@tngtech.com> Date: Mon, 11 Jun 2018 14:57:05 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <34c4a21b-9555-3b34-14a3-94cdacc22179@digiware.nl> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jun 2018 12:57:10 -0000 Under normal circumstances you can just add/remove the caches from the pool while the system is running. If something is fishy here then ZFS should inform you that there is still "dirty" data that has to be synced if you you try to remove the cache. I don't know the exact message but it is pretty clear. On 06/11/2018 02:48 PM, Willem Jan Withagen wrote: > On 11-6-2018 14:35, Stefan Wendler wrote: >> Do you use L2ARC/ZIL disks? I had a similar problem that turned out to >> be a broken caching SSD. Scrubbing didn't help a bit because it report= ed >> that data was okay. And SMART was fine as well. Fortunately I could >> still send/recv snapshots to a backup disk but wasn't able to replace >> the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL >> data to disk and also wouldn't release the SSDs from the pool. Did you >> also check the logs for entries that look like broken RAM? >=20 > That was one of the things I looked for, bad things in log files. > But the server does not deem to have any hardware problems. >=20 > I'll dive a bit deeper into my ZIL SSDs >=20 > Thanx, > --WjW >=20 >> Cheers, >> Stefan >> >> On 06/11/2018 01:29 PM, Willem Jan Withagen wrote: >>> On 11-6-2018 12:53, Andriy Gapon wrote: >>>> On 11/06/2018 13:26, Willem Jan Withagen wrote: >>>>> On 11/06/2018 12:13, Andriy Gapon wrote: >>>>>> On 08/06/2018 13:02, Willem Jan Withagen wrote: >>>>>>> My file server is crashing about every 15 minutes at the moment. >>>>>>> The panic looks like: >>>>>>> >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: allo= cating >>>>>>> allocated segment(offset=3D12922221670400 size=3D24576) >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: cpuid =3D 1 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: KDB: stack backtrace: >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_bac= ktrace+0x67 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+= 0x186 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0= x43 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_er= r+0xc2 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_pan= ic_recover+0x5a >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_t= ree_add+0x20f >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metasla= b_free_dva+0x276 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metasla= b_free+0x91 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva= _free+0x1a >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_exe= cute+0xac >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at >>>>>>> taskqueue_run_locked+0x127 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at >>>>>>> taskqueue_thread_loop+0xc8 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_e= xit+0x85 >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_t= rampoline+0xe >>>>>>> Jun=C2=A0 8 11:48:43 zfs kernel: Uptime: 9m7s >>>>>>> >>>>>>> Maybe a known bug? >>>>>>> Is there anything I can do about this? >>>>>>> Any debugging needed? >>>>>> >>>>>> Sorry to inform you but your on-disk data got corrupted. >>>>>> The most straightforward thing you can do is try to save data from= the pool in >>>>>> readonly mode. >>>>> >>>>> Hi Andriy, >>>>> >>>>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it wa= s of a test >>>>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs. >>>>> >>>>> Removing the ZVOL actually fixed the rebooting, but now the questio= n is: >>>>> =C2=A0=C2=A0=C2=A0=C2=A0Is the remainder of the zpools on the same = disks in danger? >>>> >>>> You can try to check with zdb -b on an idle (better exported) pool. = And zpool >>>> scrub. >>> >>> If scrub says things are oke, I can start breathing again? >>> exporting the pool is something for the small hours. >>> >>> Thanx, >>> --WjW >>> >>> >>> _______________________________________________ >>> freebsd-stable@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.= org" >>> >> >=20