From owner-freebsd-stable@freebsd.org Mon Jun 11 12:35:31 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B9839101E2A5 for ; Mon, 11 Jun 2018 12:35:31 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 3BEAC829C9 for ; Mon, 11 Jun 2018 12:35:31 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: by mailman.ysv.freebsd.org (Postfix) id E6B12101E2A2; Mon, 11 Jun 2018 12:35:30 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1A6D101E2A1 for ; Mon, 11 Jun 2018 12:35:30 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: from proxy.tng.vnc.biz (zimbra-vnc.tngtech.com [83.144.240.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3C752829C6 for ; Mon, 11 Jun 2018 12:35:29 +0000 (UTC) (envelope-from stefan.wendler@tngtech.com) Received: from localhost (localhost [127.0.0.1]) by proxy.tng.vnc.biz (Postfix) with ESMTP id CA90A1E0165; Mon, 11 Jun 2018 14:35:19 +0200 (CEST) Received: from proxy.tng.vnc.biz ([127.0.0.1]) by localhost (proxy.tng.vnc.biz [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id GtZcWQznVNkx; Mon, 11 Jun 2018 14:35:19 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by proxy.tng.vnc.biz (Postfix) with ESMTP id 2A2261E0194; Mon, 11 Jun 2018 14:35:19 +0200 (CEST) Received: from proxy.tng.vnc.biz ([127.0.0.1]) by localhost (proxy.tng.vnc.biz [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id NwGuAccP4zZ1; Mon, 11 Jun 2018 14:35:19 +0200 (CEST) Received: from quidnac.universe (fire.tngtech.com [212.204.93.100]) by proxy.tng.vnc.biz (Postfix) with ESMTPSA id 025E21E0165; Mon, 11 Jun 2018 14:35:18 +0200 (CEST) Subject: Re: Continuous crashing ZFS server To: Willem Jan Withagen References: <17446f39-97a1-8603-11a0-32176e8cb833@FreeBSD.org> <100ea6d0-5cf4-1a00-0e3a-dfad6175df6c@FreeBSD.org> <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl> Cc: "stable@freebsd.org" From: Stefan Wendler Openpgp: preference=signencrypt Autocrypt: addr=stefan.wendler@tngtech.com; keydata= xsFNBFqf38YBEADWxuFlGavmDifcdeaRR5A+HA9g3whvWerUhMCaYqF/dPzyznPYRfzTanMZ aBeDYFKDMnURCMwgxZUW3ErPKKoYHUOEAz7zB7o8Y4S4qFDzzEubthupzpERIxFSkH66LpN0 NhKMLHHW15zkKvzh85pl+cGBJZpGXiBunw+3h+3xIbA5zA4136JjwD2tMY3dw/ynSh4kkNt+ lMiRNEMWdyTKb67k9hOrNFLO6HnJ+lw0RFrYhnzrz4Z8AL7ADXh7fl7Km1NJsNqWr5hxK1w8 8ut/a8hu3u87u7EfETYUTKZBPn2ahawFjvnEJmNWPepiITVzm/g4N+8mzo/ZQ14Iw4tmEQrC 8Yv58sW/2CRcOEYMKyFLAjDMNUoAAdQ1nAqf/RRzGjOX1RR/HG6sv+nfqOqPr/anWVbH8XUv irQfJjySGKOsnAZ8PLqQr9y0ZUw+8uo5rBnBI27tGnY7pDzSsLAwliY4OJXpyBqVIcmwAJcI kqWSYoXQnCnACMWcPFm3Q6jVaR4FXuCXi4LtQRbwOmF7DBqeLjFt5l4yruEKT6+T2pkh3T2L JVf42+uyPf+8GOq/Z6ijh9ZPcye8A8XUdnHNtGHqsHKnwyba3/30C3GUiZK1s0tnqczMe2z8 zbLE/gxft+Gl7PyZaaLmieZ6Mu3G9glA0ZftL8wwlthEoi4JswARAQABzStTdGVmYW4gV2Vu ZGxlciA8c3RlZmFuLndlbmRsZXJAdG5ndGVjaC5jb20+wsGUBBMBCAA+FiEECW/wPDDgb09C b/dYx6FkmALno+kFAlqf38YCGyMFCQWjmoAFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQ x6FkmALno+kbXhAArMSzeVKj1lK3VXQ6GFttagz755wkkoLhidgSb2FvPweK8uDpHxX91WIe Dxr2wavtZn6ALcEti8J7iSITg8htBZBL3V3Ld1Hput7Nw1ucWvGeLqr2uwLOcxOs1IMjlC/5 ny3pb2Oa8lFkTIrcTeFUHKzeITYaYHnWZOUWqwIOlsq1kFVP+dTUvMkXw1DTL7VRcPeZgUs9 vs+RNOilVAMG8fV+x+EkYbPoMHrFchOCtid/ZYewQU7ws1ma6NVdRuUdlGpvdOaqjemJiPvq lBo8XnP2Ww+x7TpeJw6RLiozyIVBjQkoHqHBI/b8MpVemD+/VCY4elROpDrIVsnQcMyaBxk4 juLhTNB3Ayog8blZ0DTBS1qZUl0T/5iXXGu9eGdOyU9gCib9TlqDdesbzRvcLj0nZ7l7Pc5u ndCoQfoMtp+xHAJol6Nn5kOTJqlWY4BYnH2lzazC2+6RvgRDPVI7A6on9eGu05zmjv94xSO2 0Vt/S7Tah8cEsvjmSGGYj+s570dIVTFgDTkdyYB/fFk1bWLb9LLkOdc+m4KS8WDcHfB13oK5 PDaa5fnHL5otWubpsm2IsDg7FBgGuKidtspbS8sVT3WiMiJqvHq0krLWOmFJk99SVF3HCCsw rfu+rVvnD66F23OHnsykvF1rZsKjVSTmQ4yeyjSkTHaFjuyXn2TOwU0EWp/fxgEQAMQirwqS PAI4PTr6q3V4SA0AE6Cg1CDRSmtVoM/QqAnQIqjTRnhQV/BbySrxzh8NVoRaIo64TOZrBpr3 DIXED6bBmREZY1ttzKs/c4zJ95KZVvnYH8v7tCv3ucu4BISCUVEnXsdgn10kj+OeLRnoBCO+ qfzSr01+OjYYl5wGgx7ysuZOcjPO10wZWqNrmnbDIgdSts6Zgy7xcUkknAMRQspX7mlncB4e UNQ9ehJb4334hmIdlNp/k8l4V+EEOEZHne6cGJKjhOPVad74zScG8IncZke9XWapFwAuhbZT Z629btHdT9NbvgCYa6wGQmtMP8DKNCdQAg9qznY0lGEwvm0HXsCC5BrDnypznC7jnZF42YZZ SOMgWlcmHFjJkjVSaaLsgnRc+izzA/rCz/W5Qbu92V/EicJZSahzZIZmVjmrvCHFQ0evDdtX gK3umoXHsw2z3icKxmETbOz8sx3ZseCWQ9qrtOwKIe/KL6O+UzjcRZb/hwsbY2vP1auHesmQ idGbHnNyX97Q6FBV1TyGx0NK0C2cti3cUPgobzSrO3W1MMSDmHnzQpGVHtrM8B7tiiHhrq1R ZdXPX0olGkdjf0gzG/FFvhOLuFi99CIuZI5UU+fJrJG8FudHuQsrPQC9k6eBKU0NNLCu1DdM QOH7Gzx/MEo6XbC7yAOwLUP21rdtABEBAAHCwXwEGAEIACYWIQQJb/A8MOBvT0Jv91jHoWSY Auej6QUCWp/fxgIbDAUJBaOagAAKCRDHoWSYAuej6fzwD/9Ixn3xGrKvYh1MEFZaj81D4w7t /kMu2JryDLgMy6AhcN8FIWLjyFJivo6GR8pPqRjtIqNV/RhO/GsjGd9CiZwq+LUdeUAPcXaX qQdILeBY/5WPmh0rHc1gMAbOvOeceJPmFqGPwh+1OHF79TPWp34ELIJXch+GOi9cvptT7edn rlNQfc0ZqNea+E4E59B/tTTKk/1T5fkQMqmM6wosKgt9UcbFMELZvOQTlHCGUEsHsjCacr1H 2FixF2RtqVXHXAz5Np2OhRD5TSMAEXEw/sJccwqvmn/j3CCmpsx13k67gYk1TgPlHzUyUwv+ 8NV2DHcxdoCwbKShO5KvjuND/Cwl7jKNDn7e9PEUFpVxOKSZzAuJ8OZ3HcOoHGjrnoQjkS0m XWSgqXWZzzQWUQHxfaivTwyNHwjYLykSl/rIkXNSIlIxUJyf4u9L3cXC2aVTGbPLt/BmFGUp Sn+BxQXQQ5tKZXu3Hqrsnzevud5gLEAFy1fDNj2h5y18jSmk2iC1a/MkVE959nTV98X25dmP ct+2KAmzpSwg5bqEPP+Cna9IiQGMECAhcwl/9xyMFI63Kch6zek39IWdEWkfq3aessFW0uT8 R1DPHe9G/ZISUoBBW2CKES+ieiidHEEnr6+zEuEpjRn1KEN68lIgiP6pyv2qtULceRV7aFM7 hlSHYCvrHQ== Message-ID: <25b13f67-76fd-621d-22b8-f1efdcc4ae0a@tngtech.com> Date: Mon, 11 Jun 2018 14:35:18 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <17ee24dd-93e5-dede-d7aa-90239c72c287@digiware.nl> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jun 2018 12:35:32 -0000 Do you use L2ARC/ZIL disks? I had a similar problem that turned out to be a broken caching SSD. Scrubbing didn't help a bit because it reported that data was okay. And SMART was fine as well. Fortunately I could still send/recv snapshots to a backup disk but wasn't able to replace the SSDs without a pool restore. ZFS just wouldn't sync some older ZIL data to disk and also wouldn't release the SSDs from the pool. Did you also check the logs for entries that look like broken RAM? Cheers, Stefan On 06/11/2018 01:29 PM, Willem Jan Withagen wrote: > On 11-6-2018 12:53, Andriy Gapon wrote: >> On 11/06/2018 13:26, Willem Jan Withagen wrote: >>> On 11/06/2018 12:13, Andriy Gapon wrote: >>>> On 08/06/2018 13:02, Willem Jan Withagen wrote: >>>>> My file server is crashing about every 15 minutes at the moment. >>>>> The panic looks like: >>>>> >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: panic: Solaris(panic): zfs: alloca= ting >>>>> allocated segment(offset=3D12922221670400 size=3D24576) >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: cpuid =3D 1 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: KDB: stack backtrace: >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #0 0xffffffff80aada57 at kdb_backt= race+0x67 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #1 0xffffffff80a6bb36 at vpanic+0x= 186 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #2 0xffffffff80a6b9a3 at panic+0x4= 3 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #3 0xffffffff82488192 at vcmn_err+= 0xc2 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #4 0xffffffff821f73ba at zfs_panic= _recover+0x5a >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #5 0xffffffff821dff8f at range_tre= e_add+0x20f >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #6 0xffffffff821deb06 at metaslab_= free_dva+0x276 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #7 0xffffffff821debc1 at metaslab_= free+0x91 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #8 0xffffffff8222296a at zio_dva_f= ree+0x1a >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #9 0xffffffff8221f6cc at zio_execu= te+0xac >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #10 0xffffffff80abe827 at >>>>> taskqueue_run_locked+0x127 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #11 0xffffffff80abf9c8 at >>>>> taskqueue_thread_loop+0xc8 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #12 0xffffffff80a2f7d5 at fork_exi= t+0x85 >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: #13 0xffffffff80ec4abe at fork_tra= mpoline+0xe >>>>> Jun=C2=A0 8 11:48:43 zfs kernel: Uptime: 9m7s >>>>> >>>>> Maybe a known bug? >>>>> Is there anything I can do about this? >>>>> Any debugging needed? >>>> >>>> Sorry to inform you but your on-disk data got corrupted. >>>> The most straightforward thing you can do is try to save data from t= he pool in >>>> readonly mode. >>> >>> Hi Andriy, >>> >>> Auch, that is a first in 12 years of using ZFS. "Fortunately" it was = of a test >>> ZVOL->iSCSI->Win10 disk on which I spool my CAMs. >>> >>> Removing the ZVOL actually fixed the rebooting, but now the question = is: >>> =C2=A0=C2=A0=C2=A0=C2=A0Is the remainder of the zpools on the same di= sks in danger? >> >> You can try to check with zdb -b on an idle (better exported) pool. A= nd zpool >> scrub. >=20 > If scrub says things are oke, I can start breathing again? > exporting the pool is something for the small hours. >=20 > Thanx, > --WjW >=20 >=20 > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.or= g" >=20 --=20 Stefan Wendler stefan.wendler@tngtech.com +49 (0) 176 - 2438 3835 Senior Consultant TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterf=C3=B6hring Gesch=C3=A4ftsf=C3=BChrer: Henrik Klagges, Dr. Robert Dahlke, Gerhard M=C3= =BCller Sitz: Unterf=C3=B6hring * Amtsgericht M=C3=BCnchen * HRB 135082