From owner-freebsd-fs@freebsd.org Fri Sep 21 13:04:41 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A34F910999CD for ; Fri, 21 Sep 2018 13:04:41 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost2.sentex.ca (smarthost2.sentex.ca [IPv6:2607:f3e0:80:80::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smarthost2.sentex.ca", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1610F7DBED for ; Fri, 21 Sep 2018 13:04:40 +0000 (UTC) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (lava.sentex.ca [IPv6:2607:f3e0:0:5:0:0:0:11]) by smarthost2.sentex.ca (8.15.2/8.15.2) with ESMTPS id w8LD4eJA020248 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 21 Sep 2018 09:04:40 -0400 (EDT) (envelope-from mike@sentex.net) Received: from [192.168.43.26] (saphire3.sentex.net [192.168.43.26]) by lava.sentex.ca (8.15.2/8.15.2) with ESMTP id w8LD4cRt031176; Fri, 21 Sep 2018 09:04:38 -0400 (EDT) (envelope-from mike@sentex.net) Subject: Re: Troubleshooting kernel panic with zfs To: Josh Gitlin , freebsd-fs@freebsd.org References: From: Mike Tancsa Openpgp: preference=signencrypt Autocrypt: addr=mike@sentex.net; prefer-encrypt=mutual; keydata= xsBNBEzcA24BCACpwI/iqOrs0GfQSfhA1v6Z8AcXVeGsRyKEKUpxoOYxXWc2z3vndbYlIP6E YJeifzKhS/9E+VjhhICaepLHfw865TDTUPr5D0Ed+edSsKjlnDtb6hfNJC00P7eoiuvi85TW F/gAxRY269A5d856bYrzLbkWp2lKUR3Bg6NnORtflGzx9ZWAltZbjYjjRqegPv0EQNYcHqWo eRpXilEo1ahT6nmOU8V7yEvT2j4wlLcQ6qg7w+N/vcBvyd/weiwHU+vTQ9mT61x5/wUrQhdw 2gJHeQXeDGMJV49RT2EEz+QVxaf477eyWsdQzPVjAKRMT3BVdK8WvpYAEfBAbXmkboOxABEB AAHNHG1pa2UgdGFuY3NhIDxtaWtlQHNlbnRleC5jYT7CwHgEEwECACIFAkzcA24CGwMGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEJXHwM2kc8rX+sMH/2V6pTBKsQ5mpWWLgs6wVP2k BC+6r/YKNXv9Rw/PrC6+9hTbgA+sSjJ+8gxsCbJsOQXZrxF0x3l9oYdYfuKcwdwXFX1/FS8p HfBeDkmlH+dI709xT9wgrR4dS5aMmKp0scPrXPIAKiYVOHjOlNItcLYTEEWEFBepheEVsgmk GrNbcrHwOx/u4igUQ8vcpyXPyUki+BsftPw8ZQvBU887igh0OxaCR8AurJppQ5UQd63r81cX E1ZjoFoWCaGK/SjPb/OhpYpu5swoZIhOxQbn7OtakYPsDd5t2A5KhvjI8BMTnd5Go+2xsCmr jlIEq8Bi29gCcfQUvNiClevi13ifmnnOwE0ETNwDbgEIALWGNJHRAhpd0A4vtd3G0oRqMBcM FGThQr3qORmEBTPPEomTdBaHcn+Xl+3YUvTBD/67/mutWBwgp2R5gQOSqcM7axvgMSHbKqBL 9sd1LsLw0UT2O5AYxv3EwzhG84pwRg3XcUqvWA4lA8tIj/1q4Jzi5qOkg1zxq4W9qr9oiYK5 bBR638JUvr3eHMaz/Nz+sDVFgwHmXZj3M6aE5Ce9reCGbvrae7H5D5PPvtT3r22X8SqfVAiO TFKedCf/6jbSOedPN931FJQYopj9P6b3m0nI3ZiCDVSqeyOAIBLzm+RBUIU3brzoxDhYR8pz CJc2sK8l6YjqivPakrD86bFDff8AEQEAAcLAXwQYAQIACQUCTNwDbgIbDAAKCRCVx8DNpHPK 1+iQB/99aqNtez9ZTBWELj269La8ntuRx6gCpzfPXfn6SDIfTItDxTh1hrdRVP5QNGGF5wus N4EMwXouskva1hbFX3Pv72csYSxxEJXjW16oV8WK4KjKXoskLg2RyRP4uXqL7Mp2ezNtVY5F 9nu3fj4ydpHCSaqKy5xd70A8D50PfZsFgkrsa5gdQhPiGGEdxhq/XSeAAnZ4uVLJKarH+mj5 MEhgZPEBWkGrbDZpezl9qbFcUem/uT9x8FYT/JIztMVh9qDcdP5tzANW5J7nvgXjska+VFGY ryZK4SPDczh74mn6GI/+RBi7OUzXXPgpPBrhS5FByjwCqjjsSpTjTds+NGIY Organization: Sentex Communications Message-ID: Date: Fri, 21 Sep 2018 09:04:38 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.83 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2018 13:04:41 -0000 On 9/20/2018 8:27 PM, Josh Gitlin wrote: > > My hunch is that, given this was inside kmem_malloc, we were unable to allocate memory for a zio_write_compress call (the pool does have ZFS compression on) and hence this is a tuning issue and not a bug... but I am looking for confirmation and/or suggested changes/troubleshooting steps. The ZFS tuning configuration has been stable for years, to it may be a change in behavior or traffic... If this looks like it might be a bug, I will be able to get more information from a minidump if it reoccurs and can follow up on this thread. Given all the issues with memory pressure perhaps ARC is eating too much, I would try artificially limiting it so it does not eat up too much RAM. We had a similar purposed server but with 32G RAM and I found one day the system was randomly shooting processes to reclaim RAM eg pid xxx (yyy), uid zzzz, was killed: out of swap space Perhaps sysctl -w vfs.zfs.arc_max=6000000000 eg. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229764 https://lists.freebsd.org/pipermail/freebsd-questions/2018-July/282261.html ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 x203 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada