From owner-freebsd-current@FreeBSD.ORG Wed Oct 16 15:57:19 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D54B2D52; Wed, 16 Oct 2013 15:57:19 +0000 (UTC) (envelope-from satan@ukr.net) Received: from hell.ukr.net (hell.ukr.net [212.42.67.68]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 888102861; Wed, 16 Oct 2013 15:57:19 +0000 (UTC) Received: from satan by hell.ukr.net with local ID 1VWTTI-000NbY-83 ; Wed, 16 Oct 2013 18:57:16 +0300 Date: Wed, 16 Oct 2013 18:57:16 +0300 From: Vitalij Satanivskij To: Steven Hartland Subject: Re: ZFS secondarycache on SSD problem on r255173 Message-ID: <20131016155716.GA90462@hell.ukr.net> References: <1379333192.127359970.ma5jnbc5@fmst-6.ukr.net> <1379334340.567465877.0b1lli6r@fmst-6.ukr.net> <8365CE736DC749DF95D0030A725211F6@multiplay.co.uk> <02549AD9-C456-4E17-927C-B4BCC97F8CC8@freebsd.org> <4AA28730F331444AB13108ABF0CD68B7@multiplay.co.uk> <1379496242.750778745.m0ksff1m@fmst-6.ukr.net> <20131016080100.GA27758@hell.ukr.net> <0023A5B16E614B67A2D8A4A63641D1D8@multiplay.co.uk> <20131016141053.GA43384@hell.ukr.net> <6DE320B20F7844B9ADC34A214AED8055@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6DE320B20F7844B9ADC34A214AED8055@multiplay.co.uk> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Vitalij Satanivskij , Dmitriy Makarov , "Justin T. Gibbs" , Borja Marcos , freebsd-current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Oct 2013 15:57:20 -0000 Steven Hartland wrote: SH> I'm not clear what you rolled back there as r255173 has ntothing to do SH> with this. Could you clarify r255173 with you patch from email dated Tue, 17 Sep 2013 23:53:12 +0100 with subject Re: ZFS secondarycache on SSD problem on r255173 Errors wich we gets is in arcstats count not in messages, and was desribed some time ago in mails be me and Dmitriy Makarov with subject's ZFS L2ARC - incorrect size and abnormal system load on r255173 On r255173 without patch and with vfs.zfs.max_auto_ashift=9 when added to pool 2 ssd as caches get cache gpt/cache1 ONLINE 0 0 0 block size: 512B configured, 4096B native gpt/cache2 ONLINE 0 0 0 block size: 512B configured, 4096B native Same message we seen with default vfs.zfs.max_auto_ashift Will wait some time to see how it works. SH> Any errors recorded in /var/log/messages? SH> SH> Could you add code to record the none zero value of zio->io_error in SH> l2arc_read_done as this may give some indication of the underlying SH> issue. SH> SH> Additionally could always put a panic in that code path too and then SH> create a dump so the details can be fully exhamined. SH> SH> In terms of the slowness thats going to be a side effect of the cache SH> failures. SH> SH> Oh could you also confirm that the issue doesn't exist if you SH> 1. Exclude r255753 SH> 2. Set vfs.zfs.max_auto_ashift=9 SH> SH> Regards SH> Steve SH> ----- Original Message ----- SH> From: "Vitalij Satanivskij" SH> To: "Steven Hartland" SH> Cc: "Vitalij Satanivskij" ; "Dmitriy Makarov" ; "Justin T. Gibbs" ; "Borja SH> Marcos" ; SH> Sent: Wednesday, October 16, 2013 3:10 PM SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> SH> SH> > Yes SH> > SH> > We have 15 servers, all of them have problem while using with patch fo ashift, sh we rollback path (for r255173) SH> > and all of them works for a week without that's problem's. Yesterday one of of servers was updated to stable/10 (beta1) SH> > SH> > wich include patch and after around 12 hours of works l2arc begin et errors like that SH> > SH> > kstat.zfs.misc.arcstats.l2_cksum_bad SH> > kstat.zfs.misc.arcstats.l2_io_error SH> > SH> > SH> > For now patch disabled in ower production. SH> > SH> > SH> > Please note we have very heavy load on zfs pool so 90GB arc and 3x180Gb L2arc have very big hit's on it on it. SH> > SH> > SH> > SSD used for cache's is intel ssd 530 series smart for all devices in in normal states's SH> > no bad values on it. SH> > SH> > Steven Hartland wrote: SH> > SH> Have you confirmed the ashift changes are the actual cause of this SH> > SH> by backing out just those changes and retesting on the same hardware. SH> > SH> SH> > SH> Also worth checking your disks smart values to confirm there are no SH> > SH> visible signs of HW errors. SH> > SH> SH> > SH> Regards SH> > SH> Steve SH> > SH> SH> > SH> ----- Original Message ----- SH> > SH> From: "Vitalij Satanivskij" SH> > SH> To: "Dmitriy Makarov" SH> > SH> Cc: "Steven Hartland" ; "Justin T. Gibbs" ; "Borja Marcos" ; SH> > SH> SH> > SH> Sent: Wednesday, October 16, 2013 9:01 AM SH> > SH> Subject: Re: ZFS secondarycache on SSD problem on r255173 SH> > SH> SH> > SH> SH> > SH> > Hello. SH> > SH> > SH> > SH> > Patch brocke cache functionality. SH> > SH> > SH> > SH> > Look at's Dmitriy's mail from Mon, 07 Oct 2013 21:09:06 +0300 SH> > SH> > SH> > SH> > With subject ZFS L2ARC - incorrect size and abnormal system load on r255173 SH> > SH> > SH> > SH> > As patch alredy in head and BETA it's not good. SH> > SH> > SH> > SH> > Yesterday we update one machine up to beta1 and forgot about patch. So 12 Hours and cache broken... :(( SH> > SH> > SH> > SH> > SH> > SH> > SH> > SH> > Dmitriy Makarov wrote: SH> > SH> > DM> The attached patch by Steven Hartland fixes issue for me too. Thank you! SH> > SH> > DM> SH> > SH> > DM> SH> > SH> > DM> --- Исходное сообщение --- SH> > SH> > DM> От кого: "Steven Hartland" < killing@multiplay.co.uk > SH> > SH> > DM> Дата: 18 сентября 2013, 01:53:10 SH> > SH> > DM> SH> > SH> > DM> ----- Original Message ----- SH> > SH> > DM> From: "Justin T. Gibbs" < SH> > SH> > DM> SH> > SH> > DM> --- SH> > SH> > DM> Дмитрий Макаров SH> > SH> > DM> _______________________________________________ SH> > SH> > DM> freebsd-current@freebsd.org mailing list SH> > SH> > DM> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> > SH> > DM> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" SH> > SH> > SH> > SH> SH> > SH> SH> > SH> ================================================ SH> > SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the SH> > event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any SH> > information contained in it. SH> > SH> SH> > SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> > SH> or return the E.mail to postmaster@multiplay.co.uk. SH> > SH> SH> > SH> _______________________________________________ SH> > SH> freebsd-current@freebsd.org mailing list SH> > SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> > SH> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" SH> > SH> SH> SH> ================================================ SH> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. SH> SH> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 SH> or return the E.mail to postmaster@multiplay.co.uk. SH> SH> _______________________________________________ SH> freebsd-current@freebsd.org mailing list SH> http://lists.freebsd.org/mailman/listinfo/freebsd-current SH> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"