From owner-freebsd-fs@FreeBSD.ORG Thu Mar 14 18:13:40 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0189BD39 for ; Thu, 14 Mar 2013 18:13:40 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-f49.google.com (mail-qa0-f49.google.com [209.85.216.49]) by mx1.freebsd.org (Postfix) with ESMTP id BCBBB172 for ; Thu, 14 Mar 2013 18:13:39 +0000 (UTC) Received: by mail-qa0-f49.google.com with SMTP id o13so1399854qaj.15 for ; Thu, 14 Mar 2013 11:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=GYplxzYfiI41tlYW79pzop4H+7jE/XGvwwwev1BwNtU=; b=IZDNwfAY+G0fzdBtGAG2Kf/ZQc238NtfwAyalEwiABkUObRexaue6b0maSH1e0y8fl CMfT/hfF70AFafIFfwZAN5nUb/FntherTWcnM5Mucu4G63NNQD/95CY67CS0FEBzh9n0 8U1WFok6nvj2rQFJ7u4Kq6w8tz8Q2OM/goyE9VwB2BrZenSM/4A3TY2Wqz0FHt0Onc9z ZJR3Y/LNPoPXSw2ST5OjoBor5g0HUrXngYS5VNXEqtiifeOFPc2c6ePIBAn0fPl/h6zm 9npvQVjOIxuZexn+Azn6zqPaPWocLestS42hZ4SF+NPtSvG7CApe8j+jXxJ5M4nwQoql Xotw== MIME-Version: 1.0 X-Received: by 10.229.172.162 with SMTP id l34mr713340qcz.81.1363284818828; Thu, 14 Mar 2013 11:13:38 -0700 (PDT) Received: by 10.49.50.67 with HTTP; Thu, 14 Mar 2013 11:13:38 -0700 (PDT) Date: Thu, 14 Mar 2013 11:13:38 -0700 Message-ID: Subject: Strange slowdown when cache devices enabled in ZFS From: Freddie Cash To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Mar 2013 18:13:40 -0000 3 storage systems are running this: # uname -a FreeBSD alphadrive.sd73.bc.ca 9.1-STABLE FreeBSD 9.1-STABLE #0 r245466M: Fri Feb 1 09:38:24 PST 2013 root@alphadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST amd64 1 storage system is running this: # uname -a FreeBSD omegadrive.sd73.bc.ca 9.1-STABLE FreeBSD 9.1-STABLE #0 r247804M: Mon Mar 4 10:27:26 PST 2013 root@omegadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST amd64 The last system has manually merged the ZFS "deadman" patch (r 247265 from -CURRENT). All 4 systems exhibit the same symptoms: if a cache device is enabled in the pool, the l2arc_feed_thread of zfskern will spin until it takes up 100% of a CPU core, at which point all I/O to the pool stops. "zpool iostat 1" and "zpool iostat -v 1" show 0 reads and 0 writes to the pool. "gstat -I 1s -f gpt" shows 0 activity to the pool disks. If I remove the cache device from the pool, I/O starts up right away (although it takes several minutes for the remove operation to complete). During the "0 I/O period", any attempt to access the pool "hangs". CTRL+T shows either spa_namespace_lock or tx->tx_something or other (the one when trying to write a transaction to disk). And it will stay like that until the cache device is removed. Hardware is almost the same in all 4 boxes: 3x storage boxes: alphadrive: SuperMicro H8DGi-F motherboard AMD Opteron 6128 CPU (8 cores at 2.0 GHz) 64 GB of DDR3 ECC SDRAM in one box 32 GB SSD for the OS and cache device (GPT partitioned) 24x 2.0 TB WD and Seagate SATA harddrives (4x 6-drive raidz2 vdevs) SuperMicro AOC-USAS-8i SATA controller using mpt driver SuperMicro 4U chassis betadrive: SuperMicro H8DGi-F motherboard AMD Opteron 6128 CPU (8 cores at 2.0 GHz) 48 GB of DDR3 ECC SDRAM in one box 32 GB SSD for the OS and cache device (GPT partitioned) 16x 2.0 TB WD and Seagate SATA harddrives (3x 5-drive raidz2 vdevs + spare) SuperMicro AOC-USAS2-8i SATA controller using mps driver SuperMicro 3U chassis zuludrive: SuperMicro H8DGi-F motherboard AMD Opteron 6128 CPU (8 cores at 2.0 GHz) 32 GB of DDR3 ECC SDRAM in one box 32 GB SSD for the OS and cache device (GPT partitioned) 24x 2.0 TB WD and Seagate SATA harddrives (4x 6-drive raidz2 vdevs) SuperMicro AOC-USAS2-8i SATA controller using mps driver SuperMicro 836 chassis 1x storage box: omegadrive: SuperMicro H8DG6-F motherboard 2x AMD Opteron 6128 CPU (8 cores at 2.0 GHz; 16 cores total) 128 GB of DDR3 ECC SDRAM in one box 2x 60 GB SSD for the OS (gmirror'd) and log devices (ZFS mirror) 2x 120 GB SSD for cache devices 45x 2.0 TB WD and Seagate SATA harddrives (7x 6-drive raidz2 vdevs + 3 spares) LSI 9211-8e SAS controllers using mps driver Onboard LSI 2008 SATA controller using mps driver for OS/log/cache SuperMicro 4U JBOD chassis SuperMicro 2U chassis for motherboard/OS alphadrive, betadrive, and omegadrive all have dedup and lzjb compression enabled. zuludrive has lzjb compression enabled (no dedup). alpha/beta/zulu do rsync backups every night from various local and remote Linux and FreeBSD boxes, then ZFS send the snapshot to omegadrive during the day. The "0 I/O periods" occur most often and most quickly on omegadrive when receiving snapshots, but will eventually occur on all systems during the rsyncs. Things I've tried: - limiting ARC to only 32 GB on each system - limiting L2ARC to 30 GB on each system - enabling the "deadman" patch in case it was I/O requests being lost by the drives/controllers - changing primarycache between all and metadata - increasing arc_meta_limit to just shy of arc_max - removing cache devices completely So far, only the last option works. Without L2ARC, the systems are 100% stable, and can push 200 MB/s of rsync writes and just shy of 500 MB/s of ZFS recv (saturates gigabit link, bursts writes; usually hovers around 50-80 MB/s continuous writes). I'm baffled. An L2ARC is supposed to make things faster, especially when using dedup as the DDT can be cached. -- Freddie Cash fjwcash@gmail.com