From owner-freebsd-fs@FreeBSD.ORG Tue Jan 29 23:20:19 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A87AFCC5 for ; Tue, 29 Jan 2013 23:20:19 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-ia0-x232.google.com (mail-ia0-x232.google.com [IPv6:2607:f8b0:4001:c02::232]) by mx1.freebsd.org (Postfix) with ESMTP id 79612EB9 for ; Tue, 29 Jan 2013 23:20:19 +0000 (UTC) Received: by mail-ia0-f178.google.com with SMTP id y26so1457887iab.9 for ; Tue, 29 Jan 2013 15:20:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=x-received:from:content-type:content-transfer-encoding:subject :message-id:date:to:mime-version:x-mailer; bh=mV634ayqjMBOx7SILnuvSR+59ucUPOAyAQgnk69w53U=; b=cvKZdBK8TuaL52Hpw3gkErDkhUMxpGbMyTCsWlwT0D6Y6nu4/sbHZBbS8sae0X7hHE AEJXfjMNsW+TQlu1LKOxKt/h32HDXEey6Q/eWXQkf89+XamYEzmGn0Q8nlccuuEtlc2Y NkR0rU8aR545WcEn2GJ3U8wcLxPViDhggoQUk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:content-type:content-transfer-encoding:subject :message-id:date:to:mime-version:x-mailer:x-gm-message-state; bh=mV634ayqjMBOx7SILnuvSR+59ucUPOAyAQgnk69w53U=; b=TJv8YdYIq32YMay5tCTDPsnDGW+yb/+QWvEl0fNw7AG/6gWZEW1I1tx6xW/5fe0ePE PKRss0mdOP5mBFnN4xHMec7uNwpH/Y8E3J7kuy5r1jh8qQtS/+FGZocH4KKwC5xFPihN 4utlKCSglWZoQmmK/UVrBxcEkVYvuU/v6NCpMFuHIYUXQSfGhMd8Clfn2BItYH0jrEp6 d9MlfLHT/9UT3gqR5pI24a6BCt67iM03Id9XYEUxyt7oBUJU5Qb8yZOJ67zfcbdm5c1/ z/JVuGY5hots0/fUmcM5EBiesKRfqNKXM3L/CgaE5Pe5AKLOzBCgJP3wNdRqQd1QgieM cklw== X-Received: by 10.42.30.132 with SMTP id v4mr1808396icc.34.1359501619173; Tue, 29 Jan 2013 15:20:19 -0800 (PST) Received: from vpn132.rw1.your.org (vpn132.rw1.your.org. [204.9.51.132]) by mx.google.com with ESMTPS id vq4sm2912997igb.10.2013.01.29.15.20.17 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Jan 2013 15:20:18 -0800 (PST) From: Kevin Day Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Improving ZFS performance for large directories Message-Id: <19DB8F4A-6788-44F6-9A2C-E01DEA01BED9@dragondata.com> Date: Tue, 29 Jan 2013 17:20:15 -0600 To: FreeBSD Filesystems Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) X-Mailer: Apple Mail (2.1499) X-Gm-Message-State: ALoCoQlKSPeuKIj2/xZpRMC2K963YVXcKPluY/0NMQ+f5nOEn8076UECfrirKNoNH0eFdz2/QQWg X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Jan 2013 23:20:19 -0000 I'm trying to improve performance when using ZFS in large (>60000 files) = directories. A common activity is to use "getdirentries" to enumerate = all the files in the directory, then "lstat" on each one to get = information about it. Doing an "ls -l" in a large directory like this = can take 10-30 seconds to complete. Trying to figure out why, I did: ktrace ls -l /path/to/large/directory kdump -R |sort -rn |more to see what sys calls were taking the most time, I ended up with: 69247 ls 0.190729 STRU struct stat {dev=3D846475008, = ino=3D46220085, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1333196714, stime=3D1201004393, = ctime=3D1333196714.547566024, birthtime=3D1333196714.547566024, = size=3D30784, blksize=3D31232, blocks=3D62, flags=3D0x0 } 69247 ls 0.180121 STRU struct stat {dev=3D846475008, = ino=3D46233417, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1333197088, stime=3D1209814737, = ctime=3D1333197088.913571042, birthtime=3D1333197088.913571042, = size=3D3162220, blksize=3D131072, blocks=3D6409, flags=3D0x0 } 69247 ls 0.152370 RET getdirentries 4088/0xff8 69247 ls 0.139939 CALL stat(0x800d8f598,0x7fffffffcca0) 69247 ls 0.130411 RET __acl_get_link 0 69247 ls 0.121602 RET __acl_get_link 0 69247 ls 0.105799 RET getdirentries 4064/0xfe0 69247 ls 0.105069 RET getdirentries 4068/0xfe4 69247 ls 0.096862 RET getdirentries 4028/0xfbc 69247 ls 0.085012 RET getdirentries 4088/0xff8 69247 ls 0.082722 STRU struct stat {dev=3D846475008, = ino=3D72941319, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1348686155, stime=3D1348347621, = ctime=3D1348686155.768875422, birthtime=3D1348686155.768875422, = size=3D6686225, blksize=3D131072, blocks=3D13325, flags=3D0x0 } 69247 ls 0.070318 STRU struct stat {dev=3D846475008, = ino=3D46211679, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1333196475, stime=3D1240230314, = ctime=3D1333196475.038567672, birthtime=3D1333196475.038567672, = size=3D829895, blksize=3D131072, blocks=3D1797, flags=3D0x0 } 69247 ls 0.068060 RET getdirentries 4048/0xfd0 69247 ls 0.065118 RET getdirentries 4088/0xff8 69247 ls 0.062536 RET getdirentries 4096/0x1000 69247 ls 0.061118 RET getdirentries 4020/0xfb4 69247 ls 0.055038 STRU struct stat {dev=3D846475008, = ino=3D46220358, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1333196720, stime=3D1274282669, = ctime=3D1333196720.972567345, birthtime=3D1333196720.972567345, = size=3D382344, blksize=3D131072, blocks=3D773, flags=3D0x0 } 69247 ls 0.054948 STRU struct stat {dev=3D846475008, = ino=3D75025952, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1351071350, stime=3D1349726805, = ctime=3D1351071350.800873870, birthtime=3D1351071350.800873870, = size=3D2575559, blksize=3D131072, blocks=3D5127, flags=3D0x0 } 69247 ls 0.054828 STRU struct stat {dev=3D846475008, = ino=3D65021883, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1335730367, stime=3D1332843230, = ctime=3D1335730367.541567371, birthtime=3D1335730367.541567371, = size=3D226347, blksize=3D131072, blocks=3D517, flags=3D0x0 } 69247 ls 0.053743 STRU struct stat {dev=3D846475008, = ino=3D46222016, mode=3D-rw-r--r-- , nlink=3D1, uid=3D0, gid=3D0, = rdev=3D4294967295, atime=3D1333196765, stime=3D1257110706, = ctime=3D1333196765.206574132, birthtime=3D1333196765.206574132, = size=3D62112, blksize=3D62464, blocks=3D123, flags=3D0x0 } 69247 ls 0.052015 RET getdirentries 4060/0xfdc 69247 ls 0.051388 RET getdirentries 4068/0xfe4 69247 ls 0.049875 RET getdirentries 4088/0xff8 69247 ls 0.049156 RET getdirentries 4032/0xfc0 69247 ls 0.048609 RET getdirentries 4040/0xfc8 69247 ls 0.048279 RET getdirentries 4032/0xfc0 69247 ls 0.048062 RET getdirentries 4064/0xfe0 69247 ls 0.047577 RET getdirentries 4076/0xfec (snip) the STRU are returns from calling lstat(). It looks like both getdirentries and lstat are taking quite a while to = return. The shortest return for any lstat() call is 0.000004 seconds, = the maximum is 0.190729 and the average is around 0.0004. Just from = lstat() alone, that makes "ls" take over 20 seconds. I'm prepared to try an L2arc cache device (with = secondarycache=3Dmetadata), but I'm having trouble determining how big = of a device I'd need. We've got >30M inodes now on this filesystem, = including some files with extremely long names. Is there some way to = determine the amount of metadata on a ZFS filesystem?