From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 05:13:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BB051065673 for ; Sun, 16 Oct 2011 05:13:22 +0000 (UTC) (envelope-from batrick@batbytes.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 650808FC19 for ; Sun, 16 Oct 2011 05:13:22 +0000 (UTC) Received: by iaky10 with SMTP id y10so5751322iak.13 for ; Sat, 15 Oct 2011 22:13:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.158.136 with SMTP id h8mr28297990icx.22.1318740351026; Sat, 15 Oct 2011 21:45:51 -0700 (PDT) Received: by 10.231.19.66 with HTTP; Sat, 15 Oct 2011 21:45:50 -0700 (PDT) Date: Sun, 16 Oct 2011 00:45:50 -0400 Message-ID: From: Patrick Donnelly To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 Subject: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 05:13:22 -0000 Hi list, I've got an array for home use where my boot drive (UFS) finally died. I've decided to upgrade to a SSD for a replacement but am looking to maybe simultaneously improving performance of my ZFS array. Naturally a FreeBSD install doesn't use much space so partitioning the drive to get maximum usage seems wise. I was thinking for a hypothetical 40GB drive: 20GB -- FreeBSD / partition 2GB -- ZFS ZIL 18GB -- ZFS Cache What I'm wondering is if this will be a bad idea. I know that SSDs are not designed to be written to *a lot*, which a ZIL will experience. Is this a bad idea? I'm hoping for experiences from people in similar scenarios. As I'm not an enterprise IT person who can't simply choose to just throw more mon-- I mean SSDs -- at the problem, I need to be efficient. :) [I'm thinking the cache drive partition might be pointless as I don't think I'd benefit that much from it.] Disclaimer: I've looked at a lot of guides, including the standard best practices guide, and none of it seemed helpful for my particular problem, especially given that I'm using FreeBSD. Thanks for any advice, -- - Patrick Donnelly From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 05:25:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46832106566B for ; Sun, 16 Oct 2011 05:25:27 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 12BC58FC12 for ; Sun, 16 Oct 2011 05:25:26 +0000 (UTC) Received: by iaky10 with SMTP id y10so5762104iak.13 for ; Sat, 15 Oct 2011 22:25:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=dWSCMADqJks7EtUtMxQ+YvbaQBAM1skD9bjAc0Bf698=; b=G0JVCQvLRg49dVxdZi3vl4o9+YKt3xkZ103flx9FOggpK/XTQLusFdDFwkSwT5C/s6 QWXi5DOe/h1Pp4pNQLg3Rz4+1wZ/Yuar5Yb7Iw5zew3LNwn2i6CNpShokqvyucQcFCpS diNRaaKESNRz+V5gA75Ytxwq0/P+UzrEgz2Pw= MIME-Version: 1.0 Received: by 10.68.6.234 with SMTP id e10mr10617632pba.86.1318742726032; Sat, 15 Oct 2011 22:25:26 -0700 (PDT) Received: by 10.142.239.12 with HTTP; Sat, 15 Oct 2011 22:25:25 -0700 (PDT) In-Reply-To: References: Date: Sat, 15 Oct 2011 22:25:25 -0700 Message-ID: From: Freddie Cash To: Patrick Donnelly Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 05:25:27 -0000 On Sat, Oct 15, 2011 at 9:45 PM, Patrick Donnelly wrote: > I've got an array for home use where my boot drive (UFS) finally died. > I've decided to upgrade to a SSD for a replacement but am looking to > maybe simultaneously improving performance of my ZFS array. Naturally > a FreeBSD install doesn't use much space so partitioning the drive to > get maximum usage seems wise. I was thinking for a hypothetical 40GB > drive: > > 20GB -- FreeBSD / partition > 2GB -- ZFS ZIL > 18GB -- ZFS Cache > > What I'm wondering is if this will be a bad idea. I know that SSDs are > not designed to be written to *a lot*, which a ZIL will experience. Is > this a bad idea? I'm hoping for experiences from people in similar > scenarios. As I'm not an enterprise IT person who can't simply choose > to just throw more mon-- I mean SSDs -- at the problem, I need to be > efficient. :) [I'm thinking the cache drive partition might be > pointless as I don't think I'd benefit that much from it.] > > Disclaimer: I've looked at a lot of guides, including the standard > best practices guide, and none of it seemed helpful for my particular > problem, especially given that I'm using FreeBSD. > For home use, there's nothing wrong with doing this. Unless it's an NFS server used by multiple clients, you won't be pounding the ZIL; and you may not even need to have a separate log device. Create the pool, create a test filesystem, do some benchmarks to get a baseline (preferably with the "normal" workload you'd be doing). Then destroy/create the filesystem again, "zfs set sync=off" on the filesystem, and benchmark the filesystem again. If you get a huge performance gain, then turn sync on again, create the separate log and test again. Using the SSD for the OS and the cache will be fine. L2ARC is throttled to 7 MBps of writes, and is then a read-heavy partition, so is very easy on the drive. Whether or not you benefit from the L2ARC depends on whether you will be using dedupe and whether or not your files are accessed multiple times within short periods of times. If you are really worried about the longevity of the SSD, then under-provision it. Only partition/format 36 GB of it, leaving the extra 4 GB to be used internally for extra wear-leveling. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 12:14:30 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D27B106566B for ; Sun, 16 Oct 2011 12:14:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A60928FC08 for ; Sun, 16 Oct 2011 12:14:29 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA20998; Sun, 16 Oct 2011 15:14:26 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1RFPbl-0006kS-Qa; Sun, 16 Oct 2011 15:14:25 +0300 Message-ID: <4E9ACA9F.5090308@FreeBSD.org> Date: Sun, 16 Oct 2011 15:14:23 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1 MIME-Version: 1.0 To: Florian Wagner References: <20111015214347.09f68e4e@naclador.mos32.de> In-Reply-To: <20111015214347.09f68e4e@naclador.mos32.de> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: Extending zfsboot.c to allow selecting filesystem from boot.config X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 12:14:30 -0000 on 15/10/2011 22:43 Florian Wagner said the following: > Hi, > > from looking at the code in sys/boot/i386/zfsboot/zfsboot.c the ZFS aware > boot block already allows to select pool to load the kernel from by adding > : to the boot.config. As this code calls the > zfs_mount_pool function it will look for the bootfs property on the new > pool or use its root dataset to get the file from there. > > How much work would it be to extend the loader to also allow selecting a > ZFS filesystem? > > What I'd like to do is place a boot.config on the (otherwise empty) root of > my system pool and then tell it to get the loader from another filesystem > by putting "rpool/root/stable-8-r226381:/boot/zfsloader" in there. Please check out the following changes: https://gitorious.org/~avg/freebsd/avgbsd/commit/8c3808c4bb2a2cd746db3e9c46871c9bdf943ef6 https://gitorious.org/~avg/freebsd/avgbsd/commit/0b4279c0d366d9f2b5bb9d4c0dd3229d8936d92b https://gitorious.org/~avg/freebsd/avgbsd/commit/b29ab78b079f27918de1683e88bcb1817a0e5969 https://gitorious.org/~avg/freebsd/avgbsd/commit/f49add15516dfd582258b6820b8f0254cf9419a3 https://gitorious.org/~avg/freebsd/avgbsd/commit/e072b443b0f59fe1ff54a70d2437d63698bbf597 https://gitorious.org/~avg/freebsd/avgbsd/commit/f701760c10812c5b6925352fb003408c19170063 -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 14:44:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 499DD106564A for ; Sun, 16 Oct 2011 14:44:33 +0000 (UTC) (envelope-from luchesar.iliev@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id D17058FC14 for ; Sun, 16 Oct 2011 14:44:32 +0000 (UTC) Received: by eyd10 with SMTP id 10so2886521eyd.13 for ; Sun, 16 Oct 2011 07:44:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:disposition-notification-to:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=gfGlrJxD+iHNimqd0P7Mcnfxitgs/gMna1ov/EOWu88=; b=AyiCFhqxP2QCaY5losVsHj3204VCAqWKJMy2jHx2jS5DRDtNVm/lYrYHnN8bccaykP ly3ODRK6Qc7J17HxW31p+F0AqPOyEpOTYRYQ1SGILsYKMAmCGmQcov+ZU5Mi4w+YbBSk 3yzFVxjo1LVi/fA7PJA6QUFt2vQyyPEFQ87ZM= Received: by 10.223.81.205 with SMTP id y13mr17771083fak.34.1318774567893; Sun, 16 Oct 2011 07:16:07 -0700 (PDT) Received: from [79.124.93.41] ([79.124.93.41]) by mx.google.com with ESMTPS id y8sm17403394faj.10.2011.10.16.07.16.06 (version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 07:16:06 -0700 (PDT) Message-ID: <4E9AE725.4040001@gmail.com> Date: Sun, 16 Oct 2011 17:16:05 +0300 From: "Luchesar V. ILIEV" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1 MIME-Version: 1.0 To: Patrick Donnelly References: In-Reply-To: X-Enigmail-Version: undefined OpenPGP: id=9A1FEEFF; url=https://cert.acad.bg/pgp-keys/ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 14:44:33 -0000 On 16/10/2011 07:45, Patrick Donnelly wrote: > Hi list, > > I've got an array for home use where my boot drive (UFS) finally died. > I've decided to upgrade to a SSD for a replacement but am looking to > maybe simultaneously improving performance of my ZFS array. Naturally > a FreeBSD install doesn't use much space so partitioning the drive to > get maximum usage seems wise. I was thinking for a hypothetical 40GB > drive: > > 20GB -- FreeBSD / partition > 2GB -- ZFS ZIL > 18GB -- ZFS Cache > > What I'm wondering is if this will be a bad idea. I know that SSDs are > not designed to be written to *a lot*, which a ZIL will experience. Is > this a bad idea? I'm hoping for experiences from people in similar > scenarios. As I'm not an enterprise IT person who can't simply choose > to just throw more mon-- I mean SSDs -- at the problem, I need to be > efficient. :) [I'm thinking the cache drive partition might be > pointless as I don't think I'd benefit that much from it.] > > Disclaimer: I've looked at a lot of guides, including the standard > best practices guide, and none of it seemed helpful for my particular > problem, especially given that I'm using FreeBSD. > > Thanks for any advice, > There are other, much more knowledgeable people around, who might give you better advice, but let me just make a few points: 1. If you can afford more RAM, it's (much) better for ZFS than L2ARC. 2. It's not just the ZIL devices that get heavily written. L2ARC ones also get their hefty share of writes. And even when the cache becomes "hot" enough, keep in mind that... 3. You lose all L2ARC contents once the system gets rebooted. It's kind of counter-intuitive, but that's how it is (and for a reason). 4. L2ARC and ZIL have almost the opposite performance requirements, so putting them on the same device is likely never going to be optimal (unless you spend a fortune on that SSD). 5. Check the output of "zpool upgrade". If your zpool version is anything below 19 (likely 14 or 15), I'd strongly recommend that you avoid setting up a separate ZIL. Pools before v19 fail critically when the ZIL is removed or is corrupted, which means you lose them for good. You might mitigate the risk with a mirrored ZIL, but it's still likely not worth it in your case. 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), then your zpool version is likely 28 (thanks, pjd@), which means ZIL is not that scary, but you might still lose some data. Even an unexpected power failure might cause trouble, unless the SSD is designed to handle it gracefully (this typically involves some sort of capacitor). The topic is quite popular, and I'd suggest you do some searching and reading around ("ZFS SSD" on Google brings a lot of interesting and helpful things, especially on the OpenSolaris and FreeBSD's forums). If you don't feel that geeky, a good starting point might be this one: http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs It really depends on your needs, your current (and potential future) system configuration, and the time and effort you're ready to spend. Again, I'm no expert in those things, so take all my comments with a grain of salt. Good luck! Cheers, Luchesar -- i.dea.is/luchesar From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 16:17:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E42491065672 for ; Sun, 16 Oct 2011 16:17:51 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 6DB5A8FC16 for ; Sun, 16 Oct 2011 16:17:50 +0000 (UTC) Received: from [192.92.129.101] ([192.92.129.101]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p9GGHcTa026595 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 16 Oct 2011 19:17:44 +0300 (EEST) (envelope-from daniel@digsys.bg) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Daniel Kalchev In-Reply-To: <4E9AE725.4040001@gmail.com> Date: Sun, 16 Oct 2011 19:17:39 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> References: <4E9AE725.4040001@gmail.com> To: "Luchesar V. ILIEV" X-Mailer: Apple Mail (2.1251.1) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 16:17:52 -0000 On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote: > 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), then > your zpool version is likely 28 (thanks, pjd@), which means ZIL is not > that scary, but you might still lose some data. Even an unexpected = power > failure might cause trouble, unless the SSD is designed to handle it > gracefully (this typically involves some sort of capacitor). Just for the record: even without ZIL, you will most definitely lose = data at power outage. In most cases, this will not damage the ZFS = filesystem, but data will be lost. There is nothing that can prevent = this. Therefore, with ZFS v28, adding ZIL does not introduce any more risk to = your data. One thing to have in mind is ZIL will help only under certain workloads, = sequential write is not one of these. It helps most with database-type = loads and sync writes like an NFS server that is written heavily. = Freddie have good advice on determining if it will help. L2ARC on the other hand may help enormously, especially if the spool is = big. Workstation-class motherboards until recently were topped at 8GB = RAM and ZFS is happy with as much RAM as you can offer. Adding L2ARC may = provide more headroom. Benefits of course depend on the workload. = Neither L2ARC or ZIL provide magical benefits. Daniel= From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 18:02:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA7BB106564A for ; Sun, 16 Oct 2011 18:02:11 +0000 (UTC) (envelope-from luchesar.iliev@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3463C8FC08 for ; Sun, 16 Oct 2011 18:02:10 +0000 (UTC) Received: by eyd10 with SMTP id 10so3039725eyd.13 for ; Sun, 16 Oct 2011 11:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:disposition-notification-to:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=QOOUgb8RcyyrpwaS1VzsZRQpmbZEqK6Tc2R46U2qG9c=; b=V4gDDt1u4QTh/5qXyxdiZ88fEYjD08zQRHuxXsa9tualP7SY4luRT6Uv9CBR7FJcHN +AiuiDWB6DLbzPxNPoZ4wItjfnVJ5o12ymyscQC6m8u0s0mj9WJb8yeiuxiPXkONIeWD aX96I5tU6PubuXBTPh7gQvi3eg4Vkfntxy5f0= Received: by 10.223.17.11 with SMTP id q11mr19125531faa.13.1318788129324; Sun, 16 Oct 2011 11:02:09 -0700 (PDT) Received: from [79.124.93.41] ([79.124.93.41]) by mx.google.com with ESMTPS id m26sm18516987fac.6.2011.10.16.11.02.07 (version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 11:02:08 -0700 (PDT) Message-ID: <4E9B1C1E.7090804@gmail.com> Date: Sun, 16 Oct 2011 21:02:06 +0300 From: "Luchesar V. ILIEV" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1 MIME-Version: 1.0 To: Daniel Kalchev , Patrick Donnelly References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> In-Reply-To: <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> X-Enigmail-Version: undefined OpenPGP: id=9A1FEEFF; url=https://cert.acad.bg/pgp-keys/ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 18:02:11 -0000 On 16/10/2011 19:17, Daniel Kalchev wrote: > > On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote: > >> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), >> then your zpool version is likely 28 (thanks, pjd@), which means >> ZIL is not that scary, but you might still lose some data. Even an >> unexpected power failure might cause trouble, unless the SSD is >> designed to handle it gracefully (this typically involves some sort >> of capacitor). > > Just for the record: even without ZIL, you will most definitely lose > data at power outage. In most cases, this will not damage the ZFS > filesystem, but data will be lost. There is nothing that can prevent > this. > > Therefore, with ZFS v28, adding ZIL does not introduce any more risk > to your data. I might be wrong in my interpretation, but from what I remember, when the power goes down, an unprotected SSD is likely to lose _more_ data than simply its write buffers -- that's quite unlike a hard-drive. So much, in fact, that the whole ZIL might become corrupted (and that's potentially way more data than any device cache). _If_ that's true, then isn't an array of only "conventional" HDDs, where the ZIL is interleaved with the zpool itself, at least a bit safer from power failures? Again, if we are taking the cheaper SSDs into account. > One thing to have in mind is ZIL will help only under certain > workloads, sequential write is not one of these. It helps most with > database-type loads and sync writes like an NFS server that is > written heavily. Freddie have good advice on determining if it will > help. > > L2ARC on the other hand may help enormously, especially if the spool > is big. Workstation-class motherboards until recently were topped at > 8GB RAM and ZFS is happy with as much RAM as you can offer. Adding > L2ARC may provide more headroom. Benefits of course depend on the > workload. Neither L2ARC or ZIL provide magical benefits. Which is yet another reason to go for more RAM, as it tends to be quite magic-yielding. Just kidding here, but, seriously, if Patrick has room for some RAM upgrade, I think he should consider this, at least for performance (a boot and OS drive, obviously, are another matter). Cheers, Luchesar From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 18:10:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54A581065673 for ; Sun, 16 Oct 2011 18:10:05 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id CECC18FC12 for ; Sun, 16 Oct 2011 18:10:04 +0000 (UTC) Received: from [192.92.129.101] ([192.92.129.101]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p9GI9rLe026908 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Sun, 16 Oct 2011 21:10:00 +0300 (EEST) (envelope-from daniel@digsys.bg) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Daniel Kalchev In-Reply-To: <4E9B1C1E.7090804@gmail.com> Date: Sun, 16 Oct 2011 21:09:53 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg> References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> To: "Luchesar V. ILIEV" X-Mailer: Apple Mail (2.1251.1) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 18:10:05 -0000 On Oct 16, 2011, at 21:02 , Luchesar V. ILIEV wrote: > On 16/10/2011 19:17, Daniel Kalchev wrote: >> Therefore, with ZFS v28, adding ZIL does not introduce any more risk >> to your data. >=20 > I might be wrong in my interpretation, but from what I remember, when > the power goes down, an unprotected SSD is likely to lose _more_ data > than simply its write buffers -- that's quite unlike a hard-drive. So > much, in fact, that the whole ZIL might become corrupted (and that's > potentially way more data than any device cache). The real risk with low-grade "unprotected" SSDs is that the SSD may well = become damaged, sometimes beyond repair. It is the same risk with SSDs or with magnetic drives. If the drive lies = to the OS that it has safely written data -- then data will be lost. = Thing is, we know what a cheap HDD is. Most SSDs however lie, because = otherwise they will offer very poor write performance. ZIL is not about RAM. ZIL is for low latency synchronous writing. It = does not matter how much RAM do you have -- it will not help if you have = heavy synchronous writing (of small records). =20 Anyway, as it was mentioned -- with moderate activity on the pool, it is = not problem to use the same SSD for boot/ZIL/L2ARC. Daniel= From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 18:30:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49634106564A for ; Sun, 16 Oct 2011 18:30:13 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id E8D4F8FC13 for ; Sun, 16 Oct 2011 18:30:12 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta04.westchester.pa.mail.comcast.net with comcast id lWSz1h0061uE5Es54WW6aU; Sun, 16 Oct 2011 18:30:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.westchester.pa.mail.comcast.net with comcast id lWW41h01h1t3BNj3cWW5uK; Sun, 16 Oct 2011 18:30:06 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 572CE102C1C; Sun, 16 Oct 2011 11:30:03 -0700 (PDT) Date: Sun, 16 Oct 2011 11:30:03 -0700 From: Jeremy Chadwick To: "Luchesar V. ILIEV" Message-ID: <20111016183003.GA29466@icarus.home.lan> References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E9B1C1E.7090804@gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 18:30:13 -0000 On Sun, Oct 16, 2011 at 09:02:06PM +0300, Luchesar V. ILIEV wrote: > On 16/10/2011 19:17, Daniel Kalchev wrote: > > > > On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote: > > > >> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), > >> then your zpool version is likely 28 (thanks, pjd@), which means > >> ZIL is not that scary, but you might still lose some data. Even an > >> unexpected power failure might cause trouble, unless the SSD is > >> designed to handle it gracefully (this typically involves some sort > >> of capacitor). > > > > Just for the record: even without ZIL, you will most definitely lose > > data at power outage. In most cases, this will not damage the ZFS > > filesystem, but data will be lost. There is nothing that can prevent > > this. > > > > Therefore, with ZFS v28, adding ZIL does not introduce any more risk > > to your data. > > I might be wrong in my interpretation, but from what I remember, when > the power goes down, an unprotected SSD is likely to lose _more_ data > than simply its write buffers -- that's quite unlike a hard-drive. So > much, in fact, that the whole ZIL might become corrupted (and that's > potentially way more data than any device cache). > > _If_ that's true, then isn't an array of only "conventional" HDDs, where > the ZIL is interleaved with the zpool itself, at least a bit safer from > power failures? Again, if we are taking the cheaper SSDs into account. Please expand on the above, providing reference materials or links to things you've read that help shed light on all of this. More specifically: 1) I would like a definition of what "unprotected SSD" means and what a "protected SSD" is. 2) I would like an explanation as to what "SSDs are more likely than an MHDD to lose data on a power outage" means exactly (on a technical level, not something vague) and from where you got this interpretation. Thanks! -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 18:40:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D3ED106568A for ; Sun, 16 Oct 2011 18:40:13 +0000 (UTC) (envelope-from luchesar.iliev@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id C958E8FC1C for ; Sun, 16 Oct 2011 18:40:12 +0000 (UTC) Received: by bkbzu17 with SMTP id zu17so2891556bkb.13 for ; Sun, 16 Oct 2011 11:40:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:disposition-notification-to:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=wuTMRkXxvI/4uaWfKuWMj4VpFQ+HDun/d4xT26GERNk=; b=hAQF9yR6VgQieQnkRu5fjYjzfOaXXzSjm7eyf6ocRVNqKEdrBibOdNEdMzjfzLeVjI LRWMmWlHEXg9ZVwrwa+hJBym2uHfCM6046vj+a1lU/Z8DmFpBdYbJQDpj5lGryvWJZk5 Q8EosHO4CTCFX8GT/AMdfSX2hyyS1nmiA0NNE= Received: by 10.223.17.11 with SMTP id q11mr19300391faa.13.1318790411561; Sun, 16 Oct 2011 11:40:11 -0700 (PDT) Received: from [79.124.93.41] ([79.124.93.41]) by mx.google.com with ESMTPS id m26sm18703099fac.6.2011.10.16.11.40.10 (version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 11:40:10 -0700 (PDT) Message-ID: <4E9B2509.3030701@gmail.com> Date: Sun, 16 Oct 2011 21:40:09 +0300 From: "Luchesar V. ILIEV" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1 MIME-Version: 1.0 To: Daniel Kalchev References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> <14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg> In-Reply-To: <14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg> X-Enigmail-Version: undefined OpenPGP: id=9A1FEEFF; url=https://cert.acad.bg/pgp-keys/ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 18:40:13 -0000 On 16/10/2011 21:09, Daniel Kalchev wrote: > > On Oct 16, 2011, at 21:02 , Luchesar V. ILIEV wrote: > >> On 16/10/2011 19:17, Daniel Kalchev wrote: >>> Therefore, with ZFS v28, adding ZIL does not introduce any more >>> risk to your data. >> >> I might be wrong in my interpretation, but from what I remember, >> when the power goes down, an unprotected SSD is likely to lose >> _more_ data than simply its write buffers -- that's quite unlike a >> hard-drive. So much, in fact, that the whole ZIL might become >> corrupted (and that's potentially way more data than any device >> cache). > > The real risk with low-grade "unprotected" SSDs is that the SSD may > well become damaged, sometimes beyond repair. > > It is the same risk with SSDs or with magnetic drives. If the drive > lies to the OS that it has safely written data -- then data will be > lost. Thing is, we know what a cheap HDD is. Most SSDs however lie, > because otherwise they will offer very poor write performance. That's true, but my understanding is that the differences go further beyond that. To quote one paper: "Our data show that flash memory’s behavior under power failure is surprising in several ways. First, operations that come closer to completion do not necessarily exhibit fewer bit errors. Second, power failure not only results in failure of the operation in progress, it can also corrupt data already present in the flash device. Third, power failure can negatively impact the integrity of future data written to the device." http://cseweb.ucsd.edu/users/swanson/papers/DAC2011PowerCut.pdf However, that's probably getting too academic (also beyond my own qualifications), and I wouldn't like to hijack the thread. > ZIL is not about RAM. ZIL is for low latency synchronous writing. It > does not matter how much RAM do you have -- it will not help if you > have heavy synchronous writing (of small records). >From what I understand, Patrick is talking about a home system, which is not very likely to be heavy on the synchronous writes, unless, of course, he's using NFS or a database. On the other hand, most desktop applications would happily use the additional memory, so it benefits not just the storage subsystem. That's why I'm making the point about the RAM upgrade, but, apart from that, you're absolutely correct about ZIL and synchronous writes. Cheers, Luchesar From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 18:52:12 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4A0B106564A for ; Sun, 16 Oct 2011 18:52:12 +0000 (UTC) (envelope-from luchesar.iliev@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3D69F8FC12 for ; Sun, 16 Oct 2011 18:52:12 +0000 (UTC) Received: by bkbzu17 with SMTP id zu17so2903311bkb.13 for ; Sun, 16 Oct 2011 11:52:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:disposition-notification-to:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:openpgp:content-type:content-transfer-encoding; bh=oksDqTP2HOnK8TrZNOj4iOET6qRsNAodX4TQu7I90Mo=; b=gJqexuC3douGb8aICmV4arg0YYE7DPiwV26mMD9rosCscMZOCF6WLtknXkq9gkhhcA QYyAfm7d7XXXb6kTHfTgoUme2ehY5ZW852t3jgYPHvHlcgEqWwHHRRNIUpe8oxS2Aa+V rOaBiFy38N4z9N27z8jjwrkA/mA5gTuCufbwk= Received: by 10.223.77.26 with SMTP id e26mr9791168fak.37.1318791131198; Sun, 16 Oct 2011 11:52:11 -0700 (PDT) Received: from [79.124.93.41] ([79.124.93.41]) by mx.google.com with ESMTPS id f23sm4616464faf.0.2011.10.16.11.52.09 (version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 11:52:10 -0700 (PDT) Message-ID: <4E9B27D8.70106@gmail.com> Date: Sun, 16 Oct 2011 21:52:08 +0300 From: "Luchesar V. ILIEV" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1 MIME-Version: 1.0 To: Jeremy Chadwick References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> <20111016183003.GA29466@icarus.home.lan> In-Reply-To: <20111016183003.GA29466@icarus.home.lan> X-Enigmail-Version: undefined OpenPGP: id=9A1FEEFF; url=https://cert.acad.bg/pgp-keys/ Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 18:52:12 -0000 On 16/10/2011 21:30, Jeremy Chadwick wrote: > On Sun, Oct 16, 2011 at 09:02:06PM +0300, Luchesar V. ILIEV wrote: >> On 16/10/2011 19:17, Daniel Kalchev wrote: >>> >>> On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote: >>> >>>> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), >>>> then your zpool version is likely 28 (thanks, pjd@), which means >>>> ZIL is not that scary, but you might still lose some data. Even an >>>> unexpected power failure might cause trouble, unless the SSD is >>>> designed to handle it gracefully (this typically involves some sort >>>> of capacitor). >>> >>> Just for the record: even without ZIL, you will most definitely lose >>> data at power outage. In most cases, this will not damage the ZFS >>> filesystem, but data will be lost. There is nothing that can prevent >>> this. >>> >>> Therefore, with ZFS v28, adding ZIL does not introduce any more risk >>> to your data. >> >> I might be wrong in my interpretation, but from what I remember, when >> the power goes down, an unprotected SSD is likely to lose _more_ data >> than simply its write buffers -- that's quite unlike a hard-drive. So >> much, in fact, that the whole ZIL might become corrupted (and that's >> potentially way more data than any device cache). >> >> _If_ that's true, then isn't an array of only "conventional" HDDs, where >> the ZIL is interleaved with the zpool itself, at least a bit safer from >> power failures? Again, if we are taking the cheaper SSDs into account. > > Please expand on the above, providing reference materials or links to > things you've read that help shed light on all of this. More > specifically: I haven't really dug that much into that. Apart from general comments (mostly on the OpenSolaris forums), the most technical (and academic) source of information is the paper that I already quoted: Hung-Wei Tseng, Laura M. Grupp, Steven Swanson, "Understanding the Impact of Power Loss on Flash Memory", DCSE-UCSD. http://cseweb.ucsd.edu/users/swanson/papers/DAC2011PowerCut.pdf > 1) I would like a definition of what "unprotected SSD" means and what a > "protected SSD" is. Let me better quote again, "Many high-end SSDs have backup batteries or capacitors to ensure that operations complete even if power fails. Our results argue that these systems should provide power until the chip signals that the operation is finished rather than until the data appears to be correct. Low-end SSDs and embedded systems, however, often do not contain backup power sources due to cost or space constraints, and these systems must be extremely careful to prevent data loss and/or reduced reliability after a power failure." > 2) I would like an explanation as to what "SSDs are more likely than an > MHDD to lose data on a power outage" means exactly (on a technical > level, not something vague) and from where you got this interpretation. Again, to quote "The flash memory devices we studied in this work demonstrated unexpected behavior when power failure occurs. The error rates do not always decrease as the operation proceeds, and power failure can corrupt the data from operations that completed successfully. We also found that relying on blocks that have been programmed or erased during a power failure is unreliable, even if the data appears to be intact." I'd actually be interested to hear what the more experienced folks here think about this; however, again, it's probably not right to hijack the current thread. Cheers, Luchesar From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 19:10:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CD33106566C for ; Sun, 16 Oct 2011 19:10:41 +0000 (UTC) (envelope-from florian@wagner-flo.net) Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net [213.165.81.202]) by mx1.freebsd.org (Postfix) with ESMTP id 4BE9C8FC0C for ; Sun, 16 Oct 2011 19:10:41 +0000 (UTC) Received: from naclador.mos32.de (ppp-88-217-87-168.dynamic.mnet-online.de [88.217.87.168]) by umbracor.wagner-flo.net (Postfix) with ESMTPSA id EFE9F3C06145 for ; Sun, 16 Oct 2011 21:10:42 +0200 (CEST) Date: Sun, 16 Oct 2011 21:10:38 +0200 From: Florian Wagner To: freebsd-fs@freebsd.org Message-ID: <20111016211038.05de98d2@naclador.mos32.de> In-Reply-To: <4E9B27D8.70106@gmail.com> References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> <20111016183003.GA29466@icarus.home.lan> <4E9B27D8.70106@gmail.com> X-Mailer: Claws Mail 3.7.9 (GTK+ 2.24.5; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_//AIY9Dwzj2aC435CJeGIEvF"; protocol="application/pgp-signature" Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 19:10:41 -0000 --Sig_//AIY9Dwzj2aC435CJeGIEvF Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable > > 1) I would like a definition of what "unprotected SSD" means and > > what a "protected SSD" is. >=20 > Let me better quote again, "Many high-end SSDs have backup batteries > or capacitors to ensure that operations complete even if power fails. > Our results argue that these systems should provide power until the > chip signals that the operation is finished rather than until the data > appears to be correct. Low-end SSDs and embedded systems, however, > often do not contain backup power sources due to cost or space > constraints, and these systems must be extremely careful to prevent > data loss and/or reduced reliability after a power failure." I can provide a practical data point on this. I've tested power failure with some Corsair Force SSD. I've used one of the tools available for that. The process goes like that: 1. Start some kind of server application which waits for messages. 2. Start a client application which in a loop does: a. Write a block of data to disk. b. Call fsync/fdatasync to make sure the written data is on this. This systemcall sould block the application until all layers (including) the disk driver and thus the disk signal the write has completed. c. Send a message to the server which then displays the block number written. 3. Cut power to the SSD. A correctly behaving drive should have at least as many data blocks on disk as are displayed in the server application. Sometimes even more. For the tested SSD data blocks amounting to about 1 to 1.2 MB of data were consistently missing even thought they were signaled to be on disk. Care was taken to ensure that all involved OS subsystems were capable of handling the fsync/fdatasync calls correctly and passing them to lower layers (which is not the case for all filesystems in older versions of Linux for example). I've just recently repeated the test for a Intel 320 drive (the 120 GB version, but should be the same for all models) which includes a set of capacitors. These exhibit correct behavior. No missing data for about a dozen trials. Regards Florian Wagner --Sig_//AIY9Dwzj2aC435CJeGIEvF Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iEYEARECAAYFAk6bLC4ACgkQLvW/2gp2pPxruACfVVrvI1bUhcgdGidrTQsd66AW Dp8AoIu4cYPoD5XMH/vQgllpc1uXglqG =BCeM -----END PGP SIGNATURE----- --Sig_//AIY9Dwzj2aC435CJeGIEvF-- From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 20:03:12 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84F811065676 for ; Sun, 16 Oct 2011 20:03:12 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 560748FC13 for ; Sun, 16 Oct 2011 20:03:12 +0000 (UTC) Received: from julian-mac.elischer.org (home-nat.elischer.org [67.100.89.137]) (authenticated bits=0) by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id p9GK2ri7046326 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 16 Oct 2011 13:02:58 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <4E9B3876.2010809@freebsd.org> Date: Sun, 16 Oct 2011 13:03:02 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15 MIME-Version: 1.0 To: "Luchesar V. ILIEV" References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> <14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg> <4E9B2509.3030701@gmail.com> In-Reply-To: <4E9B2509.3030701@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 20:03:12 -0000 On 10/16/11 11:40 AM, Luchesar V. ILIEV wrote: > > That's true, but my understanding is that the differences go further > beyond that. To quote one paper: "Our data show that flash memory’s > behavior under power failure is surprising in several ways. First, > operations that come closer to completion do not necessarily exhibit > fewer bit errors. Second, power failure not only results in failure of > the operation in progress, it can also corrupt data already present in > the flash device. Third, power failure can negatively impact the > integrity of future data written to the device." However one must not confuse Flash memeory with drives using flash memory. part of the added value brought to the tabel by SSD manufacturers is (SHOULD BE) the addition of mechanisms to cope with unexpected power down. whether it be hold-up caoacitors, battery backed up ram or any other mechanism they can think of. Certainly Fusion-IO Flash cards will never lose data that they have reported as having been written. > http://cseweb.ucsd.edu/users/swanson/papers/DAC2011PowerCut.pdf > > However, that's probably getting too academic (also beyond my own > qualifications), and I wouldn't like to hijack the thread. > > Cheers, > Luchesar > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 21:13:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA5DD106566B for ; Sun, 16 Oct 2011 21:13:27 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 9017E8FC14 for ; Sun, 16 Oct 2011 21:13:27 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p9GLDQ4A005157; Sun, 16 Oct 2011 16:13:26 -0500 (CDT) Date: Sun, 16 Oct 2011 16:13:26 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jeremy Chadwick In-Reply-To: <20111016183003.GA29466@icarus.home.lan> Message-ID: References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> <20111016183003.GA29466@icarus.home.lan> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 16 Oct 2011 16:13:26 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 21:13:28 -0000 On Sun, 16 Oct 2011, Jeremy Chadwick wrote: > > 2) I would like an explanation as to what "SSDs are more likely than an > MHDD to lose data on a power outage" means exactly (on a technical > level, not something vague) and from where you got this interpretation. The reason is that normal operation of the SSD will move and/or rewrite existing data, which is also likely to be much older than the data currently being written. Common reasons are wear leveling, garbage collection (compacting) and because the block written is not identically sized and aligned with the SSDs native underlying blocks. While data is being re-written, moved, or copied, a copy resides in RAM. A SSD which is more defensive about avoiding corrupting old data is also likely to be slower to synchronously write. There are certainly algorithms (e.g. as used by zfs) which can help an SSD avoid issues. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 21:49:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9D071065674 for ; Sun, 16 Oct 2011 21:49:41 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id AF8668FC0A for ; Sun, 16 Oct 2011 21:49:41 +0000 (UTC) Received: (qmail 13010 invoked by uid 0); 16 Oct 2011 21:23:00 -0000 Received: from smtp.bway.net (216.220.96.25) by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 16 Oct 2011 21:23:00 -0000 Received: (qmail 13002 invoked by uid 90); 16 Oct 2011 21:23:00 -0000 Received: from unknown (HELO ?10.3.2.40?) (spork@96.57.144.66) by smtp.bway.net with (AES128-SHA encrypted) SMTP; 16 Oct 2011 21:23:00 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Charles Sprickman In-Reply-To: <20111016211038.05de98d2@naclador.mos32.de> Date: Sun, 16 Oct 2011 17:22:59 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4E9AE725.4040001@gmail.com> <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg> <4E9B1C1E.7090804@gmail.com> <20111016183003.GA29466@icarus.home.lan> <4E9B27D8.70106@gmail.com> <20111016211038.05de98d2@naclador.mos32.de> To: Florian Wagner X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 21:49:42 -0000 On Oct 16, 2011, at 3:10 PM, Florian Wagner wrote: >>> 1) I would like a definition of what "unprotected SSD" means and >>> what a "protected SSD" is. >>=20 >> Let me better quote again, "Many high-end SSDs have backup batteries >> or capacitors to ensure that operations complete even if power fails. >> Our results argue that these systems should provide power until the >> chip signals that the operation is finished rather than until the = data >> appears to be correct. Low-end SSDs and embedded systems, however, >> often do not contain backup power sources due to cost or space >> constraints, and these systems must be extremely careful to prevent >> data loss and/or reduced reliability after a power failure." >=20 > I can provide a practical data point on this. I've tested power = failure > with some Corsair Force SSD. I've used one of the tools available for > that. The process goes like that: >=20 > 1. Start some kind of server application which waits for messages. > 2. Start a client application which in a loop does: > a. Write a block of data to disk. > b. Call fsync/fdatasync to make sure the written data is on this. > This systemcall sould block the application until all layers > (including) the disk driver and thus the disk signal the write > has completed. > c. Send a message to the server which then displays the block > number written. > 3. Cut power to the SSD. >=20 > A correctly behaving drive should have at least as many data blocks on > disk as are displayed in the server application. Sometimes even more. >=20 > For the tested SSD data blocks amounting to about 1 to 1.2 MB of data > were consistently missing even thought they were signaled to be on = disk. >=20 > Care was taken to ensure that all involved OS subsystems were capable > of handling the fsync/fdatasync calls correctly and passing them > to lower layers (which is not the case for all filesystems in older > versions of Linux for example). >=20 > I've just recently repeated the test for a Intel 320 drive (the 120 GB > version, but should be the same for all models) which includes a set = of > capacitors. These exhibit correct behavior. No missing data for about = a > dozen trials. This sounds like diskchecker: http://brad.livejournal.com/2116715.html There are finally some affordable drives (eg: $100 40GB for ZIL) that incorporate a capacitor that allows the drive to flush cache to flash. My understanding is that the Intel 320 series have this and supposedly some of the OCZ drives (Vertex Pro 2, Vertex Pro 3). Intel is the only one I'm finding that has an explicit declaration that data is safely flushed though: = http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_3= 20_Series_Enhance_Power_Loss_Technology_Brief.pdf The PostgreSQL lists have some interesting info, as they are pretty conservative about the definition of "safe" writes. Also one of the devs posted this earlier this year: = http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html= For ZFS ZIL use, it also seems like mirroring is generally recommended. Since ZIL doesn't benefit from large drives, the lowest priced Intel 320 x2 is not a bank-breaker. Charles >=20 > Regards > Florian Wagner From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 22:48:44 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06F95106566B; Sun, 16 Oct 2011 22:48:44 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D32648FC0C; Sun, 16 Oct 2011 22:48:43 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9GMmh1T062070; Sun, 16 Oct 2011 22:48:43 GMT (envelope-from eadler@freefall.freebsd.org) Received: (from eadler@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9GMmhqk062066; Sun, 16 Oct 2011 22:48:43 GMT (envelope-from eadler) Date: Sun, 16 Oct 2011 22:48:43 GMT Message-Id: <201110162248.p9GMmhqk062066@freefall.freebsd.org> To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: eadler@FreeBSD.org Cc: Subject: Re: kern/161674: [ufs] snapshot on journaled ufs doesn't work X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 22:48:44 -0000 Old Synopsis: snapshot on journaled ufs doesn't work New Synopsis: [ufs] snapshot on journaled ufs doesn't work Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: eadler Responsible-Changed-When: Sun Oct 16 22:48:01 UTC 2011 Responsible-Changed-Why: over to maintainer http://www.freebsd.org/cgi/query-pr.cgi?pr=161674 From owner-freebsd-fs@FreeBSD.ORG Sun Oct 16 23:13:53 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA2C7106566C; Sun, 16 Oct 2011 23:13:53 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C24788FC12; Sun, 16 Oct 2011 23:13:53 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9GNDr6W089436; Sun, 16 Oct 2011 23:13:53 GMT (envelope-from eadler@freefall.freebsd.org) Received: (from eadler@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9GNDr2D089432; Sun, 16 Oct 2011 23:13:53 GMT (envelope-from eadler) Date: Sun, 16 Oct 2011 23:13:53 GMT Message-Id: <201110162313.p9GNDr2D089432@freefall.freebsd.org> To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: eadler@FreeBSD.org Cc: Subject: Re: kern/161205: [nfs] [pfsync] [regression] [build] Bug report freebsd 9.0 beta 3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Oct 2011 23:13:54 -0000 Old Synopsis: [build] Bug report freebsd 9.0 beta 3 [nfs] [pfsync] [regression] New Synopsis: [nfs] [pfsync] [regression] [build] Bug report freebsd 9.0 beta 3 Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: eadler Responsible-Changed-When: Sun Oct 16 23:12:53 UTC 2011 Responsible-Changed-Why: over to maintainer http://www.freebsd.org/cgi/query-pr.cgi?pr=161205 From owner-freebsd-fs@FreeBSD.ORG Mon Oct 17 08:24:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 97C481065673 for ; Mon, 17 Oct 2011 08:24:59 +0000 (UTC) (envelope-from maurizio.vairani@cloverinformatica.it) Received: from smtplq03.aruba.it (smtplq-out18.aruba.it [62.149.158.38]) by mx1.freebsd.org (Postfix) with SMTP id 081A88FC13 for ; Mon, 17 Oct 2011 08:24:57 +0000 (UTC) Received: (qmail 20766 invoked by uid 89); 17 Oct 2011 08:24:55 -0000 Received: from unknown (HELO smtp7.aruba.it) (62.149.158.227) by smtplq03.aruba.it with SMTP; 17 Oct 2011 08:24:55 -0000 Received: (qmail 2057 invoked by uid 89); 17 Oct 2011 08:24:55 -0000 Received: from unknown (HELO clover.dyndns.biz) (info@cloverinformatica.it@151.55.127.210) by smtp7.ad.aruba.it with SMTP; 17 Oct 2011 08:24:54 -0000 Received: from [192.168.0.185] ([192.168.0.185]) by clover.dyndns.biz ; Mon, 17 Oct 2011 10:24:53 +0200 Message-ID: <4E9BE653.7070008@cloverinformatica.it> Date: Mon, 17 Oct 2011 10:24:51 +0200 From: Maurizio Vairani User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: Gleb Kurtsou , freebsd-fs@freebsd.org X-Spam-Rating: smtp7.ad.aruba.it 1.6.2 0/1000/N X-Spam-Rating: smtplq03.aruba.it 1.6.2 0/1000/N Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: [TMPFS] patch for FreeBSD 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2011 08:24:59 -0000 Hi list, Gleb Kurtsou in this thread http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012650.html proposes a patch for solving the well known TMPSF problem: the free space drops down to zero when ZFS consumes the kernel memory and there isn't enough free swap space. Unfortunately the patch is not directly applicable to FreeBSD 8.2-RELEASE so I have modified the source code using the Gleb's patch as reference, recompiled and installed the new driver. I am testing it for a week on my AMD64 16G RAM server reducing the swap space from 28G to 8G, 4G or none and seems the the problem is solved. Regards -Maurizio /sys/fs/tmpfs/tmpfs.h =================================================================== --- tmpfs.h.orig 2010-12-21 18:09:00.000000000 +0100 (v 1.17.2.2.2.1) +++ tmpfs.h 2011-10-13 15:16:26.900043000 +0200 (working copy) @@ -304,10 +304,30 @@ #define TMPFS_NODE_LOCK(node) mtx_lock(&(node)->tn_interlock) #define TMPFS_NODE_UNLOCK(node) mtx_unlock(&(node)->tn_interlock) -#define TMPFS_NODE_MTX(node) (&(node)->tn_interlock) +#define TMPFS_NODE_MTX(node) (&(node)->tn_interlock) + +#ifdef INVARIANTS +#define TMPFS_ASSERT_LOCKED(node) do { \ + MPASS(node != NULL); \ + MPASS(node->tn_vnode != NULL); \ + if (!VOP_ISLOCKED(node->tn_vnode) && \ + !mtx_owned(TMPFS_NODE_MTX(node))) \ + panic("tmpfs: node is not locked: %p", node); \ + } while (0) +#define TMPFS_ASSERT_ELOCKED(node) do { \ + MPASS((node) != NULL); \ + MPASS((node)->tn_vnode != NULL); \ + mtx_assert(TMPFS_NODE_MTX(node), MA_OWNED); \ + ASSERT_VOP_LOCKED((node)->tn_vnode, "tmpfs"); \ + } while (0) +#else +#define TMPFS_ASSERT_LOCKED(node) (void)0 +#define TMPFS_ASSERT_ELOCKED(node) (void)0 +#endif #define TMPFS_VNODE_ALLOCATING 1 #define TMPFS_VNODE_WANT 2 +#define TMPFS_VNODE_DOOMED 4 /* --------------------------------------------------------------------- */ /* @@ -467,65 +487,30 @@ * Memory management stuff. */ -/* Amount of memory pages to reserve for the system (e.g., to not use by - * tmpfs). - * XXX: Should this be tunable through sysctl, for instance? */ -#define TMPFS_PAGES_RESERVED (4 * 1024 * 1024 / PAGE_SIZE) - /* - * Returns information about the number of available memory pages, - * including physical and virtual ones. - * - * If 'total' is TRUE, the value returned is the total amount of memory - * pages configured for the system (either in use or free). - * If it is FALSE, the value returned is the amount of free memory pages. - * - * Remember to remove TMPFS_PAGES_RESERVED from the returned value to avoid - * excessive memory usage. - * + * Number of reserved swap pages should not be lower than + * swap_pager_almost_full high water mark. */ +#define TMPFS_SWAP_MINRESERVED 1024 + static __inline size_t -tmpfs_mem_info(void) +tmpfs_pages_max(struct tmpfs_mount *tmp) { - size_t size; - - size = swap_pager_avail + cnt.v_free_count + cnt.v_inactive_count; - size -= size > cnt.v_wire_count ? cnt.v_wire_count : size; - return size; + return (tmp->tm_pages_max); } -/* Returns the maximum size allowed for a tmpfs file system. This macro - * must be used instead of directly retrieving the value from tm_pages_max. - * The reason is that the size of a tmpfs file system is dynamic: it lets - * the user store files as long as there is enough free memory (including - * physical memory and swap space). Therefore, the amount of memory to be - * used is either the limit imposed by the user during mount time or the - * amount of available memory, whichever is lower. To avoid consuming all - * the memory for a given mount point, the system will always reserve a - * minimum of TMPFS_PAGES_RESERVED pages, which is also taken into account - * by this macro (see above). */ static __inline size_t -TMPFS_PAGES_MAX(struct tmpfs_mount *tmp) +tmpfs_pages_used(struct tmpfs_mount *tmp) { - size_t freepages; - - freepages = tmpfs_mem_info(); - freepages -= freepages < TMPFS_PAGES_RESERVED ? - freepages : TMPFS_PAGES_RESERVED; - - return MIN(tmp->tm_pages_max, freepages + tmp->tm_pages_used); + const size_t node_size = sizeof(struct tmpfs_node) + + sizeof(struct tmpfs_dirent); + size_t meta_pages; + + meta_pages = howmany((uintmax_t)tmp->tm_nodes_inuse * node_size, + PAGE_SIZE); + return (meta_pages + tmp->tm_pages_used); } -/* Returns the available space for the given file system. */ -#define TMPFS_META_PAGES(tmp) (howmany((tmp)->tm_nodes_inuse * (sizeof(struct tmpfs_node) \ - + sizeof(struct tmpfs_dirent)), PAGE_SIZE)) -#define TMPFS_FILE_PAGES(tmp) ((tmp)->tm_pages_used) - -#define TMPFS_PAGES_AVAIL(tmp) (TMPFS_PAGES_MAX(tmp) > \ - TMPFS_META_PAGES(tmp)+TMPFS_FILE_PAGES(tmp)? \ - TMPFS_PAGES_MAX(tmp) - TMPFS_META_PAGES(tmp) \ - - TMPFS_FILE_PAGES(tmp):0) - #endif /* --------------------------------------------------------------------- */ /sys/fs/tmpfs/tmpfs_subr.c =================================================================== --- tmpfs_subr.c.orig 2010-12-21 18:09:00.000000000 +0100 (v 1.23.2.2.2.1) +++ tmpfs_subr.c 2011-10-06 14:31:26.007163000 +0200 (working copy) @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -55,6 +56,60 @@ #include #include +static long tmpfs_swap_reserved = TMPFS_SWAP_MINRESERVED * 2; + +SYSCTL_NODE(_vfs, OID_AUTO, tmpfs, CTLFLAG_RW, 0, "tmpfs memory file system"); + +static int +sysctl_swap_reserved(SYSCTL_HANDLER_ARGS) +{ + int error; + long pages, bytes; + + pages = *(long *)arg1; + bytes = pages * PAGE_SIZE; + + error = sysctl_handle_long(oidp, &bytes, 0, req); + if (error || !req->newptr) + return (error); + + pages = bytes / PAGE_SIZE; + if (pages < TMPFS_SWAP_MINRESERVED) + return (EINVAL); + + *(long *)arg1 = pages; + return (0); +} + +SYSCTL_PROC(_vfs_tmpfs, OID_AUTO, swap_reserved, CTLTYPE_LONG|CTLFLAG_RW, + &tmpfs_swap_reserved, 0, sysctl_swap_reserved, "L", "reserved swap space"); + +static __inline size_t +tmpfs_pages_avail(struct tmpfs_mount *tmp, size_t req_pages) +{ + vm_ooffset_t avail; + + if (tmpfs_pages_max(tmp) < tmpfs_pages_used(tmp) + req_pages) + return (0); + + if (!vm_page_count_target()) + return (1); + + /* + * Fail if pagedaemon wasn't able to free desired number of pages and + * we are running out of swap. + */ + avail = swap_pager_avail - vm_paging_target() - req_pages; + if (avail < tmpfs_swap_reserved) { /* avail is signed */ + printf("tmpfs: low memory: available %jd, " + "paging target %d, requested %zd\n", + (intmax_t)swap_pager_avail, vm_paging_target(), req_pages); + return (0); + } + + return (1); +} + /* --------------------------------------------------------------------- */ /* @@ -95,6 +150,8 @@ if (tmp->tm_nodes_inuse > tmp->tm_nodes_max) return (ENOSPC); + if (tmpfs_pages_avail(tmp, 1) == 0) + return (ENOSPC); nnode = (struct tmpfs_node *)uma_zalloc_arg( tmp->tm_node_pool, tmp, M_WAITOK); @@ -882,7 +939,7 @@ newpages = round_page(newsize) / PAGE_SIZE; if (newpages > oldpages && - newpages - oldpages > TMPFS_PAGES_AVAIL(tmp)) { + tmpfs_pages_avail(tmp, newpages - oldpages) == 0) { error = ENOSPC; goto out; } /sys/fs/tmpfs/tmpfs_vfsops.c =================================================================== --- tmpfs_vfsops.c.orig 2010-12-21 18:09:00.000000000 +0100 (v 1.21.2.1.6.1) +++ tmpfs_vfsops.c 2011-10-07 14:10:15.137747000 +0200 (working copy) @@ -85,53 +85,6 @@ #define SWI_MAXMIB 3 -static u_int -get_swpgtotal(void) -{ - struct xswdev xsd; - char *sname = "vm.swap_info"; - int soid[SWI_MAXMIB], oid[2]; - u_int unswdev, total, dmmax, nswapdev; - size_t mibi, len; - - total = 0; - - len = sizeof(dmmax); - if (kernel_sysctlbyname(curthread, "vm.dmmax", &dmmax, &len, - NULL, 0, NULL, 0) != 0) - return total; - - len = sizeof(nswapdev); - if (kernel_sysctlbyname(curthread, "vm.nswapdev", - &nswapdev, &len, - NULL, 0, NULL, 0) != 0) - return total; - - mibi = (SWI_MAXMIB - 1) * sizeof(int); - oid[0] = 0; - oid[1] = 3; - - if (kernel_sysctl(curthread, oid, 2, - soid, &mibi, (void *)sname, strlen(sname), - NULL, 0) != 0) - return total; - - mibi = (SWI_MAXMIB - 1); - for (unswdev = 0; unswdev < nswapdev; ++unswdev) { - soid[mibi] = unswdev; - len = sizeof(struct xswdev); - if (kernel_sysctl(curthread, - soid, mibi + 1, &xsd, &len, NULL, 0, - NULL, 0) != 0) - return total; - if (len == sizeof(struct xswdev)) - total += (xsd.xsw_nblks - dmmax); - } - - /* Not Reached */ - return total; -} - /* --------------------------------------------------------------------- */ static int tmpfs_node_ctor(void *mem, int size, void *arg, int flags) @@ -179,14 +132,13 @@ static int tmpfs_mount(struct mount *mp) { + const size_t nodes_per_page = howmany(PAGE_SIZE, + sizeof(struct tmpfs_dirent) + sizeof(struct tmpfs_node)); struct tmpfs_mount *tmp; struct tmpfs_node *root; - size_t pages, mem_size; - ino_t nodes; + u_quad_t pages; + u_quad_t nodes_max, size_max, maxfilesize; int error; - /* Size counters. */ - ino_t nodes_max; - size_t size_max; /* Root node attributes. */ uid_t root_uid; @@ -223,42 +175,55 @@ if (mp->mnt_cred->cr_ruid != 0 || vfs_scanopt(mp->mnt_optnew, "mode", "%ho", &root_mode) != 1) root_mode = va.va_mode; - if (vfs_scanopt(mp->mnt_optnew, "inodes", "%d", &nodes_max) != 1) + if (vfs_scanopt(mp->mnt_optnew, "inodes", "%qu", &nodes_max) != 1) nodes_max = 0; if (vfs_scanopt(mp->mnt_optnew, "size", "%qu", &size_max) != 1) size_max = 0; - - /* Do not allow mounts if we do not have enough memory to preserve - * the minimum reserved pages. */ - mem_size = cnt.v_free_count + cnt.v_inactive_count + get_swpgtotal(); - mem_size -= mem_size > cnt.v_wire_count ? cnt.v_wire_count : mem_size; - if (mem_size < TMPFS_PAGES_RESERVED) + if (vfs_scanopt(mp->mnt_optnew, "maxfilesize", "%qu", &maxfilesize) != 0) + maxfilesize = 0; + /* + * XXX Deny mounts if pagedaemon wasn't able to recovery desired + * number of pages. + */ + if (vm_page_count_target()) return ENOSPC; /* Get the maximum number of memory pages this file system is * allowed to use, based on the maximum size the user passed in - * the mount structure. A value of zero is treated as if the - * maximum available space was requested. */ - if (size_max < PAGE_SIZE || size_max >= SIZE_MAX) - pages = SIZE_MAX; + * the mount structure. Use half of RAM by default. */ + if (size_max < PAGE_SIZE*4 || size_max > SIZE_MAX - PAGE_SIZE) + pages = cnt.v_page_count / 2; else pages = howmany(size_max, PAGE_SIZE); MPASS(pages > 0); + MPASS(pages < SIZE_MAX); - if (nodes_max <= 3) - nodes = 3 + pages * PAGE_SIZE / 1024; + if (pages < SIZE_MAX / PAGE_SIZE) + size_max = pages * PAGE_SIZE; else - nodes = nodes_max; - MPASS(nodes >= 3); + size_max = SIZE_MAX; + + if (nodes_max <= 3) { + if (pages < UINT32_MAX / nodes_per_page) + nodes_max = pages * nodes_per_page; + else + nodes_max = UINT32_MAX; + } + if (nodes_max > UINT32_MAX) + nodes_max = UINT32_MAX; + MPASS(nodes_max >= 3); + + if (maxfilesize < PAGE_SIZE || maxfilesize > size_max) + maxfilesize = size_max; /* Allocate the tmpfs mount structure and fill it. */ tmp = (struct tmpfs_mount *)malloc(sizeof(struct tmpfs_mount), M_TMPFSMNT, M_WAITOK | M_ZERO); mtx_init(&tmp->allnode_lock, "tmpfs allnode lock", NULL, MTX_DEF); - tmp->tm_nodes_max = nodes; + tmp->tm_nodes_max = nodes_max; tmp->tm_nodes_inuse = 0; - tmp->tm_maxfilesize = (u_int64_t)(cnt.v_page_count + get_swpgtotal()) * PAGE_SIZE; + tmp->tm_maxfilesize = maxfilesize; LIST_INIT(&tmp->tm_nodes_used); tmp->tm_pages_max = pages; @@ -427,22 +392,23 @@ static int tmpfs_statfs(struct mount *mp, struct statfs *sbp) { - fsfilcnt_t freenodes; struct tmpfs_mount *tmp; + size_t used; tmp = VFS_TO_TMPFS(mp); sbp->f_iosize = PAGE_SIZE; sbp->f_bsize = PAGE_SIZE; - sbp->f_blocks = TMPFS_PAGES_MAX(tmp); - sbp->f_bavail = sbp->f_bfree = TMPFS_PAGES_AVAIL(tmp); - - freenodes = MIN(tmp->tm_nodes_max - tmp->tm_nodes_inuse, - TMPFS_PAGES_AVAIL(tmp) * PAGE_SIZE / sizeof(struct tmpfs_node)); - - sbp->f_files = freenodes + tmp->tm_nodes_inuse; - sbp->f_ffree = freenodes; + sbp->f_blocks = tmpfs_pages_max(tmp); + used = tmpfs_pages_used(tmp); + if (tmpfs_pages_max(tmp) <= used) + sbp->f_bavail = 0; + else + sbp->f_bavail = tmpfs_pages_max(tmp) - used; + sbp->f_bfree = sbp->f_bavail; + sbp->f_files = tmp->tm_nodes_max; + sbp->f_ffree = tmp->tm_nodes_max - tmp->tm_nodes_inuse; /* sbp->f_owner = tmp->tn_uid; */ return 0; From owner-freebsd-fs@FreeBSD.ORG Mon Oct 17 08:50:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9179A106566B for ; Mon, 17 Oct 2011 08:50:42 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 1C66E8FC0C for ; Mon, 17 Oct 2011 08:50:40 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1RFiu7-0001HU-91 for freebsd-fs@freebsd.org; Mon, 17 Oct 2011 10:50:39 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Oct 2011 10:50:39 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Oct 2011 10:50:39 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Mon, 17 Oct 2011 10:50:17 +0200 Lines: 50 Message-ID: References: <4E97FEDD.7060205@quip.cz> <4E982C0E.2060900@quip.cz> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig77B17ED0F6D5B60FA9E29D3C" X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111004 Thunderbird/7.0.1 In-Reply-To: <4E982C0E.2060900@quip.cz> X-Enigmail-Version: 1.1.2 Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2011 08:50:42 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig77B17ED0F6D5B60FA9E29D3C Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 14/10/2011 14:33, Miroslav Lachman wrote: >=20 >=20 > Ivan Voras wrote: >> On 14/10/2011 11:20, Miroslav Lachman wrote: >>> Hi all, >>> >>> I tried some tuning of dirhash on our servers and after googlig a bit= , I >>> found an old GSoC project wiki page about Dynamic Memory Allocation f= or >>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory >>> Is there any reason not to use it / not commit it to HEAD? >> >> AFAIK it's sort-of already present. In 8-stable and recent kernels you= >> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem >> (but except in really large edge cases I don't think you *need* more >> than 32 MB), and the kernel will scale-down or free the memory if not >> needed. >=20 > Is this change documented somewhere? Maybe it could be noticed on > DirhashDynamicMemory wiki page. Otherwise it seems as abandoned GSoC > project. I'm not touching the wiki page because I don't know really if the functionality was committed from the GSoC project or from somewhere else; what I did was much later and much smaller in scope. --------------enig77B17ED0F6D5B60FA9E29D3C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6b7E8ACgkQldnAQVacBcgHNQCgkhqum8SWEg2epPVLWg73sKvu MVoAnjoQ6vkH5kDl/Ac/jvh41pLSzAak =3uJw -----END PGP SIGNATURE----- --------------enig77B17ED0F6D5B60FA9E29D3C-- From owner-freebsd-fs@FreeBSD.ORG Mon Oct 17 11:07:02 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 18A0A1065673 for ; Mon, 17 Oct 2011 11:07:02 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 06AE48FC1D for ; Mon, 17 Oct 2011 11:07:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9HB727M099209 for ; Mon, 17 Oct 2011 11:07:02 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9HB71nh099207 for freebsd-fs@FreeBSD.org; Mon, 17 Oct 2011 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 17 Oct 2011 11:07:01 GMT Message-Id: <201110171107.p9HB71nh099207@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2011 11:07:02 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/161674 fs [ufs] snapshot on journaled ufs doesn't work o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161511 fs [unionfs] Filesystem deadlocks when using multiple uni o kern/161493 fs [nfs] NFS v3 directory structure update slow o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs Random UFS root filesystem corruption with SU+J [regre o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159971 fs [ffs] [panic] panic with soft updates journaling durin o kern/159930 fs [ufs] [panic] kernel core o kern/159418 fs [tmpfs] [panic] tmpfs kernel panic: recursing on non r o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159233 fs [ext2fs] [patch] fs/ext2fs: finish reallocblk implemen o kern/159232 fs [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs [amd] amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 257 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Oct 17 12:25:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29620106566B for ; Mon, 17 Oct 2011 12:25:30 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.freebsd.org (Postfix) with ESMTP id 986CC8FC0A for ; Mon, 17 Oct 2011 12:25:29 +0000 (UTC) Received: by people.fsn.hu (Postfix, from userid 1001) id 23A5CAADD9E; Mon, 17 Oct 2011 14:25:28 +0200 (CEST) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 27.1455] X-CRM114-CacheID: sfid-20111017_14252_85334530 X-CRM114-Status: Good ( pR: 27.1455 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Mon Oct 17 14:25:28 2011 X-DSPAM-Confidence: 0.9969 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 4e9c1eb8873599527411137 X-DSPAM-Factors: 27, From*Attila Nagy , 0.00010, >+On, 0.00099, the+>, 0.00134, the+>, 0.00134, >+the, 0.00134, >+the, 0.00134, wrote+>, 0.00178, conf, 0.00227, cache, 0.00256, cache, 0.00256, >+If, 0.00267, wrote+>>, 0.00267, )+>, 0.00279, >+>, 0.00286, >+>, 0.00286, in+>, 0.00307, I+>, 0.00341, you+>, 0.00361, >>+>>, 0.00396, >>+>>, 0.00396, >+You, 0.00409, wrote, 0.00490, wrote, 0.00490, adding, 0.00510, 2011+at, 0.00510, STABLE, 0.00556, X-Spambayes-Classification: ham; 0.00 Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99]) by people.fsn.hu (Postfix) with ESMTPSA id B369CAADD91; Mon, 17 Oct 2011 14:25:27 +0200 (CEST) Message-ID: <4E9C1EB7.60907@fsn.hu> Date: Mon, 17 Oct 2011 14:25:27 +0200 From: Attila Nagy User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Jeremy Chadwick References: <4E97F710.8000004@fsn.hu> <20111014090000.GA66602@icarus.home.lan> In-Reply-To: <20111014090000.GA66602@icarus.home.lan> X-Stationery: 0.7.5 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: cache devices come up as dsk/original_device_name in zpools X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Oct 2011 12:25:30 -0000 On 10/14/11 11:00, Jeremy Chadwick wrote: > On Fri, Oct 14, 2011 at 10:47:12AM +0200, Attila Nagy wrote: >> Hi, >> >> A have a zpool with cache devices on 8-STABLE (csuped and compiled >> at Sep 14 15:01:25 CEST 2011). The problem is every time I reboot, >> the cache devices turn to UNAVAIL (because device name changes to >> dsk/daXX): >> dsk/da37 UNAVAIL 0 0 0 cannot open >> dsk/da38 UNAVAIL 0 0 0 cannot open >> >> After removing and re-adding them, everyting goes back to normal, >> until the next reboot. I have no /boot/zfs/zpool.cache (because the >> machine is netbooted), maybe this is the cause? In previous versions >> everything was fine. > Obviously at some point when you built this system you entered > "dsk/da37" and "dsk/da38". So the metadata on those drives probably > contains references to those strings. You need to clear/change that. Pretty unlikely. And given that this happens on all machines, upgraded to a recent 8-STABLE (and never happened before), I would say something has been changed regarding this. In the user space tools there are a lot of occurrences of /dev/dsk... > > I'm not sure how to go about doing that, especially on a system which > lacks /boot/zfs/zpool.cache. A one-time "zpool export" then a reboot, I > imagine, would suffice, but I'm not sure if export actually changes the > metadata on the disk itself or just updates the zpool.cache file. > > If you ran "zdb" on this system (the output will be HUGE given the > number of vdevs and devices you have!), you should see some relevant > information under each disk (child), specifically "path" vs. > "phys_path". Maybe these differ? Well, zdb seems to be quite useless without zpool.cache... But I found a spare machine where I could do an export. The zdb output does not contain the above disks (da37, da38, the cache devices). I let the tool run for about 30 minutes only. > > You might also try tinkering about with the loader.conf(5) variables > zpool_cache_*. Depending on your setup, you might be able to move the > zpool.cache file to a different location -- I realise you PXE boot, but > if you have any sort of storage media on that system that isn't under > ZFS that *is* available (e.g. a small UFS partition, etc.) then you > might consider storing it there. See /boot/defaults/loader.conf. I have no UFS on these machines, and this is just fine. And was fine always, I hope this stays this way. :) > > Otherwise I'm not sure how to go about changing the actual strings in > the disk metadata. Maybe remove the cache devices entirely, zero out > the first and last ~16MBytes of the da37 and da38 disks (using dd), then > re-add them using their "daXX" name? That might suffice. > I'm not sure whether this is in the on-disk metadata. How could I add a /dev/dsk/da38 disk with zpool? It does not exist. From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 00:36:03 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D6CB106564A for ; Tue, 18 Oct 2011 00:36:03 +0000 (UTC) (envelope-from haroldp@internal.org) Received: from pluto.internal.org (mail.internal.org [64.191.53.117]) by mx1.freebsd.org (Postfix) with ESMTP id 0785A8FC13 for ; Tue, 18 Oct 2011 00:36:02 +0000 (UTC) Received: from [10.0.0.79] (99-46-24-87.lightspeed.renonv.sbcglobal.net [99.46.24.87]) by pluto.internal.org (Postfix) with ESMTPA id 79A5DECBD4 for ; Mon, 17 Oct 2011 17:17:32 -0700 (PDT) From: Harold Paulson Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Mon, 17 Oct 2011 17:17:31 -0700 Message-Id: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: Damaged directory on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 00:36:03 -0000 Hello,=20 I've had a server that boots from ZFS panicking for a couple days. I = have worked around the problem for now, but I hope someone can give me = some insight into what's going on, and how I can solve it properly. =20 The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA = disks in a raid10 type arrangement: # uname -a =20 FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD 8.2-STABLE-201105 = #0: Tue May 17 05:18:48 UTC 2011 = root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 And zpool status:=20 NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/disk2 ONLINE 0 0 0 gpt/disk3 ONLINE 0 0 0 It started panicking under load a couple days ago. We replaced RAM and = motherboard, but problems persisted. I don't know if a hardware issue = originally caused the problem or what. When it panics, I get the usual = panic message, but I don't get a core file, and it never reboots itself. = =20 http://pastebin.com/F1J2AjSF While I was trying to figure out the source of the problem, I notice = stuck various stuck processes that peg a CPU and can't be killed, such = as: PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 48735 0 root 1 46 0 11972K 924K CPU3 3 415:14 100.00% = find They are not marked zombie, but I can't kill them, and restarting the = jail they are in won't even get rid of them. truss just hangs with no = output on them. On different occasions, I noticed pop3d processes for = the same user getting stuck in this way. On a hunch I ran a "find" = through the files in the user's Maildir and got a panic. I disabled = this account and now the server is stable again. At least until = locate.updatedb walks through that directory, I suppose. Evidentially, = there is some kind of hole in the file system below that directory tree = causing the panic. =20 I can move that directory out of the way, and carry on, but is there = anything I can do to really *repair* the problem? Thanks. - H From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 00:54:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7CBC01065674 for ; Tue, 18 Oct 2011 00:54:50 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.emeryville.ca.mail.comcast.net (qmta07.emeryville.ca.mail.comcast.net [76.96.30.64]) by mx1.freebsd.org (Postfix) with ESMTP id 63A168FC12 for ; Tue, 18 Oct 2011 00:54:50 +0000 (UTC) Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60]) by qmta07.emeryville.ca.mail.comcast.net with comcast id m0ty1h0041HpZEsA70ujiE; Tue, 18 Oct 2011 00:54:43 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta14.emeryville.ca.mail.comcast.net with comcast id m0ti1h00k1t3BNj8a0tijs; Tue, 18 Oct 2011 00:53:43 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 545D1102C1C; Mon, 17 Oct 2011 17:54:48 -0700 (PDT) Date: Mon, 17 Oct 2011 17:54:48 -0700 From: Jeremy Chadwick To: Harold Paulson Message-ID: <20111018005448.GA2855@icarus.home.lan> References: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Damaged directory on ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 00:54:50 -0000 On Mon, Oct 17, 2011 at 05:17:31PM -0700, Harold Paulson wrote: > I've had a server that boots from ZFS panicking for a couple days. I have worked around the problem for now, but I hope someone can give me some insight into what's going on, and how I can solve it properly. > > The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA disks in a raid10 type arrangement: > > # uname -a > FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD 8.2-STABLE-201105 #0: Tue May 17 05:18:48 UTC 2011 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 First thing to do is to consider upgrading to a newer RELENG_8 date. There have been *many* ZFS fixes since May. > And zpool status: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > mirror ONLINE 0 0 0 > gpt/disk0 ONLINE 0 0 0 > gpt/disk1 ONLINE 0 0 0 > mirror ONLINE 0 0 0 > gpt/disk2 ONLINE 0 0 0 > gpt/disk3 ONLINE 0 0 0 > > It started panicking under load a couple days ago. We replaced RAM and motherboard, but problems persisted. I don't know if a hardware issue originally caused the problem or what. When it panics, I get the usual panic message, but I don't get a core file, and it never reboots itself. > > http://pastebin.com/F1J2AjSF ZFS developers will need to comment on the state of the backtrace. You may be requested to examine the core using kgdb and be given some commands to run on it. > While I was trying to figure out the source of the problem, I notice stuck various stuck processes that peg a CPU and can't be killed, such as: > > PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 48735 0 root 1 46 0 11972K 924K CPU3 3 415:14 100.00% find Had you done procstat -k -k 48735 (the "double -k" is not a typo), you probably would have seen that the process was "stuck" in a ZFS-related thread. These are processes which the kernel is hanging on to and will not let go of, so even kill -9 won't kill these. It would have also be worthwhile to get the "process tree" of what spawned the PID. (Solaris has ptree; I think we have something similar under FreeBSD but I forget what) The reason that matters is that it's probably a periodic job that runs (there are many which use find), traversing your ZFS filesystems, and tickling a bug/issue somewhere. You even hint at this in your next paragraph, re: locate.updatedb. > They are not marked zombie, but I can't kill them, and restarting the jail they are in won't even get rid of them. truss just hangs with no output on them. On different occasions, I noticed pop3d processes for the same user getting stuck in this way. On a hunch I ran a "find" through the files in the user's Maildir and got a panic. I disabled this account and now the server is stable again. At least until locate.updatedb walks through that directory, I suppose. Evidentially, there is some kind of hole in the file system below that directory tree causing the panic. The fact that jails are involved complicates things even more. truss and ktrace won't show anything going on because of what I said above: the kernel bits associated with the process are hung or spinning, not the actual syscall/userland bits. Furthermore, truss on FreeBSD is basically worthless; use ktrace. > I can move that directory out of the way, and carry on, but is there anything I can do to really *repair* the problem? I would recommend starting with "zpool scrub" on the pool which is associated with the Maildir/ directory of the account you disable. I will not be surprised if it comes back 100% clean. Given what the backtrace looks like, I would say the Maildir/ has a ton of files in it. Is that the case? Does "echo *" say something about argument list too long? You should also be aware that Maildir on ZFS performs horribly. I've experienced this, and there are old discussions about it as well. Here are some of my findings. http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-read-speed/ http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-speed-part-2/ http://koitsu.wordpress.com/2009/10/29/unix-mail-format-annoyances/ The state of mail spools on UNIX is a complete disgrace, and everyone involved in it should feel ashamed. MIX is probably the best solution to this problem, but it's not being adopted by all the major players, which is very sad. I realise that doesn't solve your problem, but my strong recommendation is to use classic UNIX mail spools (one file for many messages) when the filesystem is ZFS-based. However, someone familiar with the ZFS internals, as I said, should investigate the crash you're experiencing regardless. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 03:28:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0EC66106564A for ; Tue, 18 Oct 2011 03:28:14 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id C09BF8FC19 for ; Tue, 18 Oct 2011 03:28:13 +0000 (UTC) Received: from outgoing.leidinger.net (p4FC42468.dip.t-dialin.net [79.196.36.104]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 2441F844016; Tue, 18 Oct 2011 05:27:55 +0200 (CEST) Received: from unknown (IO.Leidinger.net [192.168.1.12]) by outgoing.leidinger.net (Postfix) with ESMTP id 4B03E2929; Tue, 18 Oct 2011 05:27:52 +0200 (CEST) Date: Tue, 18 Oct 2011 05:27:51 +0200 From: Alexander Leidinger To: Patrick Donnelly Message-ID: <20111018052751.0000273f@unknown> In-Reply-To: References: X-Mailer: Claws Mail 3.7.10cvs7 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 2441F844016.A45C0 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-1, required 6, autolearn=disabled, ALL_TRUSTED -1.00) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1319513278.37207@oxF/gA22yq8e7f9X9iW/og X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 03:28:14 -0000 On Sun, 16 Oct 2011 00:45:50 -0400 Patrick Donnelly wrote: > Hi list, > > I've got an array for home use where my boot drive (UFS) finally died. > I've decided to upgrade to a SSD for a replacement but am looking to > maybe simultaneously improving performance of my ZFS array. Naturally What do you mean by "array"? How many disks and which type (HD or SSD)? Which pool configuration you are aiming at? > a FreeBSD install doesn't use much space so partitioning the drive to > get maximum usage seems wise. I was thinking for a hypothetical 40GB > drive: > > 20GB -- FreeBSD / partition > 2GB -- ZFS ZIL > 18GB -- ZFS Cache > > What I'm wondering is if this will be a bad idea. I know that SSDs are > not designed to be written to *a lot*, which a ZIL will experience. Is > this a bad idea? I'm hoping for experiences from people in similar > scenarios. As I'm not an enterprise IT person who can't simply choose > to just throw more mon-- I mean SSDs -- at the problem, I need to be > efficient. :) [I'm thinking the cache drive partition might be > pointless as I don't think I'd benefit that much from it.] ZIL and cache shall be on devices which are faster than the rest of the pool (exception for an edge case: saturated I/O-path to the main pool-disks, not saturated I/O-path to the ZIL/cache, and data in cache or targeted for the ZIL). If your ZIL/cache partitions on SSD are aimed for a pool which consists of normal HD's, go for it. If you have only SSDs (or only one single disk in total), it is most probably better (surely for a desktop system) to keep it simple (no ZIL/cache partitions). Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 03:53:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08621106564A for ; Tue, 18 Oct 2011 03:53:44 +0000 (UTC) (envelope-from batrick@batbytes.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 98E408FC18 for ; Tue, 18 Oct 2011 03:53:44 +0000 (UTC) Received: by iaky10 with SMTP id y10so255295iak.13 for ; Mon, 17 Oct 2011 20:53:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.20.227 with SMTP id g35mr277776ibb.32.1318910023866; Mon, 17 Oct 2011 20:53:43 -0700 (PDT) Received: by 10.231.19.66 with HTTP; Mon, 17 Oct 2011 20:53:43 -0700 (PDT) In-Reply-To: <4E9AE725.4040001@gmail.com> References: <4E9AE725.4040001@gmail.com> Date: Mon, 17 Oct 2011 23:53:43 -0400 Message-ID: From: Patrick Donnelly To: "Luchesar V. ILIEV" Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 03:53:45 -0000 Since people have asked about more details for my system: It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz configuration with 1 of those drives being a hot spare (4 1TB drives in the raidz). The system currently has 2 GB of RAM IIRC. I've been using NFS to access the data on my home network which has worked pretty well. Writing to NFS over my VPN from across the country is really bad which is one of the reasons I wanted to use a SSD for a ZIL. read/write performance overall tends to be bad though so I don't really know how much it will help. After fiddling around with NFS settings for a long time I soon gave up and instead use SSH when outside a LAN. That's another matter though and off-topic. :) On Sun, Oct 16, 2011 at 10:16 AM, Luchesar V. ILIEV wrote: > 1. If you can afford more RAM, it's (much) better for ZFS than L2ARC. I think I may end up doing this as well. RAM seems to have gotten extraordinarily large and cheap in the last few years. > 5. Check the output of "zpool upgrade". If your zpool version is > anything below 19 (likely 14 or 15), I'd strongly recommend that you > avoid setting up a separate ZIL. Pools before v19 fail critically when > the ZIL is removed or is corrupted, which means you lose them for good. > You might mitigate the risk with a mirrored ZIL, but it's still likely > not worth it in your case. Yes, I plan to upgrade the pool to v28 and FreeBSD when I get the SSD. Speaking of which, should there be any problems with installing the SSD, putting FreeBSD 9.0-RELEASE (when it comes out) on it, and then trying to import the pool? > Again, I'm no expert in those things, so take all my comments with a > grain of salt. Good luck! Thank you for your advice! -- - Patrick Donnelly From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 04:28:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4ED9106566B for ; Tue, 18 Oct 2011 04:28:43 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta15.emeryville.ca.mail.comcast.net (qmta15.emeryville.ca.mail.comcast.net [76.96.27.228]) by mx1.freebsd.org (Postfix) with ESMTP id 9AC658FC0C for ; Tue, 18 Oct 2011 04:28:43 +0000 (UTC) Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76]) by qmta15.emeryville.ca.mail.comcast.net with comcast id m4KR1h0051eYJf8AF4Uc0Q; Tue, 18 Oct 2011 04:28:36 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta19.emeryville.ca.mail.comcast.net with comcast id m4BW1h00U1t3BNj014BXbP; Tue, 18 Oct 2011 04:11:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6D130102C1C; Mon, 17 Oct 2011 21:28:38 -0700 (PDT) Date: Mon, 17 Oct 2011 21:28:38 -0700 From: Jeremy Chadwick To: Patrick Donnelly Message-ID: <20111018042838.GA6246@icarus.home.lan> References: <4E9AE725.4040001@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 04:28:43 -0000 On Mon, Oct 17, 2011 at 11:53:43PM -0400, Patrick Donnelly wrote: > Since people have asked about more details for my system: > > It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz > configuration with 1 of those drives being a hot spare (4 1TB drives > in the raidz). The system currently has 2 GB of RAM IIRC. > > I've been using NFS to access the data on my home network which has > worked pretty well. Writing to NFS over my VPN from across the country > is really bad which is one of the reasons I wanted to use a SSD for a > ZIL. read/write performance overall tends to be bad though so I don't > really know how much it will help. After fiddling around with NFS > settings for a long time I soon gave up and instead use SSH when > outside a LAN. That's another matter though and off-topic. :) I don't see how using an SSD for ZIL is going to address issues of latency and underlying network filesystems (not NFS but the idea of a networked filesystem). So many people think you can just throw a VPN in between two locations and everything work will perfectly fine with only nominal delays -- that isn't what happens at all at a packet level. Network I/O tuning when a VPN is involved is a completely separate topic. NFS, by the way, is mainly intended to be used in very low-latency environments (read: LANs). You can fiddle with NFS and TCP window settings all day and accomplish nothing considering cross-country latency (within the US anyway) is around ~75ms on a good day. When it comes to networked filesystems on UNIX, we have very little choice. NFS is the main one. Then there's Stanford's Coda filesystem thing, or maybe that's now part of AFS, I don't know. Then there's sshfs, which sounds wonderful until you realise all the dependencies and nuances involved (mainly due to use of fuse, which we know on FreeBSD is not so great). Then there's Samba (CIFS/SMB, and now with Samba 3.6 offering SMB2 for Windows 7 clients), but that gets into issues of security and cannot be forwarded via SSH (e.g. VPN would be needed) given all its protocol some of which are UDP (not sure what the state of NetBIOS is). And none of this even begins to touch base on resiliency/reliability. I have to remind people on a weekly basis that the Internet *truly* is broken 24x7x365. It cannot be relied upon 100% of the time, or even 90% of the time. VPN, SSH, plain-text packets... none of it matters when the backbone carriers don't take the Internet seriously (and most do not; it's still a best-effort service, at least that's how it's treated by NOCs and technicians). So what I'm saying is: I wish you luck in your endeavour to find something that works for you. In general I find SSH to be the most convenient -- easy to deal with on a firewall level, authentication that makes sense, and is easily manageable in general. You might try making use of things like Compression=yes in SSH as well. If you find an SSD that is able to solve all of these problems, let me know and I'll invest in one. It'll be the first SSD with an RJ45 port I'm sure. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 10:46:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8C201065673; Tue, 18 Oct 2011 10:46:25 +0000 (UTC) (envelope-from hlh@restart.be) Received: from tignes.restart.be (tignes.restart.be [94.23.211.191]) by mx1.freebsd.org (Postfix) with ESMTP id 760178FC14; Tue, 18 Oct 2011 10:46:25 +0000 (UTC) Received: from restart.be (avoriaz.restart.be [IPv6:2001:41d0:2:56bf:1:1::]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "smtp.restart.be", Issuer "CA master" (verified OK)) by tignes.restart.be (Postfix) with ESMTPS id C4EEC13F0B; Tue, 18 Oct 2011 12:35:28 +0200 (CEST) Received: from morzine.restart.bel (morzine.restart.be [IPv6:2001:41d0:2:56bf:1:2::]) (authenticated bits=0) by restart.be (8.14.5/8.14.5) with ESMTP id p9IAZRbc002917; Tue, 18 Oct 2011 12:35:28 +0200 (CEST) (envelope-from hlh@restart.be) X-DKIM: Sendmail DKIM Filter v2.8.3 restart.be p9IAZRbc002917 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=restart.be; s=avoriaz; t=1318934128; bh=tyrc5bxAewlcX03p8JZT8A6e3iZiGPpNkXtYgWBKlCY=; h=Message-ID:Date:From:MIME-Version:To:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=AI5EMhhFe69cSc6qV9TMdpCFVd6xDHnKIUf3rZ/z5IBcd6q964eWesP7KgIOcfYNh HOLHSqStfs+xMe1R3vaXg== X-DomainKeys: Sendmail DomainKeys Filter v1.0.2 restart.be p9IAZRbc002917 DomainKey-Signature: a=rsa-sha1; s=avoriaz; d=restart.be; c=nofws; q=dns; h=message-id:date:from:organization:user-agent:mime-version:to: subject:references:in-reply-to:content-type:content-transfer-encoding; b=gmlZF1llrpXGXQVocZgcuja/AE25crFaXIgPrZmP6p9+WjxZBVBBOSakP/Sg+Lgu+ LJTOOe6XlTYwHzkTjAaDA== Message-ID: <4E9D566F.1040104@restart.be> Date: Tue, 18 Oct 2011 12:35:27 +0200 From: Henri Hennebert Organization: RestartSoft User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:7.0.1) Gecko/20111006 Thunderbird/7.0.1 MIME-Version: 1.0 To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org, avg@freebsd.org References: <4E8D7406.4090302@restart.be> <4E8D86A2.1040508@FreeBSD.org> <4E8D9F57.70506@restart.be> <4E8DAEE5.4020004@FreeBSD.org> In-Reply-To: <4E8DAEE5.4020004@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: zfsloader 9.0 BETA3 r225759 - i/o error - all block copies unavailable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 10:46:26 -0000 On 10/06/2011 15:36, Andriy Gapon wrote: > on 06/10/2011 15:30 Henri Hennebert said the following: >> The pool is a mirror: >> >> [root@morzine ~]# zpool status rpool >> pool: rpool >> state: ONLINE >> scan: scrub repaired 0 in 1h0m with 0 errors on Wed Aug 24 15:04:36 2011 >> config: >> >> NAME STATE READ WRITE CKSUM >> rpool ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> gptid/e915c6a0-fc72-11de-aa21-00e081706b68 ONLINE 0 0 0 >> gptid/eac8497d-fc72-11de-aa21-00e081706b68 ONLINE 0 0 0 >> >> errors: No known data errors >> >> and rpool/root is not compressed: >> >> [root@morzine ~]# zfs get compression rpool/root >> NAME PROPERTY VALUE SOURCE >> rpool/root compression off inherited from rpool >> >> pool is v28 and filesystems are v5 > > No particular recipes for this environment, just a general suggestion. > If you run into a situation like this again, please try to use > tools/tools/zfsboottest to diagnose where exactly an error originates. > I upgrade another system to 9.0-RC1 and encounter the same problem, this time zfsloader do not run. After mv /mnt/boot /mnt/Boot mkdir /mnt/boot cd /mnt/Boot find . | cpio -pvdmu /mnt/boot FreeBSD boot OK [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 /dev/ada1p2 ZFS: SPA version 28 pool: rpool config: NAME STATE rpool ONLINE mirror ONLINE ada0p2 ONLINE ada1p2 ONLINE ZFS: i/o error - all block copies unavailable can't lookup 10 minutes later: [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 /dev/ada1p2|less ZFS: SPA version 28 pool: rpool config: NAME STATE rpool ONLINE mirror ONLINE ada0p2 ONLINE ada1p2 ONLINE it seems ok :-o and a other time: [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 segmentation fault... Strange isn't it. Henri From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 11:27:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 630C2106566B for ; Tue, 18 Oct 2011 11:27:41 +0000 (UTC) (envelope-from ml@my.gd) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 010098FC16 for ; Tue, 18 Oct 2011 11:27:40 +0000 (UTC) Received: by wwi18 with SMTP id 18so656289wwi.31 for ; Tue, 18 Oct 2011 04:27:40 -0700 (PDT) Received: by 10.227.59.147 with SMTP id l19mr705867wbh.38.1318935550684; Tue, 18 Oct 2011 03:59:10 -0700 (PDT) Received: from dfleuriot-at-hi-media.com ([83.167.62.196]) by mx.google.com with ESMTPS id eu16sm2779680wbb.7.2011.10.18.03.59.08 (version=SSLv3 cipher=OTHER); Tue, 18 Oct 2011 03:59:09 -0700 (PDT) Message-ID: <4E9D5BFB.5060609@my.gd> Date: Tue, 18 Oct 2011 12:59:07 +0200 From: Damien Fleuriot User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4E9AE725.4040001@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 11:27:41 -0000 On 10/18/11 5:53 AM, Patrick Donnelly wrote: > Since people have asked about more details for my system: > > It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz > configuration with 1 of those drives being a hot spare (4 1TB drives > in the raidz). The system currently has 2 GB of RAM IIRC. > > I've been using NFS to access the data on my home network which has > worked pretty well. Writing to NFS over my VPN from across the country > is really bad which is one of the reasons I wanted to use a SSD for a > ZIL. read/write performance overall tends to be bad though so I don't > really know how much it will help. After fiddling around with NFS > settings for a long time I soon gave up and instead use SSH when > outside a LAN. That's another matter though and off-topic. :) > > This is going to sound a bit rude, my preemptive apologies. Just where did you get the notion that changing your bike's tires would make your car run faster ? In what world does your *storage* configuration affect your *network* latency and performance ? From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 14:40:05 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A15D106566C; Tue, 18 Oct 2011 14:40:05 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8CB008FC08; Tue, 18 Oct 2011 14:40:04 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA04559; Tue, 18 Oct 2011 17:40:00 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4E9D8FBF.3030502@FreeBSD.org> Date: Tue, 18 Oct 2011 17:39:59 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111003 Thunderbird/7.0.1 MIME-Version: 1.0 To: Henri Hennebert References: <4E8D7406.4090302@restart.be> <4E8D86A2.1040508@FreeBSD.org> <4E8D9F57.70506@restart.be> <4E8DAEE5.4020004@FreeBSD.org> <4E9D566F.1040104@restart.be> In-Reply-To: <4E9D566F.1040104@restart.be> X-Enigmail-Version: undefined Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org Subject: Re: zfsloader 9.0 BETA3 r225759 - i/o error - all block copies unavailable X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 14:40:05 -0000 on 18/10/2011 13:35 Henri Hennebert said the following: > I upgrade another system to 9.0-RC1 and encounter the same problem, this time > zfsloader do not run. > > After > > mv /mnt/boot /mnt/Boot > mkdir /mnt/boot > cd /mnt/Boot > find . | cpio -pvdmu /mnt/boot > > FreeBSD boot OK > > > [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 /dev/ada1p2 > ZFS: SPA version 28 > pool: rpool > config: > > NAME STATE > rpool ONLINE > mirror ONLINE > ada0p2 ONLINE > ada1p2 ONLINE > ZFS: i/o error - all block copies unavailable > can't lookup > > 10 minutes later: > > [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 > /dev/ada1p2|less > ZFS: SPA version 28 > pool: rpool > config: > > NAME STATE > rpool ONLINE > mirror ONLINE > ada0p2 ONLINE > ada1p2 ONLINE > > > it seems ok :-o > > and a other time: > [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 > segmentation fault... > > Strange isn't it. I think that it would be smart to not do any filesystem modifications after the problem is detected / reproduced. Also, currently zfsboottest doesn't do much of a problem self-diagnostics, so using gdb or/and adding some printfs in the code are required to understand a nature of a problem. Like what kind of block gives an I/O error, if it actual reading that fails or checksum verification or etc, and so on. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 15:55:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBECC106564A for ; Tue, 18 Oct 2011 15:55:27 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 9DAE78FC13 for ; Tue, 18 Oct 2011 15:55:27 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.76 (FreeBSD)) (envelope-from ) id 1RGC0j-000CSa-Ub; Tue, 18 Oct 2011 11:55:25 -0400 Date: Tue, 18 Oct 2011 11:55:25 -0400 From: Gary Palmer To: Patrick Donnelly Message-ID: <20111018155525.GH38162@in-addr.com> References: <4E9AE725.4040001@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org Subject: Re: [ZFS] Using SSD with partitions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 15:55:27 -0000 On Mon, Oct 17, 2011 at 11:53:43PM -0400, Patrick Donnelly wrote: > Since people have asked about more details for my system: > > It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz > configuration with 1 of those drives being a hot spare (4 1TB drives > in the raidz). The system currently has 2 GB of RAM IIRC. > > I've been using NFS to access the data on my home network which has > worked pretty well. Writing to NFS over my VPN from across the country > is really bad which is one of the reasons I wanted to use a SSD for a > ZIL. read/write performance overall tends to be bad though so I don't > really know how much it will help. After fiddling around with NFS > settings for a long time I soon gave up and instead use SSH when > outside a LAN. That's another matter though and off-topic. :) Block access protocols (NFS, CIFS) suck over anything other than a LAN. If you think about it you basically have (and yes, this is WAY over simplified but it illustrates the point) Client: request block 0 of file 1 Server: block 0 Client: request block 1 of file 1 Server: block 1 So for each block (whether its 512 bytes, 4k or larger is irrelevant) you spend most of your time waiting for the request to transit the network. You spend very little time actually sending or receiving data. You're probably better off with something like WebDAV or the like as they are less impacted by RTT issues as they request the entire file at once. Gary From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 17:46:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 176FD106566B; Tue, 18 Oct 2011 17:46:58 +0000 (UTC) (envelope-from olivier@gid0.org) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id AD9858FC14; Tue, 18 Oct 2011 17:46:57 +0000 (UTC) Received: by qadz30 with SMTP id z30so884629qad.13 for ; Tue, 18 Oct 2011 10:46:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.185.19 with SMTP id cm19mr3025416qab.8.1318960016655; Tue, 18 Oct 2011 10:46:56 -0700 (PDT) Received: by 10.224.60.206 with HTTP; Tue, 18 Oct 2011 10:46:56 -0700 (PDT) In-Reply-To: <20111005092603.GA1874@tops> References: <20111002020231.GA70864@icarus.home.lan> <20111005092603.GA1874@tops> Date: Tue, 18 Oct 2011 19:46:56 +0200 Message-ID: From: Olivier Smedts To: Gleb Kurtsou Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-fs@freebsd.org" , Ivan Voras Subject: Re: is TMPFS still highly experimental? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 17:46:58 -0000 2011/10/5 Gleb Kurtsou : > Free RAM is a bit tricky with virtual memory and overcommit support all > over the place. There are at least 3 memory hungry subsystems: buffer > cache, ZFS ARC, tmpfs. > > For the first two there is defined maximum size and they can be shrunk > in low memory situations. Tmpfs grows as much as it can trying to > calculate "free" memory available. Another difference is that tmpfs > can't be shrunk in low memory situation. > > I proposed a patch changing tmpfs memory allocation: > - Define maximum file system size (RAM/2 by default) > - Don't try to check if free memory available, check free swap > =A0instead and allocate more aggressively, i.e. allocate until > =A0swap or file system limit is reached. Patch tested and approved ! I did not test the maximum tmpfs default size because I allocated a max size in my fstab. %cat /etc/fstab none /tmp tmpfs rw,mode=3D1777,size=3D2147483648 0 0 %df -h /tmp Filesystem Size Used Avail Capacity Mounted on tmpfs 2.0G 124k 2G 0% /tmp Mem: 622M Active, 351M Inact, 6491M Wired, 4940K Cache, 2160K Buf, 385M Fre= e Swap: 2048M Total, 36M Used, 2012M Free, 1% Inuse (ZFS is using all my wired memory, the ARC is now full, and I deleted my nearly-never-touched 8G swap in favor of a 2G swap) A little test now : %dd if=3D/dev/zero of=3D/tmp/test bs=3D1M count=3D1500 1500+0 records in 1500+0 records out 1572864000 bytes transferred in 0.763368 secs (2060427243 bytes/sec) %df -h /tmp Filesystem Size Used Avail Capacity Mounted on tmpfs 2.0G 1.5G 542M 74% /tmp % top Mem: 2559M Active, 514M Inact, 4506M Wired, 1656K Cache, 2160K Buf, 274M Fr= ee Swap: 2048M Total, 39M Used, 2009M Free, 1% Inuse So tmpfs made the ZFS ARC cache shrink, without swapping. I did not test filling my active memory to see if the max tmpfs size was shrinking. Cheers ! > > Patch: > http://marc.info/?l=3Dfreebsd-fs&m=3D129747367322954&w=3D2 > https://github.com/glk/freebsd-head/tree/tmpfs > > Thanks, > Gleb. > --=20 Olivier Smedts=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 _ =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 ASCII ribbon campaign ( ) e-mail: olivier@gid0.org=A0 =A0 =A0 =A0 - against HTML email & vCards=A0 X www: http://www.gid0.org=A0 =A0 - against proprietary attachments / \ =A0 "Il y a seulement 10 sortes de gens dans le monde : =A0 ceux qui comprennent le binaire, =A0 et ceux qui ne le comprennent pas." From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 19:30:10 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2C12106566B for ; Tue, 18 Oct 2011 19:30:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BCBFF8FC12 for ; Tue, 18 Oct 2011 19:30:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9IJUAtc060289 for ; Tue, 18 Oct 2011 19:30:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9IJUAO6060285; Tue, 18 Oct 2011 19:30:10 GMT (envelope-from gnats) Date: Tue, 18 Oct 2011 19:30:10 GMT Message-Id: <201110181930.p9IJUAO6060285@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: 3zstbn24xn@snkmail.com Cc: Subject: Re: kern/160777: [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/import on 9.0-BETA2/amd64 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: 3zstbn24xn@snkmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 19:30:10 -0000 The following reply was made to PR kern/160777; it has been noted by GNATS. From: 3zstbn24xn@snkmail.com To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/160777: [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/import on 9.0-BETA2/amd64 Date: Tue, 18 Oct 2011 19:00:58 +0000 The following kernel trace may be of relevance (console output shown below). I receive this on 9.0-BETA3. lock order reversal: 1st 0xfffffe000656c278 zfs (zfs) @ /usr/src/sys/kern/vfs_vnops.c:618 2nd 0xfffffe017795e098 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2134 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0x109c ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 vget() at vget+0x7b vm_fault_hold() at vm_fault_hold+0x1976 trap_pfault() at trap_pfault+0x118 trap() at trap+0x39b calltrap() at calltrap+0x8 --- trap 0xc, rip = 0xffffffff80b0aa8d, rsp = 0xffffff82331b5640, rbp = 0xffffff82331b56a0 --- copyin() at copyin+0x3d zfs_freebsd_write() at zfs_freebsd_write+0x46f VOP_WRITE_APV() at VOP_WRITE_APV+0x103 vn_write() at vn_write+0x2a2 dofilewrite() at dofilewrite+0x85 kern_writev() at kern_writev+0x6c sys_write() at sys_write+0x55 amd64_syscall() at amd64_syscall+0x3ba Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x80094533c, rsp = 0x7fffffffd9d8, rbp = 0x80065b000 --- From owner-freebsd-fs@FreeBSD.ORG Tue Oct 18 21:49:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BFEC106564A; Tue, 18 Oct 2011 21:49:10 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id DDE608FC15; Tue, 18 Oct 2011 21:49:09 +0000 (UTC) Received: by bkbzu17 with SMTP id zu17so1701178bkb.13 for ; Tue, 18 Oct 2011 14:49:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=3RSAw/uH7jgEnEyW5qUid/L+zTWTbwHYuROS800fAAs=; b=KNpkrgga9aIq1jY6K20WTc8W48dgukZ6qRY5M+yjGBEbL7jFyg3jXmRFGOk5s+IYCK rvLBoQNQL5jGCTfHQXOhKHJbPIFsnYnBF/V9rXnqn+rXOAD1sIazk/dnGXURHI11R9vU 7Wo9MkefqOSxXFVdrNbWeHjhZJv9x5YOqoj38= Received: by 10.204.157.142 with SMTP id b14mr3103164bkx.44.1318974547777; Tue, 18 Oct 2011 14:49:07 -0700 (PDT) Received: from localhost ([78.157.92.5]) by mx.google.com with ESMTPS id z9sm3661323bkn.7.2011.10.18.14.49.05 (version=SSLv3 cipher=OTHER); Tue, 18 Oct 2011 14:49:06 -0700 (PDT) Date: Wed, 19 Oct 2011 00:46:34 +0300 From: Gleb Kurtsou To: Olivier Smedts Message-ID: <20111018214634.GA55276@tops> References: <20111002020231.GA70864@icarus.home.lan> <20111005092603.GA1874@tops> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: "freebsd-fs@freebsd.org" , Ivan Voras Subject: Re: is TMPFS still highly experimental? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2011 21:49:10 -0000 On (18/10/2011 19:46), Olivier Smedts wrote: > 2011/10/5 Gleb Kurtsou : > > Free RAM is a bit tricky with virtual memory and overcommit support all > > over the place. There are at least 3 memory hungry subsystems: buffer > > cache, ZFS ARC, tmpfs. > > > > For the first two there is defined maximum size and they can be shrunk > > in low memory situations. Tmpfs grows as much as it can trying to > > calculate "free" memory available. Another difference is that tmpfs > > can't be shrunk in low memory situation. > > > > I proposed a patch changing tmpfs memory allocation: > > - Define maximum file system size (RAM/2 by default) > > - Don't try to check if free memory available, check free swap > >  instead and allocate more aggressively, i.e. allocate until > >  swap or file system limit is reached. > > Patch tested and approved ! I did not test the maximum tmpfs default > size because I allocated a max size in my fstab. > > %cat /etc/fstab > none /tmp tmpfs rw,mode=1777,size=2147483648 0 0 You may specify human friendly size, e.g. size=2G. I'll add live tmpfs resize once decision is made that patch can be committed. > > %df -h /tmp > Filesystem Size Used Avail Capacity Mounted on > tmpfs 2.0G 124k 2G 0% /tmp > > Mem: 622M Active, 351M Inact, 6491M Wired, 4940K Cache, 2160K Buf, 385M Free > Swap: 2048M Total, 36M Used, 2012M Free, 1% Inuse > > (ZFS is using all my wired memory, the ARC is now full, and I deleted > my nearly-never-touched 8G swap in favor of a 2G swap) > > A little test now : > %dd if=/dev/zero of=/tmp/test bs=1M count=1500 > 1500+0 records in > 1500+0 records out > 1572864000 bytes transferred in 0.763368 secs (2060427243 bytes/sec) > %df -h /tmp > Filesystem Size Used Avail Capacity Mounted on > tmpfs 2.0G 1.5G 542M 74% /tmp > % top > Mem: 2559M Active, 514M Inact, 4506M Wired, 1656K Cache, 2160K Buf, 274M Free > Swap: 2048M Total, 39M Used, 2009M Free, 1% Inuse > > So tmpfs made the ZFS ARC cache shrink, without swapping. I did not > test filling my active memory to see if the max tmpfs size was > shrinking. tmpfs size won't change. You'll be able to write to tmpfs until either filesystem size or swap limit reached. It's for administrator to decide how large tmpfs can grow. Simply put, there is no way to compete with ZFS and buffer cache in trying to use all "free" memory, unlike those tmpfs data can't be freed when needed. > > Cheers ! > > > > > Patch: > > http://marc.info/?l=freebsd-fs&m=129747367322954&w=2 > > https://github.com/glk/freebsd-head/tree/tmpfs > > > > Thanks, > > Gleb. > > > > > -- > Olivier Smedts                                                 _ >                                         ASCII ribbon campaign ( ) > e-mail: olivier@gid0.org        - against HTML email & vCards  X > www: http://www.gid0.org    - against proprietary attachments / \ > >   "Il y a seulement 10 sortes de gens dans le monde : >   ceux qui comprennent le binaire, >   et ceux qui ne le comprennent pas." From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 02:23:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DF2BA106566B for ; Wed, 19 Oct 2011 02:23:22 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7DBB08FC0A for ; Wed, 19 Oct 2011 02:23:22 +0000 (UTC) X-AuditID: 12074424-b7ef76d0000008dc-21-4e9e34997345 Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 0B.01.02268.9943E9E4; Tue, 18 Oct 2011 22:23:21 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id p9J2NLOg007413; Tue, 18 Oct 2011 22:23:21 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p9J2NJcC015843 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 18 Oct 2011 22:23:20 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p9J2NIOR005263; Tue, 18 Oct 2011 22:23:18 -0400 (EDT) Date: Tue, 18 Oct 2011 22:23:18 -0400 (EDT) From: Benjamin Kaduk To: Jeremy Chadwick In-Reply-To: <20111018042838.GA6246@icarus.home.lan> Message-ID: References: <4E9AE725.4040001@gmail.com> <20111018042838.GA6246@icarus.home.lan> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrNIsWRmVeSWpSXmKPExsUixCmqrDvTZJ6fwfz/+hbHHv9ks2j8cZrd gcljxqf5LB5rrl5lDWCK4rJJSc3JLEst0rdL4Mp42/2fveACe8W3m6uYGhib2LoYOTkkBEwk 1lztYoGwxSQu3FsPFOfiEBLYxyjxadZ1sISQwAZGiZP7giESB5gk3m/bwwzhNDBKHP26gRWk ikVAW+L9k4WMIDabgIrEzDcbwVaICOhJrF21A8jm4GAWkJK4s7YCJCws4Cbx98RFFpAwJ9AV f7/6gJi8AvYSyz44QEy/xSixouslO0i5qICOxOr9U8Du4RUQlDg58wmYzSxgKXHuz3W2CYyC s5CkZiFJLWBkWsUom5JbpZubmJlTnJqsW5ycmJeXWqRrrpebWaKXmlK6iREUpuwuKjsYmw8p HWIU4GBU4uHdITfPT4g1say4MvcQoyQHk5IorxAwyIX4kvJTKjMSizPii0pzUosPMUpwMCuJ 8N7hAMrxpiRWVqUW5cOkpDlYlMR5bXY6+AkJpCeWpGanphakFsFkZTg4lCR4OUGGChalpqdW pGXmlCCkmTg4QYbzAA0/ZwwyvLggMbc4Mx0if4pRUUqc9xZIQgAkkVGaB9cLSyOvGMWBXhHm FQRZwQNMQXDdr4AGMwENPqo4F2RwSSJCSqqBUV6hvCH5IO/Dxv8HGnVfKv7yStbzdqnYWHzj cVnP7FmHV4q8LHWccmXDhVPF5YbiTNdTam9Jz45dmXU9RtizydaS3yfKqadC7K79cqMHkwJK E1flLZrH9eXvbyn/PVO0D08Qzq/g/BZSu6AtViY/fom78r5p+uaZeSZzQn3+pZ/fsFr5dWil EktxRqKhFnNRcSIAJ1iAlf4CAAA= Cc: freebsd-fs@freebsd.org Subject: network filesystems (was Re: [ZFS] Using SSD with partitions) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 02:23:22 -0000 On Mon, 17 Oct 2011, Jeremy Chadwick wrote: > > When it comes to networked filesystems on UNIX, we have very little > choice. NFS is the main one. Then there's Stanford's Coda filesystem > thing, or maybe that's now part of AFS, I don't know. Then there's Coda and AFS are different codebases, but implemenent similar sorts of things. I believe that Coda is still "research-grade", and I know that OpenAFS is not ready for production deployment on FreeBSD. (But I'm working on it.) -Ben Kaduk > sshfs, which sounds wonderful until you realise all the dependencies and > nuances involved (mainly due to use of fuse, which we know on FreeBSD is > not so great). Then there's Samba (CIFS/SMB, and now with Samba 3.6 > offering SMB2 for Windows 7 clients), but that gets into issues of > security and cannot be forwarded via SSH (e.g. VPN would be needed) > given all its protocol some of which are UDP (not sure what the state of > NetBIOS is). From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 03:49:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CFFAD1065675 for ; Wed, 19 Oct 2011 03:49:56 +0000 (UTC) (envelope-from jwd@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BFBC28FC17 for ; Wed, 19 Oct 2011 03:49:56 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9J3nuai032929 for ; Wed, 19 Oct 2011 03:49:56 GMT (envelope-from jwd@freefall.freebsd.org) Received: (from jwd@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9J3nuv8032928 for freebsd-fs@freebsd.org; Wed, 19 Oct 2011 03:49:56 GMT (envelope-from jwd) Date: Wed, 19 Oct 2011 03:49:56 +0000 From: John To: freebsd-fs@freebsd.org Message-ID: <20111019034956.GA8345@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: nfsstats for new nfsserver X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 03:49:56 -0000 Hi Folks, I've been looking into different performance aspects of running the new nfsserver servering out zfs filesystems with 9. I've run into a nfsstat question I thought I would ask about. >From nfsstat on a system that's been up for a few hours: # nfsstat Client Info: ... deleted ... all 0. Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove 1014376791 95502 1815135267 10181 8613463 6005951 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 240 0 0 0 0 47964 0 547155 Mknod Fsstat Fsinfo PathConf Commit 0 595932 45 0 74154 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idem Misses 6308 0 3852 -1448802368 Server Write Gathering: WriteOps WriteRPC Opsaved 6005951 6005951 0 The 'Misses' value is very large. When looking at the source, if I'm following the code correctly (and I might not be), would it make sense to try increasing the size of the cache, or simply disabling it? Can do either - looking for opinions. The Opsaved value being 0, would it make sense to simply disable gathering also? Last, just more of a comment, would it make sense to go ahead and treat these values as unsigned? They'll still wrap, but they would stay positive. Thanks, John From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 06:07:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88BDA106564A for ; Wed, 19 Oct 2011 06:07:21 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU [18.9.25.13]) by mx1.freebsd.org (Postfix) with ESMTP id 2B1C78FC0C for ; Wed, 19 Oct 2011 06:07:20 +0000 (UTC) X-AuditID: 1209190d-b7f726d0000008d1-a3-4e9e6918e402 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP id 6F.7C.02257.8196E9E4; Wed, 19 Oct 2011 02:07:20 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id p9J67Ko2008830; Wed, 19 Oct 2011 02:07:20 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p9J67Ieh008769 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 19 Oct 2011 02:07:19 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p9J67IUA010311; Wed, 19 Oct 2011 02:07:18 -0400 (EDT) Date: Wed, 19 Oct 2011 02:07:17 -0400 (EDT) From: Benjamin Kaduk To: rmacklem@freebsd.org Message-ID: User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRmVeSWpSXmKPExsUixG6nriuROc/P4EwXv8Wxxz/ZLOb+3c/o wOQx49N8lgDGKC6blNSczLLUIn27BK6MB7eOMhc0i1ccvuzRwHiRp4uRg0NCwETi6BrVLkZO IFNM4sK99WxdjFwcQgL7GCUmHtvDCOFsYJSY/HcKO4RzgEli2d+9jCAtQgINjBLTjweB2CwC 2hIvjj9lArHZBFQkZr7ZyAZiiwhISJy8d4wZZBuzgJTEnbUVIKawgLHE1DYLkApeAXuJa0eX s4LYogI6Eqv3T2GBiAtKnJz5BMxmFrCU+Lf2F+sERv5ZSFKzkKQWMDKtYpRNya3SzU3MzClO TdYtTk7My0st0jXSy80s0UtNKd3ECAo0TkneHYzvDiodYhTgYFTi4d0hN89PiDWxrLgy9xCj JAeTkiivWgZQiC8pP6UyI7E4I76oNCe1+BCjBAezkgjv4TSgHG9KYmVValE+TEqag0VJnLdw h4OfkEB6YklqdmpqQWoRTFaGg0NJgnc2yFDBotT01Iq0zJwShDQTByfIcB6g4bEgNbzFBYm5 xZnpEPlTjIpSQKNBEgIgiYzSPLheWCJ4xSgO9IowbwJIFQ8wicB1vwIazAQ0+KjiXJDBJYkI KakGRkeH9d8+756/PX7nX5V/BSlFZ2a9mM6+ijVpgmjvcwl92/fpiavOVM2XPL8g++iRR+vX xSQEV7MZ3wxhXL32gl+hz9dA3mf3r/ctsVn1j+vjuQhhJ+nlSRM2VWwVzmsSmXZl71LJ5Haf 7eZC4VLbGupPP1+zv1Nr7e/v8+4E5JhtbzFN1Zuxk1eJpTgj0VCLuag4EQCPPn543wIAAA== Cc: freebsd-fs@freebsd.org Subject: lock status of dvp in lookup error return? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 06:07:21 -0000 Hi Rick, In tracking down a panic trying to recursively lock a vnode in openafs, I started questioning my behavior in the ISDOTDOT case, in particular whether to drop the dvp lock before the actual call over the network; this naturally led me to look at the NFS code as a reference. Unfortunately, this left me more confused than when I began ... sys/fs/nfs_clvnops.c, in nfs_lookup(): 1211 if (flags & ISDOTDOT) { 1212 ltype = NFSVOPISLOCKED(dvp); 1213 error = vfs_busy(mp, MBF_NOWAIT); 1214 if (error != 0) { 1215 vfs_ref(mp); 1216 NFSVOPUNLOCK(dvp, 0); 1217 error = vfs_busy(mp, 0); 1218 NFSVOPLOCK(dvp, ltype | LK_RETRY); If we fail to busy the mountpoint, drop the directory lock and try again, then relock dvp afterward. 1219 vfs_rel(mp); 1220 if (error == 0 && (dvp->v_iflag & VI_DOOMED)) { 1221 vfs_unbusy(mp); 1222 error = ENOENT; 1223 } 1224 if (error != 0) 1225 return (error); But if the second vfs_busy failed, or dvp is DOOMED, return with dvp locked. 1226 } 1227 NFSVOPUNLOCK(dvp, 0); But now we always unlock dvp. 1228 error = nfscl_nget(mp, dvp, nfhp, cnp, td, &np, NULL, 1229 cnp->cn_lkflags); The call to the network (?) 1230 if (error == 0) 1231 newvp = NFSTOV(np); 1232 vfs_unbusy(mp); 1233 if (newvp != dvp) 1234 NFSVOPLOCK(dvp, ltype | LK_RETRY); 1235 if (dvp->v_iflag & VI_DOOMED) { 1236 if (error == 0) { 1237 if (newvp == dvp) 1238 vrele(newvp); 1239 else 1240 vput(newvp); 1241 } 1242 error = ENOENT; 1243 } 1244 if (error != 0) 1245 return (error); And here if there was an error hearing from the network, we return with dvp still unlocked. 1246 if (attrflag) 1247 (void) nfscl_loadattrcache(&newvp, &nfsva, NULL, NULL, 1248 0, 1); So, I'm still confused about whether I should be unlocking dvp in the error case for ISDOTDOT (though presumably looking at other filesystems would help). This inconsistency in the NFS client looks like a bug at my current level of understanding -- what do you think? Thanks, Ben Kaduk From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 06:21:45 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B4B41065673 for ; Wed, 19 Oct 2011 06:21:45 +0000 (UTC) (envelope-from florian@wagner-flo.net) Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net [213.165.81.202]) by mx1.freebsd.org (Postfix) with ESMTP id 09A8D8FC08 for ; Wed, 19 Oct 2011 06:21:44 +0000 (UTC) Received: from auedv3.syscomp.de (umbracor.wagner-flo.net [127.0.0.1]) by umbracor.wagner-flo.net (Postfix) with ESMTPSA id D6B373C06C30; Wed, 19 Oct 2011 08:21:46 +0200 (CEST) Date: Wed, 19 Oct 2011 08:21:39 +0200 From: Florian Wagner To: Andriy Gapon Message-ID: <20111019082139.1661868e@auedv3.syscomp.de> In-Reply-To: <4E9ACA9F.5090308@FreeBSD.org> References: <20111015214347.09f68e4e@naclador.mos32.de> <4E9ACA9F.5090308@FreeBSD.org> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/MqGgbxzbrnCGjY92aocJCb="; protocol="application/pgp-signature" Cc: freebsd-fs@FreeBSD.org Subject: Re: Extending zfsboot.c to allow selecting filesystem from boot.config X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 06:21:45 -0000 --Sig_/MqGgbxzbrnCGjY92aocJCb= Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable > on 15/10/2011 22:43 Florian Wagner said the following: > > Hi, > >=20 > > from looking at the code in sys/boot/i386/zfsboot/zfsboot.c the ZFS > > aware boot block already allows to select pool to load the kernel > > from by adding : to the boot.config. As this > > code calls the zfs_mount_pool function it will look for the bootfs > > property on the new pool or use its root dataset to get the file > > from there. > >=20 > > How much work would it be to extend the loader to also allow > > selecting a ZFS filesystem? > >=20 > > What I'd like to do is place a boot.config on the (otherwise empty) > > root of my system pool and then tell it to get the loader from > > another filesystem by putting > > "rpool/root/stable-8-r226381:/boot/zfsloader" in there. >=20 > Please check out the following changes: > https://gitorious.org/~avg/freebsd/avgbsd/commit/8c3808c4bb2a2cd746db3e9c= 46871c9bdf943ef6 > https://gitorious.org/~avg/freebsd/avgbsd/commit/0b4279c0d366d9f2b5bb9d4c= 0dd3229d8936d92b > https://gitorious.org/~avg/freebsd/avgbsd/commit/b29ab78b079f27918de1683e= 88bcb1817a0e5969 > https://gitorious.org/~avg/freebsd/avgbsd/commit/f49add15516dfd582258b682= 0b8f0254cf9419a3 > https://gitorious.org/~avg/freebsd/avgbsd/commit/e072b443b0f59fe1ff54a70d= 2437d63698bbf597 > https://gitorious.org/~avg/freebsd/avgbsd/commit/f701760c10812c5b6925352f= b003408c19170063 Looks great! I've applied the patches to my checkout of Stable 8 and gave the resulting gptzfsboot and zfsloader a cursory try in a virtual machine. Commit f701760c10812c5b6925352fb003408c19170063 breaks the build of the non-ZFS-enabled bootcode. The syntax is wrong in the following snippet if LOADER_ZFS_SUPPORT is not defined. Moving the closing bracket ("};") right after the second #endif into the preprocessor conditional fixes that. @@ -52,14 +52,21 @@ u_int32_t howto; u_int32_t bootdev; u_int32_t bootflags; +#ifdef LOADER_ZFS_SUPPORT union { +#endif struct { u_int32_t pxeinfo; u_int32_t res2; }; +#ifdef LOADER_ZFS_SUPPORT uint64_t zfspool; +#endif }; u_int32_t bootinfo; +#ifdef LOADER_ZFS_SUPPORT + uint64_t zfsroot; +#endif } *kargs; The only thing I was a bit confused by is that on the boot prompt only the pool and filename to be booted are printed. Apart from that it worked as expected. Not having to set vfs.root.mountfrom in the loader is nice. Regards and thanks Florian Wagner --Sig_/MqGgbxzbrnCGjY92aocJCb= Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk6ebHMACgkQLvW/2gp2pPw0rQCeN81YhLpkyZtw+KyMScOOSl1s bxgAoILoMmdsz1lWUC9ex6wunDl+rPRA =F9Av -----END PGP SIGNATURE----- --Sig_/MqGgbxzbrnCGjY92aocJCb=-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 11:18:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E51A71065670 for ; Wed, 19 Oct 2011 11:18:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 7F6FD8FC15 for ; Wed, 19 Oct 2011 11:18:03 +0000 (UTC) Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p9JBHkVe036547 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Oct 2011 14:17:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from alf.home (kostik@localhost [127.0.0.1]) by alf.home (8.14.5/8.14.5) with ESMTP id p9JBHkDG063304; Wed, 19 Oct 2011 14:17:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by alf.home (8.14.5/8.14.5/Submit) id p9JBHksA063303; Wed, 19 Oct 2011 14:17:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Oct 2011 14:17:46 +0300 From: Kostik Belousov To: Benjamin Kaduk Message-ID: <20111019111746.GP50300@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="rQ7Ovc9/RBrrr0/1" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org Subject: Re: lock status of dvp in lookup error return? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 11:18:05 -0000 --rQ7Ovc9/RBrrr0/1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Oct 19, 2011 at 02:07:17AM -0400, Benjamin Kaduk wrote: > Hi Rick, >=20 > In tracking down a panic trying to recursively lock a vnode in openafs, I= =20 > started questioning my behavior in the ISDOTDOT case, in particular=20 > whether to drop the dvp lock before the actual call over the network; thi= s=20 > naturally led me to look at the NFS code as a reference. > Unfortunately, this left me more confused than when I began ... >=20 > sys/fs/nfs_clvnops.c, in nfs_lookup(): > 1211 if (flags & ISDOTDOT) { > 1212 ltype =3D NFSVOPISLOCKED(dvp); > 1213 error =3D vfs_busy(mp, MBF_NOWAIT); > 1214 if (error !=3D 0) { > 1215 vfs_ref(mp); > 1216 NFSVOPUNLOCK(dvp, 0); > 1217 error =3D vfs_busy(mp, 0); > 1218 NFSVOPLOCK(dvp, ltype | LK_RETRY); >=20 > If we fail to busy the mountpoint, drop the directory lock and try again,= =20 > then relock dvp afterward. >=20 > 1219 vfs_rel(mp); > 1220 if (error =3D=3D 0 && (dvp->v_iflag & VI_DOOM= ED)) { > 1221 vfs_unbusy(mp); > 1222 error =3D ENOENT; > 1223 } > 1224 if (error !=3D 0) > 1225 return (error); >=20 > But if the second vfs_busy failed, or dvp is DOOMED, return with dvp=20 > locked. >=20 > 1226 } > 1227 NFSVOPUNLOCK(dvp, 0); >=20 > But now we always unlock dvp. >=20 > 1228 error =3D nfscl_nget(mp, dvp, nfhp, cnp, td, &np,= =20 > NULL, > 1229 cnp->cn_lkflags); >=20 > The call to the network (?) >=20 > 1230 if (error =3D=3D 0) > 1231 newvp =3D NFSTOV(np); > 1232 vfs_unbusy(mp); > 1233 if (newvp !=3D dvp) > 1234 NFSVOPLOCK(dvp, ltype | LK_RETRY); Did you missed line 1234 ? The code is the copy of the vn_vget_ino(). The logic in the function might be slightly easier to follow. > 1235 if (dvp->v_iflag & VI_DOOMED) { > 1236 if (error =3D=3D 0) { > 1237 if (newvp =3D=3D dvp) > 1238 vrele(newvp); > 1239 else > 1240 vput(newvp); > 1241 } > 1242 error =3D ENOENT; > 1243 } > 1244 if (error !=3D 0) > 1245 return (error); >=20 > And here if there was an error hearing from the network, we return with= =20 > dvp still unlocked. >=20 > 1246 if (attrflag) > 1247 (void) nfscl_loadattrcache(&newvp, &nfsva,=20 > NULL, NULL, > 1248 0, 1); >=20 >=20 > So, I'm still confused about whether I should be unlocking dvp in the=20 > error case for ISDOTDOT (though presumably looking at other filesystems= =20 > would help). This inconsistency in the NFS client looks like a bug at my= =20 > current level of understanding -- what do you think? >=20 > Thanks, >=20 > Ben Kaduk > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --rQ7Ovc9/RBrrr0/1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk6esdoACgkQC3+MBN1Mb4gc7wCg45DKU1WoaUBlwI1FrJl1HnPy pmsAnRkGuubGe5QT55xcNtTpq69GoSFQ =5Bav -----END PGP SIGNATURE----- --rQ7Ovc9/RBrrr0/1-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 14:53:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7A0C1065673; Wed, 19 Oct 2011 14:53:52 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-1.mit.edu (DMZ-MAILSEC-SCANNER-1.MIT.EDU [18.9.25.12]) by mx1.freebsd.org (Postfix) with ESMTP id 32B868FC12; Wed, 19 Oct 2011 14:53:51 +0000 (UTC) X-AuditID: 1209190c-b7fd26d0000008df-52-4e9ee47f8e7b Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id 48.C4.02271.F74EE9E4; Wed, 19 Oct 2011 10:53:51 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id p9JErpeG023501; Wed, 19 Oct 2011 10:53:51 -0400 Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p9JErnOD026397 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 19 Oct 2011 10:53:50 -0400 (EDT) Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id p9JErnEL016061; Wed, 19 Oct 2011 10:53:49 -0400 (EDT) Date: Wed, 19 Oct 2011 10:53:48 -0400 (EDT) From: Benjamin Kaduk To: Kostik Belousov In-Reply-To: <20111019111746.GP50300@deviant.kiev.zoral.com.ua> Message-ID: References: <20111019111746.GP50300@deviant.kiev.zoral.com.ua> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrMIsWRmVeSWpSXmKPExsUixCmqrVv/ZJ6fwYOr8hbHHv9ks2iY9pjN Yu7f/YwOzB4zPs1n8dg56y57AFMUl01Kak5mWWqRvl0CV8afu00sBfOFKp4fbmJrYJzE3cXI ySEhYCJxdO1RNghbTOLCvfVANheHkMA+Rok9Ta1MEM4GRom3jd9YIJwDTBLbvh1kh3AaGCV6 N3eyg/SzCGhLLF+9E8xmE1CRmPlmI9hcEQFNiWub7jOB2MwCBhIz2uYzgtjCAuYS69ftYwWx OQXsJWZtPgdWwwtkb1r+CCwuJFAkMevNfLC4qICOxOr9U1ggagQlTs58wgIx01Li3J/rbBMY BWchSc1CklrAyLSKUTYlt0o3NzEzpzg1Wbc4OTEvL7VI11AvN7NELzWldBMjOGwleXYwvjmo dIhRgINRiYd3h9w8PyHWxLLiytxDjJIcTEqivM2PgUJ8SfkplRmJxRnxRaU5qcWHGCU4mJVE eF/dAMrxpiRWVqUW5cOkpDlYlMR5D+5w8BMSSE8sSc1OTS1ILYLJynBwKEnw/gMZKliUmp5a kZaZU4KQZuLgBBnOAzT8O0gNb3FBYm5xZjpE/hSjopQ473mQhABIIqM0D64XllZeMYoDvSLM +xukigeYkuC6XwENZgIafFRxLsjgkkSElFQDo2LU+tgmwcvBsz/M/qbSI7u5snL9vs02c5dE BJ/6n6Eur6L4eZLEjv9xIo/OZz2arN/PeSP1McuLD7I54X2pTko2KR5Pzy+a0TxlG29a+Cu2 dFGOyLJXk7nXryh6GRig9r/SmFvS2u6VTffcAHbuqP+qtTP7L8z3eNr1L+74S95zgg/6AwqL lFiKMxINtZiLihMBdIwNewYDAAA= Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org Subject: Re: lock status of dvp in lookup error return? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 14:53:52 -0000 On Wed, 19 Oct 2011, Kostik Belousov wrote: > On Wed, Oct 19, 2011 at 02:07:17AM -0400, Benjamin Kaduk wrote: >> Hi Rick, >> >> In tracking down a panic trying to recursively lock a vnode in openafs, I >> started questioning my behavior in the ISDOTDOT case, in particular >> whether to drop the dvp lock before the actual call over the network; this >> naturally led me to look at the NFS code as a reference. >> Unfortunately, this left me more confused than when I began ... >> >> sys/fs/nfs_clvnops.c, in nfs_lookup(): >> 1211 if (flags & ISDOTDOT) { >> 1212 ltype = NFSVOPISLOCKED(dvp); >> 1213 error = vfs_busy(mp, MBF_NOWAIT); >> 1214 if (error != 0) { >> 1215 vfs_ref(mp); >> 1216 NFSVOPUNLOCK(dvp, 0); >> 1217 error = vfs_busy(mp, 0); >> 1218 NFSVOPLOCK(dvp, ltype | LK_RETRY); >> >> If we fail to busy the mountpoint, drop the directory lock and try again, >> then relock dvp afterward. >> >> 1219 vfs_rel(mp); >> 1220 if (error == 0 && (dvp->v_iflag & VI_DOOMED)) { >> 1221 vfs_unbusy(mp); >> 1222 error = ENOENT; >> 1223 } >> 1224 if (error != 0) >> 1225 return (error); >> >> But if the second vfs_busy failed, or dvp is DOOMED, return with dvp >> locked. >> >> 1226 } >> 1227 NFSVOPUNLOCK(dvp, 0); >> >> But now we always unlock dvp. >> >> 1228 error = nfscl_nget(mp, dvp, nfhp, cnp, td, &np, >> NULL, >> 1229 cnp->cn_lkflags); >> >> The call to the network (?) >> >> 1230 if (error == 0) >> 1231 newvp = NFSTOV(np); >> 1232 vfs_unbusy(mp); >> 1233 if (newvp != dvp) >> 1234 NFSVOPLOCK(dvp, ltype | LK_RETRY); > Did you missed line 1234 ? > > The code is the copy of the vn_vget_ino(). The logic in the function > might be slightly easier to follow. Ah, I did miss that, thanks. Maybe 0200h is not the best time to do these things ... -Ben From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 15:39:53 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C5F851065687 for ; Wed, 19 Oct 2011 15:39:53 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 171338FC1D for ; Wed, 19 Oct 2011 15:39:52 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA29079; Wed, 19 Oct 2011 18:39:50 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1RGYFC-000GOA-33; Wed, 19 Oct 2011 18:39:50 +0300 Message-ID: <4E9EEF45.9020404@FreeBSD.org> Date: Wed, 19 Oct 2011 18:39:49 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1 MIME-Version: 1.0 To: Florian Wagner References: <20111015214347.09f68e4e@naclador.mos32.de> <4E9ACA9F.5090308@FreeBSD.org> <20111019082139.1661868e@auedv3.syscomp.de> In-Reply-To: <20111019082139.1661868e@auedv3.syscomp.de> X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: Extending zfsboot.c to allow selecting filesystem from boot.config X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 15:39:53 -0000 on 19/10/2011 09:21 Florian Wagner said the following: Somewhere about this line an attribution of my text was expected :-) >> on 15/10/2011 22:43 Florian Wagner said the following: >>> Hi, >>> >>> from looking at the code in sys/boot/i386/zfsboot/zfsboot.c the ZFS >>> aware boot block already allows to select pool to load the kernel from >>> by adding : to the boot.config. As this code calls >>> the zfs_mount_pool function it will look for the bootfs property on the >>> new pool or use its root dataset to get the file from there. >>> >>> How much work would it be to extend the loader to also allow selecting >>> a ZFS filesystem? >>> >>> What I'd like to do is place a boot.config on the (otherwise empty) >>> root of my system pool and then tell it to get the loader from another >>> filesystem by putting "rpool/root/stable-8-r226381:/boot/zfsloader" in >>> there. >> >> Please check out the following changes: >> https://gitorious.org/~avg/freebsd/avgbsd/commit/8c3808c4bb2a2cd746db3e9c46871c9bdf943ef6 >> >> https://gitorious.org/~avg/freebsd/avgbsd/commit/0b4279c0d366d9f2b5bb9d4c0dd3229d8936d92b >> https://gitorious.org/~avg/freebsd/avgbsd/commit/b29ab78b079f27918de1683e88bcb1817a0e5969 >> >> https://gitorious.org/~avg/freebsd/avgbsd/commit/f49add15516dfd582258b6820b8f0254cf9419a3 >> https://gitorious.org/~avg/freebsd/avgbsd/commit/e072b443b0f59fe1ff54a70d2437d63698bbf597 >> >> https://gitorious.org/~avg/freebsd/avgbsd/commit/f701760c10812c5b6925352fb003408c19170063 > > Looks great! > > I've applied the patches to my checkout of Stable 8 and gave the resulting > gptzfsboot and zfsloader a cursory try in a virtual machine. Thank you for testing! > Commit f701760c10812c5b6925352fb003408c19170063 breaks the build of the > non-ZFS-enabled bootcode. The syntax is wrong in the following snippet if > LOADER_ZFS_SUPPORT is not defined. Moving the closing bracket ("};") right > after the second #endif into the preprocessor conditional fixes that. Thank you for reporting this! > @@ -52,14 +52,21 @@ u_int32_t howto; u_int32_t bootdev; u_int32_t > bootflags; +#ifdef LOADER_ZFS_SUPPORT union { +#endif struct { u_int32_t > pxeinfo; u_int32_t res2; }; +#ifdef LOADER_ZFS_SUPPORT uint64_t zfspool; > +#endif }; u_int32_t bootinfo; +#ifdef LOADER_ZFS_SUPPORT + uint64_t > zfsroot; +#endif } *kargs; > > > The only thing I was a bit confused by is that on the boot prompt only the > pool and filename to be booted are printed. Do you mean the (gpt)zfsboot prompt? > Apart from that it worked as expected. Not having to set vfs.root.mountfrom > in the loader is nice. Thanks! -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 15:55:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84010106564A for ; Wed, 19 Oct 2011 15:55:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 17CBA8FC0C for ; Wed, 19 Oct 2011 15:55:39 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAMrynk6DaFvO/2dsb2JhbABEhHWlBIFuAQEBAwEBAQEgBCcgCwUWGAICDRkCKQEJJgYIBwQBHASHXwijcJIBgTCFV4EUBJFkghqRcw X-IronPort-AV: E=Sophos;i="4.69,373,1315195200"; d="scan'208";a="140488433" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Oct 2011 11:55:32 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A07ACB3F7E; Wed, 19 Oct 2011 11:55:32 -0400 (EDT) Date: Wed, 19 Oct 2011 11:55:32 -0400 (EDT) From: Rick Macklem To: John Message-ID: <1951661642.102471.1319039732646.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20111019034956.GA8345@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: nfsstats for new nfsserver X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 15:55:41 -0000 John wrote: > Hi Folks, > > I've been looking into different performance aspects of running > the new nfsserver servering out zfs filesystems with 9. > > I've run into a nfsstat question I thought I would ask about. > >From nfsstat on a system that's been up for a few hours: > > # nfsstat > Client Info: > > ... deleted ... all 0. > > Server Info: > Getattr Setattr Lookup Readlink Read Write Create Remove > 1014376791 95502 1815135267 10181 8613463 6005951 0 0 > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access > 240 0 0 0 0 47964 0 547155 > Mknod Fsstat Fsinfo PathConf Commit > 0 595932 45 0 74154 > Server Ret-Failed > 0 > Server Faults > 0 > Server Cache Stats: > Inprog Idem Non-idem Misses > 6308 0 3852 -1448802368 > Server Write Gathering: > WriteOps WriteRPC Opsaved > 6005951 6005951 0 > > > The 'Misses' value is very large. When looking at the source, if I'm > following the code correctly (and I might not be), would it make sense > to try increasing the size of the cache, or simply disabling it? Can > do > either - looking for opinions. > Well, if everything is working well, you should always have misses and nothing else. A hit means that an RPC has been retried. On TCP, this implies that there should have been a network partitioning for a significant period of time (or the client is retrying too agressively, since the TCP layer should take care of packet loss, etc). I'm actually surprised you see any hits. Are you using UDP? (In that case there probably will be spurious RPC retries and the cache avoids re-doing the RPC for those cases.) The DRC is weird, in that it does not improve performance, but correctness. Since the overhead should be minimal, I wouldn't disable it, especially if you are using UDP. > The Opsaved value being 0, would it make sense to simply disable > gathering also? > The new server doesn't do write gathering. It was mainly useful for NFSv2, where all writes were synchronous. --> The Opsaved field will always be 0. It's there for the default "nfsstat" output for backwards compatibility only. If you use "nfsstat -e -s" you'll get the newer style of server stats. > Last, just more of a comment, would it make sense to go ahead and > treat > these values as unsigned? They'll still wrap, but they would stay > positive. > Yea, I suppose the printfs should be changed someday, rick > Thanks, > John > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 16:04:16 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44213106564A for ; Wed, 19 Oct 2011 16:04:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E36CA8FC0A for ; Wed, 19 Oct 2011 16:04:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: At8EANTznk6DaFvO/2dsb2JhbABEhHWhRYM/gW4BAQEEAQEBIAQnIAsbDgMDAQIBERkCBB8GAQkeCAYIBwQBHASHZ6NukgGDMINXgRQEkWSCGoowh0M X-IronPort-AV: E=Sophos;i="4.69,373,1315195200"; d="scan'208";a="140490268" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Oct 2011 12:04:14 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 09A30B3F6A; Wed, 19 Oct 2011 12:04:15 -0400 (EDT) Date: Wed, 19 Oct 2011 12:04:15 -0400 (EDT) From: Rick Macklem To: Mark Saad Message-ID: <436946680.103454.1319040255028.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <201109291600.p8TG0OI4040954@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_103453_1537264991.1319040255027" X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@FreeBSD.org Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent access over NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 16:04:16 -0000 ------=_Part_103453_1537264991.1319040255027 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Mark Saad wrote: > The following reply was made to PR kern/156168; it has been noted by > GNATS. > > From: Mark Saad > To: bug-followup@FreeBSD.org, niakrisn@gmail.com > Cc: > Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent > access > over NFS > Date: Thu, 29 Sep 2011 11:32:12 -0400 > > All > I am seeing a similar crash on 7.3-RELEASE-p2 amd64 when using > apache-1.3.34 with accf_httpd and a nfs docroot > The servers that have crashed are all FreeBSD 7.3-RELEASE amd64. > Hardware is HP Dl145 g2 > They have 2G of ram and 2G swap with one single core opteron cpu. > > > We are using the following sysctls . > > kern.ipc.maxsockbuf=2097152 > kern.ipc.nmbclusters=32768 > kern.ipc.somaxconn=1024 > kern.maxfiles=131072 > kern.maxfilesperproc=32768 > net.inet.tcp.inflight.enable=0 > net.inet.tcp.path_mtu_discovery=0 > net.inet.tcp.recvbuf_inc=524288 > net.inet.tcp.recvbuf_max=8388608 > net.inet.tcp.recvspace=32768 > net.inet.tcp.sendbuf_inc=16384 > net.inet.tcp.sendbuf_max=8388608 > net.inet.tcp.sendspace=32768 > net.inet.udp.recvspace=42080 > net.isr.direct=1 > vm.pmap.shpgperproc=600 > > > Up time prior to the crash was not the other system was up for 11 days > this one was 6 days. > > Here is the contents of my crash > > > [root@web29 /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x258 > fault code = supervisor read data, page not present > instruction pointer = 0x8:0xffffffff8051a66d > stack pointer = 0x10:0xffffff803e69b1c0 > frame pointer = 0x10:0xffffff0001b50ae0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 9336 (libhttpd.ep) > trap number = 12 > panic: page fault > cpuid = 0 > Uptime: 6d5h18m39s > Physical memory: 2034 MB > Dumping 1451 MB: 1436 1420 1404 1388 1372 1356 1340 1324 1308 1292 > 1276 1260 1244 1228 1212 1196 1180 1164 1148 1132 1116 1100 1084 1068 > 1052 1036 1020 1004 988 972 956 940 924 908 892 876 860 844 828 812 > 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540 > 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268 > 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12 > > Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from > /boot/kernel/accf_http.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/accf_http.ko > #0 doadump () at pcpu.h:195 > 195 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:195 > #1 0x0000000000000004 in ?? () > #2 0xffffffff805285f9 in boot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:418 > #3 0xffffffff80528a02 in panic (fmt=0x104
bounds>) at /usr/src/sys/kern/kern_shutdown.c:574 > #4 0xffffffff807ec813 in trap_fatal (frame=0xffffff0001b50ae0, > eva=Variable "eva" is not available. > ) at /usr/src/sys/amd64/amd64/trap.c:777 > #5 0xffffffff807ecbe5 in trap_pfault (frame=0xffffff803e69b110, > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:693 > #6 0xffffffff807ed50c in trap (frame=0xffffff803e69b110) at > /usr/src/sys/amd64/amd64/trap.c:464 > #7 0xffffffff807d614e in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:218 > #8 0xffffffff8051a66d in _mtx_lock_sleep (m=0xffffff002f3d7a80, > tid=18446742974226565856, opts=Variable "opts" is not available. > ) > at /usr/src/sys/kern/kern_mutex.c:339 > #9 0xffffffff80701f60 in clnt_dg_create (so=0xffffff00017755a0, > svcaddr=0xffffff803e69b310, program=100000, version=4, sendsz=Variable > "sendsz" is not available. > ) > at /usr/src/sys/rpc/clnt_dg.c:259 > #10 0xffffffff806e97c9 in nlm_get_rpc (sa=Variable "sa" is not > available. > ) at /usr/src/sys/nlm/nlm_prot_impl.c:327 > #11 0xffffffff806e9d39 in nlm_host_get_rpc (host=0xffffff0001705000) > at /usr/src/sys/nlm/nlm_prot_impl.c:1199 > #12 0xffffffff806e680f in nlm_clearlock (host=0xffffff0001705000, > ext=0xffffff803e69b9a0, vers=4, timo=0xffffff803e69b9d0, > retries=2147483647, vp=0xffffff004881edc8, op=2, > fl=0xffffff803e69bac0, flags=64, svid=9336, fhlen=32, > fh=0xffffff803e69b750, > size=689) at /usr/src/sys/nlm/nlm_advlock.c:943 > #13 0xffffffff806e7801 in nlm_advlock_internal (vp=0xffffff004881edc8, > id=Variable "id" is not available. > ) at /usr/src/sys/nlm/nlm_advlock.c:355 > #14 0xffffffff806e8166 in nlm_advlock (ap=Variable "ap" is not > available. > ) at /usr/src/sys/nlm/nlm_advlock.c:392 > #15 0xffffffff806ced28 in nfs_advlock (ap=0xffffff803e69ba90) at > /usr/src/sys/nfsclient/nfs_vnops.c:3153 > #16 0xffffffff804f40e2 in closef (fp=0xffffff0073716d80, > td=0xffffff0001b50ae0) at vnode_if.h:1036 > #17 0xffffffff804f462b in kern_close (td=0xffffff0001b50ae0, > fd=Variable "fd" is not available. > ) at /usr/src/sys/kern/kern_descrip.c:1125 > #18 0xffffffff807ece67 in syscall (frame=0xffffff803e69bc80) at > /usr/src/sys/amd64/amd64/trap.c:920 > #19 0xffffffff807d635b in Xfast_syscall () at > /usr/src/sys/amd64/amd64/exception.S:339 > #20 0x00000008009c5b1c in ?? () > Previous frame inner to this frame (corrupt stack?) > You could try the attached patch, which contains some of the changes in the newer versions of clnt_dg.c. (There have been many changes, so carrying them all across isn't practical, for me at least.) I have no way of testing this patch at this time, so all I did was compile it, rick > -- > mark saad | nonesuch@longcount.org > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" ------=_Part_103453_1537264991.1319040255027 Content-Type: text/x-patch; name=nlmdg7.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=nlmdg7.patch LS0tIHJwYy9jbG50X2RnLmMuc2F2CTIwMTEtMTAtMTkgMDk6Mzk6MjguMDAwMDAwMDAwIC0wNDAw CisrKyBycGMvY2xudF9kZy5jCTIwMTEtMTAtMTkgMDk6Mzk6MzkuMDAwMDAwMDAwIC0wNDAwCkBA IC0xMjAsOSArMTIwLDExIEBAIHN0cnVjdCBjdV9zb2NrZXQgewogCXN0cnVjdCBtdHgJCWNzX2xv Y2s7CiAJaW50CQkJY3NfcmVmczsJLyogQ291bnQgb2YgY2xpZW50cyAqLwogCXN0cnVjdCBjdV9y ZXF1ZXN0X2xpc3QJY3NfcGVuZGluZzsJLyogUmVxdWVzdHMgYXdhaXRpbmcgcmVwbGllcyAqLwot CQorCWludAkJCWNzX3VwY2FsbHJlZnM7CS8qIFJlZmNudCBvZiB1cGNhbGxzIGluIHByb2cuKi8K IH07CiAKK3N0YXRpYyB2b2lkIGNsbnRfZGdfdXBjYWxsc2RvbmUoc3RydWN0IHNvY2tldCAqLCBz dHJ1Y3QgY3Vfc29ja2V0ICopOworCiAvKgogICogUHJpdmF0ZSBkYXRhIGtlcHQgcGVyIGNsaWVu dCBoYW5kbGUKICAqLwpAQCAtMjc2LDYgKzI3OCw3IEBAIHJlY2hlY2tfc29ja2V0OgogCQl9CiAJ CW10eF9pbml0KCZjcy0+Y3NfbG9jaywgImNzLT5jc19sb2NrIiwgTlVMTCwgTVRYX0RFRik7CiAJ CWNzLT5jc19yZWZzID0gMTsKKwkJY3MtPmNzX3VwY2FsbHJlZnMgPSAwOwogCQlUQUlMUV9JTklU KCZjcy0+Y3NfcGVuZGluZyk7CiAJCXNvLT5zb191cGNhbGxhcmcgPSBjczsKIAkJc28tPnNvX3Vw Y2FsbCA9IGNsbnRfZGdfc291cGNhbGw7CkBAIC04MTEsMTggKzgxNCwyMyBAQCBjbG50X2RnX2Rl c3Ryb3koQ0xJRU5UICpjbCkKIAl3aGlsZSAoY3UtPmN1X3RocmVhZHMpCiAJCW1zbGVlcChjdSwg JmNzLT5jc19sb2NrLCAwLCAicnBjY2xvc2UiLCAwKTsKIAorCW10eF91bmxvY2soJmNzLT5jc19s b2NrKTsJCS8qIFRvIGF2b2lkIGEgTE9SLiAqLworCVNPQ0tCVUZfTE9DSygmY3UtPmN1X3NvY2tl dC0+c29fcmN2KTsKKwltdHhfbG9jaygmY3MtPmNzX2xvY2spOwogCWNzLT5jc19yZWZzLS07CiAJ aWYgKGNzLT5jc19yZWZzID09IDApIHsKLQkJbXR4X2Rlc3Ryb3koJmNzLT5jc19sb2NrKTsKLQkJ U09DS0JVRl9MT0NLKCZjdS0+Y3Vfc29ja2V0LT5zb19yY3YpOworCQltdHhfdW5sb2NrKCZjcy0+ Y3NfbG9jayk7CiAJCWN1LT5jdV9zb2NrZXQtPnNvX3VwY2FsbGFyZyA9IE5VTEw7CiAJCWN1LT5j dV9zb2NrZXQtPnNvX3VwY2FsbCA9IE5VTEw7CiAJCWN1LT5jdV9zb2NrZXQtPnNvX3Jjdi5zYl9m bGFncyAmPSB+U0JfVVBDQUxMOworCQljbG50X2RnX3VwY2FsbHNkb25lKGN1LT5jdV9zb2NrZXQs IGNzKTsKIAkJU09DS0JVRl9VTkxPQ0soJmN1LT5jdV9zb2NrZXQtPnNvX3Jjdik7CisJCW10eF9k ZXN0cm95KCZjcy0+Y3NfbG9jayk7CiAJCW1lbV9mcmVlKGNzLCBzaXplb2YoKmNzKSk7CiAJCWxh c3Rzb2NrZXRyZWYgPSBUUlVFOwogCX0gZWxzZSB7CiAJCW10eF91bmxvY2soJmNzLT5jc19sb2Nr KTsKKwkJU09DS0JVRl9VTkxPQ0soJmN1LT5jdV9zb2NrZXQtPnNvX3Jjdik7CiAJCWxhc3Rzb2Nr ZXRyZWYgPSBGQUxTRTsKIAl9CiAKQEAgLTg2Myw2ICs4NzEsOSBAQCBjbG50X2RnX3NvdXBjYWxs KHN0cnVjdCBzb2NrZXQgKnNvLCB2b2lkCiAJaW50IGVycm9yLCByY3ZmbGFnLCBmb3VuZHJlcTsK IAl1aW50MzJfdCB4aWQ7CiAKKwltdHhfbG9jaygmY3MtPmNzX2xvY2spOworCWNzLT5jc191cGNh bGxyZWZzKys7CisJbXR4X3VubG9jaygmY3MtPmNzX2xvY2spOwogCXVpby51aW9fcmVzaWQgPSAx MDAwMDAwMDAwOwogCXVpby51aW9fdGQgPSBjdXJ0aHJlYWQ7CiAJZG8gewpAQCAtOTM1LDUgKzk0 NiwyMiBAQCBjbG50X2RnX3NvdXBjYWxsKHN0cnVjdCBzb2NrZXQgKnNvLCB2b2lkCiAJCWlmICgh Zm91bmRyZXEpCiAJCQltX2ZyZWVtKG0pOwogCX0gd2hpbGUgKG0pOworCW10eF9sb2NrKCZjcy0+ Y3NfbG9jayk7CisJY3MtPmNzX3VwY2FsbHJlZnMtLTsKKwltdHhfdW5sb2NrKCZjcy0+Y3NfbG9j ayk7CisJd2FrZXVwKCZjcy0+Y3NfdXBjYWxscmVmcyk7CiB9CiAKKy8qCisgKiBXYWl0IGZvciBh bGwgdXBjYWxscyBpbiBwcm9ncmVzcyB0byBjb21wbGV0ZS4KKyAqLworc3RhdGljIHZvaWQKK2Ns bnRfZGdfdXBjYWxsc2RvbmUoc3RydWN0IHNvY2tldCAqc28sIHN0cnVjdCBjdV9zb2NrZXQgKmNz KQoreworCisJU09DS0JVRl9MT0NLX0FTU0VSVCgmc28tPnNvX3Jjdik7CisKKwl3aGlsZSAoY3Mt PmNzX3VwY2FsbHJlZnMgPiAwKQorCQkodm9pZCkgbXNsZWVwKCZjcy0+Y3NfdXBjYWxscmVmcywg U09DS0JVRl9NVFgoJnNvLT5zb19yY3YpLCAwLAorCQkgICAgInJwY2RndXAiLCAwKTsKK30K ------=_Part_103453_1537264991.1319040255027-- From owner-freebsd-fs@FreeBSD.ORG Wed Oct 19 16:21:33 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8DC32106564A; Wed, 19 Oct 2011 16:21:33 +0000 (UTC) (envelope-from florian@wagner-flo.net) Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net [213.165.81.202]) by mx1.freebsd.org (Postfix) with ESMTP id 4B20F8FC16; Wed, 19 Oct 2011 16:21:33 +0000 (UTC) Received: from naclador.mos32.de (ppp-188-174-59-72.dynamic.mnet-online.de [188.174.59.72]) by umbracor.wagner-flo.net (Postfix) with ESMTPSA id 700D53C06C30; Wed, 19 Oct 2011 18:21:35 +0200 (CEST) Date: Wed, 19 Oct 2011 18:21:30 +0200 From: Florian Wagner To: Andriy Gapon Message-ID: <20111019182130.27446750@naclador.mos32.de> In-Reply-To: <4E9EEF45.9020404@FreeBSD.org> References: <20111015214347.09f68e4e@naclador.mos32.de> <4E9ACA9F.5090308@FreeBSD.org> <20111019082139.1661868e@auedv3.syscomp.de> <4E9EEF45.9020404@FreeBSD.org> X-Mailer: Claws Mail 3.7.9 (GTK+ 2.24.5; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D"; protocol="application/pgp-signature" Cc: freebsd-fs@FreeBSD.org Subject: Re: Extending zfsboot.c to allow selecting filesystem from boot.config X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2011 16:21:33 -0000 --Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable > [...] > > > The only thing I was a bit confused by is that on the boot prompt > > only the pool and filename to be booted are printed. >=20 > Do you mean the (gpt)zfsboot prompt? Yes. For a boot.config with "rpool:root/something:" it prints: Booting from Hard Disk... /boot.config: rpool FreeBSD/x86 boot Default: rpool:/boot/zfsloader boot: Regards Florian Wagner --Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iEYEARECAAYFAk6e+QsACgkQLvW/2gp2pPyrVwCgh0KS8z/cfwXpHujMgy5VWHqr OmUAoI0GZnF12/M1SJ3xPrNJRa7UAtlR =4KjI -----END PGP SIGNATURE----- --Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D-- From owner-freebsd-fs@FreeBSD.ORG Thu Oct 20 05:26:01 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F04121065673; Thu, 20 Oct 2011 05:26:01 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C7DFC8FC08; Thu, 20 Oct 2011 05:26:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9K5Q1lO007001; Thu, 20 Oct 2011 05:26:01 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9K5Q1ad006997; Thu, 20 Oct 2011 05:26:01 GMT (envelope-from linimon) Date: Thu, 20 Oct 2011 05:26:01 GMT Message-Id: <201110200526.p9K5Q1ad006997@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: bin/161807: [patch] add option for explicitly specifying metadata version to geli(8) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2011 05:26:02 -0000 Old Synopsis: [patch] add option for explicitly specifying metadata version to geli New Synopsis: [patch] add option for explicitly specifying metadata version to geli(8) Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Thu Oct 20 05:25:18 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=161807 From owner-freebsd-fs@FreeBSD.ORG Thu Oct 20 11:55:44 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F19F2106566C for ; Thu, 20 Oct 2011 11:55:44 +0000 (UTC) (envelope-from unix.co@gmail.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id D18978FC12 for ; Thu, 20 Oct 2011 11:55:44 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1RGqvq-00083f-Ii for freebsd-fs@freebsd.org; Thu, 20 Oct 2011 04:37:06 -0700 Date: Thu, 20 Oct 2011 04:37:06 -0700 (PDT) From: umar To: freebsd-fs@freebsd.org Message-ID: <1319110626566-4921174.post@n5.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: 8TB Partition Problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2011 11:55:45 -0000 Hi Member, Recently I have buy Dell Power Vault NX3100, with 4 3 TB hardrive after creating the raid it give me 8.xx TB partition. There is another 2 HD with 140GB attached with Raid 1. I have install FreeBSD 8.2 64bit on Raid1 partition. After the installation I have tried to create new partition on 8TB drive through sysinstall but its not working. Then I tried bsdlable but its also failed below is the error message of bsdlable. bsdlabel: disks with more than 2^32-1 sectors are not supported Would you please help me how i can solve this problem. If freebsd is not supported 8 TB then which Linux is supporting so I can move to Linux Best Regards, Umar -- View this message in context: http://freebsd.1045724.n5.nabble.com/8TB-Partition-Problem-tp4921174p4921174.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Thu Oct 20 12:01:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E957C10656D6 for ; Thu, 20 Oct 2011 12:01:42 +0000 (UTC) (envelope-from nowakpl@platinum.linux.pl) Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4]) by mx1.freebsd.org (Postfix) with ESMTP id A278F8FC23 for ; Thu, 20 Oct 2011 12:01:42 +0000 (UTC) Received: by platinum.linux.pl (Postfix, from userid 87) id 1880447E15; Thu, 20 Oct 2011 14:01:40 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl X-Spam-Level: X-Spam-Status: No, score=-0.4 required=3.0 tests=ALL_TRUSTED,AWL,URI_HEX autolearn=disabled version=3.3.2 Received: from [172.19.191.2] (078088011125.bialystok.vectranet.pl [78.88.11.125]) by platinum.linux.pl (Postfix) with ESMTPA id 0DD5F47E11 for ; Thu, 20 Oct 2011 14:01:35 +0200 (CEST) Message-ID: <4EA00D97.7000005@platinum.linux.pl> Date: Thu, 20 Oct 2011 14:01:27 +0200 From: Adam Nowacki User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1319110626566-4921174.post@n5.nabble.com> In-Reply-To: <1319110626566-4921174.post@n5.nabble.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: 8TB Partition Problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2011 12:01:43 -0000 On 2011-10-20 13:37, umar wrote: > Hi Member, > > Recently I have buy Dell Power Vault NX3100, with 4 3 TB hardrive after > creating the raid it give me 8.xx TB partition. There is another 2 HD with > 140GB attached with Raid 1. I have install FreeBSD 8.2 64bit on Raid1 > partition. > > After the installation I have tried to create new partition on 8TB drive > through sysinstall but its not working. Then I tried bsdlable but its also > failed below is the error message of bsdlable. > > bsdlabel: disks with more than 2^32-1 sectors are not supported > > Would you please help me how i can solve this problem. If freebsd is not > supported 8 TB then which Linux is supporting so I can move to Linux Use GPT, gpart create -s GPT /dev/device then add partitions with gpart, see http://www.freebsd.org/cgi/man.cgi?query=gpart&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html > > Best Regards, > > Umar > > > -- > View this message in context: http://freebsd.1045724.n5.nabble.com/8TB-Partition-Problem-tp4921174p4921174.html > Sent from the freebsd-fs mailing list archive at Nabble.com. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Oct 20 12:23:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 745051065673 for ; Thu, 20 Oct 2011 12:23:55 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 347F98FC13 for ; Thu, 20 Oct 2011 12:23:54 +0000 (UTC) Received: by ggnq2 with SMTP id q2so1845817ggn.13 for ; Thu, 20 Oct 2011 05:23:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=PMpfrxfJVo/gZHEHGUZ3ZS0ICbxB8QkYgKAjJ3wwIPw=; b=etc99BVK7+jShDFsdgnurSJJfNnjdoA3KpAllnH9D3Y51e1kw09ZKG+AEEX9hpy8hG jdzvLK+uZ8aSx5y5rL9O1CP21DHGONroyVEBAo9r+Z5zwAzjVM4CdsWNMmzMHx7U9Ju7 BMH6YDgB8IlzBm174rIk5Z+rITvI8jh+vlc18= Received: by 10.223.77.77 with SMTP id f13mr10159615fak.19.1319112048967; Thu, 20 Oct 2011 05:00:48 -0700 (PDT) Received: from sime-imac.logos.hr ([213.147.110.159]) by mx.google.com with ESMTPS id y8sm15472870faj.10.2011.10.20.05.00.47 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 20 Oct 2011 05:00:47 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: =?iso-8859-2?Q?=A9imun_Mikecin?= In-Reply-To: <1319110626566-4921174.post@n5.nabble.com> Date: Thu, 20 Oct 2011 14:00:44 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <667290D4-6F62-41C4-A690-28FC11C2AD5F@gmail.com> References: <1319110626566-4921174.post@n5.nabble.com> To: umar X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs Subject: Re: 8TB Partition Problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2011 12:23:55 -0000 On 20. lis. 2011., at 13:37, umar wrote: >=20 > After the installation I have tried to create new partition on 8TB = drive > through sysinstall but its not working. Then I tried bsdlable but its = also > failed below is the error message of bsdlable. >=20 > bsdlabel: disks with more than 2^32-1 sectors are not supported >=20 > Would you please help me how i can solve this problem. If freebsd is = not > supported 8 TB then which Linux is supporting so I can move to Linux Disks larger than 2TB should be partitioned using GPT instead of MBR. It = doesn't matter if you are using FreeBSD or Linux. Instead of using sysinstall and/or bsdlabel for partitioning, you should = partition your disk using GPT, see gpart(8). From owner-freebsd-fs@FreeBSD.ORG Thu Oct 20 15:09:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2877106566B for ; Thu, 20 Oct 2011 15:09:10 +0000 (UTC) (envelope-from bgold@simons-rock.edu) Received: from hedwig.simons-rock.edu (hedwig.simons-rock.edu [208.81.88.14]) by mx1.freebsd.org (Postfix) with ESMTP id AAC5B8FC17 for ; Thu, 20 Oct 2011 15:09:10 +0000 (UTC) Received: from hp6000new (behemoth.simons-rock.edu [10.30.2.44]) by hedwig.simons-rock.edu (Postfix) with ESMTP id 833402BB35B for ; Thu, 20 Oct 2011 10:40:37 -0400 (EDT) From: "Brian Gold" To: Date: Thu, 20 Oct 2011 10:40:33 -0400 Message-ID: <05b701cc8f36$3934cf70$ab9e6e50$@simons-rock.edu> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AcyPNjkccBdkHrdhSMKChgRJOdOxbQ== Content-Language: en-us Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: system hangs using zfs28 after enabiling nfsshare on a new dataset X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2011 15:09:11 -0000 Hello all, I've been using the 8.2-RELEASE w/ ZFSv28 from mfsBSD (http://mfsbsd.vx.sk/) for a while now without any issues. Today I decided to try out sharenfs to see how it works. I got everything configured and successfully shared a new dataset (backup/vmimages) to another system on my network. After an hour or so however, the nfs host locked up. I tried rebooting it and it hung shortly after mounting my zfsroot pool (see screenshot http://i.imgur.com/qjJO2.jpg). I rebooted into single-user mode and attempted to run a "zfs set sharenfs=off backup/vmimages", but that hung the system as well. I rebooted again into single-user mode and ran "zfs destroy backup/vmimages" (nothing important on there yet) and even that hangs the system. Any thoughts as to what I can do to troubleshoot this further? From owner-freebsd-fs@FreeBSD.ORG Thu Oct 20 22:44:42 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C68C1065674; Thu, 20 Oct 2011 22:44:42 +0000 (UTC) (envelope-from rmacklem@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 33DC68FC18; Thu, 20 Oct 2011 22:44:42 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9KMigAl001888; Thu, 20 Oct 2011 22:44:42 GMT (envelope-from rmacklem@freefall.freebsd.org) Received: (from rmacklem@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9KMifS1001884; Thu, 20 Oct 2011 22:44:41 GMT (envelope-from rmacklem) Date: Thu, 20 Oct 2011 22:44:41 GMT Message-Id: <201110202244.p9KMifS1001884@freefall.freebsd.org> To: niakrisn@gmail.com, rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org, rmacklem@FreeBSD.org From: rmacklem@FreeBSD.org Cc: Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent access over NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Oct 2011 22:44:42 -0000 Synopsis: [nfs] [panic] Kernel panic under concurrent access over NFS State-Changed-From-To: open->feedback State-Changed-By: rmacklem State-Changed-When: Thu Oct 20 22:42:03 UTC 2011 State-Changed-Why: I have sent the person that reported this a patch to test and am waiting for feedback. I've taken responsibility for this. Responsible-Changed-From-To: freebsd-fs->rmacklem Responsible-Changed-By: rmacklem Responsible-Changed-When: Thu Oct 20 22:42:03 UTC 2011 Responsible-Changed-Why: I have sent the person that reported this a patch for testing and will update the status when I hear back from them. http://www.freebsd.org/cgi/query-pr.cgi?pr=156168 From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 06:49:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7B131065670 for ; Fri, 21 Oct 2011 06:49:46 +0000 (UTC) (envelope-from unix.co@gmail.com) Received: from sam.nabble.com (sam.nabble.com [216.139.236.26]) by mx1.freebsd.org (Postfix) with ESMTP id C463A8FC08 for ; Fri, 21 Oct 2011 06:49:46 +0000 (UTC) Received: from [192.168.236.26] (helo=sam.nabble.com) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1RH8vK-0000Xd-1e for freebsd-fs@freebsd.org; Thu, 20 Oct 2011 23:49:46 -0700 Date: Thu, 20 Oct 2011 23:49:46 -0700 (PDT) From: umar To: freebsd-fs@freebsd.org Message-ID: <1319179786044-4923804.post@n5.nabble.com> In-Reply-To: <4EA00D97.7000005@platinum.linux.pl> References: <1319110626566-4921174.post@n5.nabble.com> <4EA00D97.7000005@platinum.linux.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: Re: 8TB Partition Problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 06:49:47 -0000 Hi, Thanks it works. Best Regards, Umar -- View this message in context: http://freebsd.1045724.n5.nabble.com/8TB-Partition-Problem-tp4921174p4923804.html Sent from the freebsd-fs mailing list archive at Nabble.com. From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 09:19:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2AD251065670 for ; Fri, 21 Oct 2011 09:19:36 +0000 (UTC) (envelope-from shuriku@shurik.kiev.ua) Received: from it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7]) by mx1.freebsd.org (Postfix) with ESMTP id CBC618FC12 for ; Fri, 21 Oct 2011 09:19:35 +0000 (UTC) Received: from [93.183.237.246] (helo=lenovo-b570.it-profi.org.ua) by it-profi.org.ua with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.76 (FreeBSD)) (envelope-from ) id 1RHAYu-0003IS-5F for freebsd-fs@freebsd.org; Fri, 21 Oct 2011 11:34:44 +0300 Message-ID: <4EA12EC0.2040907@shurik.kiev.ua> Date: Fri, 21 Oct 2011 11:35:12 +0300 From: Alexandr User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0.2) Gecko/20110926 Thunderbird/6.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 (-) X-Spam-Report: Spam detection software, running on the system "graal.it-profi.org.ua", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see The administrator of that system for details. Content preview: Hello! A few weeks ago I have migrated to a new laptop Lenovo B570 and I cannot boot from my HDD. At the start of the boot process HDD led blinks some times and boot continues from Network PXE. Bootloader from 8-RELEASE, 9-STABLE, and 10-CURRENT connot solve this issue. I am using a GPT scheme on my laptop: [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP X-SA-Exim-Connect-IP: 93.183.237.246 X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua X-SA-Exim-Scanned: No (on it-profi.org.ua); SAEximRunCond expanded to false Subject: cannot boot from HDD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 09:19:36 -0000 Hello! A few weeks ago I have migrated to a new laptop Lenovo B570 and I cannot boot from my HDD. At the start of the boot process HDD led blinks some times and boot continues from Network PXE. Bootloader from 8-RELEASE, 9-STABLE, and 10-CURRENT connot solve this issue. I am using a GPT scheme on my laptop: lenovo-b570# gpart show => 34 976773101 ada0 GPT (465G) 34 128 1 freebsd-boot (64k) 162 976772973 2 freebsd-zfs (465G) Now to boot my system I am using boottable usb-flash with Grub2 installed. Choosing boot from HDD mbr starts boot my system. I discussed this problem in our local mailing list, but no success. The only way I see to resolve my problem at this time is to use such scheme: bios-boot (with Grub2) freebsd-boot freebsd-zfs From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 15:38:47 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4AFC5106566B; Fri, 21 Oct 2011 15:38:47 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 0B0D88FC18; Fri, 21 Oct 2011 15:38:46 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 36DEF28424; Fri, 21 Oct 2011 17:38:45 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id DF63D28422; Fri, 21 Oct 2011 17:38:43 +0200 (CEST) Message-ID: <4EA19203.5050503@quip.cz> Date: Fri, 21 Oct 2011 17:38:43 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Ivan Voras References: <4E97FEDD.7060205@quip.cz> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 15:38:47 -0000 Hi, I am back on this topic... Ivan Voras wrote: > On 14/10/2011 11:20, Miroslav Lachman wrote: >> Hi all, >> >> I tried some tuning of dirhash on our servers and after googlig a bit, I >> found an old GSoC project wiki page about Dynamic Memory Allocation for >> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory >> Is there any reason not to use it / not commit it to HEAD? > > AFAIK it's sort-of already present. In 8-stable and recent kernels you > can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem > (but except in really large edge cases I don't think you *need* more > than 32 MB), and the kernel will scale-down or free the memory if not > needed. > > In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will > use less and will free the allocated memory in low memory situations > (which I've tried and it works). So the current behavior is that on 7.3+ and 8.x we have smaller average dirhash buffer (by default) than it was initialy 10 years ago. Because it starts as 2MB fixed size and now we have 2MB max, which is lowered in low mem situations... and sometimes it is set to 0MB! I caught this 2 days ago: root@rip ~/# sysctl vfs.ufs vfs.ufs.dirhash_reclaimage: 5 vfs.ufs.dirhash_lowmemcount: 36953 vfs.ufs.dirhash_docheck: 0 vfs.ufs.dirhash_mem: 0 vfs.ufs.dirhash_maxmem: 8388608 vfs.ufs.dirhash_minsize: 2560 I set maxmem to 8MB in sysctl.conf to increase performance and dirhash_mem 0 is really bad surprise! I am worrying about bad performance in situation where dirhash is emptied in situations, where server is already running at maximum performance (there is some memory hungry process and system can start swapping to disk + dirhash is efectively disabled) I found a PR kern/145246 http://www.freebsd.org/cgi/query-pr.cgi?pr=145246 Is it possible to add some dirhash_minmem limit to not clear all the dirhash memory? So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then dirhash_mem will be allways between these two limits? >> And second question - is there any negative impact with higher >> vfs.ufs.dirhash_maxmem? It stil defaults to 2MB (on FreeBSD 8.2) after > > Not that I know of. > >> 10 years, but I think we all are using bigger FS in these days with lot >> of files and directories and 2MB is not enough. > > AFAIK I've changed it to autotune so it's configured to approximately 4 > MB on a 4 GB machine (and scales up) in 9. I didn't tried 9 yet. Does it mean dirhash_maxmem is initially set to approximately 1% of physical RAM and then it can be set higher by sysctl as in older versions? Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 16:20:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D881B1065673 for ; Fri, 21 Oct 2011 16:20:27 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.westchester.pa.mail.comcast.net (qmta13.westchester.pa.mail.comcast.net [76.96.59.243]) by mx1.freebsd.org (Postfix) with ESMTP id 852EF8FC13 for ; Fri, 21 Oct 2011 16:20:27 +0000 (UTC) Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98]) by qmta13.westchester.pa.mail.comcast.net with comcast id nU2J1h00927AodY5DULT8E; Fri, 21 Oct 2011 16:20:27 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta19.westchester.pa.mail.comcast.net with comcast id nULS1h00k1t3BNj3fULTcD; Fri, 21 Oct 2011 16:20:27 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 30DE3102C1C; Fri, 21 Oct 2011 09:20:25 -0700 (PDT) Date: Fri, 21 Oct 2011 09:20:25 -0700 From: Jeremy Chadwick To: Miroslav Lachman <000.fbsd@quip.cz> Message-ID: <20111021162025.GA89885@icarus.home.lan> References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EA19203.5050503@quip.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 16:20:27 -0000 On Fri, Oct 21, 2011 at 05:38:43PM +0200, Miroslav Lachman wrote: > Hi, I am back on this topic... > > Ivan Voras wrote: > >On 14/10/2011 11:20, Miroslav Lachman wrote: > >>Hi all, > >> > >>I tried some tuning of dirhash on our servers and after googlig a bit, I > >>found an old GSoC project wiki page about Dynamic Memory Allocation for > >>Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory > >>Is there any reason not to use it / not commit it to HEAD? > > > >AFAIK it's sort-of already present. In 8-stable and recent kernels you > >can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem > >(but except in really large edge cases I don't think you *need* more > >than 32 MB), and the kernel will scale-down or free the memory if not > >needed. > > > >In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will > >use less and will free the allocated memory in low memory situations > >(which I've tried and it works). > > So the current behavior is that on 7.3+ and 8.x we have smaller > average dirhash buffer (by default) than it was initialy 10 years > ago. Because it starts as 2MB fixed size and now we have 2MB max, > which is lowered in low mem situations... and sometimes it is set to > 0MB! > > I caught this 2 days ago: > > root@rip ~/# sysctl vfs.ufs > vfs.ufs.dirhash_reclaimage: 5 > vfs.ufs.dirhash_lowmemcount: 36953 > vfs.ufs.dirhash_docheck: 0 > vfs.ufs.dirhash_mem: 0 > vfs.ufs.dirhash_maxmem: 8388608 > vfs.ufs.dirhash_minsize: 2560 > > I set maxmem to 8MB in sysctl.conf to increase performance and > dirhash_mem 0 is really bad surprise! Actually, the "bad surprise" is dirhash_lowmemcount of 36953. You increasing dirhash_maxmem is fine -- what you're seeing is that your machine keeps running out of memory, or that your directories are filled with so many files that you're exhausting the dirhash repetitively. I'm going to be blunt and just ask it: why does that happen? Or do you have a filesystem that has an absurdly high number of files in a single directory? If the former, ignore the next paragraph I've harped on this before on the mailing list: one of the first things I learned as a system administrator was that you Do Not(tm) fill directories with tens of thousands of files. Split them up into subdirs. Even caching daemons (squid, etc.) support this kind of thing; filename "aj1j11hsfkqXaj21" should really be aj/1j/11hsfkqXaj21. You get the idea. DNS/BIND administrators of systems which have tens of thousands of domains are quite familiar with this scenario too. > I am worrying about bad performance in situation where dirhash is > emptied in situations, where server is already running at maximum > performance (there is some memory hungry process and system can > start swapping to disk + dirhash is efectively disabled) > > I found a PR kern/145246 > http://www.freebsd.org/cgi/query-pr.cgi?pr=145246 > > Is it possible to add some dirhash_minmem limit to not clear all the > dirhash memory? > So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then > dirhash_mem will be allways between these two limits? dirhash shouldn't be "disabled", it's that memory pressure from other things has priority over the dirhash, but I understand what you mean. This is quite evident from dirhash_lowmemcount being so high. I understand what you want, and maybe there is a way to get what you want (with little effort), but I am strongly inclined to say you need to figure out on your system what is causing such memory pressure and solve that. Honest: try to solve the real problem rather than dancing around it. If you have a process that skyrockets in RSS/RES usage due to a memory leak or out-of-control design (such as a daemonised perl script which blindly uses .= to append data to a scalar, or blindly keeps appending data to the end of a list), then fix that problem. Basically I'm trying to say that it shouldn't be the responsibility of dirhash to "work around" other problems happening on the system that diminish or exhaust available memory. You end up with a kernel design that has tons of one-offs in it and that does nothing but bite you in the butt down the road. (Linux has been through this many times over.) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 19:11:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EC9F106566C; Fri, 21 Oct 2011 19:11:13 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id A14E68FC21; Fri, 21 Oct 2011 19:11:12 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 897D128424; Fri, 21 Oct 2011 21:11:10 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id 9438428423; Fri, 21 Oct 2011 21:11:08 +0200 (CEST) Message-ID: <4EA1C3CC.3090500@quip.cz> Date: Fri, 21 Oct 2011 21:11:08 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Jeremy Chadwick References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> <20111021162025.GA89885@icarus.home.lan> In-Reply-To: <20111021162025.GA89885@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 19:11:13 -0000 Jeremy Chadwick wrote: > On Fri, Oct 21, 2011 at 05:38:43PM +0200, Miroslav Lachman wrote: >> Hi, I am back on this topic... >> >> Ivan Voras wrote: >>> On 14/10/2011 11:20, Miroslav Lachman wrote: >>>> Hi all, >>>> >>>> I tried some tuning of dirhash on our servers and after googlig a bit, I >>>> found an old GSoC project wiki page about Dynamic Memory Allocation for >>>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory >>>> Is there any reason not to use it / not commit it to HEAD? >>> >>> AFAIK it's sort-of already present. In 8-stable and recent kernels you >>> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem >>> (but except in really large edge cases I don't think you *need* more >>> than 32 MB), and the kernel will scale-down or free the memory if not >>> needed. >>> >>> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will >>> use less and will free the allocated memory in low memory situations >>> (which I've tried and it works). >> >> So the current behavior is that on 7.3+ and 8.x we have smaller >> average dirhash buffer (by default) than it was initialy 10 years >> ago. Because it starts as 2MB fixed size and now we have 2MB max, >> which is lowered in low mem situations... and sometimes it is set to >> 0MB! >> >> I caught this 2 days ago: >> >> root@rip ~/# sysctl vfs.ufs >> vfs.ufs.dirhash_reclaimage: 5 >> vfs.ufs.dirhash_lowmemcount: 36953 >> vfs.ufs.dirhash_docheck: 0 >> vfs.ufs.dirhash_mem: 0 >> vfs.ufs.dirhash_maxmem: 8388608 >> vfs.ufs.dirhash_minsize: 2560 >> >> I set maxmem to 8MB in sysctl.conf to increase performance and >> dirhash_mem 0 is really bad surprise! > > Actually, the "bad surprise" is dirhash_lowmemcount of 36953. You > increasing dirhash_maxmem is fine -- what you're seeing is that your > machine keeps running out of memory, or that your directories are filled > with so many files that you're exhausting the dirhash repetitively. > > I'm going to be blunt and just ask it: why does that happen? Or do you > have a filesystem that has an absurdly high number of files in a single > directory? If the former, ignore the next paragraph There are not absurdly high number of files in a single directory, because I know this potential problem and I am fighting against it with webapp developers. But I see similar lowmemcount on almost all UFS based servers. Most of them are for webhosting (running OpenSource or proprietary CMS, so the most content is in MySQL). Many of our servers have long uptime (about or more than year), so the lowmemcount numbers are higher on them. Webservers are hosting about 100-150 websites. Examples from 4 of our servers: vfs.ufs.dirhash_lowmemcount: 45295 up 39 days vfs.ufs.dirhash_lowmemcount: 164782 up 419 days vfs.ufs.dirhash_lowmemcount: 391452 up 102 days vfs.ufs.dirhash_lowmemcount: 633202 up 417 days Only few of our servers have lowmemcount lower than 1000 (but stil higher than 500) One example is server with jails, where UFS is used only for host system, and jails are on ZFS. This server has 4GB of RAM and 362MB used swap space: vfs.ufs.dirhash_lowmemcount: 936 up 284 days > I've harped on this before on the mailing list: one of the first things > I learned as a system administrator was that you Do Not(tm) fill > directories with tens of thousands of files. Split them up into > subdirs. Even caching daemons (squid, etc.) support this kind of thing; > filename "aj1j11hsfkqXaj21" should really be aj/1j/11hsfkqXaj21. You > get the idea. DNS/BIND administrators of systems which have tens of > thousands of domains are quite familiar with this scenario too. > >> I am worrying about bad performance in situation where dirhash is >> emptied in situations, where server is already running at maximum >> performance (there is some memory hungry process and system can >> start swapping to disk + dirhash is efectively disabled) >> >> I found a PR kern/145246 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=145246 >> >> Is it possible to add some dirhash_minmem limit to not clear all the >> dirhash memory? >> So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then >> dirhash_mem will be allways between these two limits? > > dirhash shouldn't be "disabled", it's that memory pressure from other > things has priority over the dirhash, but I understand what you mean. > This is quite evident from dirhash_lowmemcount being so high. > > I understand what you want, and maybe there is a way to get what you > want (with little effort), but I am strongly inclined to say you need to > figure out on your system what is causing such memory pressure and solve > that. Honest: try to solve the real problem rather than dancing around > it. If you have a process that skyrockets in RSS/RES usage due to a > memory leak or out-of-control design (such as a daemonised perl script > which blindly uses .= to append data to a scalar, or blindly keeps > appending data to the end of a list), then fix that problem. As the servers are running 3rd party apps (customer's websites), it is out of my control to fix issues with PHP CMS etc. So low memory fix "is easy" - buy and add more RAM. > Basically I'm trying to say that it shouldn't be the responsibility of > dirhash to "work around" other problems happening on the system that > diminish or exhaust available memory. You end up with a kernel design > that has tons of one-offs in it and that does nothing but bite you in > the butt down the road. (Linux has been through this many times over.) You are partially right. But dirhash lowmemhook seems too sensitive to me. I see high lowmemcount numbers on systems with almost empty swap. (few kB in swap, not MBs) That's why I am looking for dirhash_minmem. Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 20:31:54 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63BDF1065675 for ; Fri, 21 Oct 2011 20:31:54 +0000 (UTC) (envelope-from subbsd@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 25EE58FC1E for ; Fri, 21 Oct 2011 20:31:53 +0000 (UTC) Received: by vcbfo13 with SMTP id fo13so5686429vcb.13 for ; Fri, 21 Oct 2011 13:31:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=Q4CXmt28iRppaeiBMeKXq1KIWNbC2C5CC0in4Cy3T9M=; b=iHnQqk2VjZcJbesFWFX3DCqoaxkWznOU92Ixa1jmfffDc+SqRlJUnhykuZlZ2E3hN5 P+eKEe/i6Vs+kMDSnwK0QCIIHSknaKTh4Sr1o2n10i9ABk9Mn7ogAfm6z6aVPnZ0oQcP LE+nXroyOlekbp5ImNJlXABTJbVGZLc7u7KqE= MIME-Version: 1.0 Received: by 10.220.106.206 with SMTP id y14mr1164604vco.109.1319227735002; Fri, 21 Oct 2011 13:08:55 -0700 (PDT) Received: by 10.220.160.197 with HTTP; Fri, 21 Oct 2011 13:08:54 -0700 (PDT) Date: Sat, 22 Oct 2011 00:08:54 +0400 Message-ID: From: Subbsd To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: VFS problem with ?fcntl SETLK? and nullfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 20:31:54 -0000 Hi I found a bad issue in FreeBSD mounts nullfs file system, which may appear in the random. Initially, I get problems on FreeBSD-current on the host that have a large number JAIL at the time when they start. Handbook scenario: 1) have readonly base (for example /usr/jails/base) 2) have write area for jail personal data (for example: /usr/jails/j1data/{home,var,local,...}) 3) mount RO base to new jail location, then mount RW part data above RO In some cases, i watched the freeze of the system when working nullfs mount, but could not find a reason. On a test environment I have tried to simulate mount_nullfs with different types of actions by the source directory: - through dd(1) to make an huge oveload by read - does not affect - through dd(1) to make an huge overload by write - does not affect - through script to delete, create random-files in large numbers - does not affect but now I can easily with a 100% guarantee show the problem - it is easily obtained by working with "svn cleanup" action. For example on the directory /usr/src obtained from SVN. If start in /usr/src svn cleanup and at the same time try to mount_nullfs the problem appears. As far as I can see, cleanup makes frequent lock files. It seems to me, who some of the lock is simply not true and is inherited by a deadlock. I wrote sample scripts simulating the problem. I did a rotation mount-ro + mount-rw specifically - is the repetition of the way described in the handbook section of jail. Since the problem can appear in random moment, I made an infinite loop. But I am getting the problem is usually the first-pass. Here is it: -------/cut/----- #!/bin/sh SRCROOT="/usr/src" DSTROOT="/usr/nullfstest" ITER=`seq 100` MOUNTO=`find ${SRCROOT} -type d -maxdepth 1 -exec basename {} \;` [ -d "${DSTROOT}" ] || mkdir $DSTROOT mount_subdir() { for mto in ${MOUNTO}; do if [ -d "${1}/$mto" ]; then mount -orw -t nullfs /bin ${1}/${mto} fi done } cd ${SRCROOT} while [ 1 ]; do echo "Mount phase" lockf -s -t0 /tmp/svn.lock svn cleanup & for iter in $ITER; do DST="${DSTROOT}/${iter}" [ -d "${DST}" ] || mkdir ${DST} mount -oro -t nullfs ${SRCROOT} ${DST} mount_subdir ${DST} done echo "Unmount phase" mount -t nullfs |awk {'printf "umount -f "$3"\n"'} |sh done -------/end of cut/----- Last syscall I can see this svn cleanup is: fcntl(3,F_SETLK,0x7fffffffc9b0) where 3 - fd of some \.svn/file. looks like in action this way - the system (kernel) works. but if the process or your session will affect an action in the source directory (in this example - /usr/src), for example: cd /usr/src fstat /usr/src/* ls /usr/src/ - Get filesystem deadlock. In addition, the system in this state does not reboot without help - system do not return from free buffer to storage stage. in FreeBSD 9.0 RC1 bug exists. PS: An important detail - I could not get the problem on FreeBSD running under a virtual machine (VirtualBox) - maybe due to the tick / hz.kern issue? PS2: what file system - does not matter. I get the problem on ZFS as well as for UFS Please check this informatio. it seems that this is serious Thanks. From owner-freebsd-fs@FreeBSD.ORG Fri Oct 21 20:40:16 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85076106566B for ; Fri, 21 Oct 2011 20:40:16 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5B0748FC12 for ; Fri, 21 Oct 2011 20:40:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9LKeGYd070338 for ; Fri, 21 Oct 2011 20:40:16 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9LKeGPG070337; Fri, 21 Oct 2011 20:40:16 GMT (envelope-from gnats) Date: Fri, 21 Oct 2011 20:40:16 GMT Message-Id: <201110212040.p9LKeGPG070337@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Robert Millan Cc: Subject: Re: kern/150207: zpool(1): zpool import -d /dev tries to open weird devices X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Robert Millan List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2011 20:40:16 -0000 The following reply was made to PR kern/150207; it has been noted by GNATS. From: Robert Millan To: bug-followup@FreeBSD.org, aurelien@aurel32.net Cc: pjd@freebsd.org Subject: Re: kern/150207: zpool(1): zpool import -d /dev tries to open weird devices Date: Fri, 21 Oct 2011 22:12:38 +0200 This might have been fixed by pjd@ in r219089. A quick look in the code doesn't show traces of this problem. From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 02:16:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 39F8F106566C; Sat, 22 Oct 2011 02:16:28 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id B5CEA14E61A; Sat, 22 Oct 2011 02:16:27 +0000 (UTC) Message-ID: <4EA2277B.5080306@FreeBSD.org> Date: Fri, 21 Oct 2011 19:16:27 -0700 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111001 Thunderbird/7.0.1 MIME-Version: 1.0 To: Miroslav Lachman <000.fbsd@quip.cz> References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> In-Reply-To: <4EA19203.5050503@quip.cz> X-Enigmail-Version: undefined OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 02:16:28 -0000 On 10/21/2011 08:38, Miroslav Lachman wrote: > Hi, I am back on this topic... > > Ivan Voras wrote: >> On 14/10/2011 11:20, Miroslav Lachman wrote: >>> Hi all, >>> >>> I tried some tuning of dirhash on our servers and after googlig a bit, I >>> found an old GSoC project wiki page about Dynamic Memory Allocation for >>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory >>> Is there any reason not to use it / not commit it to HEAD? >> >> AFAIK it's sort-of already present. In 8-stable and recent kernels you >> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem >> (but except in really large edge cases I don't think you *need* more >> than 32 MB), and the kernel will scale-down or free the memory if not >> needed. >> >> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will >> use less and will free the allocated memory in low memory situations >> (which I've tried and it works). > > So the current behavior is that on 7.3+ and 8.x we have smaller average > dirhash buffer (by default) than it was initialy 10 years ago. Because > it starts as 2MB fixed size and now we have 2MB max, which is lowered in > low mem situations... and sometimes it is set to 0MB! > > I caught this 2 days ago: > > root@rip ~/# sysctl vfs.ufs > vfs.ufs.dirhash_reclaimage: 5 > vfs.ufs.dirhash_lowmemcount: 36953 > vfs.ufs.dirhash_docheck: 0 > vfs.ufs.dirhash_mem: 0 > vfs.ufs.dirhash_maxmem: 8388608 > vfs.ufs.dirhash_minsize: 2560 > > I set maxmem to 8MB in sysctl.conf to increase performance and > dirhash_mem 0 is really bad surprise! > > I am worrying about bad performance in situation where dirhash is > emptied in situations, where server is already running at maximum > performance (there is some memory hungry process and system can start > swapping to disk + dirhash is efectively disabled) > > I found a PR kern/145246 > http://www.freebsd.org/cgi/query-pr.cgi?pr=145246 > > Is it possible to add some dirhash_minmem limit to not clear all the > dirhash memory? > So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then dirhash_mem > will be allways between these two limits? Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there is a lot more memory in modern systems setting that higher by default is probably a good idea. Or maybe I'm misunderstanding what that knob does? >>> And second question - is there any negative impact with higher >>> vfs.ufs.dirhash_maxmem? It stil defaults to 2MB (on FreeBSD 8.2) after >> >> Not that I know of. >> >>> 10 years, but I think we all are using bigger FS in these days with lot >>> of files and directories and 2MB is not enough. >> >> AFAIK I've changed it to autotune so it's configured to approximately 4 >> MB on a 4 GB machine (and scales up) in 9. > > I didn't tried 9 yet. Does it mean dirhash_maxmem is initially set to > approximately 1% of physical RAM and then it can be set higher by sysctl > as in older versions? I'm not sure that's what's happening, I have 6G of ram in this box and I have this by default: vfs.ufs.dirhash_maxmem: 9977856 -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 03:04:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52F431065673 for ; Sat, 22 Oct 2011 03:04:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id 139068FC17 for ; Sat, 22 Oct 2011 03:04:05 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta08.westchester.pa.mail.comcast.net with comcast id nepN1h0011uE5Es58f462m; Sat, 22 Oct 2011 03:04:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.westchester.pa.mail.comcast.net with comcast id nf451h00J1t3BNj3cf45rQ; Sat, 22 Oct 2011 03:04:06 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id BFFAA102C1C; Fri, 21 Oct 2011 20:04:03 -0700 (PDT) Date: Fri, 21 Oct 2011 20:04:03 -0700 From: Jeremy Chadwick To: Doug Barton Message-ID: <20111022030403.GA176@icarus.home.lan> References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EA2277B.5080306@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 03:04:06 -0000 On Fri, Oct 21, 2011 at 07:16:27PM -0700, Doug Barton wrote: > On 10/21/2011 08:38, Miroslav Lachman wrote: > > Hi, I am back on this topic... > > > > Ivan Voras wrote: > >> On 14/10/2011 11:20, Miroslav Lachman wrote: > >>> Hi all, > >>> > >>> I tried some tuning of dirhash on our servers and after googlig a bit, I > >>> found an old GSoC project wiki page about Dynamic Memory Allocation for > >>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory > >>> Is there any reason not to use it / not commit it to HEAD? > >> > >> AFAIK it's sort-of already present. In 8-stable and recent kernels you > >> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem > >> (but except in really large edge cases I don't think you *need* more > >> than 32 MB), and the kernel will scale-down or free the memory if not > >> needed. > >> > >> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will > >> use less and will free the allocated memory in low memory situations > >> (which I've tried and it works). > > > > So the current behavior is that on 7.3+ and 8.x we have smaller average > > dirhash buffer (by default) than it was initialy 10 years ago. Because > > it starts as 2MB fixed size and now we have 2MB max, which is lowered in > > low mem situations... and sometimes it is set to 0MB! > > > > I caught this 2 days ago: > > > > root@rip ~/# sysctl vfs.ufs > > vfs.ufs.dirhash_reclaimage: 5 > > vfs.ufs.dirhash_lowmemcount: 36953 > > vfs.ufs.dirhash_docheck: 0 > > vfs.ufs.dirhash_mem: 0 > > vfs.ufs.dirhash_maxmem: 8388608 > > vfs.ufs.dirhash_minsize: 2560 > > > > I set maxmem to 8MB in sysctl.conf to increase performance and > > dirhash_mem 0 is really bad surprise! > > > > I am worrying about bad performance in situation where dirhash is > > emptied in situations, where server is already running at maximum > > performance (there is some memory hungry process and system can start > > swapping to disk + dirhash is efectively disabled) > > > > I found a PR kern/145246 > > http://www.freebsd.org/cgi/query-pr.cgi?pr=145246 > > > > Is it possible to add some dirhash_minmem limit to not clear all the > > dirhash memory? > > So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then dirhash_mem > > will be allways between these two limits? > > Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there > is a lot more memory in modern systems setting that higher by default is > probably a good idea. Or maybe I'm misunderstanding what that knob does? I believe the function of that sysctl is different. It's not the "minimum amount of dirhash memory to retain", it's: $ sysctl -d vfs.ufs.dirhash_minsize vfs.ufs.dirhash_minsize: minimum directory size in bytes for which to use hashed lookup The sysctl should really be named "dirhash_mindirsize". -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 03:13:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id D7FA31065675; Sat, 22 Oct 2011 03:13:22 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 492011507AF; Sat, 22 Oct 2011 03:13:22 +0000 (UTC) Message-ID: <4EA234D1.7000805@FreeBSD.org> Date: Fri, 21 Oct 2011 20:13:21 -0700 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:7.0.1) Gecko/20111001 Thunderbird/7.0.1 MIME-Version: 1.0 To: Jeremy Chadwick References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org> <20111022030403.GA176@icarus.home.lan> In-Reply-To: <20111022030403.GA176@icarus.home.lan> X-Enigmail-Version: undefined OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 03:13:23 -0000 On 10/21/2011 20:04, Jeremy Chadwick wrote: > On Fri, Oct 21, 2011 at 07:16:27PM -0700, Doug Barton wrote: >> Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there >> is a lot more memory in modern systems setting that higher by default is >> probably a good idea. Or maybe I'm misunderstanding what that knob does? > > I believe the function of that sysctl is different. It's not the > "minimum amount of dirhash memory to retain", it's: > > $ sysctl -d vfs.ufs.dirhash_minsize > vfs.ufs.dirhash_minsize: minimum directory size in bytes for which to use hashed lookup Ah, silly me. I'm so used to 'sysctl -d' not working that I didn't even try it this time. Thanks for setting me straight. In that case I agree with the OP that a knob for minimum setting would be desirable. Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 05:06:03 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 86637106564A; Sat, 22 Oct 2011 05:06:03 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5E3978FC0A; Sat, 22 Oct 2011 05:06:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9M563sH052058; Sat, 22 Oct 2011 05:06:03 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9M563tA052054; Sat, 22 Oct 2011 05:06:03 GMT (envelope-from linimon) Date: Sat, 22 Oct 2011 05:06:03 GMT Message-Id: <201110220506.p9M563tA052054@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/161864: [ufs] removing journaling from UFS partition fails on gpt-labelled disk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 05:06:03 -0000 Old Synopsis: removing journaling from UFS partition fails on gpt-labelled disk New Synopsis: [ufs] removing journaling from UFS partition fails on gpt-labelled disk Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Oct 22 05:05:27 UTC 2011 Responsible-Changed-Why: This kind of spans several areas, so try to guess at a label for it as I assign it. http://www.freebsd.org/cgi/query-pr.cgi?pr=161864 From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 09:52:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0F32106566B; Sat, 22 Oct 2011 09:52:04 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 4B4198FC14; Sat, 22 Oct 2011 09:52:03 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 582F928424; Sat, 22 Oct 2011 11:52:02 +0200 (CEST) Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz [86.49.61.235]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id DC2BB28423; Sat, 22 Oct 2011 11:52:00 +0200 (CEST) Message-ID: <4EA2923F.7060303@quip.cz> Date: Sat, 22 Oct 2011 11:51:59 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Doug Barton References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org> <20111022030403.GA176@icarus.home.lan> <4EA234D1.7000805@FreeBSD.org> In-Reply-To: <4EA234D1.7000805@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 09:52:04 -0000 Doug Barton wrote: > On 10/21/2011 20:04, Jeremy Chadwick wrote: >> On Fri, Oct 21, 2011 at 07:16:27PM -0700, Doug Barton wrote: > >>> Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there >>> is a lot more memory in modern systems setting that higher by default is >>> probably a good idea. Or maybe I'm misunderstanding what that knob does? >> >> I believe the function of that sysctl is different. It's not the >> "minimum amount of dirhash memory to retain", it's: >> >> $ sysctl -d vfs.ufs.dirhash_minsize >> vfs.ufs.dirhash_minsize: minimum directory size in bytes for which to use hashed lookup > > Ah, silly me. I'm so used to 'sysctl -d' not working that I didn't even > try it this time. Thanks for setting me straight. sysctls are becoming a mess as new are added, but only few of them are good named or documented. Even if some have 'sysctl -d' description, the description is not helpfull for non-developer persons like me. And the second aspect is that sometimes there is two sysctl OID with slightly different naming scheme doing the same (or having same description) for example low_mem vs. lowmemcount: # sysctl -a | egrep 'low_?mem' kern.geom.journal.stats.low_mem: 46443 vfs.ufs.dirhash_lowmemcount: 46443 # sysctl -d {kern.geom.journal.stats.low_mem,vfs.ufs.dirhash_lowmemcount} kern.geom.journal.stats.low_mem: Number of times low_mem hook was called. vfs.ufs.dirhash_lowmemcount: number of times low memory hook called And the problem for non-developer person is, that this description brings more questions than it answers. "What condition is causing it?, What low mem hook is doing? What should I do with it?..." (I already know it, not from a documentation, but from discussions in mailinglists) FreeBSD is known to its good documentation, so it is sad that this important part of the system is lacking really good documentation. FreeBSD comes with not optimal default settings nor autotunes, so almost everybody needs to set something in loader.conf or sysctl.conf to achieve better performance. But it is hard without good sysctl documentation. I know there were some atempts in the past (GSoC project and http://wiki.freebsd.org/IdeasPage#Document_all_sysctls) but none of them was successful. Maybe it's time for stronger policy in committing new code to the tree - if something is adding new sysctl IOD, it must have 'sysctl -d' description and some documentation of its behavior in handbook or some wiki page - I don't know where is the right place to have all sysctls documented. (something like man rc.conf) Maybe wiki page with some tuning tips will be the best place, as more persons can edit it. ...I am sorry for being off-topic, this is really not related to my original dirhash problem. :) > In that case I agree with the OP that a knob for minimum setting would > be desirable. Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 13:29:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3650A106566C for ; Sat, 22 Oct 2011 13:29:37 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id D6B2F8FC0A for ; Sat, 22 Oct 2011 13:29:36 +0000 (UTC) Received: by ywt32 with SMTP id 32so1024047ywt.13 for ; Sat, 22 Oct 2011 06:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=sTJcaxVVzBYx65wf8I8jOOnvoc6eHMx0wcpofdd8gfg=; b=lQm31adJUXMj/SAr5hHym89KhJOjSEgUt2yCMbkOxoVUppcPwgiLQJSa2yNhZPn9uq 5atR7cCIeCpInSKmfO0SQahHn5jQAWQejc/rkk3QuwsORaIKvH8qpKzqIupV+gLhReRM ABZgrDJmVY+InJNXl5SSqUV7yEFOyvL7nt0us= Received: by 10.100.56.32 with SMTP id e32mr4154911ana.66.1319290176242; Sat, 22 Oct 2011 06:29:36 -0700 (PDT) MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.100.189.14 with HTTP; Sat, 22 Oct 2011 06:28:56 -0700 (PDT) In-Reply-To: <4EA2277B.5080306@FreeBSD.org> References: <4E97FEDD.7060205@quip.cz> <4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org> From: Ivan Voras Date: Sat, 22 Oct 2011 15:28:56 +0200 X-Google-Sender-Auth: 6-wGmCaBNb6Fy_YVHxD33mm2RKQ Message-ID: To: Doug Barton Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: dirhash and dynamic memory allocation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 13:29:37 -0000 On 22 October 2011 04:16, Doug Barton wrote: > On 10/21/2011 08:38, Miroslav Lachman wrote: >> will be allways between these two limits? > > Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there > is a lot more memory in modern systems setting that higher by default is > probably a good idea. Or maybe I'm misunderstanding what that knob does? Directories are AFAIK cached "all or nothing" so if there are some large directories in the dirhash and they are evicted, it's possible to end up with "0" dirhash used without being able to fit a directory in the dirhash_minsize for any reasonable amount of time. >>> AFAIK I've changed it to autotune so it's configured to approximately 4 >>> MB on a 4 GB machine (and scales up) in 9. >> >> I didn't tried 9 yet. Does it mean dirhash_maxmem is initially set to >> approximately 1% of physical RAM and then it can be set higher by sysctl >> as in older versions? > > I'm not sure that's what's happening, I have 6G of ram in this box and I > have this by default: > > vfs.ufs.dirhash_maxmem: 9977856 It's actually not a direct percentage of memory but it's tied to hibufspace which is itself auto-tuned here: http://fxr.watson.org/fxr/source/kern/vfs_bio.c?v=FREEBSD8#L606 . So yes, it's nonlinear (and it probably doesn't matter). From owner-freebsd-fs@FreeBSD.ORG Sat Oct 22 16:05:48 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F8D2106564A; Sat, 22 Oct 2011 16:05:48 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3757D8FC0A; Sat, 22 Oct 2011 16:05:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9MG5mT7095808; Sat, 22 Oct 2011 16:05:48 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9MG5lCV095801; Sat, 22 Oct 2011 16:05:48 GMT (envelope-from linimon) Date: Sat, 22 Oct 2011 16:05:48 GMT Message-Id: <201110221605.p9MG5lCV095801@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/161897: [zfs] [patch] zfs partition probing causing long delay at BTX loader X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Oct 2011 16:05:48 -0000 Old Synopsis: zfs parition probing causing long delay at BTX loader New Synopsis: [zfs] [patch] zfs partition probing causing long delay at BTX loader Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sat Oct 22 16:05:13 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=161897