From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 01:26:02 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 81186BFF; Sun, 1 Sep 2013 01:26:02 +0000 (UTC) (envelope-from kuangche@gmail.com) Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 40DC12DE0; Sun, 1 Sep 2013 01:26:02 +0000 (UTC) Received: by mail-ob0-f169.google.com with SMTP id es8so3341315obc.28 for ; Sat, 31 Aug 2013 18:26:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7yamaM1pKwR3D/bf+SRHsg769FRX5f18fzCTnABb8sI=; b=dZ0BakHAUcEVmaOhNxmaWdCovJCRhHTHlrF4rO3lGW0GlJ0DHN7wTWwAGOY5OQB869 iEeNOVoxGu8vHlSDgsFxkJssc8ECLRBa5nfSsBWqYWJ7f6/Ds+PHTDS0J442q23jclWX AJ1sp7DZp9OZJwL2MavFNc3s+G16Wt7F1MB8fGDSLuXxVRNJd5a0VlxBbYQ0woQIWFfz PTHjWCtHsCtreMNpjA0OmU53E7h8xlIaaDa1FIfo8E9WTZHcaMVp2PhTn5TbQkorH5jT RWSLOqllcyCppl701MOwCIjfwlVJ6Z8p625sNOrEtYihsZMNpyGIvtpH2Sm3W6NIZQC8 LFfA== MIME-Version: 1.0 X-Received: by 10.60.62.101 with SMTP id x5mr12143486oer.24.1377998760924; Sat, 31 Aug 2013 18:26:00 -0700 (PDT) Received: by 10.76.167.7 with HTTP; Sat, 31 Aug 2013 18:26:00 -0700 (PDT) In-Reply-To: <521F1B74.7020402@FreeBSD.org> References: <521F1B74.7020402@FreeBSD.org> Date: Sun, 1 Sep 2013 09:26:00 +0800 Message-ID: Subject: Re: zfs dead lock From: Kuang-che Wu To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 01:26:02 -0000 2013/8/29 Andriy Gapon > on 15/08/2013 21:36 Kuang-che Wu said the following: > > I suspect I encountered zfs deadlock yesterday. > > Thank you very much for the report! And sorry for the delay. > Do you still have this locked up system? > Or are you able to reproduce the lock up? > I would like to examine some things with kgdb. After few days of the deadlock, I found my hard drive has bad sectors and the number is increasing. I replaced the drive and no more dead locks. I suspect the dead locks were triggered by slow disk i/o. From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 01:31:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C61A8CA4 for ; Sun, 1 Sep 2013 01:31:47 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (dmz-mailsec-scanner-7.mit.edu [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id 6CE0D2E2F for ; Sun, 1 Sep 2013 01:31:47 +0000 (UTC) X-AuditID: 12074424-b7f228e00000096b-8b-522298fcd7b0 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id 91.E5.02411.CF892225; Sat, 31 Aug 2013 21:31:40 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id r811VaQj017666; Sat, 31 Aug 2013 21:31:36 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id r811VYS4007599 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sat, 31 Aug 2013 21:31:35 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id r811VYd9028709; Sat, 31 Aug 2013 21:31:34 -0400 (EDT) Date: Sat, 31 Aug 2013 21:31:34 -0400 (EDT) From: Benjamin Kaduk To: Rick Macklem Subject: Re: fixing "umount -f" for the NFS client In-Reply-To: <1251021093.15594833.1377867650267.JavaMail.root@uoguelph.ca> Message-ID: References: <1251021093.15594833.1377867650267.JavaMail.root@uoguelph.ca> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrLIsWRmVeSWpSXmKPExsUixG6nrvtnhlKQwelX/BbHHv9ks3i47BqT A5PHjE/zWTx+b97LFMAUxWWTkpqTWZZapG+XwJVx9cIEpoK/LBXt79tZGxi/MXcxcnJICJhI /D3wnRHCFpO4cG89WxcjF4eQwD5GiZa3r1ggnI2MEpca3oFVCQkcYpL4OCUUItHAKHH7yjMm kASLgLbEmx3zWEFsNgEViZlvNrKB2CIC6hKbV/eDrWMGshuaprCD2MICRhLzVzSDDeUU8JJ4 vnol2BxeAUeJLysbmSGWeUqs3nMRzBYV0JFYvX8KC0SNoMTJmU9YIGZaSpz7c51tAqPgLCSp WUhSCxiZVjHKpuRW6eYmZuYUpybrFicn5uWlFuma6+VmluilppRuYgSHqovKDsbmQ0qHGAU4 GJV4eFdEKwUJsSaWFVfmHmKU5GBSEuXdOBEoxJeUn1KZkVicEV9UmpNafIhRgoNZSYSXoQko x5uSWFmVWpQPk5LmYFES53329GygkEB6YklqdmpqQWoRTFaGg0NJglcVGJNCgkWp6akVaZk5 JQhpJg5OkOE8QMMvTwcZXlyQmFucmQ6RP8WoKCXOywzSLACSyCjNg+uFpZJXjOJArwjzfgZp 5wGmIbjuV0CDmYAGX5uoCDK4JBEhJdXA2NLUaNBxy2wy81+Rlky+S+nTfoWcEb2bPnl/T3ts 1LfL2nOzF/wzKVlue8VTceviqgoRA6sw0Uc2B72yf07LYxIqy0mo/fTrzLoys9mv7jucNO1l 4jKcfPXbxSlGV9abuK/8E8zkaC565dOznqnhl/LNJp7dZdenVhSS/M/PVLhV6Zv6FKsWJZbi jERDLeai4kQAzo4adAADAAA= Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 01:31:47 -0000 On Fri, 30 Aug 2013, Rick Macklem wrote: > Kostik wrote: >> On Thu, Aug 29, 2013 at 07:43:34PM -0400, Rick Macklem wrote: >>>>> I assume I would also need to bump __FreeBSD_version (and maybe >>>>> VFS_VERSION?). >>>> I think you could avoid it. >>>> >>> Do you mean I don't need to bump __FreeBSD_version or VFS_VERSION >>> or both? >> I do not see much sense in bumping either of them. >> You might want to bump __FreeBSD_version when merging to stable. Please do bump __FreeBSD_version when merging to stable. I will not make much noise about -current at the moment, as I'm behind on tracking it. Thanks, Ben From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 14:42:05 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 2318A375 for ; Sun, 1 Sep 2013 14:42:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D4AA32FE8 for ; Sun, 1 Sep 2013 14:42:04 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAFNRI1KDaFve/2dsb2JhbABaDoN/gye9RYE0dIIkAQEEASNWBRYYAgINGQJZBogPBqd2kWqBKY4iNAeCaYE0A6lbgmFbIIFu X-IronPort-AV: E=Sophos;i="4.89,1001,1367985600"; d="scan'208";a="48843693" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 01 Sep 2013 10:41:57 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DA67DB3F15; Sun, 1 Sep 2013 10:41:57 -0400 (EDT) Date: Sun, 1 Sep 2013 10:41:57 -0400 (EDT) From: Rick Macklem To: Benjamin Kaduk Message-ID: <1247162688.16775666.1378046517881.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: fixing "umount -f" for the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 14:42:05 -0000 Benjamin Kaduk wrote: > On Fri, 30 Aug 2013, Rick Macklem wrote: > > > Kostik wrote: > >> On Thu, Aug 29, 2013 at 07:43:34PM -0400, Rick Macklem wrote: > >>>>> I assume I would also need to bump __FreeBSD_version (and maybe > >>>>> VFS_VERSION?). > >>>> I think you could avoid it. > >>>> > >>> Do you mean I don't need to bump __FreeBSD_version or VFS_VERSION > >>> or both? > >> I do not see much sense in bumping either of them. > >> You might want to bump __FreeBSD_version when merging to stable. > > Please do bump __FreeBSD_version when merging to stable. I will not > make > much noise about -current at the moment, as I'm behind on tracking > it. > Actually, I'm "on the fence" as to whether or not this one should be MFC'd, due to the VFS ABI breakage. Since you (well, actually OpenAFS;-) are the main guy affected by VFS ABI breakage these days, maybe you'd like to comment on this? Also, if anyone else has an opinion w.r.t. MFC'ng a patch that adds a VFS op and, therefore, breaks the VFS ABI, please feel free to comment. Thanks, rick ps: And, yes, I will bump __FreeBSD_version of I end up doing the MFC. > Thanks, > > Ben > From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 17:41:52 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B3AA9D58; Sun, 1 Sep 2013 17:41:52 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ea0-x22f.google.com (mail-ea0-x22f.google.com [IPv6:2a00:1450:4013:c01::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0BAAE2AF8; Sun, 1 Sep 2013 17:41:51 +0000 (UTC) Received: by mail-ea0-f175.google.com with SMTP id m14so1912317eaj.34 for ; Sun, 01 Sep 2013 10:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=LtwKjNeLy7LB4V/JjYOhTOaG0e9nzM4vAai0pja/x1I=; b=C5eFS659ApT47TUA/a7uB5eNk0Hb9uOt/8oIBC/O5Yxw7iR6dKbeTFvz7PxiMxfpl5 B7onMDN6LdfWgllTy/AznJJHVO4JUWtPgmtd5HoIdu/aHcKTIaLRvhvtg+WgLE32F3bN poL+9JwjRFyZrXOAMdlH+zfkAt0PEVaq9TM99eC0nufzBICqIzqnsLurknZTLlLuV3yn W9SbgQdkz8DjOR0rsAfa5FdAoZikaVQlehJopd4okmMSWgwtCI8hWhVQzNZEAQf2K4O6 Hu3zKolmhpd5P06UN+SQjRHwlMdMgmBCd/Uk/cbLbFNX8JB+GeKGq7jXimcRquiGsBqj MGEw== X-Received: by 10.15.81.132 with SMTP id x4mr287727eey.100.1378057310115; Sun, 01 Sep 2013 10:41:50 -0700 (PDT) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id r48sm14972840eev.14.1969.12.31.16.00.00 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 01 Sep 2013 10:41:49 -0700 (PDT) Sender: Mikolaj Golub Date: Sun, 1 Sep 2013 20:41:47 +0300 From: Mikolaj Golub To: Yamagi Burmeister Subject: Re: 9.2-RC1: LORs / Deadlock with SU+J on HAST in "memsync" mode Message-ID: <20130901174146.GA15654@gmail.com> References: <20130819115101.ae9c0cf788f881dc4de464c5@yamagi.org> <20130822121341.0f27cb5e372d12bab8725654@yamagi.org> <20130825175616.GA3472@gmail.com> <20130826160125.3b62df57515c45be3c9b2723@yamagi.org> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20130826160125.3b62df57515c45be3c9b2723@yamagi.org> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 17:41:52 -0000 Hi, Yamagi, sorry for the delay. I can only work on this in spare time, which is mostly (some) weekends. On Mon, Aug 26, 2013 at 04:01:25PM +0200, Yamagi Burmeister wrote: > I'm sorry but the patch doesn't change anything. Processes accessing > the UFS on top of HAST still deadlock within a couple of minutes. οΛ, my patch fixed the issue that might occur on secondary disconnect. As you did not have disconnects according to your log, your issue is different. The core provided suggests this too. Anyway, I have updated my patch, as the first version had some issues (e.g. the approach of using one flags variable to store different hio states was not correct without locking because flags could be changed by two threads simultaneously). Here is the updated version: http://people.freebsd.org/~trociny/patches/hast.primary.c.memsync_secondary_disconnect.1.patch Pawel, what do you think about this? > trasz@ suggested that all "buf" maybe exhausted which would result in > an IO deadlock, but at least increasing their number by four times by > "kern.nbuf" doesn't change anything. > > > If it does not help, please, after the hang, get core images of the > > worker processes (both primary and secondary) using gcore(1) and > > provide them together with hastd binary and libraries it is linked > > with (from `ldd /sbin/hastd' list). Note, core files might expose > > secure information from your host, if this worries you, you can send > > them to me privately. > > No problem, it's a test setup without any production data. You can find > a tar archive with the binary and libs (all with debug symbols) here: > http://deponie.yamagi.org/freebsd/debug/lor_hast/hast_cores.tar.xz > > I have two HAST providers, therefor two core dumps for each host: > hast_deadlocked.core -> worker for the provider an which the processes > deadlocked. > hast_not_deadlocked.core -> worker for the other provider Thanks. The state of the primary node at the moment of the core is generated: 253 requests in local send queue, others queues are empty, no requests leaked. The threads state: ggate_recv_thread: got hio 0x801cac880 from free queue lock res->hr_amp_lock to activemap_write_start() blocked on hast_activemap_flush->pwrite(activemap) ggate_send_thread: got hio 0x801cac040 from done queue, waiting for res->hr_amp_lock to activemap_write_complete() local_send_thread: got hio 0x801cabfa0 from local send queue, blocked on pwrite(hio data) sync_thread: got hio 0x801cf9820 from free queue put hio to local send queue (read data from disk) waiting for read to complete remote_recv_thread: waiting for a hio in remote recv queue remote_send_thread: waiting for a hio in remote send queue ctrl_thread: waiting for a control request guard_thread sleeping on sigtimedwait() So deficit of io buffers made two threads block on writing data and metadata, and another one block on waiting the lock held by the thread that was writing metadata. This last thread was about to return the request to the UFS, potentially freeing a buffer, if it had not been that lock on activemap. As it involves flushing metadata to disk, Yamagi, you might try reducing metadata updates by tuning extentsize and keepdirty parameters. You can change the parameters only by recreating the providers though. What you need to change depends on your workload. If your applications are mostly updating some set of blocks then increasing the number of keepdirty extents so they would cover all updated blocks should reduce metadata updates. If your applications are mostly writing (appending) new data then the only way to reduce metadata flushes is to increase the extentsize. See hastctl(8) for the parameters description and what they affect. You can monitor the activemap updates using `hastctl list' command, comparing with the amount of writes. It is rather unfortunate that we are flushing metadata on disk under the lock, so another thread that might only have to update in-memory map is waiting for the first thread to complete. It looks like we can improve this introducing additional on-disk map lock, so when in-memory map is updated and it is detected that the on-disk map needs update too, the on-disk map lock is acquired and the on-memory lock is released before flushing the map. http://people.freebsd.org/~trociny/patches/hast.primary.c.activemap_flush_lock.1.patch Pawel, what do you think about this? Yamagi, you might want to try this patch if it changes anything to you. If it does not, could you please tell a little more about your workload so I could try to reproduce it. Also, if you are going to try my suggestions I would recommend this patch too, that fixes the stalls I observed trying to reproduce your case: ggate_recv_thread() got stuck sleeping on taking a free request, because it had not received the signal to wake up. http://people.freebsd.org/~trociny/patches/hast.primary.c.cv_broadcast.1.patch > While all processes accessing the UFS filesystem on top of the provider > deadlocked, HAST still seemed to transfer data to the secondary. At > least the process generated CPU load, the switch LEDs were blinking > and the harddrive LEDs showed activity on both sides. According to the core state it might be not a deadlock but rather a starvation. You could use `hastctl list' command to monitor HAST io statistics on both primary and secondary nodes, and if the counters are changing (and how) when the issue is observed. Also, you might be interested in this patch that adds the current queue sizes to `hastctl list' output and helps to see in real time what is going with a HAST node: http://people.freebsd.org/~trociny/patches/hast.queue_stats.1.patch The output look like below: queues: local: 237, send: 0, recv: 3, done: 0, idle: 17 The local queue is for the thread that does local io. The send/recv queues are for remote requests. The requests in done queue are these that completed locally and remotely and are waiting to be returned to the idle queue. The idle queue keeps free (unused) request buffers. In the example above the bottlneck is on local io operations. If the idle queue is empty it means that the HAST is overloaded. I would like to commit this patch and the patch that fixes the lost wakeups on free queue, if Pawel does not have objections. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 18:16:36 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id AA19290A for ; Sun, 1 Sep 2013 18:16:36 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ee0-x235.google.com (mail-ee0-x235.google.com [IPv6:2a00:1450:4013:c00::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 3C7A02CF9 for ; Sun, 1 Sep 2013 18:16:36 +0000 (UTC) Received: by mail-ee0-f53.google.com with SMTP id b15so1914968eek.40 for ; Sun, 01 Sep 2013 11:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=X6Ae8IPXzxQRPmn8gZUOlvSMIusNslEJs5EJPNg03uI=; b=Xk5GcI0rwDgyzueFJhU61uEwJ5C2+Br5ZgknrGEgeJK8BOEHiRtkBKSeOieIL2qqvg Xq34q2qmW8DzFXeTVOAuU9AAOo3UCobHYbk+LlO0YDnOF04g7J9euPauC995TzdNtJVn IIaM2+xt+yiYz82BE87pEEE4lOyBqdhRkDYlhkXAKVZpro22D0eihJvEAz9wX4UEqak6 MQsQTng+UXNrMnvUUwYgOB+AVTcw7xA7ua2o97zphJ6BS11oFkTGiWEJ+VSWav5Muh18 zRBWgPC+R2yfrBaGZ9oVOCZp6E2FvNLCA3O3cmn5v4329G0sKiZJuNbxQodk+Ca7APq8 4HIQ== X-Received: by 10.14.115.133 with SMTP id e5mr30617731eeh.27.1378059394515; Sun, 01 Sep 2013 11:16:34 -0700 (PDT) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id b45sm15274066eef.4.1969.12.31.16.00.00 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 01 Sep 2013 11:16:33 -0700 (PDT) Sender: Mikolaj Golub Date: Sun, 1 Sep 2013 21:16:31 +0300 From: Mikolaj Golub To: Rick Macklem Subject: Re: NFS on ZFS pure SSD pool Message-ID: <20130901181630.GB15654@gmail.com> References: <258054624.15907722.1377905324980.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <258054624.15907722.1377905324980.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 18:16:36 -0000 On Fri, Aug 30, 2013 at 07:28:44PM -0400, Rick Macklem wrote: > Sam Fourman Jr. wrote: > > $ cat /var/log/messages | grep failed > > Aug 30 10:22:20 students nfsd[1978]: accept failed: Software caused > > connection abort > > Aug 30 10:27:16 students nfsd[1978]: accept failed: Software caused > > connection abort > > Aug 30 11:46:30 students nfsd[1978]: accept failed: Software caused > > connection abort > > Aug 30 11:47:10 students nfsd[1978]: accept failed: Software caused > > connection abort > > > Since the master socket that is accepting connections isn't being closed, > I believe this error (ECONNABORTED returned by accept()) occurs when the > client closes the new TCP connection before it has been accepted. Why > would an NFS client do this? I have no idea. May be because nfsd is too slow accepting new connections and the client aborts due to its timeout? May nfssvc(2) block for some considerable time? Sam, you could monitor nfsd listen queue running netstat -nL periodically and current client connections to nfsd with netstat -na, to see what is going on. Also enabling ktrace on the nfsd process when the issue is observed could tell if it is due to nfssvc(2) is slow. BTW, I noticed that nfsd sets listen backlog to 5. Isn't it a bit low for servers that might have hundreds clients? -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 18:28:47 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 57550ECB; Sun, 1 Sep 2013 18:28:47 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-vb0-x231.google.com (mail-vb0-x231.google.com [IPv6:2607:f8b0:400c:c02::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 050E62D91; Sun, 1 Sep 2013 18:28:46 +0000 (UTC) Received: by mail-vb0-f49.google.com with SMTP id w16so2466001vbb.36 for ; Sun, 01 Sep 2013 11:28:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mp37uCeqZcyF51i7e2WinrPG+vYertUBdPq/AJHWJJs=; b=IpveykH85tegXcwn14GIwbHtXc2KaFCjAMXiD8yRzyiq4M1Xo15iWulo5xdC8ygmjD 3/3TEnFDefuXML8sORH09P4hOi9r/4LkhzFySFGS7ycEvammSkuYrgOobLmiArQgl5Pk Ct9c/VGEbXEXoo+jJTJmtlf29yaJPDGwstTYShLn6nVlPkl1+mLKafCozva+pdQxk+cH vw1covMxAiHlpkO25rPlUamSgqlD9r86b/zNr0BDWs8eCrqvlwXgv0M1iMjN8faBhpM5 KMaJz1FeYP2cw7cF6RYiIdsnb9cBGmITBfUSAAXDAydwawCqdlYUFtFsSnNtHTqecXlG 6UTA== MIME-Version: 1.0 X-Received: by 10.58.201.69 with SMTP id jy5mr265913vec.29.1378060125767; Sun, 01 Sep 2013 11:28:45 -0700 (PDT) Received: by 10.220.96.78 with HTTP; Sun, 1 Sep 2013 11:28:45 -0700 (PDT) In-Reply-To: <20130901181630.GB15654@gmail.com> References: <258054624.15907722.1377905324980.JavaMail.root@uoguelph.ca> <20130901181630.GB15654@gmail.com> Date: Sun, 1 Sep 2013 14:28:45 -0400 Message-ID: Subject: Re: NFS on ZFS pure SSD pool From: "Sam Fourman Jr." To: Mikolaj Golub Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 18:28:47 -0000 On Sun, Sep 1, 2013 at 2:16 PM, Mikolaj Golub wrote: > On Fri, Aug 30, 2013 at 07:28:44PM -0400, Rick Macklem wrote: > > Sam Fourman Jr. wrote: > > > $ cat /var/log/messages | grep failed > > > Aug 30 10:22:20 students nfsd[1978]: accept failed: Software caused > > > connection abort > > > Aug 30 10:27:16 students nfsd[1978]: accept failed: Software caused > > > connection abort > > > Aug 30 11:46:30 students nfsd[1978]: accept failed: Software caused > > > connection abort > > > Aug 30 11:47:10 students nfsd[1978]: accept failed: Software caused > > > connection abort > > > > > Since the master socket that is accepting connections isn't being closed, > > I believe this error (ECONNABORTED returned by accept()) occurs when the > > client closes the new TCP connection before it has been accepted. Why > > would an NFS client do this? I have no idea. > > May be because nfsd is too slow accepting new connections and the > client aborts due to its timeout? May nfssvc(2) block for some > considerable time? > > Sam, you could monitor nfsd listen queue running netstat -nL > periodically and current client connections to nfsd with netstat -na, > to see what is going on. Also enabling ktrace on the nfsd process when > the issue is observed could tell if it is due to nfssvc(2) is slow. > > BTW, I noticed that nfsd sets listen backlog to 5. Isn't it a bit low > for servers that might have hundreds clients? > is there a sysctl to increase the listen backlog for nfsd? > > -- > Mikolaj Golub > -- Sam Fourman Jr. From owner-freebsd-fs@FreeBSD.ORG Sun Sep 1 19:11:20 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7379ED0B for ; Sun, 1 Sep 2013 19:11:20 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ee0-x231.google.com (mail-ee0-x231.google.com [IPv6:2a00:1450:4013:c00::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 0435A208C for ; Sun, 1 Sep 2013 19:11:19 +0000 (UTC) Received: by mail-ee0-f49.google.com with SMTP id d41so1930469eek.36 for ; Sun, 01 Sep 2013 12:11:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=45ppacPuUey+7gqwXij2grpafrlALoO0KIX3rv/GJFA=; b=s9Yqi0dvR9rsKg57exxAEENN+uHpXTesfyZtNdHpx3NATViYgxNc7LCD3GLQ+uWmI7 wzeJJUeqNJjY3IugV9mWlmDYOJvWgPpi8/IS28Je6mDV8OuVdjGBYchCUFSP9TZyIp+r Nu56mcZY4I79LxqW3MMjkTZJf9CdtlnSSHZaPAmjXeXlDlgVcSkRwmx0UluYE65UuyGo GQqkA779oHzMYVCQ30e84nhBoGypQcCf+DGP4caKp5PfiUs2z6TNoucs99+EQarJGB11 bHbipG0UwmPOfGAbuR77PfEuAu1qAwNLVT65HS0WWdk6GZGIxyN6ThOZI3Gr3vjqRFss qH5Q== X-Received: by 10.15.35.196 with SMTP id g44mr30866529eev.18.1378062678376; Sun, 01 Sep 2013 12:11:18 -0700 (PDT) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id bn13sm15603035eeb.11.1969.12.31.16.00.00 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 01 Sep 2013 12:11:17 -0700 (PDT) Sender: Mikolaj Golub Date: Sun, 1 Sep 2013 22:11:14 +0300 From: Mikolaj Golub To: "Sam Fourman Jr." Subject: Re: NFS on ZFS pure SSD pool Message-ID: <20130901191113.GC15654@gmail.com> References: <258054624.15907722.1377905324980.JavaMail.root@uoguelph.ca> <20130901181630.GB15654@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 19:11:20 -0000 On Sun, Sep 01, 2013 at 02:28:45PM -0400, Sam Fourman Jr. wrote: > > BTW, I noticed that nfsd sets listen backlog to 5. Isn't it a bit low > > for servers that might have hundreds clients? > > > > is there a sysctl to increase the listen backlog for nfsd? No. It is hardcoded to 5 in nfsd.c. If it had been set to -1 instead, the system default would have been used and could be tuned (kern.ipc.soacceptqueue). But my note was just an observation and I am not sure this makes any difference in your case, without seeing netstat output. -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Mon Sep 2 08:35:59 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id AE32F996 for ; Mon, 2 Sep 2013 08:35:59 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from cpsmtpb-ews07.kpnxchange.com (cpsmtpb-ews07.kpnxchange.com [213.75.39.10]) by mx1.freebsd.org (Postfix) with ESMTP id 4DD9C2763 for ; Mon, 2 Sep 2013 08:35:59 +0000 (UTC) Received: from cpsps-ews01.kpnxchange.com ([10.94.84.168]) by cpsmtpb-ews07.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Mon, 2 Sep 2013 10:35:50 +0200 Received: from CPSMTPM-TLF104.kpnxchange.com ([195.121.3.7]) by cpsps-ews01.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Mon, 2 Sep 2013 10:35:50 +0200 Received: from sjakie.klop.ws ([212.182.167.131]) by CPSMTPM-TLF104.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Mon, 2 Sep 2013 10:35:49 +0200 Received: from 212-182-167-131.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id B6841CA3A for ; Mon, 2 Sep 2013 10:35:49 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Date: Mon, 02 Sep 2013 10:35:49 +0200 Subject: fsck does not mark as clean MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: User-Agent: Opera Mail/12.16 (FreeBSD) X-OriginalArrivalTime: 02 Sep 2013 08:35:50.0073 (UTC) FILETIME=[6DA55A90:01CEA7B7] X-RcptDomain: freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 08:35:59 -0000 Hello, I have a usb stick which I mounted async and than removed it (without writing anything). Of course its FS is marked dirty now. Running fsck gives this: # fsck_ufs -y /dev/da5s2 ** /dev/da5s2 ** Last Mounted on /jails/mailserver/mnt/blabla ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 13178 files, 255459 used, 698006 free (238 frags, 87221 blocks, 0.0% fragmentation) ***** FILE SYSTEM STILL DIRTY ***** ***** PLEASE RERUN FSCK ***** # How can I mark it as clean again? Fsck does not complain about specific problems, but it does not fix anything either. NB: I'm running fsck on 9.2-STABLE/amd64 of Aug 5th. Ronald. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 2 09:18:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7E520C37 for ; Mon, 2 Sep 2013 09:18:24 +0000 (UTC) (envelope-from maurizio.vairani@cloverinformatica.it) Received: from smtpdg12.aruba.it (smtpdg7.aruba.it [62.149.158.237]) by mx1.freebsd.org (Postfix) with ESMTP id 748912B05 for ; Mon, 2 Sep 2013 09:18:22 +0000 (UTC) Received: from cloverinformatica.it ([188.10.129.202]) by smtpcmd04.ad.aruba.it with bizsmtp id L9JA1m00a4N8xN4019JBNL; Mon, 02 Sep 2013 11:18:13 +0200 Received: from [192.168.0.81] (ASUS-TERMINATOR [192.168.0.81]) by cloverinformatica.it (Postfix) with ESMTP id D9E0117EDD; Mon, 2 Sep 2013 11:18:10 +0200 (CEST) Message-ID: <522457D2.1070304@cloverinformatica.it> Date: Mon, 02 Sep 2013 11:18:10 +0200 From: Maurizio Vairani User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: Boot problem if a ZFS log device is missing References: <521F05F0.4090607@cloverinformatica.it> <521F0DEB.20408@FreeBSD.org> <522075D6.70600@FreeBSD.org> <52207800.2060901@FreeBSD.org> In-Reply-To: <52207800.2060901@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 09:18:24 -0000 On 30/08/2013 12.46, Andriy Gapon wrote: > on 30/08/2013 13:37 Andriy Gapon said the following: >> on 30/08/2013 00:38 Charles Sprickman said the following: >>> If one is willing to accept that data is lost (like the log device is totally smoked), is there a way to boot knowing that you may have some data loss, or is the only option to boot alternate media and force a pool import (assuming that works without the log device)? >> I think it's the latter. I am not aware of any way to select a behavior similar >> to import -m or import -F during boot. >> Perhaps... ZFS_IMPORT_MISSING_LOG should be a default behavior for a root pool >> or maybe the behavior could be controllable by a tunable. >> > Maurizio, > > you might want to try the following patch as an interim solution for your > environment: > > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c > @@ -4112,6 +4112,7 @@ spa_import_rootpool(const char *name) > } > spa->spa_is_root = B_TRUE; > spa->spa_import_flags = ZFS_IMPORT_VERBATIM; > + spa->spa_import_flags |= ZFS_IMPORT_MISSING_LOG; /* XXX make tunable */ > > /* > * Build up a vdev tree based on the boot device's label config. > > HI all, unfortunately the patch don't works. The laptop returns the same error message: "Mounting from zfs:tank0 failed with error 6" and the same "mountroot>" prompt. I am available for further testing if needs. Thanks anyway, Maurizio From owner-freebsd-fs@FreeBSD.ORG Mon Sep 2 10:25:49 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BCBE5BCB for ; Mon, 2 Sep 2013 10:25:49 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from land.berklix.org (land.berklix.org [144.76.10.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 330B62FC4 for ; Mon, 2 Sep 2013 10:25:49 +0000 (UTC) Received: from park.js.berklix.net (pD9FBF3C8.dip0.t-ipconnect.de [217.251.243.200]) (authenticated bits=128) by land.berklix.org (8.14.5/8.14.5) with ESMTP id r829Y4J2041050; Mon, 2 Sep 2013 09:34:04 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by park.js.berklix.net (8.14.3/8.14.3) with ESMTP id r829Xs28043903; Mon, 2 Sep 2013 11:33:54 +0200 (CEST) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost.js.berklix.net [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id r829XVGu020704; Mon, 2 Sep 2013 11:33:53 +0200 (CEST) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201309020933.r829XVGu020704@fire.js.berklix.net> To: "Ronald Klop" Subject: Re: fsck does not mark as clean From: "Julian H. Stacey" Organization: http://berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Mon, 02 Sep 2013 10:35:49 +0200." Date: Mon, 02 Sep 2013 11:33:31 +0200 Sender: jhs@berklix.com Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 10:25:49 -0000 Hi, Reference: > From: "Ronald Klop" > Date: Mon, 02 Sep 2013 10:35:49 +0200 "Ronald Klop" wrote: > Hello, > > I have a usb stick which I mounted async and than removed it (without > writing anything). Of course its FS is marked dirty now. > Running fsck gives this: > > # fsck_ufs -y /dev/da5s2 > ** /dev/da5s2 > ** Last Mounted on /jails/mailserver/mnt/blabla > ** Phase 1 - Check Blocks and Sizes > ** Phase 2 - Check Pathnames > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > ** Phase 5 - Check Cyl groups > 13178 files, 255459 used, 698006 free (238 frags, 87221 blocks, 0.0% > fragmentation) > > ***** FILE SYSTEM STILL DIRTY ***** > > ***** PLEASE RERUN FSCK ***** > > # > > How can I mark it as clean again? Fsck does not complain about specific > problems, but it does not fix anything either. > NB: I'm running fsck on 9.2-STABLE/amd64 of Aug 5th. > > Ronald. If a 2nd fsck does Not fix it, have a look in /var/log/messages, to see if eg you might have bad media, some blocks on USB failing to write. If its not private data, make an image copy to hard disk with dd, then gzip2 image, so if there is a real bug in fsck, you'll have an image hopefully small enough to send a developer if they later see your post & respond. After that, there's fsdb in base from src/ & also cd /usr/ports/sysutils; echo *fs* shows eg ffs2recov If you play with fsdb its very likely you'll trash things, & want to revert, so first make another copy of /dev/da5s2 with dd, then use approx eg mdconfig -a -t vnode -f myusbimage.dd fsdb /dev/md0 & experiment/search/learn with fsdb Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Reply below not above, like a play script. Indent old text with "> ". Send plain text. No quoted-printable, HTML, base64, multipart/alternative. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 2 11:06:44 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 07F49F1 for ; Mon, 2 Sep 2013 11:06:44 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E9A9F239E for ; Mon, 2 Sep 2013 11:06:43 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r82B6hrk016001 for ; Mon, 2 Sep 2013 11:06:43 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r82B6hcV015999 for freebsd-fs@FreeBSD.org; Mon, 2 Sep 2013 11:06:43 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 2 Sep 2013 11:06:43 GMT Message-Id: <201309021106.r82B6hcV015999@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 11:06:44 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/181565 fs [swap] Problem with vnode-backed swap space. o kern/181377 fs [zfs] zfs recv causes an inconsistant pool o kern/181281 fs [msdosfs] stack trace after successfull 'umount /mnt' o kern/181082 fs [fuse] [ntfs] Write to mounted NTFS filesystem using F o kern/180979 fs [netsmb][patch]: Fix large files handling o kern/180876 fs [zfs] [hast] ZFS with trim,bio_flush or bio_delete loc o kern/180678 fs [NFS] succesfully exported filesystems being reported o kern/180438 fs [smbfs] [patch] mount_smbfs fails on arm because of wr p kern/180236 fs [zfs] [nullfs] Leakage free space using ZFS with nullf o kern/178854 fs [ufs] FreeBSD kernel crash in UFS o kern/178713 fs [nfs] [patch] Correct WebNFS support in NFS server and s kern/178467 fs [zfs] [request] Optimized Checksum Code for ZFS o kern/178412 fs [smbfs] Coredump when smbfs mounted o kern/178388 fs [zfs] [patch] allow up to 8MB recordsize o kern/178387 fs [zfs] [patch] sparse files performance improvements o kern/178349 fs [zfs] zfs scrub on deduped data could be much less see o kern/178329 fs [zfs] extended attributes leak o kern/178238 fs [nullfs] nullfs don't release i-nodes on unlink. f kern/178231 fs [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr o kern/178103 fs [kernel] [nfs] [patch] Correct support of index files o kern/177985 fs [zfs] disk usage problem when copying from one zfs dat o kern/177971 fs [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3, o kern/177966 fs [zfs] resilver completes but subsequent scrub reports o kern/177658 fs [ufs] FreeBSD panics after get full filesystem with uf o kern/177536 fs [zfs] zfs livelock (deadlock) with high write-to-disk o kern/177445 fs [hast] HAST panic o kern/177240 fs [zfs] zpool import failed with state UNAVAIL but all d o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175449 fs [unionfs] unionfs and devfs misbehaviour o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172942 fs [smbfs] Unmounting a smb mount when the server became o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic f kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/145750 fs [unionfs] [hang] unionfs locks the machine s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141950 fs [unionfs] [lor] ufs/unionfs/ufs Lock order reversal o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/137588 fs [unionfs] [lor] LOR nfs/ufs/nfs o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126973 fs [unionfs] [hang] System hang with unionfs and init chr o kern/126553 fs [unionfs] unionfs move directory problem 2 (files appe o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o bin/123574 fs [unionfs] df(1) -t option destroys info for unionfs (a o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o kern/121385 fs [unionfs] unionfs cross mount -> kernel panic o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/67326 fs [msdosfs] crash after attempt to mount write protected o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t o kern/9619 fs [nfs] Restarting mountd kills existing mounts 335 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 2 14:10:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1070CE62 for ; Mon, 2 Sep 2013 14:10:41 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CA4AD25E0 for ; Mon, 2 Sep 2013 14:10:40 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAA6cJFKDaFve/2dsb2JhbABahA2DJ7pMgnOBPXSCKyNWGxoCDRkCWQaIFacykXeBKY4iNAeCaYE0A6lbgzwggW4 X-IronPort-AV: E=Sophos;i="4.89,1007,1367985600"; d="scan'208";a="48116724" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 02 Sep 2013 10:10:18 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 53324B3F49; Mon, 2 Sep 2013 10:10:18 -0400 (EDT) Date: Mon, 2 Sep 2013 10:10:18 -0400 (EDT) From: Rick Macklem To: "Sam Fourman Jr." Message-ID: <590302855.16994708.1378131018328.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: NFS on ZFS pure SSD pool MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 14:10:41 -0000 Sam Fourman Jr. wrote: [lots of stuff snipped for brevity] > root@students:/users # nfsstat -e -s > > Server Info: > Getattr Setattr Lookup Readlink Read Write Create > Remove > 106273793 1417764 19593633 12021 2497674 7927757 1047249 > 772450 > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus > Access > 319284 924 13813 63500 20980 526257 0 > 677005862 I didn't spot this when it first was posted, because of the wrap around. That's a *lot* of Access operations, imho. Maybe tweaking the Mac OS X client could reduce these? I see there are a couple of options in their nfs.conf(5) file that might help? nfs.client.access_cache_timeout nfs.client.access_for_getattr You could also look at their mount_nfs man page to see if there are other settings for access related stuff. As well, I'm wondering if the Macs may be doing Access ops like crazy because they see that ACLs are enabled. I think ZFS always have ACLs enabled, but you can change the line in sys/fs/nfs/nfs_commonsubs.c int nfsrv_useacl = 1; to int nfsrv_useacl = 0; and then build/boot a new kernel on the server to disable them. (It isn't a sysctl, because it normally depends on the server file system to say if they are supported.) Also, pull up a terminal window and do an "ls -l" on some directory, to make sure everything isn;t owned by "nobody". If it is, the name/uid mapping for NFSv4 isn;t working correctly. Good luck with it, rick From owner-freebsd-fs@FreeBSD.ORG Mon Sep 2 15:54:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id F22964C2 for ; Mon, 2 Sep 2013 15:54:30 +0000 (UTC) (envelope-from ericbrowning@skaggscatholiccenter.org) Received: from mail-pb0-f46.google.com (mail-pb0-f46.google.com [209.85.160.46]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CA6CE2CE8 for ; Mon, 2 Sep 2013 15:54:30 +0000 (UTC) Received: by mail-pb0-f46.google.com with SMTP id rq2so4919700pbb.33 for ; Mon, 02 Sep 2013 08:54:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=CvjmeKUDlQwLMDZD7EBmOXqWf85bZcHXlzO4Sl5EfLs=; b=AdqxJAYY3g4ctH+EpsjvkAMtoTrr86xE4QD7fB+g13/OCnv77s9xiaA9l275KdLr+/ aIbnFIcx1NwCg/ZT7BtsuyA9br5vCHRxrgmwU2Rzb6BN0gy7g1IpgEhb7NJGvTkKhJuK lmvAqvayCM2rlhvJGrHuARgF2daLLlGXPQf05x6LVQgXqhuTQCclNlEw/o8FNrKCC+1K 1ymsti5uzgP4nJYZ+NvYbyL5z5JtJdKCWzqhE+4fDQSQrn+9m745lTPUFDGQEaMhI3uB ROUKjQos1LvKUMjxgY69V7JYD4oRgTjeWAX7UqzHqzID1/+FYJtreRrFefpR/ZHJfX4K gZJw== X-Gm-Message-State: ALoCoQnPXNFOHcrVJ8506Jsh4q0lBfXxzHUy5o+b3Z0vnD+UGAORi9wruLhQJAhAFY8Wi1BlDzF2 MIME-Version: 1.0 X-Received: by 10.68.137.170 with SMTP id qj10mr26265772pbb.31.1378136925752; Mon, 02 Sep 2013 08:48:45 -0700 (PDT) Received: by 10.70.26.4 with HTTP; Mon, 2 Sep 2013 08:48:45 -0700 (PDT) In-Reply-To: <590302855.16994708.1378131018328.JavaMail.root@uoguelph.ca> References: <590302855.16994708.1378131018328.JavaMail.root@uoguelph.ca> Date: Mon, 2 Sep 2013 09:48:45 -0600 Message-ID: Subject: Re: NFS on ZFS pure SSD pool From: Eric Browning To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 15:54:31 -0000 Rick, Thanks for the suggestions. mapping of names/uid is working correctly, on my own files I can see my uid on my files. Because of my mix of clients 10.6/7/8 I'm using just nfs v3 until those 10.6 clients die out. I'd like to eventually switch to nfsv4. I can't jail off the nfsv3 10.6 clients via MAC address since many of them overlap with the 10.7 clients which are just one model newer. nfs.client.access_cache_timeout is now set to 600 seconds instead of 60 nfs.client.access_for_getattr by default is off and of the three versions of OSX (10.6/7/8) Sam and I can experiment with sys/fs/nfs/nfs_commonsubs.c after we see if the mac client tweaks work this week. It's currently labor day and the school is closed. On Mon, Sep 2, 2013 at 8:10 AM, Rick Macklem wrote: > Sam Fourman Jr. wrote: > [lots of stuff snipped for brevity] > > root@students:/users # nfsstat -e -s > > > > Server Info: > > Getattr Setattr Lookup Readlink Read Write Create > > Remove > > 106273793 1417764 19593633 12021 2497674 7927757 1047249 > > 772450 > > Rename Link Symlink Mkdir Rmdir Readdir RdirPlus > > Access > > 319284 924 13813 63500 20980 526257 0 > > 677005862 > I didn't spot this when it first was posted, because of the wrap around. > > That's a *lot* of Access operations, imho. Maybe tweaking the Mac OS X > client > could reduce these? > > I see there are a couple of options in their nfs.conf(5) file that might > help? > nfs.client.access_cache_timeout > nfs.client.access_for_getattr > > You could also look at their mount_nfs man page to see if there are other > settings for access related stuff. > > As well, I'm wondering if the Macs may be doing Access ops like crazy > because > they see that ACLs are enabled. > I think ZFS always have ACLs enabled, but you can change the line in > sys/fs/nfs/nfs_commonsubs.c > int nfsrv_useacl = 1; > to > int nfsrv_useacl = 0; > and then build/boot a new kernel on the server to disable them. (It isn't > a sysctl, because it normally depends on the server file system to say if > they are supported.) > > Also, pull up a terminal window and do an "ls -l" on some directory, to > make > sure everything isn;t owned by "nobody". If it is, the name/uid mapping for > NFSv4 isn;t working correctly. > > Good luck with it, rick > > -- Eric Browning Systems Administrator 801-984-7623 Skaggs Catholic Center Juan Diego Catholic High School Saint John the Baptist Middle Saint John the Baptist Elementary From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 04:00:02 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 5E884362 for ; Tue, 3 Sep 2013 04:00:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4D08221F1 for ; Tue, 3 Sep 2013 04:00:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r83401ZD031716 for ; Tue, 3 Sep 2013 04:00:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r83401rL031715; Tue, 3 Sep 2013 04:00:01 GMT (envelope-from gnats) Date: Tue, 3 Sep 2013 04:00:01 GMT Message-Id: <201309030400.r83401rL031715@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Berend de Boer Subject: Re: bin/121898: [nullfs] pwd(1)/getcwd(2) fails with Permission denied X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Berend de Boer List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 04:00:02 -0000 The following reply was made to PR bin/121898; it has been noted by GNATS. From: Berend de Boer To: bug-followup@FreeBSD.org, ota@j.email.ne.jp Cc: Subject: Re: bin/121898: [nullfs] pwd(1)/getcwd(2) fails with Permission denied Date: Tue, 03 Sep 2013 15:55:23 +1200 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --I7VwP6VLJ4NMtdHJ57x1qVnLNlcoQhgXp Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Looks to me like a duplicate of http://www.freebsd.org/cgi/query-pr.cgi?pr=3D161424&cat=3D -- All the best, Berend. --I7VwP6VLJ4NMtdHJ57x1qVnLNlcoQhgXp Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSJV2rAAoJEKOfeD48G3g51VEP/2WKzPrB+g+zNcaj+j24NcBS 7mTMs3ulUbR5jN1saXa0teSMvSD1cRz2C/51gFiiEmRghWl9Cu45Re/zXOzK9mHV JMAoTYVg7opmEmkW7njXa9X2k2AIsORJNcQuV8eFVXau+9/x9Zfop3YuqGshsjSe xGuBmZA49oekRQq02LtMoSY72Z0Z4L87Fa4ogltjU42+369bMfcuhX8xjWD0LTqw PIh6eSSc3eyPO2U9ReqBVpHk7BajcL3Kg0AvuwfBSFSpF1JemHkgqbL3KJPsrKM2 Yj30n5M3srb/WtBeWKZ9KljhUiRo/SovE6d1dF1gpUDu53wmcFpAnfvWCNJMqChl f5IR1NT/Jj3YO2bc2IaK6Xo+eXrOi26Z7fLo4/pJwwZ3pDJkYy57ot/0i7zBxYE7 XIKSfzYJnM3wdMzW9p0KFM2muRFrLBOTLJ7S3vGalJPpEGFwzyXd7NfTIeYuG4l6 zGpxBRjklyzdf1axx4G6kZq7QC+cqUkiCGIARNhmULGFnpWAe7KUJo4EhHXBmW36 9Mq/nxTTzRlyRAyLcuWgp19qWfkUhQUEH0gukLZz4vl34WMpSkQVB/77NZ1I4S/g /t5O8b8YbwwZlHNroB5npZgQXb8ErP43CyRtEnWBEQMcpIQue0Rnhmhggp6hoXqC 07LSyMu6dd9YBOV2jdmz =2VpQ -----END PGP SIGNATURE----- --I7VwP6VLJ4NMtdHJ57x1qVnLNlcoQhgXp-- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 04:00:03 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 80FEA363 for ; Tue, 3 Sep 2013 04:00:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 55AF321F2 for ; Tue, 3 Sep 2013 04:00:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r834030d031722 for ; Tue, 3 Sep 2013 04:00:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r83403nX031721; Tue, 3 Sep 2013 04:00:03 GMT (envelope-from gnats) Date: Tue, 3 Sep 2013 04:00:03 GMT Message-Id: <201309030400.r83403nX031721@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: Berend de Boer Subject: Re: kern/161424: [nullfs] __getcwd() calls fail when used on nullfs mount X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: Berend de Boer List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 04:00:03 -0000 The following reply was made to PR kern/161424; it has been noted by GNATS. From: Berend de Boer To: bug-followup@FreeBSD.org, v.haisman@sh.cvut.cz Cc: Subject: Re: kern/161424: [nullfs] __getcwd() calls fail when used on nullfs mount Date: Tue, 03 Sep 2013 15:53:06 +1200 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --3cPG6dhSiJ0Xq4uUIATFsmLvXWcRkoajv Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have exactly the same problem in FreeBSD 9.1-RELEASE (almost, it runs on AWS so some minor patches). I have an ZFS pool, and nullfs mounted it. I wrote a simple C program which exhibits the problem: #include #include #include int main () { void *r; char c[1024]; r =3D getcwd(c, 1024); printf ("r =3D %p, errno =3D %i\n", r, errno); printf ("dir =3D %s\n", c); } Result: r =3D 0x0, errno =3D 13 dir =3D Very annoying, as I have my home directories on /u1/home and remount them on /home to avoid having to rename a bunch of stuff. All the best, Berend. --3cPG6dhSiJ0Xq4uUIATFsmLvXWcRkoajv Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJSJV0jAAoJEKOfeD48G3g5cukQAK6YVzIBffLP8SLefAPLSpr8 7Rcu52TIx5/afJgYJc9sYCv/1tEG5g8QIGO7j1/b2lxlFPSj+qOlltm/T4wErET2 +Nsb+dn+CE+ZNimeel9QstuqUFrs5P+OvlfthOtIG0vhD7bPdFpSq1rjFhzC2+ww tvE/48eBhtO0oQ4ek/lhOmnipSy/FKdGF0N4rAKdTyjPOLPn4OzyDEFoG1xJ0OQp OM3E5LbilssIjzBxgKnfxLBtbHwqEsotyrCcAZyA7qSZxhs7tN4KR2C+K78QIOmG H20RdnffbcbLyWRza6SA2tpe+pGZGeV1umPlmJNKVNk5vgp1pPerhZAyv8es3EzF 0ip0fRO4D6/7sfgq1wC4oUnlZTqEbahgm6B3dkherDwowJAw3WzqR5geQ6e/xcj4 lEYgs1suCKVDRwbmo1aHnG1FbS0iYeYiJCnwQG0gRala71Z+JGCKmo6MuE9kMEWR SJehMtlfaRJX35R1yTmm/D5VUuhkwcpwv/WWok6OjAT0sqxB3vSeShCu4o8aojGK LccenGdCl28gzKt25K50uK+itN+XD8RL//G/gHnoIQX+cDjlmXw111iGnkRbCuSp f6PIPbPaeHH2y8Kgl33nIQZ5Df3YtxHy+Cr7BRoNPnWjHn1okQTEZHRFPlTSd5Eq HHK2WzOnk6eIX9UIN5kM =mF/w -----END PGP SIGNATURE----- --3cPG6dhSiJ0Xq4uUIATFsmLvXWcRkoajv-- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 08:20:44 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4589F917 for ; Tue, 3 Sep 2013 08:20:44 +0000 (UTC) (envelope-from grant@grantgray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id D192B2323 for ; Tue, 3 Sep 2013 08:20:42 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id C77AA36BD12 for ; Tue, 3 Sep 2013 18:11:22 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8s0TJ2BjRbqu for ; Tue, 3 Sep 2013 18:11:21 +1000 (EST) Received: from localhost.localdomain (c27-253-54-200.thoms4.vic.optusnet.com.au [27.253.54.200]) by mail.grantgray.id.au (Postfix) with ESMTPSA id 7A8DA36BD10 for ; Tue, 3 Sep 2013 18:11:21 +1000 (EST) Message-ID: <522599A9.9070107@grantgray.id.au> Date: Tue, 03 Sep 2013 18:11:21 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: ZFS livelock / deadlock on pure SSD pool Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 08:20:44 -0000 Hello All, I have been experiencing a ZFS livelock on a 9.1 system since introducing pools containing only SSDs. The livelock occurs typically every 1-2 days, sometimes as much as twice a day. ZFS filesystems: http://pastebin.com/raw.php?i=svTZRd7m The pool configuration is as follows: http://pastebin.com/raw.php?i=KAdSGWu4 /boot/loader.conf: http://pastebin.com/raw.php?i=J1cZNPjS There were a couple of livelock issues associated with 9.1 (one in ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to 9.2RC3, however the problem persists. When the system has locked, it can still be pinged and socket connections can be made (SSH begins handshake for example, but doesn't get as far as prompting for password). Some details: * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot, * Regular (hourly) cron jobs that traverse at least one filesystem of tens of thousands of files, * NFS exports of some ZFS filesystems, * iSCSI exports via istgt of zvols, * Host controller is LSI 3801E (IT) with latest firmware, * Storage array is Dell MD1000 with latest firmware, * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons, * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2). I haven't yet enabled the kernel debugger to get a stack trace/lock status, but procstat -kk -a is here: http://pastebin.com/raw.php?i=SYhmyhGj Once livelock occurs, any ZFS command hangs, and it appears any command that doesn't happen to be in cache may also hang. Any suggestions are warmly welcomed! From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 08:22:42 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EEB99A48 for ; Tue, 3 Sep 2013 08:22:41 +0000 (UTC) (envelope-from grant@grantgray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8783B2341 for ; Tue, 3 Sep 2013 08:22:41 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id 8381936BD12 for ; Tue, 3 Sep 2013 18:22:40 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BlTZAKFjs11z for ; Tue, 3 Sep 2013 18:22:40 +1000 (EST) Received: from localhost.localdomain (c27-253-54-200.thoms4.vic.optusnet.com.au [27.253.54.200]) by mail.grantgray.id.au (Postfix) with ESMTPSA id EF0E136BD10 for ; Tue, 3 Sep 2013 18:22:39 +1000 (EST) Message-ID: <52259C4F.6020705@grantgray.id.au> Date: Tue, 03 Sep 2013 18:22:39 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> In-Reply-To: <522599A9.9070107@grantgray.id.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 08:22:42 -0000 I forgot to mention the device list: at scbus0 target 0 lun 0 (pass0,cd0) at scbus3 target 39 lun 0 (pass1,ses0) at scbus3 target 40 lun 0 (pass2,da0) at scbus3 target 41 lun 0 (pass3,da1) at scbus3 target 42 lun 0 (pass4,da2) at scbus3 target 43 lun 0 (pass5,da3) at scbus3 target 44 lun 0 (pass6,da4) at scbus3 target 45 lun 0 (pass7,da5) at scbus3 target 46 lun 0 (pass8,da6) at scbus3 target 47 lun 0 (pass9,da7) at scbus3 target 48 lun 0 (pass10,da8) at scbus3 target 49 lun 0 (pass11,da9) at scbus3 target 50 lun 0 (pass12,da10) at scbus3 target 51 lun 0 (pass13,da11) at scbus3 target 52 lun 0 (pass14,da12) at scbus3 target 53 lun 0 (pass15,da13) at scbus3 target 54 lun 0 (pass16,da14) at scbus4 target 0 lun 0 (pass17,da15) at scbus4 target 1 lun 0 (pass18,da16) at scbus4 target 2 lun 0 (pass19,da17) at scbus4 target 3 lun 0 (pass20,da18) On 09/03/2013 06:11 PM, Grant Gray wrote: > Hello All, > > I have been experiencing a ZFS livelock on a 9.1 system since > introducing pools containing only SSDs. The livelock occurs typically > every 1-2 days, sometimes as much as twice a day. > > ZFS filesystems: > http://pastebin.com/raw.php?i=svTZRd7m > > The pool configuration is as follows: > http://pastebin.com/raw.php?i=KAdSGWu4 > > /boot/loader.conf: > http://pastebin.com/raw.php?i=J1cZNPjS > > There were a couple of livelock issues associated with 9.1 (one in > ZFS, one in CAM) that prompted an upgrade to 9.2RC2 and then to > 9.2RC3, however the problem persists. When the system has locked, it > can still be pinged and socket connections can be made (SSH begins > handshake for example, but doesn't get as far as prompting for password). > > Some details: > * Regular (hourly, daily, weekly) rolling snapshots via zfs-snapshot, > * Regular (hourly) cron jobs that traverse at least one filesystem of > tens of thousands of files, > * NFS exports of some ZFS filesystems, > * iSCSI exports via istgt of zvols, > * Host controller is LSI 3801E (IT) with latest firmware, > * Storage array is Dell MD1000 with latest firmware, > * Host system is Sun X4200 M2 w/32GB RAM, 2 x dual core Opterons, > * SSDs (4 of) are Crucial M500 960GB in two mirrored pools (san1 & san2). > > > I haven't yet enabled the kernel debugger to get a stack trace/lock > status, but procstat -kk -a is here: > http://pastebin.com/raw.php?i=SYhmyhGj > > Once livelock occurs, any ZFS command hangs, and it appears any > command that doesn't happen to be in cache may also hang. > > Any suggestions are warmly welcomed! > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 09:28:36 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 61D5482A for ; Tue, 3 Sep 2013 09:28:36 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5DCFD2704 for ; Tue, 3 Sep 2013 09:28:34 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA06532; Tue, 03 Sep 2013 12:28:17 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VGmuF-0001WF-W4; Tue, 03 Sep 2013 12:28:16 +0300 Message-ID: <5225AB77.9020208@FreeBSD.org> Date: Tue, 03 Sep 2013 12:27:19 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130810 Thunderbird/17.0.8 MIME-Version: 1.0 To: Grant Gray Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> In-Reply-To: <522599A9.9070107@grantgray.id.au> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 09:28:36 -0000 on 03/09/2013 11:11 Grant Gray said the following: > I haven't yet enabled the kernel debugger to get a stack trace/lock status, but > procstat -kk -a is here: > http://pastebin.com/raw.php?i=SYhmyhGj I believe that this another ARC deadlock triggered by low memory condition. This time it seems to be FreeBSD-specific too: 6 100059 zfskern arc_reclaim_thre mi_switch+0x194 sleepq_wait+0x42 _sx_xlock_hard+0x4d6 _sx_xlock+0x75 arc_buf_remove_ref+0x8a dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 dbuf_do_evict+0x53 arc_do_user_evicts+0xe2 arc_reclaim_thread+0x264 fork_exit+0x11f fork_trampoline+0xe 5338 102410 vorbisgain - mi_switch+0x194 sleepq_wait+0x42 _sx_xlock_hard+0x4d6 _sx_xlock+0x75 arc_lowmem+0x38 kmem_malloc+0xb0 uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0x1f4 arc_read+0x225 dbuf_read+0x445 dmu_buf_hold_array_by_dnode+0x168 dmu_buf_hold_array+0x67 dmu_read_uio+0x3f zfs_freebsd_read+0x483 VOP_READ_APV+0x6e vn_read+0xed vn_io_fault+0x90 Thread 100059 acquired arc_reclaim_thr_lock before calling arc_do_user_evicts and now it wants to take a buf header hash lock. Thread 102410 acquired the hash lock in arc_read, then it got into arc_lowmem because of a memory allocation problem (and M_WAIT flag) and now it wants to take arc_reclaim_thr_lock. A classic deadlock. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 10:36:09 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 04A7C958; Tue, 3 Sep 2013 10:36:09 +0000 (UTC) (envelope-from grant@gray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id B9663234B; Tue, 3 Sep 2013 10:36:07 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id 3207A37BA59; Tue, 3 Sep 2013 20:36:05 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iMz1y0UZkFv8; Tue, 3 Sep 2013 20:36:04 +1000 (EST) Received: from [192.168.1.159] (musicm2.lnk.telstra.net [110.142.98.231]) by mail.grantgray.id.au (Postfix) with ESMTPSA id 9B53637BA44; Tue, 3 Sep 2013 20:36:03 +1000 (EST) Message-ID: <5225BB8C.5050802@gray.id.au> Date: Tue, 03 Sep 2013 20:35:56 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> <5225AB77.9020208@FreeBSD.org> In-Reply-To: <5225AB77.9020208@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Grant Gray X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 10:36:09 -0000 On 3/09/2013 7:27 PM, Andriy Gapon wrote: > on 03/09/2013 11:11 Grant Gray said the following: >> I haven't yet enabled the kernel debugger to get a stack trace/lock status, but >> procstat -kk -a is here: >> http://pastebin.com/raw.php?i=SYhmyhGj > I believe that this another ARC deadlock triggered by low memory condition. > This time it seems to be FreeBSD-specific too: > > 6 100059 zfskern arc_reclaim_thre mi_switch+0x194 sleepq_wait+0x42 > _sx_xlock_hard+0x4d6 _sx_xlock+0x75 arc_buf_remove_ref+0x8a > dbuf_rele_and_unlock+0x132 dbuf_evict+0x11 dbuf_do_evict+0x53 > arc_do_user_evicts+0xe2 arc_reclaim_thread+0x264 fork_exit+0x11f fork_trampoline+0xe > > 5338 102410 vorbisgain - mi_switch+0x194 sleepq_wait+0x42 > _sx_xlock_hard+0x4d6 _sx_xlock+0x75 arc_lowmem+0x38 kmem_malloc+0xb0 > uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0x1f4 arc_read+0x225 > dbuf_read+0x445 dmu_buf_hold_array_by_dnode+0x168 dmu_buf_hold_array+0x67 > dmu_read_uio+0x3f zfs_freebsd_read+0x483 VOP_READ_APV+0x6e vn_read+0xed > vn_io_fault+0x90 > > Thread 100059 acquired arc_reclaim_thr_lock before calling arc_do_user_evicts > and now it wants to take a buf header hash lock. > Thread 102410 acquired the hash lock in arc_read, then it got into arc_lowmem > because of a memory allocation problem (and M_WAIT flag) and now it wants to > take arc_reclaim_thr_lock. > > A classic deadlock. Thanks for the feedback. Do you think it may be triggered when the ARC is evicting pages because it is full, or a genuine low-memory case? The system has 32GB of RAM, of which the ARC is typically about 24G (I think). From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 16:30:02 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D40692CD for ; Tue, 3 Sep 2013 16:30:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C169629C6 for ; Tue, 3 Sep 2013 16:30:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r83GU24k098429 for ; Tue, 3 Sep 2013 16:30:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r83GU22a098428; Tue, 3 Sep 2013 16:30:02 GMT (envelope-from gnats) Date: Tue, 3 Sep 2013 16:30:02 GMT Message-Id: <201309031630.r83GU22a098428@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org Cc: From: John Kozubik Subject: RE: kern/156781: zfs is losing the snapshot directory X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: John Kozubik List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 16:30:02 -0000 The following reply was made to PR kern/156781; it has been noted by GNATS. From: John Kozubik To: bug-followup@FreeBSD.org Cc: Subject: RE: kern/156781: zfs is losing the snapshot directory Date: Tue, 3 Sep 2013 09:12:15 -0700 (PDT) I can confirm that this still occurs, this time on 8.3-RELEASE on amd64. Exactly as described by the previous submitters - nothing special going on on our system (no NFS, etc.) - just plain old snapshots being created and removed and ... eventually some .zfs/snapshot directories just disappear. Two things: 1) In our experience, enough retrying of ls/find/etc. of the missing snapshot directory will eventually lock the system up 2) non-critical severity is incorrect - snapshots are a critical feature and this is a critical bug. Thanks. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 17:28:35 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E4A9FA78 for ; Tue, 3 Sep 2013 17:28:35 +0000 (UTC) (envelope-from feld@FreeBSD.org) Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BAD9E2326 for ; Tue, 3 Sep 2013 17:28:35 +0000 (UTC) Received: from compute6.internal (compute6.nyi.mail.srv.osa [10.202.2.46]) by gateway1.nyi.mail.srv.osa (Postfix) with ESMTP id D498321638 for ; Tue, 3 Sep 2013 13:18:20 -0400 (EDT) Received: from web3 ([10.202.2.213]) by compute6.internal (MEProxy); Tue, 03 Sep 2013 13:18:20 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:subject:date:in-reply-to :references; s=smtpout; bh=3UUJx4ePyzhKtUk++NlMTZj2sKc=; b=TGfUo 0lqXtIjFOhHwZyD5zKqd6gFH5XfAgZFXdpmV+5jnH+zGbhI3YZrzVcb3KuHFM56v RJw6yUR9tgMMyRn9iHScyzIFuzL9eRegNu3c/Qv8TtOGn6BEBDHBUo2EmF1SM1Xd ucz7LBwSaFXT5wHL7AsGK1FwDK/Gul3q7tATBA= Received: by web3.nyi.mail.srv.osa (Postfix, from userid 99) id ABB2CB00077; Tue, 3 Sep 2013 13:18:20 -0400 (EDT) Message-Id: <1378228700.21135.17445033.2AD5D05D@webmail.messagingengine.com> X-Sasl-Enc: sTlY2fI8PvnyXGIKNPoNXy69KUCHTcylOZIx2Zto1FaR 1378228700 From: Mark Felder To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-ed2f0e98 Subject: Re: kern/156781: zfs is losing the snapshot directory Date: Tue, 03 Sep 2013 12:18:20 -0500 In-Reply-To: <201309031630.r83GU22a098428@freefall.freebsd.org> References: <201309031630.r83GU22a098428@freefall.freebsd.org> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 17:28:36 -0000 I used to see this in 8.x but haven't seen this in 9.x or HEAD. Would it be possible for you to zfs send your pool to another box on 9.x or HEAD and see if you can recreate it? From owner-freebsd-fs@FreeBSD.ORG Tue Sep 3 19:08:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A0604652; Tue, 3 Sep 2013 19:08:37 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 66DA42FC2; Tue, 3 Sep 2013 19:08:37 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 21196B990; Tue, 3 Sep 2013 15:08:36 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Subject: Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage? Date: Tue, 3 Sep 2013 15:07:32 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p28; KDE/4.5.5; amd64; ; ) References: <20130828181228.0d3618dd@ernst.home> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201309031507.33098.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 03 Sep 2013 15:08:36 -0400 (EDT) Cc: freebsd-hackers , Ivan Voras X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Sep 2013 19:08:37 -0000 On Wednesday, August 28, 2013 12:40:15 pm Ivan Voras wrote: > On 28 August 2013 18:12, Gary Jennejohn wrote: > > > So, if I understand this correctly, a normal desktop user won't > > notice any real change, except that buildworld might get faster, > > and big servers will benefit? > > Basically, yes, but read on... > > > But could this negatively impact small, embedded systems, which > > usually have only small memory footprints? Although I suppose > > one could argue that they usually don't have large numbers of > > files cached in memory at any given time. > > Unless I'm wrong, the only pathological case coming from this change > would be the following sequence of events: > > 1) Memory is scarce [*] > 2) There's a sudden surge of requests for a huge number of different directories > 3) There's an urgent lowmem event which is observed by dirhash, which > attempts to free memory but is prevented in doing so for the next 60 > seconds because all entries are young (the idea behind dirhash being > that if a directory is accessed, it will probably soon be accessed > again - think "ls" then "fopen", so we won't evict him until > reclaimage seconds) > 4) the kernel runs out of memory, game over. Just to play devil's advocate, the only way your change can benefit is if: 1) Memory is scarce thus triggering a lowmem event 2) There are requests for a huge number of directories that haven't been accessed in over 5 seconds. That is to say, what your change does is increase the relative importance of dirhash memory relative to other memory in the machine when the machine is under memory pressure. If the machine is not under memory pressure then the lowmem handler will not be triggered and your change will never matter. Keep in mind that if pagedaemon is able to keep up, the lowmem event handler will not be called. This handler only triggers when you are really low on memory and trying to allocate it faster than pagedaemon can reclaim free pages. In that sort of environment you generally want caches to return pages sooner rather than later. What would perhaps be better than a hardcoded reclaim age would be to use an LRU-type approach and perhaps set a target percent to reclaim. That is, suppose you were to reclaim the oldest 10% of hashes on each lowmem call (and make the '10%' the tunable value). Then you will always make some amount of progress in a low memory situation (and if the situation remains dire you will eventually empty the entire cache), but the effective maximum age will be more dynamic. Right now if you haven't touched UFS in 5 seconds it throws the entire thing out on the first lowmem event. The LRU-approach would only throw the oldest 10% out on the first call, but eventually throw it all out if the situation remains dire. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Wed Sep 4 06:03:29 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 883E3827 for ; Wed, 4 Sep 2013 06:03:29 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id DA394230E for ; Wed, 4 Sep 2013 06:03:28 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA20707; Wed, 04 Sep 2013 09:03:21 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1VH6BV-0006BT-62; Wed, 04 Sep 2013 09:03:21 +0300 Message-ID: <5226CCEF.7090002@FreeBSD.org> Date: Wed, 04 Sep 2013 09:02:23 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130810 Thunderbird/17.0.8 MIME-Version: 1.0 To: Grant Gray Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> <5225AB77.9020208@FreeBSD.org> <5225BB8C.5050802@gray.id.au> In-Reply-To: <5225BB8C.5050802@gray.id.au> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, Grant Gray X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Sep 2013 06:03:29 -0000 on 03/09/2013 13:35 Grant Gray said the following: > On 3/09/2013 7:27 PM, Andriy Gapon wrote: >> arc_lowmem+0x38 kmem_malloc+0xb0 > Thanks for the feedback. Do you think it may be triggered when the ARC is > evicting pages because it is full, or a genuine low-memory case? The system has > 32GB of RAM, of which the ARC is typically about 24G (I think). Given the kmem_malloc -> arc_lowmem call chain it was a KVA shortage. Probably because of KVA fragmentation. Setting KVA size to a value larger than your physical memory size (1.5x or 2x) may work around this problem. The cost of the workaround is that some memory will be used for the additional page table pages. Some recent changes in head are supposed to help with the KVA fragmentation problem in general. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Sep 4 13:37:41 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 45634EB9 for ; Wed, 4 Sep 2013 13:37:41 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-1.mit.edu (dmz-mailsec-scanner-1.mit.edu [18.9.25.12]) by mx1.freebsd.org (Postfix) with ESMTP id DDFFF24B6 for ; Wed, 4 Sep 2013 13:37:40 +0000 (UTC) X-AuditID: 1209190c-b7fac8e000006335-49-522736703b70 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id 9F.B2.25397.07637225; Wed, 4 Sep 2013 09:32:32 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id r84DWV1T004656; Wed, 4 Sep 2013 09:32:32 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id r84DWT1e015181 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 4 Sep 2013 09:32:31 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id r84DWTAC014124; Wed, 4 Sep 2013 09:32:29 -0400 (EDT) Date: Wed, 4 Sep 2013 09:32:29 -0400 (EDT) From: Benjamin Kaduk To: Rick Macklem Subject: Re: fixing "umount -f" for the NFS client In-Reply-To: <1247162688.16775666.1378046517881.JavaMail.root@uoguelph.ca> Message-ID: References: <1247162688.16775666.1378046517881.JavaMail.root@uoguelph.ca> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrDIsWRmVeSWpSXmKPExsUixCmqrVtgph5k8GehusWxxz/ZLB4uu8bk wOQx49N8Fo/fm/cyBTBFcdmkpOZklqUW6dslcGWcnK1esIWrouHqVsYGxmkcXYycHBICJhIv m14zQthiEhfurWfrYuTiEBLYxyjx5MY+ZghnA6PE+1sLGCGcg0wSL271sYO0CAnUS/R3nmYF sVkEtCSmrIGIswmoSMx8s5ENxBYRUJfYvLqfGcRmBrIbmqaA1QgLGEnMX9EMtppTwEvi26X/ YHFeAUeJ2StuQc33lJh07whYjaiAjsTq/VNYIGoEJU7OfMICMdNS4tyf62wTGAVnIUnNQpJa wMi0ilE2JbdKNzcxM6c4NVm3ODkxLy+1SNdQLzezRC81pXQTIzhQJXl2ML45qHSIUYCDUYmH t9FYPUiINbGsuDL3EKMkB5OSKO95I6AQX1J+SmVGYnFGfFFpTmrxIUYJDmYlEd6lIkA53pTE yqrUonyYlDQHi5I479OnZwOFBNITS1KzU1MLUotgsjIcHEoSvHamQI2CRanpqRVpmTklCGkm Dk6Q4TxAwwNBaniLCxJzizPTIfKnGBWlxHmdQRICIImM0jy4XlgiecUoDvSKMEQ7DzAJwXW/ AhrMBDQ47bMqyOCSRISUVAPjLJeJ+7TM05jW6F9cmcDWdOejTr//3psbc8p+GbPH/2s/96Fn L8ca2Uf/yoV/Hkp54Ds3ZGuNWo2H4aXugsN35ZLlMyfLN151bRK7Zfy940qFpXDiyZ0dL3fs 99zm1Gw5yaJ8jn3e0qzsSVfFbGrs5m1wsoiSKuSae6KeJyDmrNyCyBUv77spsRRnJBpqMRcV JwIAhkFhzP8CAAA= Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Sep 2013 13:37:41 -0000 On Sun, 1 Sep 2013, Rick Macklem wrote: > Benjamin Kaduk wrote: >> On Fri, 30 Aug 2013, Rick Macklem wrote: >> >>> Kostik wrote: >>>> On Thu, Aug 29, 2013 at 07:43:34PM -0400, Rick Macklem wrote: >>>>>>> I assume I would also need to bump __FreeBSD_version (and maybe >>>>>>> VFS_VERSION?). >>>>>> I think you could avoid it. >>>>>> >>>>> Do you mean I don't need to bump __FreeBSD_version or VFS_VERSION >>>>> or both? >>>> I do not see much sense in bumping either of them. >>>> You might want to bump __FreeBSD_version when merging to stable. >> >> Please do bump __FreeBSD_version when merging to stable. I will not >> make >> much noise about -current at the moment, as I'm behind on tracking >> it. >> > Actually, I'm "on the fence" as to whether or not this one should be > MFC'd, due to the VFS ABI breakage. > > Since you (well, actually OpenAFS;-) are the main guy affected by VFS > ABI breakage these days, maybe you'd like to comment on this? > > Also, if anyone else has an opinion w.r.t. MFC'ng a patch that adds > a VFS op and, therefore, breaks the VFS ABI, please feel free to comment. Oops, this mail got lost. I think there are spare vfsops fields, so the MFC can be done in an ABI-compatible way. The new routine is for optional functionality, so it seems fine. -Ben From owner-freebsd-fs@FreeBSD.ORG Wed Sep 4 20:56:06 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7D9EAEDD for ; Wed, 4 Sep 2013 20:56:06 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 43E3D24C5 for ; Wed, 4 Sep 2013 20:56:05 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvgEAHudJ1KDaFve/2dsb2JhbABaDoN/gyi9D4ENgT50giQBAQUjVhsYAgINGQJZBogVqDSSBYEpjH6BBTQHgmmBNAORVZgGgmFbIIE1OQ X-IronPort-AV: E=Sophos;i="4.89,1023,1367985600"; d="scan'208";a="48652845" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 04 Sep 2013 16:55:59 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 46949B4064; Wed, 4 Sep 2013 16:55:59 -0400 (EDT) Date: Wed, 4 Sep 2013 16:55:59 -0400 (EDT) From: Rick Macklem To: Benjamin Kaduk Message-ID: <1345367028.18318718.1378328159276.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: fixing "umount -f" for the NFS client MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Sep 2013 20:56:06 -0000 Benjamin Kaduk wrote: > On Sun, 1 Sep 2013, Rick Macklem wrote: > > > Benjamin Kaduk wrote: > >> On Fri, 30 Aug 2013, Rick Macklem wrote: > >> > >>> Kostik wrote: > >>>> On Thu, Aug 29, 2013 at 07:43:34PM -0400, Rick Macklem wrote: > >>>>>>> I assume I would also need to bump __FreeBSD_version (and > >>>>>>> maybe > >>>>>>> VFS_VERSION?). > >>>>>> I think you could avoid it. > >>>>>> > >>>>> Do you mean I don't need to bump __FreeBSD_version or > >>>>> VFS_VERSION > >>>>> or both? > >>>> I do not see much sense in bumping either of them. > >>>> You might want to bump __FreeBSD_version when merging to stable. > >> > >> Please do bump __FreeBSD_version when merging to stable. I will > >> not > >> make > >> much noise about -current at the moment, as I'm behind on tracking > >> it. > >> > > Actually, I'm "on the fence" as to whether or not this one should > > be > > MFC'd, due to the VFS ABI breakage. > > > > Since you (well, actually OpenAFS;-) are the main guy affected by > > VFS > > ABI breakage these days, maybe you'd like to comment on this? > > > > Also, if anyone else has an opinion w.r.t. MFC'ng a patch that adds > > a VFS op and, therefore, breaks the VFS ABI, please feel free to > > comment. > > Oops, this mail got lost. > > I think there are spare vfsops fields, so the MFC can be done in an > ABI-compatible way. The new routine is for optional functionality, > so it > seems fine. > There are spares vfs ops in 10/current, but not in stable/9. An MFC will result in a VFS ABI change. (Since 10.0 hasn't been released yet, I didn't use one of the recently added spares.) rick > -Ben > From owner-freebsd-fs@FreeBSD.ORG Thu Sep 5 15:20:33 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 4166C8FF for ; Thu, 5 Sep 2013 15:20:33 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-7.mit.edu (dmz-mailsec-scanner-7.mit.edu [18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id DDA612808 for ; Thu, 5 Sep 2013 15:20:32 +0000 (UTC) X-AuditID: 12074424-b7f228e00000096b-88-5228a13a0392 Received: from mailhub-auth-2.mit.edu ( [18.7.62.36]) by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id D3.FC.02411.A31A8225; Thu, 5 Sep 2013 11:20:26 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-2.mit.edu (8.13.8/8.9.2) with ESMTP id r85FKPfF010778; Thu, 5 Sep 2013 11:20:25 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id r85FKNLf009965 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 5 Sep 2013 11:20:24 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id r85FKMGI017680; Thu, 5 Sep 2013 11:20:22 -0400 (EDT) Date: Thu, 5 Sep 2013 11:20:22 -0400 (EDT) From: Benjamin Kaduk To: Rick Macklem Subject: Re: fixing "umount -f" for the NFS client In-Reply-To: <1345367028.18318718.1378328159276.JavaMail.root@uoguelph.ca> Message-ID: References: <1345367028.18318718.1378328159276.JavaMail.root@uoguelph.ca> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrDIsWRmVeSWpSXmKPExsUixG6nomu1UCPIYN57QYtjj3+yWTxcdo3J gcljxqf5LB6/N+9lCmCK4rJJSc3JLEst0rdL4MpYdf4ha8Fmtoql/7YxNTD+Zeli5OSQEDCR mLrtHzOELSZx4d56ti5GLg4hgX2MEn1f+1khnA2MEvvurGCEcA4ySZw/sZ8JpEVIoF6i7fMq dhCbRUBLYm3PPlYQm01ARWLmm41sILaIgLrE5tX9YCuYgeyGpilg9cICRhLzVzQzgticAl4S K7duADqJg4NXwFHiz3pmiPGeEhPmNYGViwroSKzePwXsal4BQYmTM5+wQIy0lDj35zrbBEbB WUhSs5CkFjAyrWKUTcmt0s1NzMwpTk3WLU5OzMtLLdI118vNLNFLTSndxAgOVBeVHYzNh5QO MQpwMCrx8HL0aAQJsSaWFVfmHmKU5GBSEuWdORsoxJeUn1KZkVicEV9UmpNafIhRgoNZSYSX cT5QjjclsbIqtSgfJiXNwaIkzvvs6dlAIYH0xJLU7NTUgtQimKwMB4eSBO/MBUCNgkWp6akV aZk5JQhpJg5OkOE8QMPng9TwFhck5hZnpkPkTzEqSonzZoAkBEASGaV5cL2wRPKKURzoFWHe SpAqHmASgut+BTSYCWhw2mdVkMEliQgpqQZGBt36VVoXzsZteKgtePrMlCib7U4/bnnOmHXO 4ccta4v2a8YBJU6Vp5PzJye86dnG1Tjf9vD9x+zTLhx+YFljWFzVZjQhUqzYaf5L/h0L8nPC /RfaPLdbOHNZo/1ZtVsJZi+KdXv8Zsde5l3VtkTCP+3h5awtlYunfW80+506u+mteck0vrBk JZbijERDLeai4kQAZR0R9P8CAAA= Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Sep 2013 15:20:33 -0000 On Wed, 4 Sep 2013, Rick Macklem wrote: > Benjamin Kaduk wrote: >> >> I think there are spare vfsops fields, so the MFC can be done in an >> ABI-compatible way. The new routine is for optional functionality, >> so it >> seems fine. >> > There are spares vfs ops in 10/current, but not in stable/9. An MFC will > result in a VFS ABI change. (Since 10.0 hasn't been released yet, I didn't > use one of the recently added spares.) Oh, right, I was looking at 10/current. Unless there are pressing calls for the feature in the stable branches, it's probably best to hold off on the MFC, then. OpenAFS has encountered a few KBI incompatibilities over the years (mostly in the networking bits, if I remember correctly), and we can deal in the future, but not having to is nice. Thanks, Ben From owner-freebsd-fs@FreeBSD.ORG Thu Sep 5 23:05:51 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 1606316E; Thu, 5 Sep 2013 23:05:51 +0000 (UTC) (envelope-from grant@grantgray.id.au) Received: from mail.grantgray.id.au (aurora.evps.com.au [116.240.200.42]) by mx1.freebsd.org (Postfix) with ESMTP id A08BB26CB; Thu, 5 Sep 2013 23:05:49 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.grantgray.id.au (Postfix) with ESMTP id 9C41037BA59; Fri, 6 Sep 2013 09:05:40 +1000 (EST) X-Virus-Scanned: amavisd-new at mail.grantgray.id.au Received: from mail.grantgray.id.au ([127.0.0.1]) by localhost (mail.grantgray.id.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CyWQ4j5wwI7f; Fri, 6 Sep 2013 09:05:39 +1000 (EST) Received: from localhost.localdomain (c27-253-54-200.thoms4.vic.optusnet.com.au [27.253.54.200]) by mail.grantgray.id.au (Postfix) with ESMTPSA id B07AB37BA44; Fri, 6 Sep 2013 09:05:39 +1000 (EST) Message-ID: <52290E43.1090203@grantgray.id.au> Date: Fri, 06 Sep 2013 09:05:39 +1000 From: Grant Gray User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: ZFS livelock / deadlock on pure SSD pool References: <522599A9.9070107@grantgray.id.au> <5225AB77.9020208@FreeBSD.org> <5225BB8C.5050802@gray.id.au> <5226CCEF.7090002@FreeBSD.org> In-Reply-To: <5226CCEF.7090002@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 Sep 2013 23:05:51 -0000 On 09/04/2013 04:02 PM, Andriy Gapon wrote: > on 03/09/2013 13:35 Grant Gray said the following: >> On 3/09/2013 7:27 PM, Andriy Gapon wrote: >>> arc_lowmem+0x38 kmem_malloc+0xb0 >> Thanks for the feedback. Do you think it may be triggered when the ARC is >> evicting pages because it is full, or a genuine low-memory case? The system has >> 32GB of RAM, of which the ARC is typically about 24G (I think). > Given the kmem_malloc -> arc_lowmem call chain it was a KVA shortage. Probably > because of KVA fragmentation. > Setting KVA size to a value larger than your physical memory size (1.5x or 2x) > may work around this problem. The cost of the workaround is that some memory > will be used for the additional page table pages. > > Some recent changes in head are supposed to help with the KVA fragmentation > problem in general. I've had to revert the problem server to spinning disks as my customer can't bear any more downtime. I'm happy to test any proposed patches/workarounds on a non-production system. I have no idea how the FreeBSD allocator works. Does the suggested increase in KVA size merely defer the problem as it will still eventually run out of contiguous pages? PR has been submitted: http://www.freebsd.org/cgi/query-pr.cgi?pr=181791 From owner-freebsd-fs@FreeBSD.ORG Fri Sep 6 00:57:52 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id EBCCC681; Fri, 6 Sep 2013 00:57:52 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C15362B33; Fri, 6 Sep 2013 00:57:52 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r860vq44029545; Fri, 6 Sep 2013 00:57:52 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r860vqor029544; Fri, 6 Sep 2013 00:57:52 GMT (envelope-from linimon) Date: Fri, 6 Sep 2013 00:57:52 GMT Message-Id: <201309060057.r860vqor029544@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Subject: Re: kern/181834: [nfs] amd mounting NFS directories can drive a dead-lock [regression] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Sep 2013 00:57:53 -0000 Old Synopsis: amd mounting NFS directories can drive a dead-lock New Synopsis: [nfs] amd mounting NFS directories can drive a dead-lock [regression] Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Sep 6 00:57:34 UTC 2013 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=181834