From owner-freebsd-fs@FreeBSD.ORG Wed Jul 11 22:16:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CB64D1065686 for ; Wed, 11 Jul 2012 22:16:04 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id 877498FC08 for ; Wed, 11 Jul 2012 22:16:04 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1Sp5CU-0000J2-Ry for freebsd-fs@freebsd.org; Thu, 12 Jul 2012 00:16:02 +0200 Received: from cpe-188-129-83-64.dynamic.amis.hr ([188.129.83.64]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 12 Jul 2012 00:16:02 +0200 Received: from ivoras by cpe-188-129-83-64.dynamic.amis.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 12 Jul 2012 00:16:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Thu, 12 Jul 2012 00:15:45 +0200 Lines: 27 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: cpe-188-129-83-64.dynamic.amis.hr User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120614 Thunderbird/13.0.1 Subject: wdrain hang X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jul 2012 22:16:04 -0000 Hello, I started writing a tutorial on ggate and have encountered a bug I thought was solved long ago, but aparrently it was only worked around: http://ivoras.net/blog/tree/2012-07-06.writing-a-geom-gate-module-part-4.html The problem is that writing to a file system from within a ggate module (and a similar thing used to happen with md(4)) hangs when a certain amount of data gets in-flight. I think this happens when the amount of in-flight data from the upper layer (i.e. the file system sitting on top a ggate device) + the amount of data on the lower layer (the file system to which the userland ggate module writes) gets greater than hirunningspace, which somehow causes a deadlock in waitrunningbufspace(). I don't understand exactly how this deadlock happens, since it looks like one of the processes which does the writing (either the one writing to the ggate module or the ggate module itself) should probably hang in mtx_lock() but apparently both hang in the "wdrain" state. Can someone explain what happens here? So far this issue has been worked around by using O_DIRECT, but in the case of this tutorial I'm doing it's not possible, so I'm wondering if there is another workaround?