From owner-freebsd-fs@FreeBSD.ORG Sun Jun 25 04:59:30 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA5B316A4A9; Sun, 25 Jun 2006 04:59:30 +0000 (UTC) (envelope-from user@dhp.com) Received: from shell.dhp.com (shell.dhp.com [199.245.105.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id ED4EB43D60; Sun, 25 Jun 2006 04:59:29 +0000 (GMT) (envelope-from user@dhp.com) Received: by shell.dhp.com (Postfix, from userid 896) id 5083F31311; Sun, 25 Jun 2006 00:59:29 -0400 (EDT) Date: Sun, 25 Jun 2006 00:59:29 -0400 (EDT) From: Ensel Sharon To: Robert Watson In-Reply-To: <20060624232457.D8526@fledge.watson.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org Subject: Re: 6.1 quota bugs cause adaptec 2820sa kernel to crash ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2006 04:59:30 -0000 On Sat, 24 Jun 2006, Robert Watson wrote: > > After loading, the system frequently (multi-daily) crashed with the error: > > > > Warning! Controller is no longer running! code=0xbcef0100 > > > > (after a page or so of aac0 timeout messages) > > > > So I disabled quotas on the system, and it has been completely stable ever > > since. > > > > ----- > > The above sounds a lot like a problem with {Adaptect driver, controller, > disks}, rather than a quota problem. So it might be that there's an I/O load > change with quotas running that triggers the problem. Alternatively, there's > a memory corruption bug (or the like) in quotas that corrupts data structures > for the adaptec driver, but hopefully not. I believe Scott Long follows this > list, but if you don't hear back in a bit, you might forward that description > to him and see if he has thoughts. In the past, he's maintained the Adaptec > drivers, but I'm not sure what his level of involvement with them is at this > point. Yes, that is why I was asking - is it reasonable that quotas causes this, or is quotas just "hard" and if I did something equally "hard" the system would also crash ? I can tell you this though - it is a busy fileserver with a lot of intensive rsyncs going on, and with quotas involved it crashes hourly, and as soon as you turn them off it never crashes. No other variables have changed. What quota-like activity, that doesn't involve quotas, and also won't destroy data (!) can I run to see if it crashes ? I am more than happy to use this system to test things that coud solve this problem for everyone - it is not the availability of this system that matters, it is just that the data must remain safe... thanks... From owner-freebsd-fs@FreeBSD.ORG Mon Jun 26 03:43:50 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 995B216AA29 for ; Mon, 26 Jun 2006 03:43:50 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 012FE450E9 for ; Mon, 26 Jun 2006 03:15:09 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.13.4/8.13.4) with ESMTP id k5Q2rD5u008155; Sun, 25 Jun 2006 20:53:20 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <449F4C1A.1000704@samsco.org> Date: Sun, 25 Jun 2006 20:53:14 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.12) Gecko/20051230 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Pedro Martelletto References: <20060622153504.GB835@static.protection.cx> In-Reply-To: <20060622153504.GB835@static.protection.cx> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=3.8 tests=ALL_TRUSTED autolearn=failed version=3.1.1 X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on pooker.samsco.org Cc: freebsd-fs@freebsd.org Subject: Re: plug memory leaks and fix nested loops in udf_find_partmaps() X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 03:43:50 -0000 Thanks for this, I'll look at committing it right now. Scott Pedro Martelletto wrote: > currently, there are two nested 'for' loops in udf_find_partmaps() which > use the same control variable (i), as well as memory leaks in two error > paths, which the following diff should fix. > > -p. > > Index: udf_vfsops.c > =================================================================== > RCS file: /home/ncvs/src/sys/fs/udf/udf_vfsops.c,v > retrieving revision 1.41 > diff -u -p -r1.41 udf_vfsops.c > --- udf_vfsops.c 26 May 2006 01:21:51 -0000 1.41 > +++ udf_vfsops.c 22 Jun 2006 15:08:25 -0000 > @@ -728,7 +728,7 @@ udf_find_partmaps(struct udf_mnt *udfmp, > struct regid *pmap_id; > struct buf *bp; > unsigned char regid_id[UDF_REGID_ID_SIZE + 1]; > - int i, ptype, psize, error; > + int i, k, ptype, psize, error; > > for (i = 0; i < le32toh(lvd->n_pm); i++) { > pmap = (union udf_pmap *)&lvd->maps[i * UDF_PMAP_SIZE]; > @@ -776,6 +776,7 @@ udf_find_partmaps(struct udf_mnt *udfmp, > brelse(bp); > printf("Failed to read Sparing Table at sector %d\n", > le32toh(pms->st_loc[0])); > + FREE(udfmp->s_table, M_UDFMOUNT); > return (error); > } > bcopy(bp->b_data, udfmp->s_table, le32toh(pms->st_size)); > @@ -783,15 +784,16 @@ udf_find_partmaps(struct udf_mnt *udfmp, > > if (udf_checktag(&udfmp->s_table->tag, 0)) { > printf("Invalid sparing table found\n"); > + FREE(udfmp->s_table, M_UDFMOUNT); > return (EINVAL); > } > > /* See how many valid entries there are here. The list is > * supposed to be sorted. 0xfffffff0 and higher are not valid > */ > - for (i = 0; i < le16toh(udfmp->s_table->rt_l); i++) { > - udfmp->s_table_entries = i; > - if (le32toh(udfmp->s_table->entries[i].org) >= > + for (k = 0; k < le16toh(udfmp->s_table->rt_l); k++) { > + udfmp->s_table_entries = k; > + if (le32toh(udfmp->s_table->entries[k].org) >= > 0xfffffff0) > break; > } > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Jun 26 11:46:12 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EE01216A4CC for ; Mon, 26 Jun 2006 11:46:12 +0000 (UTC) (envelope-from pedro@ambientworks.net) Received: from protection.cx (protection.cx [209.242.20.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4791943E6E for ; Mon, 26 Jun 2006 11:23:14 +0000 (GMT) (envelope-from pedro@ambientworks.net) Received: by protection.cx (Postfix, from userid 1001) id 80ECE4C; Mon, 26 Jun 2006 06:28:40 -0500 (CDT) Date: Mon, 26 Jun 2006 08:28:39 -0300 From: Pedro Martelletto To: freebsd-fs@freebsd.org Message-ID: <20060626112839.GA12741@static.protection.cx> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Subject: Fix parsing of UDF type 1 partition maps X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 11:46:13 -0000 In udf_find_partmaps(), when we find a type 1 partition map, we have to skip the actual type 1 length (6 bytes). With this diff, it is possible to correctly spot the VAT partition map in a couple of discs I've. Thanks for your attention, -p. Index: ecma167-udf.h =================================================================== RCS file: /home/ncvs/src/sys/fs/udf/ecma167-udf.h,v retrieving revision 1.6 diff -u -p -r1.6 ecma167-udf.h --- ecma167-udf.h 3 Feb 2006 15:25:52 -0000 1.6 +++ ecma167-udf.h 26 Jun 2006 10:45:12 -0000 @@ -201,8 +201,6 @@ struct logvol_desc { uint8_t maps[1]; } __packed; -#define UDF_PMAP_SIZE 64 - /* Type 1 Partition Map [3/10.7.2] */ struct part_map_1 { uint8_t type; @@ -211,6 +209,8 @@ struct part_map_1 { uint16_t part_num; } __packed; +#define UDF_PMAP_TYPE1_SIZE 6 + /* Type 2 Partition Map [3/10.7.3] */ struct part_map_2 { uint8_t type; @@ -218,6 +218,8 @@ struct part_map_2 { uint8_t part_id[62]; } __packed; +#define UDF_PMAP_TYPE2_SIZE 64 + /* Virtual Partition Map [UDF 2.01/2.2.8] */ struct part_map_virt { uint8_t type; @@ -245,7 +247,6 @@ struct part_map_spare { } __packed; union udf_pmap { - uint8_t data[UDF_PMAP_SIZE]; struct part_map_1 pm1; struct part_map_2 pm2; struct part_map_virt pmv; Index: udf_vfsops.c =================================================================== RCS file: /home/ncvs/src/sys/fs/udf/udf_vfsops.c,v retrieving revision 1.42 diff -u -p -r1.42 udf_vfsops.c --- udf_vfsops.c 26 Jun 2006 03:21:19 -0000 1.42 +++ udf_vfsops.c 26 Jun 2006 10:45:12 -0000 @@ -723,30 +723,31 @@ udf_vptofh (struct vnode *vp, struct fid static int udf_find_partmaps(struct udf_mnt *udfmp, struct logvol_desc *lvd) { - union udf_pmap *pmap; struct part_map_spare *pms; struct regid *pmap_id; struct buf *bp; unsigned char regid_id[UDF_REGID_ID_SIZE + 1]; int i, k, ptype, psize, error; + uint8_t *pmap = (uint8_t *) &lvd->maps[0]; for (i = 0; i < le32toh(lvd->n_pm); i++) { - pmap = (union udf_pmap *)&lvd->maps[i * UDF_PMAP_SIZE]; - ptype = pmap->data[0]; - psize = pmap->data[1]; + ptype = pmap[0]; + psize = pmap[1]; if (((ptype != 1) && (ptype != 2)) || - ((psize != UDF_PMAP_SIZE) && (psize != 6))) { + ((psize != UDF_PMAP_TYPE1_SIZE) && + (psize != UDF_PMAP_TYPE2_SIZE))) { printf("Invalid partition map found\n"); return (1); } if (ptype == 1) { /* Type 1 map. We don't care */ + pmap += UDF_PMAP_TYPE1_SIZE; continue; } /* Type 2 map. Gotta find out the details */ - pmap_id = (struct regid *)&pmap->data[4]; + pmap_id = (struct regid *)&pmap[4]; bzero(®id_id[0], UDF_REGID_ID_SIZE); bcopy(&pmap_id->id[0], ®id_id[0], UDF_REGID_ID_SIZE); @@ -756,7 +757,8 @@ udf_find_partmaps(struct udf_mnt *udfmp, return (1); } - pms = &pmap->pms; + pms = (struct part_map_spare *)pmap; + pmap += UDF_PMAP_TYPE2_SIZE; MALLOC(udfmp->s_table, struct udf_sparing_table *, le32toh(pms->st_size), M_UDFMOUNT, M_NOWAIT | M_ZERO); if (udfmp->s_table == NULL) From owner-freebsd-fs@FreeBSD.ORG Tue Jun 27 07:32:12 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 99F6616A403 for ; Tue, 27 Jun 2006 07:32:12 +0000 (UTC) (envelope-from leo.huang.gd@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.175]) by mx1.FreeBSD.org (Postfix) with ESMTP id C713644538 for ; Tue, 27 Jun 2006 07:32:11 +0000 (GMT) (envelope-from leo.huang.gd@gmail.com) Received: by ug-out-1314.google.com with SMTP id m3so1379354uge for ; Tue, 27 Jun 2006 00:32:10 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=ex+gua8DQ1dvhPN1prBg6919G0e12/+2ApXL7mxp+dZPq5bMGQYyZXERQQLAQzxWIRDlMQHgLAUhSRkpOv4k/0ccWA5f9EUamIhO1i6tOKr5Th209EHuPTyrIfLncfCuArmtFA+kd3QRsvg7pJX9jiFA+P5pClyAcZ7h5a1Q/3M= Received: by 10.67.24.13 with SMTP id b13mr5773864ugj; Tue, 27 Jun 2006 00:32:10 -0700 (PDT) Received: by 10.67.27.12 with HTTP; Tue, 27 Jun 2006 00:32:10 -0700 (PDT) Message-ID: Date: Tue, 27 Jun 2006 15:32:10 +0800 From: "Leo Huang" To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=GB2312; format=flowed Content-Transfer-Encoding: base64 Content-Disposition: inline Subject: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2006 07:32:12 -0000 SGksCgpJIGJlbmNobWFya2VkIE15U1FMIDQuMS4xOCBvbiBGcmVlQlNEIDYuMSBhbmQgRGViaWFu IDMuMSB1c2luZyBTdXBlcgpTbWFjayAxLjMgc29tZSBkYXlzIGFnby4KClRoZSBiZW5jaG1hcmsg dGFibGUgIGlzCkNSRUFURSBUQUJMRSBgQWNjb3VudGAgKAogIGBhaWRgIGludCgxMSkgTk9UIE5V TEwgYXV0b19pbmNyZW1lbnQsCiAgYG5hbWVgIGNoYXIoMjApIE5PVCBOVUxMIGRlZmF1bHQgJycs CiAgYGZsYWdgIGludCgxMSkgTk9UIE5VTEwgZGVmYXVsdCAnMCcsCiAgYHVpZGNvdW50YCBpbnQo MTEpIE5PVCBOVUxMIGRlZmF1bHQgJzAnLAogIGBiYWxhbmNlYCBpbnQoMTEpIE5PVCBOVUxMIGRl ZmF1bHQgJzAnLAogIGBwb2ludGAgaW50KDExKSBOT1QgTlVMTCBkZWZhdWx0ICcwJywKICBgYmxv Y2t0bWAgaW50KDExKSBOT1QgTlVMTCBkZWZhdWx0ICcwJywKICBgaXBudW1gIGludCgxMCkgdW5z aWduZWQgZGVmYXVsdCBOVUxMLAogIGBuZXdkYXRlYCBkYXRldGltZSBkZWZhdWx0IE5VTEwsCiAg UFJJTUFSWSBLRVkgIChgYWlkYCksCiAgVU5JUVVFIEtFWSBgbmFtZWAgKGBuYW1lYCkKKSBFTkdJ TkU9SW5ub0RCIERFRkFVTFQgQ0hBUlNFVD1sYXRpbjE7CgpBbmQgaXQgaGFzIDEsMDAwLDAwMCBy b3dzLgoKVGhlIFNRTCBzdGF0ZW1lbnQgaXMKdXBkYXRlIEFjY291bnQgc2V0IGJhbGFuY2U9IGJh bGFuY2UgKyAxIHdoZXJlIGFpZD0/OwoKVGhlIHJlc3VsdCBpcyBmb2xsb3dlZDoKT1MgICAgICAg ICAgIENsaWVudHMgICAgICAgIFJlc3VsdChxdWVyaWVzIHBlciBzZWNvbmQpICBUUFMoZ290IGZy b20gaW9zdGF0KQpGcmVlQlNENi4xICAgIDUwICAgICAgICAgICAgICAgNTE2LjEgICAgICAgICAg ICAgICAgICAgICAgICAgYWJvdXQgMjAwMApEZWJpYW4zLjEgICAgIDUwICAgICAgICAgICAgICAg NDkuOCAgICAgICAgICAgICAgICAgICAgICAgICAgYWJvdXQgMjAwCgpUaGUgcmVzdWx0IGlzIHN1 cnByaXNlIG1lLiBUaGUgTXlTUUwgUGVyZm9ybWFuY2Ugb24gRnJlZUJTRDYuMSBpcwphYm91dCAx MCB0aW1lcyBvZiBvbiBEZWJpYW4zLjGjrGFuZCB0aGUgb3V0cHV0IG9mIGlvc3RhdCBhbHNvIHNo b3dzIGl0LgoKSSBrbm93IHRoYXQgTXlTUUwgdXNlcyBmc3luYygpIHRvIGZsdXNoIGJvdGggdGhl IGRhdGEgYW5kIGxvZyBmaWxlcyBhdApkZWZhdWx0IHdoZW4gdXNpbmcgaW5ub2RiCmVuZ2luZSho dHRwOi8vZGV2Lm15c3FsLmNvbS9kb2MvcmVmbWFuLzQuMS9lbi9pbm5vZGItcGFyYW1ldGVycy5o dG1sKS4KT3VyIGV2YWx1YXRpbmcgY29tcHV0ZXIgb25seSBoYXMgYSAxMDAwMFJQTSBTQ1NJIGhh cmQgZGlzay4gSSB0aGluayBpdApjYW4gZG8gYWJvdXQgMjAwIHNlcXVlbnRpYWwgZnN5bmMoKSBj YWxscyBwZXIgc2Vjb25kIGlmIHRoZSBmc3luYygpIGlzCnJlYWwuCgpJcyB0aGUgZnN5bmMoKSBv biBGcmVlQlNENi4xIGZha2U/IEkgbWVhbiB0aGFuIHRoZSBkYXRhIGlzIG9ubHkKd3JpdHRlbiB0 byB0aGUgZHJpdmVzIG1lbW9yeSBhbmQgc28gY2FuIGJlIGxvc3QgaWYgcG93ZXIgZ29lcyBkb3du LgpBbmQgaG93IEkgY2FuIGNvbmZpcm0gdGhpcz8KCklmIHRoZSBmc3luYygpIGlzIGZha2UsIGhv dyBjYW4gSSBnZXQgdGhlIHJlYWwgZnN5bmM/CgpBbnkgY29tbWVudCBpcyB3ZWxjb21lIQoKUFM6 CjEuIE91ciBldmFsdWF0aW5nIGNvbXB1dGVyIGlzIERFTEwgUG93ZXJFZGdlIDE2NTCho0l0cyBo YXJkd2FyZQpjb25maWd1cmF0aW9uIGlzIGZvbGxvd2VkOgoJQ1BVOiAyICogSW50ZWwgUGVudGl1 bSBJSUkgMS4zM0dIeiA1MTJLQiBMZXZlbCAyIENhY2hlKHNtcCkKCU1lbW9yeTogMTAyNE1CIEVD QyBTRFJBTQoJSEQ6IFNFQUdBVEUgU1QzMzY3MDZMQ6OoMzZHQiBVbHRyYTE2MCBTQ1NJIDEwMDAw UlBNo6kKCU5JQwk6IEludGVsKFIpIFBSTy8xMDAwIE5ldHdvcmsgQ29ubmVjdGlvbgoJCjIuIFNv bWUgaW1wb3J0YW50IHBhcmFtZXRlcnMgaW4gTXlTUUwgY29uZmlndXJhdGlvbiBmaWxlIGFyZSBo ZXJlOgoJbG9nLWJpbgoJc3luY19iaW5sb2c9MQoJaW5ub2RiX3NhZmVfYmlubG9nCglpbm5vZGJf YnVmZmVyX3Bvb2xfc2l6ZSA9IDM4NE0KCWlubm9kYl9hZGRpdGlvbmFsX21lbV9wb29sX3NpemUg PSAyME0KCWlubm9kYl9sb2dfZmlsZV9zaXplID0gMTAwTQoJaW5ub2RiX2xvZ19idWZmZXJfc2l6 ZSA9IDhNCglpbm5vZGJfZmx1c2hfbG9nX2F0X3RyeF9jb21taXQgPSAxCglpbm5vZGJfbG9ja193 YWl0X3RpbWVvdXQgPSA1MAoKMy4gb24gRnJlZUJTRDYuMSB0aGUgL2V0Yy9mc3RhYiBpcyBmb2xs b3dlZDoKCSMgRGV2aWNlICAgICAgICAgICAgICAgIE1vdW50cG9pbnQgICAgICBGU3R5cGUgIE9w dGlvbnMgICAgICAgICBEdW1wICAgIFBhc3MjCgkvZGV2L2RhMHMxZyAgICAgICAgICAgICAvaG9t ZSAgICAgICAgICAgdWZzICAgICBydyAgICAgICAgICAgICAgMiAgICAgICAyCiAgIGFuZCB0aGUg b3V0cHV0cyBvZiB0dW5lZnMgaXM6CiAgIG15c3FsLXRlc3QtNCMgdHVuZWZzIC1wIC9ob21lCiAg IHR1bmVmczogQUNMczogKC1hKSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgZGlzYWJsZWQKICAgdHVuZWZzOiBNQUMgbXVsdGlsYWJlbDogKC1sKSAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBkaXNhYmxlZAogICB0dW5lZnM6IHNvZnQgdXBkYXRlczogKC1uKSAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGVuYWJsZWQKICAgdHVuZWZzOiBtYXhpbXVt IGJsb2NrcyBwZXIgZmlsZSBpbiBhIGN5bGluZGVyIGdyb3VwOiAoLWUpICAyMDQ4CiAgIHR1bmVm czogYXZlcmFnZSBmaWxlIHNpemU6ICgtZikgICAgICAgICAgICAgICAgICAgICAgICAgICAgMTYz ODQKICAgdHVuZWZzOiBhdmVyYWdlIG51bWJlciBvZiBmaWxlcyBpbiBhIGRpcmVjdG9yeTogKC1z KSAgICAgICA2NAogICB0dW5lZnM6IG1pbmltdW0gcGVyY2VudGFnZSBvZiBmcmVlIHNwYWNlOiAo LW0pICAgICAgICAgICAgIDglCiAgIHR1bmVmczogb3B0aW1pemF0aW9uIHByZWZlcmVuY2U6ICgt bykgICAgICAgICAgICAgICAgICAgICAgdGltZQogICB0dW5lZnM6IHZvbHVtZSBsYWJlbDogKC1M KQoKNC4gb24gRGViaWFuMy4xIHRoZSAvZXRjL2ZzdGFiIGlzIGZvbGxvd2VkOgogICAjIDxmaWxl IHN5c3RlbT4gPG1vdW50IHBvaW50PiAgIDx0eXBlPiAgPG9wdGlvbnM+ICAgICAgIDxkdW1wPiAg PHBhc3M+CiAgIC9kZXYvc2RhOSAgICAgICAvaG9tZSAgICAgICAgICAgZXh0MyAgICBkZWZhdWx0 cyAgICAgICAgMCAgICAgICAyCg== From owner-freebsd-fs@FreeBSD.ORG Tue Jun 27 14:47:18 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 730FE16A506; Tue, 27 Jun 2006 14:47:18 +0000 (UTC) (envelope-from daichi@freebsd.org) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.232.58]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7575E43D69; Tue, 27 Jun 2006 14:47:17 +0000 (GMT) (envelope-from daichi@freebsd.org) Received: from [192.168.1.101] (dullmdaler.ongs.co.jp [202.216.232.62]) by natial.ongs.co.jp (Postfix) with ESMTP id 45D66244C44; Tue, 27 Jun 2006 23:47:16 +0900 (JST) Message-ID: <44A144F1.1090909@freebsd.org> Date: Tue, 27 Jun 2006 23:47:13 +0900 From: Daichi GOTO User-Agent: Thunderbird 1.5.0.4 (X11/20060612) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, freebsd-fs@freebsd.org, rodrigc@crodrigues.org Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit Cc: daichi@freebsd.org, ozawa@ongs.co.jp Subject: [ANN] unionfs =?iso-2022-jp?b?cGF0Y2hzZXQtMRskQiM1GyhCIHJlbGVh?= =?iso-2022-jp?b?c2UsIGl0IGlzIHJlYWR5IGZvciB0aGUgbWVyZ2U=?= X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2006 14:47:18 -0000 Hi Guys! It is my pleasure and honor to announce the availability of the unionfs patchset-15. The p15 is an important milestone. It is ready for the merge. Patchset-15: For 7-current http://people.freebsd.org/~daichi/unionfs/unionfs-p15.diff For 6.x http://people.freebsd.org/~daichi/unionfs/unionfs6-p15.diff Changes in unionfs-p15.diff - deleted -r option in limbo and updated ount_unionfs.8 - changed source code style and naming rules for fitting to FreeBSD kernel src style The documents of those unionfs patches: http://people.freebsd.org/~daichi/unionfs/ (English) http://people.freebsd.org/~daichi/unionfs/index-ja.html (Japanese) We think that patchset-15 is ready for the merge to FreeBSD base system. It has extensive trials, good stability and source code quality enough to get the merge. I'll commit it to FreeBSD base system after my mentor's check. Thanks HEADS UP: to Mr. Rodrigues (rodrigc) Please check the unionfs patchset-15. After your mention, I'll do the commit work :) -- Daichi GOTO, http://people.freebsd.org/~daichi From owner-freebsd-fs@FreeBSD.ORG Tue Jun 27 21:17:49 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5D79D16A4D8 for ; Tue, 27 Jun 2006 21:17:49 +0000 (UTC) (envelope-from nuno.antunes@gmail.com) Received: from nz-out-0102.google.com (nz-out-0102.google.com [64.233.162.194]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5C5B144D33 for ; Tue, 27 Jun 2006 19:13:10 +0000 (GMT) (envelope-from nuno.antunes@gmail.com) Received: by nz-out-0102.google.com with SMTP id i1so2359230nzh for ; Tue, 27 Jun 2006 12:13:06 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=leihlzAruNFKtIDTPW5jFL6C5vHRw3lLNpR0GMcNI2jfaDrAit/VX+w1tPzVvmyBmUOFEBkRyq1P+aDJJhSvexzTAYIki7HKYwznFFmyKk1mV9Nb3VX/qrGtnJi7lmOrVrcsTup5GZS83RD09U4HBJsU34iBX3ei+sNed+8Jhnk= Received: by 10.37.13.37 with SMTP id q37mr47287nzi; Tue, 27 Jun 2006 12:13:05 -0700 (PDT) Received: by 10.36.5.6 with HTTP; Tue, 27 Jun 2006 12:13:05 -0700 (PDT) Message-ID: <262949390606271213q6c253aadm11277d3cf6e9ba83@mail.gmail.com> Date: Tue, 27 Jun 2006 20:13:05 +0100 From: "Nuno Antunes" To: "Daichi GOTO" In-Reply-To: <44A144F1.1090909@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44A144F1.1090909@freebsd.org> Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org, freebsd-current@freebsd.org, ozawa@ongs.co.jp Subject: =?iso-2022-jp?b?UmU6IFtBTk5dIHVuaW9uZnMgcGF0Y2hzZXQtMRskQiM1GyhC?= =?iso-2022-jp?b?IHJlbGVhc2UsIGl0IGlzIHJlYWR5IGZvciB0aGUgbWVyZ2U=?= X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2006 21:17:49 -0000 On 6/27/06, Daichi GOTO wrote: > Hi Guys! > > It is my pleasure and honor to announce the availability of > the unionfs patchset-15. The p15 is an important milestone. > It is ready for the merge. > > Patchset-15: > For 7-current > http://people.freebsd.org/~daichi/unionfs/unionfs-p15.diff > > For 6.x > http://people.freebsd.org/~daichi/unionfs/unionfs6-p15.diff > > Changes in unionfs-p15.diff > - deleted -r option in limbo and updated ount_unionfs.8 > - changed source code style and naming rules for fitting > to FreeBSD kernel src style > > The documents of those unionfs patches: > > http://people.freebsd.org/~daichi/unionfs/ (English) > http://people.freebsd.org/~daichi/unionfs/index-ja.html (Japanese) > > We think that patchset-15 is ready for the merge to FreeBSD base > system. It has extensive trials, good stability and source code > quality enough to get the merge. > > I'll commit it to FreeBSD base system after my mentor's check. > > Thanks > > > HEADS UP: to Mr. Rodrigues (rodrigc) > Please check the unionfs patchset-15. After your mention, I'll > do the commit work :) > > -- > Daichi GOTO, http://people.freebsd.org/~daichi > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > Many thanks for the excelent work! Your patchsets made it possible to start and run jails with unionfs mounted userland :-) Can't wait to see it commited. Thanks, Nuno From owner-freebsd-fs@FreeBSD.ORG Tue Jun 27 22:31:13 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3498F16A408 for ; Tue, 27 Jun 2006 22:31:13 +0000 (UTC) (envelope-from julian@elischer.org) Received: from a50.ironport.com (a50.ironport.com [63.251.108.112]) by mx1.FreeBSD.org (Postfix) with ESMTP id C743743DDB for ; Tue, 27 Jun 2006 22:31:11 +0000 (GMT) (envelope-from julian@elischer.org) Received: from unknown (HELO [10.251.17.220]) ([10.251.17.220]) by a50.ironport.com with ESMTP; 27 Jun 2006 15:31:11 -0700 Message-ID: <44A1B1AF.3050506@elischer.org> Date: Tue, 27 Jun 2006 15:31:11 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060414 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Leo Huang References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2006 22:31:13 -0000 Leo Huang wrote: > Hi, > > I benchmarked MySQL 4.1.18 on FreeBSD 6.1 and Debian 3.1 using Super > Smack 1.3 some days ago. > > The benchmark table is > CREATE TABLE `Account` ( > `aid` int(11) NOT NULL auto_increment, > `name` char(20) NOT NULL default '', > `flag` int(11) NOT NULL default '0', > `uidcount` int(11) NOT NULL default '0', > `balance` int(11) NOT NULL default '0', > `point` int(11) NOT NULL default '0', > `blocktm` int(11) NOT NULL default '0', > `ipnum` int(10) unsigned default NULL, > `newdate` datetime default NULL, > PRIMARY KEY (`aid`), > UNIQUE KEY `name` (`name`) > ) ENGINE=InnoDB DEFAULT CHARSET=latin1; > > And it has 1,000,000 rows. > > The SQL statement is > update Account set balance= balance + 1 where aid=?; > > The result is followed: > OS Clients Result(queries per second) TPS(got from iostat) > FreeBSD6.1 50 516.1 about 2000 > Debian3.1 50 49.8 about 200 > > The result is surprise me. The MySQL Performance on FreeBSD6.1 is > about 10 times of on Debian3.1,and the output of iostat also shows it. > > I know that MySQL uses fsync() to flush both the data and log files at > default when using innodb > engine(http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html). > Our evaluating computer only has a 10000RPM SCSI hard disk. I think it > can do about 200 sequential fsync() calls per second if the fsync() is > real. > > Is the fsync() on FreeBSD6.1 fake? I mean than the data is only > written to the drives memory and so can be lost if power goes down. > And how I can confirm this? > > If the fsync() is fake, how can I get the real fsync? fsync is certainly not fake. is there any chance that the log file or any other part of your infrastructure is not on real disk? e.g. a memory filesystem for /tmp? The disk layout may allow the disks to lie to the OS in some situations. (the disks may be set to cache writes.. there are utilities to check this sort of thing, e.g. camcontrol) You say that iostat gives you "the same info" what does that actually mean/show? > > Any comment is welcome! > > PS: > 1. Our evaluating computer is DELL PowerEdge 1650。Its hardware > configuration is followed: > CPU: 2 * Intel Pentium III 1.33GHz 512KB Level 2 Cache(smp) > Memory: 1024MB ECC SDRAM > HD: SEAGATE ST336706LC(36GB Ultra160 SCSI 10000RPM) > NIC : Intel(R) PRO/1000 Network Connection > > 2. Some important parameters in MySQL configuration file are here: > log-bin > sync_binlog=1 > innodb_safe_binlog > innodb_buffer_pool_size = 384M > innodb_additional_mem_pool_size = 20M > innodb_log_file_size = 100M > innodb_log_buffer_size = 8M > innodb_flush_log_at_trx_commit = 1 > innodb_lock_wait_timeout = 50 > > 3. on FreeBSD6.1 the /etc/fstab is followed: > # Device Mountpoint FStype Options Dump Pass# > /dev/da0s1g /home ufs rw 2 2 > and the outputs of tunefs is: > mysql-test-4# tunefs -p /home > tunefs: ACLs: (-a) disabled > tunefs: MAC multilabel: (-l) disabled > tunefs: soft updates: (-n) enabled > tunefs: maximum blocks per file in a cylinder group: (-e) 2048 > tunefs: average file size: (-f) 16384 > tunefs: average number of files in a directory: (-s) 64 > tunefs: minimum percentage of free space: (-m) 8% > tunefs: optimization preference: (-o) time > tunefs: volume label: (-L) > > 4. on Debian3.1 the /etc/fstab is followed: > # > /dev/sda9 /home ext3 defaults 0 2 > >------------------------------------------------------------------------ > >_______________________________________________ >freebsd-fs@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-fs >To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Tue Jun 27 23:04:06 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F0E5E16A403 for ; Tue, 27 Jun 2006 23:04:05 +0000 (UTC) (envelope-from ivoras@fer.hr) Received: from lara.cc.fer.hr (lara.cc.fer.hr [161.53.72.113]) by mx1.FreeBSD.org (Postfix) with ESMTP id 57F2944B7B for ; Tue, 27 Jun 2006 23:04:05 +0000 (GMT) (envelope-from ivoras@fer.hr) Received: from [127.0.0.1] (localhost.cc.fer.hr [127.0.0.1]) by lara.cc.fer.hr (8.13.6/8.13.4) with ESMTP id k5RN3t9s087390; Wed, 28 Jun 2006 01:03:57 +0200 (CEST) (envelope-from ivoras@fer.hr) Message-ID: <44A1B958.4030204@fer.hr> Date: Wed, 28 Jun 2006 01:03:52 +0200 From: Ivan Voras User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050921) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Leo Huang References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jun 2006 23:04:06 -0000 Leo Huang wrote: > The result is followed: > OS Clients Result(queries per second) TPS(got from > iostat) > FreeBSD6.1 50 516.1 about 2000 > Debian3.1 50 49.8 about 200 > > I know that MySQL uses fsync() to flush both the data and log files at I tried to see the effects from fsync() with this little program: #include #include #include #define BUF_SIZE 512 #define COUNT 50000 int main() { int fd; char buf[BUF_SIZE]; int i; fd = open("test.file", O_CREAT|O_TRUNC|O_WRONLY, 0600); if (fd < 0) { printf("cannot open\n"); exit(1); } for (i = 0; i < COUNT; i++) { if (write(fd, buf, BUF_SIZE) != BUF_SIZE) { printf("error writing\n"); exit(1); } if (fsync(fd) < 0) { printf("error in fsync\n"); exit(1); } } close(fd); unlink("test.file"); return 0; But I see strange results with iostat. It shows 16KB transactions, ~2900 tps and 46 MB/s. On the other hand, the program runs for ~36 seconds, which gives ~1390 tps (this is a single desktop drive). Since 36 seconds of 46MB/s would result in a file 1.6 GB in size, while it's clearly 50000*512=25MB, iostat is lying. I think it's a too valuable tool to be lying. For what it's worth, gstat is also lying in the same way. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 28 05:22:10 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2DC816A400 for ; Wed, 28 Jun 2006 05:22:10 +0000 (UTC) (envelope-from dennisolvany@gmail.com) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.228]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9B8E943D79 for ; Wed, 28 Jun 2006 05:22:03 +0000 (GMT) (envelope-from dennisolvany@gmail.com) Received: by wr-out-0506.google.com with SMTP id 50so1346269wri for ; Tue, 27 Jun 2006 22:22:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:user-agent:mime-version:to:subject:content-type:content-transfer-encoding; b=VuwpbZtlqfPX1Rj60txJDoI+5IFZclOillw97tW24I1G4RbMaSQTAtCOnvd+4bKNxUevyWME2qjnl8gcDDRpSFN3t8wNLZ4BUTbQ0k/B4eoyjqUZ1s9wCLXCzZEgljX90+ofDdbQGEtUx4hzH1ehXAohHAncxPRyflvDA5v1/hk= Received: by 10.54.62.8 with SMTP id k8mr478311wra; Tue, 27 Jun 2006 22:22:03 -0700 (PDT) Received: from ?195.16.87.34? ( [195.16.87.34]) by mx.gmail.com with ESMTP id 8sm1103824wra.2006.06.27.22.22.00; Tue, 27 Jun 2006 22:22:02 -0700 (PDT) Message-ID: <44A211F6.9090004@gmail.com> Date: Wed, 28 Jun 2006 00:21:58 -0500 From: Dennis Olvany User-Agent: Thunderbird 1.5 (X11/20060211) MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: smbfs.ko: no dns support X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jun 2006 05:22:11 -0000 It seems that the smbfs module only supports netbios name resolution and not dns or ip. Such support would prove invaluable. From owner-freebsd-fs@FreeBSD.ORG Wed Jun 28 14:44:02 2006 Return-Path: X-Original-To: freebsd-fs@FreeBSD.org Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A7C9A16AA32 for ; Wed, 28 Jun 2006 14:44:02 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id 688A0441BB for ; Wed, 28 Jun 2006 14:21:26 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout2.pacific.net.au (Postfix) with ESMTP id 514B77206E; Thu, 29 Jun 2006 00:21:23 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5SELJxX026464; Thu, 29 Jun 2006 00:21:21 +1000 Date: Thu, 29 Jun 2006 00:21:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ivan Voras In-Reply-To: <44A1B958.4030204@fer.hr> Message-ID: <20060628230439.M75051@delplex.bde.org> References: <44A1B958.4030204@fer.hr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jun 2006 14:44:02 -0000 On Wed, 28 Jun 2006, Ivan Voras wrote: > Leo Huang wrote: > >> The result is followed: >> OS Clients Result(queries per second) TPS(got from >> iostat) >> FreeBSD6.1 50 516.1 about 2000 Seems normal for drives that do write caching. >> Debian3.1 50 49.8 about 200 Seems to slow for disks that do write caching. Maybe Debian does something to force the drive to complete it's i/o, or just does a full sync() like someone mentioned Linux doing. >> I know that MySQL uses fsync() to flush both the data and log files at > > I tried to see the effects from fsync() with this little program: > > #include > #include > #include > > #define BUF_SIZE 512 > #define COUNT 50000 > > int main() { > int fd; > char buf[BUF_SIZE]; > int i; > > fd = open("test.file", O_CREAT|O_TRUNC|O_WRONLY, 0600); > if (fd < 0) { > printf("cannot open\n"); > exit(1); > } > > for (i = 0; i < COUNT; i++) { > if (write(fd, buf, BUF_SIZE) != BUF_SIZE) { > printf("error writing\n"); > exit(1); > } The results are much clearer with BUF_SIZE == 1 and COUNT <= fs_blocksize. Then the file system keeps writing the same block and inode, and drives with write caching are limited mainly by their software overhead. With a program similar to the above, I get the following times on a 7200 RPM ATA drive: COUNT = fs_blocksize = 8192 to regular file in /tmp (mount options: none, no soft updates) 7.76 seconds (iostat 500-3500 tps 4.5-7.7 KB/t) to /dev/null on a devfs-free system: 9.67 seconds (iostat 450-2200 tps 8.0 KB/t) to /dev/ttyv on a devfs-free system: 16.30 seconds (iostat 500-550 tps 8.0 KB/t) (yes, /dev/ttyv0 is slowest!) Er, the results were clear. In a previous run, with different mount options, (-async and maybe -noatime), and COUNT = 1000, I got 4000+ tps 4.5 KB/t consistently for the regular file. 4.5 is the average of 8 and 9 (which I thought was 1 8K data block and 1 1K inode block, but now think was 1 1K data block amd 1 8K inode block). Changing COUNT back to 1000 now gives a consistent 4.5KB/t but only about 500 tps. The variation on the block size is caused by 8192 being larger than 1000 -- the file usually consists of 1-7 fragments except at limits it is empty or 1 block. fsync()ing /dev/null and /dev/ttyv1 is apparently slow because I (or someone at my request) prematurely removed the hack for not syncing file times for device files. IN_LAZYMOD was supposed to make the hack unnecessary, but I never got around to making IN_LAZYMOD apply more generally. In -current, it only applies to device files that are not in devfs and on ffs without soft updates, but there are no such files so it never applies. In my kernel, it applies to all files but still only for atimes and not for soft updates. It is strange that fsync()ing /dev/null is slower than fsync()ing a regular file, and especially strange that fsync()ing /dev/ttyv1 is much slower than fsync()ing /dev/null. Both should be about twice as fast since only 1 block needs to be written (an inode block). > if (fsync(fd) < 0) { > printf("error in fsync\n"); > exit(1); > } > } > > close(fd); > unlink("test.file"); > > return 0; > > But I see strange results with iostat. It shows 16KB transactions, ~2900 tps > and 46 MB/s. On the other hand, the program runs for ~36 seconds, which gives > ~1390 tps (this is a single desktop drive). Since 36 seconds of 46MB/s would > result in a file 1.6 GB in size, while it's clearly 50000*512=25MB, iostat is > lying. This is because you fsync() every 512 bytes. The file system then writes a 16K inode block and a 16K data block, giving 64 times as much i/o as necessary. > I think it's a too valuable tool to be lying. For what it's worth, gstat is > also lying in the same way. iostat and gstat just report whatever is recorded by devstat(9). The recording is done at a fairly low level but not low enough to be correct. Recorders lie mainly for block sizes larger than 64K. E.g., geom claims that all (?) disk devices can handle block sizes up to MAXPHYS (128K) and then splits up i/o's into whatever sizes the disk devices drivers handle. Most disk devices drivers claim to handle DFLTPHYS (64K) whether or not the disk drive can handle that, and may further split up the i/o as necessary. This makes it hard to see the sizes that actually reach the hardware. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jun 28 14:50:28 2006 Return-Path: X-Original-To: freebsd-fs@FreeBSD.org Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8274B16A417 for ; Wed, 28 Jun 2006 14:50:28 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id CE38E43D70 for ; Wed, 28 Jun 2006 14:50:27 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.2.163]) by mailout2.pacific.net.au (Postfix) with ESMTP id A40D46E2DE; Thu, 29 Jun 2006 00:50:26 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5SEoNUZ009114; Thu, 29 Jun 2006 00:50:24 +1000 Date: Thu, 29 Jun 2006 00:50:23 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ivan Voras In-Reply-To: <20060628230439.M75051@delplex.bde.org> Message-ID: <20060629002637.L75228@delplex.bde.org> References: <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jun 2006 14:50:28 -0000 On Thu, 29 Jun 2006, I wrote: > ... > fsync()ing /dev/null and /dev/ttyv1 is apparently slow because I (or > someone at my request) prematurely removed the hack for not syncing > file times for device files. IN_LAZYMOD was supposed to make the > hack unnecessary, but I never got around to making IN_LAZYMOD apply > more generally. In -current, it only applies to device files that are However, it always applied all timestamps in device files in the non soft updates case, so removing the hack was correct and there is a bug that makes fsync() not honor IN_LAZYMOD. I tested the effectiveness of IN_LAZYMOD mainly for sync(). The bug is probably in my code. From my version of ffs_fsync(): %1 if (ap->a_waitfor == MNT_WAIT && %2 (vp->v_mount->mnt_flag & MNT_RDONLY) == 0) %3 VTOI(vp)->i_flag |= IN_MODIFIED; %4 VI_UNLOCK(vp); %5 splx(s); %6 return (UFS_UPDATE(vp, wait)); I added lines 1-3. This forces an inode write. Line 6 is dubious too. If there was a delayed write pending for the inode (which is normal after a write(2) to the file), then previous code in ffs_fsync() should have written the inode. Line 6 then risks dirtying the inode and writing it again. I think this happens in the following case: any activity on the file that causes a delayed write for the inode (e.g., a write(2)), followed by activity that just marks a timestamp for update. Then previous code in ffs_fsync() should write the inode, but ffs_update() will find that a timestamp is not up to date, set the timestamp, and write the inode again. The second write is often not very slow since it is a delayed write, but here we have wait != 0, so a a sync write is forced modulo bugs in the async mode case. I think the UFS_UPDATE() in the above should be at the beginning of ffs_fsync() instead of at the end, or in both places (an async one at the beginning to update the timestamps in the in-core inode so that the second sync one at the end is usually null). > not in devfs and on ffs without soft updates, but there are no such > files so it never applies. In my kernel, it applies to all files but > still only for atimes and not for soft updates. In my kernel, it applies for all timestamps on device files and to atimes on all files, except in the soft updates case. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Jun 28 21:06:23 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D46216A6F4 for ; Wed, 28 Jun 2006 21:06:22 +0000 (UTC) (envelope-from ivoras@fer.hr) Received: from lara.cc.fer.hr (lara.cc.fer.hr [161.53.72.113]) by mx1.FreeBSD.org (Postfix) with ESMTP id CAA6F4431D for ; Wed, 28 Jun 2006 20:05:28 +0000 (GMT) (envelope-from ivoras@fer.hr) Received: from [127.0.0.1] (localhost.cc.fer.hr [127.0.0.1]) by lara.cc.fer.hr (8.13.6/8.13.4) with ESMTP id k5SK5HKw095507 for ; Wed, 28 Jun 2006 22:05:22 +0200 (CEST) (envelope-from ivoras@fer.hr) Message-ID: <44A2E0FD.6060302@fer.hr> Date: Wed, 28 Jun 2006 22:05:17 +0200 From: Ivan Voras User-Agent: Mozilla Thunderbird 1.0.6 (X11/20050921) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> In-Reply-To: <20060628230439.M75051@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jun 2006 21:06:24 -0000 Bruce Evans wrote: >> But I see strange results with iostat. It shows 16KB transactions, >> ~2900 tps and 46 MB/s. On the other hand, the program runs for ~36 >> seconds, which gives ~1390 tps (this is a single desktop drive). Since >> 36 seconds of 46MB/s would result in a file 1.6 GB in size, while it's >> clearly 50000*512=25MB, iostat is lying. > > This is because you fsync() every 512 bytes. The file system then writes > a 16K inode block and a 16K data block, giving 64 times as much i/o as > necessary. Ok, so you're saying that it actually does 46MB/s, rewriting 16K FS blocks over and over? In that case, wouldn't all writes to the FS, especially with soft-updates be minimally 16K+16K? It doesn't look like it when I monitor a live server - there are 8KB and 4KB writes... maybe UFS fragments complicate the (ac)counting. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 29 02:20:35 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7E73216A410 for ; Thu, 29 Jun 2006 02:20:35 +0000 (UTC) (envelope-from leo.huang.gd@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.172]) by mx1.FreeBSD.org (Postfix) with ESMTP id BDDD644CF9 for ; Thu, 29 Jun 2006 02:20:34 +0000 (GMT) (envelope-from leo.huang.gd@gmail.com) Received: by ug-out-1314.google.com with SMTP id m3so98238uge for ; Wed, 28 Jun 2006 19:20:33 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=DdmCxdxx7OkIWpOr1FgYRcEdj1tKWC2wUcVN3H8IMasp57P3Z/CKSM98ZbOuaMuNv8IccFuXNjbnLEXpsJhzP7Wa3RnfKm7xORH5cUSh9cXE+5B830GvZS8LL7sowKFWx9jcYr2dvm+oWLMXAQekl8a5BL7ip8jLnEufZJ23ZnE= Received: by 10.67.26.7 with SMTP id d7mr1370842ugj; Wed, 28 Jun 2006 19:20:33 -0700 (PDT) Received: by 10.67.27.12 with HTTP; Wed, 28 Jun 2006 19:20:33 -0700 (PDT) Message-ID: Date: Thu, 29 Jun 2006 10:20:33 +0800 From: "Leo Huang" To: "Bruce Evans" In-Reply-To: <20060628230439.M75051@delplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 02:20:35 -0000 hi, > >> OS Clients Result(queries per second) TPS(got from > >> iostat) > >> FreeBSD6.1 50 516.1 about 2000 > > Seems normal for drives that do write caching. I disable the driver write caching as Bjorn Gronvall suggest, the result show that the TPS come down to about 200. So I think you and Bjorn Gronvall are right. It is the disk write caching make the TPS so high. > >> Debian3.1 50 49.8 about 200 > > Seems to slow for disks that do write caching. Maybe Debian does something > to force the drive to complete it's i/o, or just does a full sync() like > someone mentioned Linux doing. I use sginfo the find that the disk write caching is also enabled default. After the disk write caching is disabled, the TPS also come down from 200 to 110. This is really pullze me. Can you give me more infomation about it? regards, Leo Huang 2006/6/28, Bruce Evans : > On Wed, 28 Jun 2006, Ivan Voras wrote: > > > Leo Huang wrote: > > > >> The result is followed: > >> OS Clients Result(queries per second) TPS(got from > >> iostat) > >> FreeBSD6.1 50 516.1 about 2000 > > Seems normal for drives that do write caching. > > >> Debian3.1 50 49.8 about 200 > > Seems to slow for disks that do write caching. Maybe Debian does something > to force the drive to complete it's i/o, or just does a full sync() like > someone mentioned Linux doing. > > >> I know that MySQL uses fsync() to flush both the data and log files at > > > > I tried to see the effects from fsync() with this little program: > > > > #include > > #include > > #include > > > > #define BUF_SIZE 512 > > #define COUNT 50000 > > > > int main() { > > int fd; > > char buf[BUF_SIZE]; > > int i; > > > > fd = open("test.file", O_CREAT|O_TRUNC|O_WRONLY, 0600); > > if (fd < 0) { > > printf("cannot open\n"); > > exit(1); > > } > > > > for (i = 0; i < COUNT; i++) { > > if (write(fd, buf, BUF_SIZE) != BUF_SIZE) { > > printf("error writing\n"); > > exit(1); > > } > > The results are much clearer with BUF_SIZE == 1 and COUNT <= fs_blocksize. > Then the file system keeps writing the same block and inode, and drives > with write caching are limited mainly by their software overhead. With > a program similar to the above, I get the following times on a 7200 RPM > ATA drive: > > COUNT = fs_blocksize = 8192 to regular file in /tmp > (mount options: none, no soft updates) > 7.76 seconds (iostat 500-3500 tps 4.5-7.7 KB/t) > to /dev/null on a devfs-free system: > 9.67 seconds (iostat 450-2200 tps 8.0 KB/t) > to /dev/ttyv on a devfs-free system: > 16.30 seconds (iostat 500-550 tps 8.0 KB/t) (yes, /dev/ttyv0 is slowest!) > > Er, the results were clear. In a previous run, with different mount options, > (-async and maybe -noatime), and COUNT = 1000, I got 4000+ tps 4.5 KB/t > consistently for the regular file. 4.5 is the average of 8 and 9 (which I > thought was 1 8K data block and 1 1K inode block, but now think was 1 1K > data block amd 1 8K inode block). Changing COUNT back to 1000 now gives > a consistent 4.5KB/t but only about 500 tps. The variation on the block > size is caused by 8192 being larger than 1000 -- the file usually consists > of 1-7 fragments except at limits it is empty or 1 block. > > fsync()ing /dev/null and /dev/ttyv1 is apparently slow because I (or > someone at my request) prematurely removed the hack for not syncing > file times for device files. IN_LAZYMOD was supposed to make the > hack unnecessary, but I never got around to making IN_LAZYMOD apply > more generally. In -current, it only applies to device files that are > not in devfs and on ffs without soft updates, but there are no such > files so it never applies. In my kernel, it applies to all files but > still only for atimes and not for soft updates. > > It is strange that fsync()ing /dev/null is slower than fsync()ing > a regular file, and especially strange that fsync()ing /dev/ttyv1 > is much slower than fsync()ing /dev/null. Both should be about twice > as fast since only 1 block needs to be written (an inode block). > > > if (fsync(fd) < 0) { > > printf("error in fsync\n"); > > exit(1); > > } > > } > > > > close(fd); > > unlink("test.file"); > > > > return 0; > > > > But I see strange results with iostat. It shows 16KB transactions, ~2900 tps > > and 46 MB/s. On the other hand, the program runs for ~36 seconds, which gives > > ~1390 tps (this is a single desktop drive). Since 36 seconds of 46MB/s would > > result in a file 1.6 GB in size, while it's clearly 50000*512=25MB, iostat is > > lying. > > This is because you fsync() every 512 bytes. The file system then writes > a 16K inode block and a 16K data block, giving 64 times as much i/o as > necessary. > > > I think it's a too valuable tool to be lying. For what it's worth, gstat is > > also lying in the same way. > > iostat and gstat just report whatever is recorded by devstat(9). The > recording is done at a fairly low level but not low enough to be > correct. Recorders lie mainly for block sizes larger than 64K. E.g., > geom claims that all (?) disk devices can handle block sizes up to > MAXPHYS (128K) and then splits up i/o's into whatever sizes the disk > devices drivers handle. Most disk devices drivers claim to handle > DFLTPHYS (64K) whether or not the disk drive can handle that, and may > further split up the i/o as necessary. This makes it hard to see the > sizes that actually reach the hardware. > > Bruce > From owner-freebsd-fs@FreeBSD.ORG Thu Jun 29 05:31:29 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4C49316A492; Thu, 29 Jun 2006 05:31:29 +0000 (UTC) (envelope-from user@dhp.com) Received: from shell.dhp.com (shell.dhp.com [199.245.105.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3925E43DB2; Thu, 29 Jun 2006 05:31:26 +0000 (GMT) (envelope-from user@dhp.com) Received: by shell.dhp.com (Postfix, from userid 896) id 398CE31316; Thu, 29 Jun 2006 01:31:25 -0400 (EDT) Date: Thu, 29 Jun 2006 01:31:25 -0400 (EDT) From: Ensel Sharon To: Robert Watson In-Reply-To: <20060624232457.D8526@fledge.watson.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org Subject: Re: 6.1 quota bugs cause adaptec 2820sa kernel to crash ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 05:31:29 -0000 Robert, On Sat, 24 Jun 2006, Robert Watson wrote: > > After loading, the system frequently (multi-daily) crashed with the error: > > > > Warning! Controller is no longer running! code=0xbcef0100 > > > > (after a page or so of aac0 timeout messages) > > > > So I disabled quotas on the system, and it has been completely stable ever > > since. > > > > ----- > > The above sounds a lot like a problem with {Adaptect driver, controller, > disks}, rather than a quota problem. So it might be that there's an I/O load > change with quotas running that triggers the problem. Alternatively, there's > a memory corruption bug (or the like) in quotas that corrupts data structures > for the adaptec driver, but hopefully not. I believe Scott Long follows this > list, but if you don't hear back in a bit, you might forward that description > to him and see if he has thoughts. In the past, he's maintained the Adaptec > drivers, but I'm not sure what his level of involvement with them is at this > point. I have gotten no responses of any kind from -hackers or from -fs, or privately. I am currently trying to reproduce this on a totally different machine, to see if I can pin it down as a quota problem or an aac problem or a 2820sa problem. My efforts are ametuerish, I'm afraid, and I'd be happy to run any kind of tests, etc., for anyone that is better at this than I am. In the meantime, if _anyone_ has any insight into my original post in this thread (June 23) it would be much appreciated. Especially some details as to what the problems with quotas really are and how they are being fixed. Thanks. From owner-freebsd-fs@FreeBSD.ORG Thu Jun 29 10:43:42 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9075116A5B7 for ; Thu, 29 Jun 2006 10:43:42 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 36F95443B9 for ; Thu, 29 Jun 2006 10:10:53 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id 81C3E61FECC; Thu, 29 Jun 2006 20:10:51 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5TAAnoD012755; Thu, 29 Jun 2006 20:10:50 +1000 Date: Thu, 29 Jun 2006 20:10:49 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ivan Voras In-Reply-To: <44A2E0FD.6060302@fer.hr> Message-ID: <20060629195739.L77878@delplex.bde.org> References: <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> <44A2E0FD.6060302@fer.hr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 10:43:42 -0000 On Wed, 28 Jun 2006, Ivan Voras wrote: > Bruce Evans wrote: > >>> But I see strange results with iostat. It shows 16KB transactions, ~2900 >>> tps and 46 MB/s. On the other hand, the program runs for ~36 seconds, >>> which gives ~1390 tps (this is a single desktop drive). Since 36 seconds >>> of 46MB/s would result in a file 1.6 GB in size, while it's clearly >>> 50000*512=25MB, iostat is lying. >> >> This is because you fsync() every 512 bytes. The file system then writes >> a 16K inode block and a 16K data block, giving 64 times as much i/o as >> necessary. > > Ok, so you're saying that it actually does 46MB/s, rewriting 16K FS blocks > over and over? Yes. It's actually surprising that the speed is only 46MB/s if the drive caches the write. > In that case, wouldn't all writes to the FS, especially with soft-updates be > minimally 16K+16K? It doesn't look like it when I monitor a live server - > there are 8KB and 4KB writes... maybe UFS fragments complicate the > (ac)counting. Yes, the minimum i/o size is the fragment size, and the average file size is still probably smaller than 16K. However, for sequential writes to large files, most writes should be 64K+0K (DFLTPHYS+0K) or 128K+0K (MAXPHYS+0K) depending on how broken MAXPHYS vs MINPHYS is. Clustering combines 16K-blocks into either 64K or 128K-blocks, and at least without soft updates and without sync mounting or fsync(), inode updates are normally delayed longer than data updates. Bruce From owner-freebsd-fs@FreeBSD.ORG Thu Jun 29 10:47:08 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2B62D16A410 for ; Thu, 29 Jun 2006 10:47:08 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 844A843D69 for ; Thu, 29 Jun 2006 10:47:06 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout1.pacific.net.au (Postfix) with ESMTP id 7C3AD24D0A8; Thu, 29 Jun 2006 20:47:05 +1000 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3sarge1) with ESMTP id k5TAl3XM008341; Thu, 29 Jun 2006 20:47:04 +1000 Date: Thu, 29 Jun 2006 20:47:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Leo Huang In-Reply-To: Message-ID: <20060629201157.N77878@delplex.bde.org> References: <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 10:47:08 -0000 On Thu, 29 Jun 2006, Leo Huang wrote: >> >> OS Clients Result(queries per second) TPS(got from >> >> iostat) >> >> FreeBSD6.1 50 516.1 about 2000 >> >> Seems normal for drives that do write caching. > > I disable the driver write caching as Bjorn Gronvall suggest, the > result show that the TPS come down to about 200. So I think you and > Bjorn Gronvall are right. It is the disk write caching make the TPS so > high. > >> >> Debian3.1 50 49.8 about 200 >> >> Seems to slow for disks that do write caching. Maybe Debian does something >> to force the drive to complete it's i/o, or just does a full sync() like >> someone mentioned Linux doing. > > I use sginfo the find that the disk write caching is also enabled > default. After the disk write caching is disabled, the TPS also come > down from 200 to 110. This is really pullze me. Can you give me more > infomation about it? How did you disable "driver" write caching? I think both Bjorn and I meant the _drive_ write caching, and that is what you refer to as "disk" write caching. Only turn off the caching in the lowest layer (disk == drive). I wonder when all drives will have enough fast enough nonvolatile RAM for write caching to just work. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Jun 30 03:57:07 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D30E116A412 for ; Fri, 30 Jun 2006 03:57:07 +0000 (UTC) (envelope-from leo.huang.gd@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2106A43D49 for ; Fri, 30 Jun 2006 03:57:06 +0000 (GMT) (envelope-from leo.huang.gd@gmail.com) Received: by ug-out-1314.google.com with SMTP id m3so719798uge for ; Thu, 29 Jun 2006 20:57:05 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=X2I9SW3ndHF+Ab48OjBABxmvkyPXEl10W5ltk1RCQzqf6HL+LMJL6MCgvgUjqhbzyJCYx5eEXWaIe+nuX0zByp6zlTu1SEO2g9Vt9W/3VxR371VY0lHhPprcMLrCL/MAz8jXomSmG5KiBWKQTJ9Dp81t5BlMABM32oy773/TLhE= Received: by 10.67.101.8 with SMTP id d8mr2630211ugm; Thu, 29 Jun 2006 20:57:05 -0700 (PDT) Received: by 10.67.27.12 with HTTP; Thu, 29 Jun 2006 20:57:05 -0700 (PDT) Message-ID: Date: Fri, 30 Jun 2006 11:57:05 +0800 From: "Leo Huang" To: "Bruce Evans" In-Reply-To: <20060629201157.N77878@delplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <44A1B958.4030204@fer.hr> <20060628230439.M75051@delplex.bde.org> <20060629201157.N77878@delplex.bde.org> Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: Is the fsync() fake on FreeBSD6.1? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2006 03:57:07 -0000 hi, Bruce > How did you disable "driver" write caching? I think both Bjorn and I > meant the _drive_ write caching, and that is what you refer to as "disk" > write caching. Only turn off the caching in the lowest layer (disk == > drive). Yes, "disk" what I say means drive. On FreeBSD, I use camcontrol to disable the drive write caching. Enable: mysql-test-4# camcontrol modepage da0 -m8 IC: 0 ABPF: 0 CAP: 0 DISC: 1 SIZE: 0 WCE: 1 MF: 0 RCD: 0 Demand Retention Priority: 0 Write Retention Priority: 0 Disable Pre-fetch Transfer Length: 65535 Minimum Pre-fetch: 0 Maximum Pre-fetch: 0 Maximum Pre-fetch Ceiling: 65535 Disable: mysql-test-4# camcontrol modepage da0 -m8 IC: 0 ABPF: 0 CAP: 0 DISC: 1 SIZE: 0 WCE: 0 MF: 0 RCD: 0 Demand Retention Priority: 0 Write Retention Priority: 0 Disable Pre-fetch Transfer Length: 65535 Minimum Pre-fetch: 0 Maximum Pre-fetch: 0 Maximum Pre-fetch Ceiling: 65535 On Debian, I use sginfo to disable it. Enable: mysql-test-1:/home/huangjy# sginfo -c /dev/sda Caching mode page (0x8) ----------------------- Initiator Control 0 ABPF 0 CAP 0 DISC 1 SIZE 0 Write Cache Enabled 1 MF 0 Read Cache Disabled 0 Demand Read Retention Priority 0 Demand Write Retention Priority 0 Disable Pre-fetch Transfer Length 65535 Minimum Pre-fetch 0 Maximum Pre-fetch 0 Maximum Pre-fetch Ceiling 65535 FSW 1 LBCSS 0 DRA 0 Number of Cache Segments 8 Cache Segment size 0 Non-Cache Segment size 0 Disable: mysql-test-1:/home/huangjy# sginfo -c /dev/sda Caching mode page (0x8) ----------------------- Initiator Control 0 ABPF 0 CAP 0 DISC 1 SIZE 0 Write Cache Enabled 0 MF 0 Read Cache Disabled 0 Demand Read Retention Priority 0 Demand Write Retention Priority 0 Disable Pre-fetch Transfer Length 65535 Minimum Pre-fetch 0 Maximum Pre-fetch 0 Maximum Pre-fetch Ceiling 65535 FSW 1 LBCSS 0 DRA 0 Number of Cache Segments 8 Cache Segment size 0 Non-Cache Segment size 0 > I wonder when all drives will have enough fast enough nonvolatile RAM for > write caching to just work. Our test computer only have one scsi disk. So I think that the data in drive write caching will be lost when the power is off. Regards, Leo Huang 2006/6/29, Bruce Evans : > On Thu, 29 Jun 2006, Leo Huang wrote: > > >> >> OS Clients Result(queries per second) TPS(got from > >> >> iostat) > >> >> FreeBSD6.1 50 516.1 about 2000 > >> > >> Seems normal for drives that do write caching. > > > > I disable the driver write caching as Bjorn Gronvall suggest, the > > result show that the TPS come down to about 200. So I think you and > > Bjorn Gronvall are right. It is the disk write caching make the TPS so > > high. > > > >> >> Debian3.1 50 49.8 about 200 > >> > >> Seems to slow for disks that do write caching. Maybe Debian does something > >> to force the drive to complete it's i/o, or just does a full sync() like > >> someone mentioned Linux doing. > > > > I use sginfo the find that the disk write caching is also enabled > > default. After the disk write caching is disabled, the TPS also come > > down from 200 to 110. This is really pullze me. Can you give me more > > infomation about it? > > How did you disable "driver" write caching? I think both Bjorn and I > meant the _drive_ write caching, and that is what you refer to as "disk" > write caching. Only turn off the caching in the lowest layer (disk == > drive). > > I wonder when all drives will have enough fast enough nonvolatile RAM for > write caching to just work. > > Bruce > From owner-freebsd-fs@FreeBSD.ORG Fri Jun 30 09:28:58 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D98C616A412; Fri, 30 Jun 2006 09:28:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (ll-227.216.82.212.sovam.net.ua [212.82.216.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id 11D7F43D6B; Fri, 30 Jun 2006 09:28:42 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k5U9SbZZ025715 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Jun 2006 12:28:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k5U9SbU4058634; Fri, 30 Jun 2006 12:28:37 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k5U9SToY058633; Fri, 30 Jun 2006 12:28:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 30 Jun 2006 12:28:29 +0300 From: Kostik Belousov To: Mike Jakubik Message-ID: <20060630092829.GE1258@deviant.kiev.zoral.com.ua> References: <20060523181638.GC767@dimma.mow.oilspace.com> <6eb82e0605231120q37224c6r3b25982f556bed72@mail.gmail.com> <447366AD.30203@rogers.com> <44736E11.6060104@mkproductions.org> <20060523203521.GA48061@xor.obsecurity.org> <20060524062118.GA766@dimma.mow.oilspace.com> <447400BB.9060603@samsco.org> <4485C010.9040402@rogers.com> <20060606182234.GB72368@deviant.kiev.zoral.com.ua> <44A490E6.1000502@rogers.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6Vw0j8UKbyX0bfpA" Content-Disposition: inline In-Reply-To: <44A490E6.1000502@rogers.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.4 required=5.0 tests=ALL_TRUSTED, DNS_FROM_RFC_ABUSE,SPF_NEUTRAL autolearn=no version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on fw.zoral.com.ua Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, Dmitriy Kirhlarov Subject: md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2006 09:28:59 -0000 --6Vw0j8UKbyX0bfpA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 29, 2006 at 10:48:06PM -0400, Mike Jakubik wrote: > Konstantin Belousov wrote: > >On Tue, Jun 06, 2006 at 01:49:04PM -0400, Mike Jakubik wrote: > > =20 > >>Scott Long wrote: > >> =20 > >>>Dmitriy Kirhlarov wrote: > >>> > >>> =20 > >>>>Hi! > >>>> > >>>>On Tue, May 23, 2006 at 04:35:21PM -0400, Kris Kennaway wrote: > >>>> > >>>> > >>>> =20 > >>>>>>>>6.1-STABLE after 6.1-RELEASE is releases. So I think you may want > >>>>>>>> =20 > >>>>>If you use snapshots with your quotas, update to 6.1-STABLE. If you > >>>>> =20 > >>>>Sorry, guys. You are mean RELENG_6_1 or RELENG_6? > >>>> > >>>>WBR > >>>> =20 > >>>RELENG_6. However, the changes will likely make their way into=20 > >>>RELENG_6_1 in a few weeks as part of an errata update. > >>> > >>>Scott > >>> =20 > >>I have just done tests on 6.1-R and RELENG_6 as of yesterday evening.= =20 > >>Unfortunately both still lock up hard, no crash, just a frozen system. = I=20 > >>cant enter the KDB (ddb) via the console, but its unusable, as it wont= =20 > >>let me type in anything. There must be some other change in -CURRENT=20 > >>that fixes this, as -CURRENT did not freeze during my previous tests. > >> > >> > >>Just to confirm, here is the ID of ufs_quota.c on my RELENG_6 system: > >> > >>/usr/src/sys/ufs/ufs/ufs_quota.c: > >> $FreeBSD: src/sys/ufs/ufs/ufs_quota.c,v 1.74.2.4 2006/05/14=20 > >>00:23:27 tegge Exp $ > >> =20 > >The hangs are mostly related to snapshots. It would be better to > >update to the latest RELENG_6. > > > >Hangs on RELENG_6_1 is not so much interesting. For > >hanged RELENG_6 system, please do what described below and post > >the log of the ddb session. > > > >I'm not sure whether kbdmux was MFCed into RELENG_6 (AFAIR, yes). > >If you have it in your kernel, add the line > >hint.kbdmux.0.disabled=3D"1" > >into the /boot/device.hints to make ddb usable. > > > >After that, on the hang, enter ddb, and > >do ps and tr for all suspected processes. > >Better yet, add the following options to your kernel: > > > >options INVARIANTS > >options INVARIANT_SUPPORT > >options WITNESS > >options DEBUG_LOCKS > >options DEBUG_VFS_LOCKS > >options DIAGNOSTIC > > > >and, after hang, do in ddb > > > >show allpcpu > >show alllocks > >show lockedvnods > >ps > > > >For each process mentioned in show output, do where > >(for threaded processes, do thread ; where). > > > >BTW, it would be great to add this instructions to the FAQ. > > =20 >=20 > Well, i finally got around to setting up a serial console on this box,=20 > the following is the output from the debugger after the system stopped=20 > responding. Let me know if you need any more/different information, i=20 > also made the kernel changes you recommended. >=20 > FreeBSD 6.1-STABLE #1: Thu Jun 10 00:22:29 EDT 2006 >=20 > --- > KDB: enter: Line break on console > [thread pid 12 tid 100004 ] > Stopped at kdb_enter+0x30: leave =20 > db> ps > pid proc uid ppid pgrp flag stat wmesg wchan cmd > 552 c3622830 2 550 549 0004000 [SLPQ flswai 0xc0707c24][SLP] rm > 550 c3570830 2 549 549 0004000 [SLPQ wait 0xc3570830][SLP] sh > 549 c342ec48 2 548 549 0004000 [SLPQ wait 0xc342ec48][SLP] sh > 548 c3622624 0 422 422 0000000 [SLPQ piperd 0xc36027f8][SLP] cron > 547 c361f830 0 524 547 0004002 [SLPQ ufs 0xc3777c94][SLP] ls > 546 c36bc418 0 544 544 0004002 [SLPQ wdrain 0xc0707be4][SLP]=20 > fsck_4.2bsd > 544 c36bcc48 0 511 544 0004002 [SLPQ wait 0xc36bcc48][SLP] fsck > 524 c35e020c 0 522 524 0004002 [SLPQ wait 0xc35e020c][SLP] bash > 522 c3570c48 0 406 522 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd > 515 c36bc20c 0 0 0 0000204 [SLPQ wdrain 0xc0707be4][SLP] md0 > 511 c36bb624 0 500 511 0004002 [SLPQ wait 0xc36bb624][SLP] bash > 509 c3570418 65 1 509 0000100 [SLPQ select 0xc0707644][SLP]=20 > dhclient > 500 c361fa3c 0 406 500 0004100 [SLPQ flswai 0xc0707c24][SLP] sshd > 480 c342ea3c 0 1 256 0000000 [SLPQ select 0xc0707644][SLP]=20 > dhclient > 465 c361f624 0 1 465 0004002 [SLPQ ttyin 0xc342b010][SLP] getty > 464 c35e0c48 0 1 464 0004002 [SLPQ ttyin 0xc3429410][SLP] getty > 463 c356fa3c 0 1 463 0004002 [SLPQ ttyin 0xc3429810][SLP] getty > 462 c356f418 0 1 462 0004002 [SLPQ ttyin 0xc343f010][SLP] getty > 422 c342e624 0 1 422 0000000 [SLPQ nanslp 0xc06ba32c][SLP] cron > 416 c356f000 25 1 416 0000100 [SLPQ pause 0xc356f034][SLP]=20 > sendmail > 412 c356f624 0 1 412 0000100 [SLPQ select 0xc0707644][SLP]=20 > sendmail > 406 c35e0000 0 1 406 0000100 [SLPQ select 0xc0707644][SLP] sshd > 290 c361f20c 0 1 290 0000000 [SLPQ flswai 0xc0707c24][SLP]=20 > syslogd > 256 c3622418 0 1 256 0000000 [SLPQ select 0xc0707644][SLP] devd > 145 c356f830 0 1 145 0000000 [SLPQ pause 0xc356f864][SLP]=20 > adjkerntz > 38 c3378c48 0 0 0 0000204 [SLPQ - 0xd56f5cf8][SLP] schedcpu > 37 c342d000 0 0 0 0000204 [SLPQ sdflush 0xc070a3b4][SLP]=20 > softdepflush > 36 c342d20c 0 0 0 0000204 [SLPQ vlruwt 0xc342d20c][SLP] vnlru > 35 c342d418 0 0 0 0000204 [SLPQ ufs 0xc363c46c][SLP] syncer > 34 c342d624 0 0 0 0000204 [SLPQ wdrain 0xc0707be4][SLP]=20 > bufdaemon > 33 c342d830 0 0 0 000020c [SLPQ pgzero 0xc070b324][SLP]=20 > pagezero > 32 c342da3c 0 0 0 0000204 [SLPQ psleep 0xc070ae74][SLP]=20 > vmdaemon > 31 c342dc48 0 0 0 0000204 [SLPQ psleep 0xc070ae30][SLP]=20 > pagedaemon > 30 c342e000 0 0 0 0000204 [IWAIT] irq7: ppc0 > 29 c342e20c 0 0 0 0000204 [IWAIT] swi0: sio > 28 c342e418 0 0 0 0000204 [IWAIT] irq1: atkbd0 > 27 c3319624 0 0 0 0000204 [SLPQ - 0xc32c943c][SLP] fdc0 > 26 c3319830 0 0 0 0000204 [IWAIT] irq16: fxp0 > 25 c3319a3c 0 0 0 0000204 [SLPQ aifthd 0xc3319a3c][SLP]=20 > aac0aif > 24 c3319c48 0 0 0 0000204 [SLPQ idle 0xc32c8400][SLP]=20 > aic_recovery0 > 23 c3378000 0 0 0 0000204 [IWAIT] irq30: ahc0 > 22 c337820c 0 0 0 0000204 [SLPQ idle 0xc32c8400][SLP]=20 > aic_recovery0 > 21 c3378418 0 0 0 0000204 [IWAIT] irq9: acpi0 > 9 c3378624 0 0 0 0000204 [SLPQ - 0xc3321200][SLP] thread=20 > taskq > 20 c3378830 0 0 0 0000204 [IWAIT] swi6: + > 19 c3378a3c 0 0 0 0000204 [IWAIT] swi6: task queue > 8 c32da20c 0 0 0 0000204 [SLPQ - 0xc3321480][SLP] acpi_task2 > 7 c32da418 0 0 0 0000204 [SLPQ - 0xc3321480][SLP] acpi_task1 > 6 c32da624 0 0 0 0000204 [SLPQ - 0xc3321480][SLP] acpi_task0 > 5 c32da830 0 0 0 0000204 [SLPQ - 0xc3321500][SLP] kqueue=20 > taskq > 18 c32daa3c 0 0 0 0000204 [IWAIT] swi2: cambio > 17 c32dac48 0 0 0 0000204 [IWAIT] swi5: + > 16 c3319000 0 0 0 0000204 [SLPQ - 0xc06b6b60][SLP] yarrow > 4 c331920c 0 0 0 0000204 [SLPQ - 0xc06b7828][SLP] g_down > 3 c3319418 0 0 0 0000204 [SLPQ - 0xc06b7824][SLP] g_up > 2 c32d5000 0 0 0 0000204 [SLPQ - 0xc06b781c][SLP] g_event > 15 c32d520c 0 0 0 0000204 [IWAIT] swi1: net > 14 c32d5418 0 0 0 0000204 [IWAIT] swi3: vm > 13 c32d5624 0 0 0 000020c [IWAIT] swi4: clock sio > 12 c32d5830 0 0 0 000020c [CPU 0] idle: cpu0 > 11 c32d5a3c 0 0 0 000020c [CPU 1] idle: cpu1 > 1 c32d5c48 0 0 1 0004200 [SLPQ wait 0xc32d5c48][SLP] init > 10 c32da000 0 0 0 0000204 [SLPQ ktrace 0xc06b8258][SLP] ktra= ce > 0 c06b7920 0 0 0 0000200 [IWAIT] swapper > db> tr 524 > Tracing pid 524 tid 100057 td 0xc35e1d80 > sched_switch(c35e1d80,0,1,10a,73683eb3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,c35e020c) at mi_switch+0x2e6 > sleepq_switch(c35e020c,c06b9a00,1,c0661d3b,0) at sleepq_switch+0x112 > sleepq_wait_sig(c35e020c,0,c0663282,c8,100) at sleepq_wait_sig+0x25 > msleep(c35e020c,c35e0274,15c,c0667749,0) at msleep+0x326 > kern_wait(c35e1d80,ffffffff,dab4cc7c,6,0) at kern_wait+0x8bd > wait4(c35e1d80,dab4cd04,10,41d,4) at wait4+0x3c > syscall(3b,3b,3b,1,0) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip =3D 0x2829d273, esp =3D=20 > 0xbfbfe56c, ebp =3D 0xbfbfe588 --- > db> tr 544 > Tracing pid 544 tid 100090 td 0xc36c0000 > sched_switch(c36c0000,0,1,10a,753725b3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,c36bcc48) at mi_switch+0x2e6 > sleepq_switch(c36bcc48,c06b9a00,1,c0661d3b,0) at sleepq_switch+0x112 > sleepq_wait_sig(c36bcc48,0,c0663282,c8,100) at sleepq_wait_sig+0x25 > msleep(c36bcc48,c36bccb0,15c,c0667749,0) at msleep+0x326 > kern_wait(c36c0000,222,dabc3c7c,0,0) at kern_wait+0x8bd > wait4(c36c0000,dabc3d04,10,41d,4) at wait4+0x3c > syscall(3b,3b,3b,8050100,2) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip =3D 0x280d4273, esp =3D=20 > 0xbfbfe30c, ebp =3D 0xbfbfe328 --- > db> tr 511 > Tracing pid 511 tid 100080 td 0xc35e2c00 > sched_switch(c35e2c00,0,1,10a,ba5fc6b3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,c36bb624) at mi_switch+0x2e6 > sleepq_switch(c36bb624,c06b9a00,1,c0661d3b,0) at sleepq_switch+0x112 > sleepq_wait_sig(c36bb624,0,c0663282,c8,100) at sleepq_wait_sig+0x25 > msleep(c36bb624,c36bb68c,15c,c0667749,0) at msleep+0x326 > kern_wait(c35e2c00,ffffffff,dab67c7c,6,0) at kern_wait+0x8bd > wait4(c35e2c00,dab67d04,10,41d,4) at wait4+0x3c > syscall(3b,3b,bfbf003b,1,0) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (7, FreeBSD ELF32, wait4), eip =3D 0x2829d273, esp =3D=20 > 0xbfbfe88c, ebp =3D 0xbfbfe8a8 --- > db> show allpcpu > Current CPU: 0 >=20 > cpuid =3D 0 > curthread =3D 0xc32d6900: pid 12 "idle: cpu0" > curpcb =3D 0xd44dad90 > fpcurthread =3D none > idlethread =3D 0xc32d6900: pid 12 "idle: cpu0" > APIC ID =3D 1 > currentldt =3D 0x50 > spin locks held: >=20 > cpuid =3D 1 > curthread =3D 0xc32d6780: pid 11 "idle: cpu1" > curpcb =3D 0xd44d7d90 > fpcurthread =3D none > idlethread =3D 0xc32d6780: pid 11 "idle: cpu1" > APIC ID =3D 0 > currentldt =3D 0x50 > spin locks held: >=20 > db> show alllocks > db> show lockedvnods > Locked vnodes >=20 > 0xc35d76cc: tag syncer, type VNON > usecount 1, writecount 0, refcount 2 mountedhere 0 > flags () > lock type syncer: EXCL (count 1) by thread 0xc32dbc00 (pid 35)#0=20 > 0xc04d300c at lockmgr+0x5bc > #1 0xc0541a72 at vop_stdlock+0x32 > #2 0xc06419d4 at VOP_LOCK_APV+0xb4 > #3 0xc055b77c at vn_lock+0xec > #4 0xc054beb2 at sync_vnode+0x132 > #5 0xc054c1ff at sched_sync+0x26f > #6 0xc04c7851 at fork_exit+0xc1 > #7 0xc0614fac at fork_trampoline+0x8 >=20 >=20 > 0xc363c414: tag ufs, type VREG > usecount 1, writecount 1, refcount 1536 mountedhere 0 > flags () > v_object 0xc36c2210 ref 0 pages 52780 > lock type ufs: EXCL (count 1) by thread 0xc35e2480 (pid 515) with 1= =20 > pending#0 0xc04d300c at lockmgr+0x5bc > #1 0xc05b7ac6 at ffs_lock+0xa6 > #2 0xc06419d4 at VOP_LOCK_APV+0xb4 > #3 0xc055b77c at vn_lock+0xec > #4 0xc373e485 at mdstart_vnode+0xe5 > #5 0xc373ec5f at md_kthread+0x14f > #6 0xc04c7851 at fork_exit+0xc1 > #7 0xc0614fac at fork_trampoline+0x8 >=20 > ino 1515, on dev aacd0s1f >=20 > 0xc368f000: tag ufs, type VDIR > usecount 4, writecount 0, refcount 6 mountedhere 0 > flags () > v_object 0xc360318c ref 0 pages 1 > lock type ufs: EXCL (count 1) by thread 0xc3573d80 (pid 547)#0=20 > 0xc04d300c at lockmgr+0x5bc > #1 0xc05b7ac6 at ffs_lock+0xa6 > #2 0xc06419d4 at VOP_LOCK_APV+0xb4 > #3 0xc055b77c at vn_lock+0xec > #4 0xc0543ce6 at lookup+0xe6 > #5 0xc0543918 at namei+0x488 > #6 0xc055467f at kern_lstat+0x4f > #7 0xc05545ff at lstat+0x2f > #8 0xc062aea0 at syscall+0x300 > #9 0xc0614f9f at Xint0x80_syscall+0x1f >=20 > ino 3, on dev md0a >=20 > 0xc3777c3c: tag ufs, type VREG > usecount 1, writecount 0, refcount 239 mountedhere 0 > flags () > lock type ufs: EXCL (count 1) by thread 0xc35e2300 (pid 546) with 1= =20 > pending#0 0xc04d300c at lockmgr+0x5bc > #1 0xc0542cd6 at vfs_hash_insert+0x36 > #2 0xc05b657e at ffs_vget+0x1ce > #3 0xc05960c7 at ffs_valloc+0x137 > #4 0xc05c4d19 at ufs_makeinode+0x79 > #5 0xc05c1826 at ufs_create+0x36 > #6 0xc063f332 at VOP_CREATE_APV+0xd2 > #7 0xc05a015a at ffs_snapshot+0x33a > #8 0xc05b3cb1 at ffs_mount+0xa81 > #9 0xc05469fe at vfs_domount+0x6be > #10 0xc05460ea at vfs_donmount+0x47a > #11 0xc054906e at kernel_mount+0x7e > #12 0xc05b3ee4 at ffs_cmount+0x84 > #13 0xc0546326 at mount+0x1e6 > #14 0xc062aea0 at syscall+0x300 > #15 0xc0614f9f at Xint0x80_syscall+0x1f >=20 > ino 4, on dev md0a > db> where 35 > Tracing pid 35 tid 100030 td 0xc32dbc00 > sched_switch(c32dbc00,0,1,10a,3a53d433) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,1) at mi_switch+0x2e6 > sleepq_switch(c363c46c,0,c066587f,20c,d451ca24) at sleepq_switch+0x112 > sleepq_wait(c363c46c,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c363c46c,c06b96fc,50,c0669f9f,0) at msleep+0x335 > acquire(d451cad0,40,60000,b1,c32dbc00) at acquire+0x8e > lockmgr(c363c46c,2002,c363c4dc,c32dbc00,c363c4dc) at lockmgr+0x516 > ffs_lock(d451cb38,d451cb1c,c04d73fd,2002,c363c414) at ffs_lock+0xa6 > VOP_LOCK_APV(c06a5920,d451cb38,c0673d22,d451cb3c,c050ccb0) at=20 > VOP_LOCK_APV+0xb4 > vn_lock(c363c414,2002,c32dbc00,7a5,2002) at vn_lock+0xec > vget(c363c414,2002,c32dbc00,2fc,c3558c90) at vget+0xff > qsync(c3558c00,0,c0673039,47c,c0661d3b) at qsync+0x13d > ffs_sync(c3558c00,3,c32dbc00,c32dbc00,c3558c00) at ffs_sync+0x392 > sync_fsync(d451cca0,c0680e2f,c35d76cc,c35d76cc,c35d77d8) at sync_fsync+0x= 19e > VOP_FSYNC_APV(c069f540,d451cca0,c32dbc00,620,0) at VOP_FSYNC_APV+0xd2 > sync_vnode(c35d77d8,c32dbc00,c066c1b0,657,0) at sync_vnode+0x158 > sched_sync(0,d451cd38,c065fd98,31d,0) at sched_sync+0x26f > fork_exit(c054bf90,0,d451cd38) at fork_exit+0xc1 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip =3D 0, esp =3D 0xd451cd6c, ebp =3D 0 --- > db> where 515 > Tracing pid 515 tid 100085 td 0xc35e2480 > sched_switch(c35e2480,0,1,10a,f85dbeb3) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,1) at mi_switch+0x2e6 > sleepq_switch(c0707be4,0,c066587f,20c,dab58894) at sleepq_switch+0x112 > sleepq_wait(c0707be4,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c0707be4,c0707c00,44,c066a543,0) at msleep+0x335 > waitrunningbufspace(c363c520,cd895584,cd8955e4,4000,cd895584) at=20 > waitrunningbufspace+0x72 > bufwrite(cd895584,c05b8419,c06400ac,246,c06969c4) at bufwrite+0x1a1 > vfs_bio_awrite(cd895584,0,c06733a7,e4,c80000) at vfs_bio_awrite+0x29e > ffs_syncvnode(c363c414,2,c06a5920,dab58980,c0640922) at ffs_syncvnode+0x3= 52 > ffs_fsync(dab589bc,c0680e2f,c35e2480,c35e2480,cd81e4b8) at ffs_fsync+0x1c > VOP_FSYNC_APV(c06a5920,dab589bc,c066a527,37f,c363c414) at VOP_FSYNC_APV+0= xd2 > bdwrite(cd81e4b8,c3671948,cd851fa8,7fa,54c540) at bdwrite+0x12b > ffs_balloc_ufs2(c363c414,f6352000,2f,2000,c3655300) at=20 > ffs_balloc_ufs2+0x193e > ffs_write(dab58c78,c0680c61,0,0,0) at ffs_write+0x369 > VOP_WRITE_APV(c06a5920,dab58c78,c35e2480,1ea,c3558c00) at=20 > VOP_WRITE_APV+0x17c > mdstart_vnode(c36b2000,c37df39c,c373ff9e,2a3,0) at mdstart_vnode+0x126 > md_kthread(c36b2000,dab58d38,c065fd98,31d,c35e2480) at md_kthread+0x14f > fork_exit(c373eb10,c36b2000,dab58d38) at fork_exit+0xc1 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip =3D 0, esp =3D 0xdab58d6c, ebp =3D 0 --- > db> where 547 > Tracing pid 547 tid 100067 td 0xc3573d80 > sched_switch(c3573d80,0,1,10a,5da4c133) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,1) at mi_switch+0x2e6 > sleepq_switch(c3777c94,0,c066587f,20c,dab2e71c) at sleepq_switch+0x112 > sleepq_wait(c3777c94,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c3777c94,c06b8814,50,c0669f9f,0) at msleep+0x335 > acquire(dab2e7c8,40,60000,b1,c3573d80) at acquire+0x8e > lockmgr(c3777c94,2002,c3777d04,c3573d80,c3777d04) at lockmgr+0x516 > ffs_lock(dab2e830,dab2e814,c04d73fd,2002,c3777c3c) at ffs_lock+0xa6 > VOP_LOCK_APV(c06a5920,dab2e830,c066b6c8,dab2e834,c050ccb0) at=20 > VOP_LOCK_APV+0xb4 > vn_lock(c3777c3c,2002,c3573d80,7a5,2002) at vn_lock+0xec > vget(c3777c3c,2002,c3573d80,50,dab2ebc0) at vget+0xff > vfs_hash_get(c3429c00,4,2,c3573d80,dab2e98c) at vfs_hash_get+0xe2 > ffs_vget(c3429c00,4,2,dab2e98c,dab2e990) at ffs_vget+0x49 > ufs_lookup(dab2ea40,c0680949,c368f000,c368f000,dab2ebc0) at ufs_lookup+0x= bdf > VOP_CACHEDLOOKUP_APV(c06a5920,dab2ea40,dab2ebc0,c3573d80,c376f380) at=20 > VOP_CACHEDLOOKUP_APV+0xd2 > vfs_cache_lookup(dab2eaec,dab2eaec,0,c368f000,dab2ebc0) at=20 > vfs_cache_lookup+0xd0 > VOP_LOOKUP_APV(c06a5920,dab2eaec,c3573d80,3,1) at VOP_LOOKUP_APV+0xb4 > lookup(dab2eb98,0,c066b85d,b6,6b2) at lookup+0x528 > namei(dab2eb98,dab2ebe8,60,854,c3573d80) at namei+0x488 > kern_lstat(c3573d80,80524a8,0,dab2ec6c,dab2ec88) at kern_lstat+0x4f > lstat(c3573d80,dab2ed04,8,41d,2) at lstat+0x2f > syscall(3b,3b,3b,8052448,8052400) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (190, FreeBSD ELF32, lstat), eip =3D 0x28182613, esp =3D=20 > 0xbfbfe55c, ebp =3D 0xbfbfe5f8 --- > db> where 546 > Tracing pid 546 tid 100086 td 0xc35e2300 > sched_switch(c35e2300,0,1,10a,6af4f133) at sched_switch+0x190 > mi_switch(1,0,c066587f,1ba,2) at mi_switch+0x2e6 > sleepq_switch(c0707be4,0,c066587f,20c,dab5560c) at sleepq_switch+0x112 > sleepq_wait(c0707be4,0,c0663282,c8,0) at sleepq_wait+0x65 > msleep(c0707be4,c0707c00,44,c066a543,0) at msleep+0x335 > waitrunningbufspace(c3777d48,cd832380,c3527c00,c3527c00,c3580000) at=20 > waitrunningbufspace+0x72 > bufwrite(cd832380,0,dab55934,c05a043f,cd832380) at bufwrite+0x1a1 > bawrite(cd832380,4a030000,2b,4000,c3655300) at bawrite+0x6b > ffs_snapshot(c3429c00,c3572680,dab559a8,6c,1) at ffs_snapshot+0x61f > ffs_mount(c3429c00,c35e2300,c066ba58,331,c06c0ea0) at ffs_mount+0xa81 > vfs_domount(c35e2300,c354a680,c3391030,1211000,c3391230) at=20 > vfs_domount+0x6be > vfs_donmount(c35e2300,1211000,dab55bf4,c3779080,e) at vfs_donmount+0x47a > kernel_mount(c3391910,1211000,dab55c38,6c,805bb00) at kernel_mount+0x7e > ffs_cmount(c3391910,bfbfec70,1211000,c35e2300,c06a5600) at ffs_cmount+0x84 > mount(c35e2300,dab55d04,c067d6fd,3cb,4) at mount+0x1e6 > syscall(3b,3b,3b,bfbfec70,805dab8) at syscall+0x300 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (21, FreeBSD ELF32, mount), eip =3D 0x280cddb7, esp =3D=20 > 0xbfbfea0c, ebp =3D 0xbfbfed18 --- > db> >=20 First, I set the followup to the right mailing list. Second, I am really curious what you do. My understanding follows: you have set up vnode-backed md device (md0a) on sparce file, created ufs2 on it, mounted it with quotas, and run background fsck on that fs. At the same time, you did rm for the snapshot file created by fsck. Right ? Anyway, the problem seems to be not related to neither snapshots nor quotas. In your trace, process 35 (syncer) tries to sync the vnode 0xc363c414, that is inode 1515 on aacd0s1f, that is used for md0. That vnode is already locked by process 515 (md0 kthread). Process 515 is stuck in the wdrain state, waiting for buffers to be flushed. It seems that there is huge amount of dirty buffers going to be written to md0, caused by snapshotting the fs. As result, system deadlocks due to md0 hung waiting for buffer' runspace, that is occupied by pending write requests to md0. Do -fs@ readers agree with analysis ? I propose to set TDP_NORUNNINGBUF thread flag for both swap- and file- backed md threads to prevent such deadlocks. That i/o is already accounted for in the upper layer. Moreover, that already accounted requests do not really differ from requests (re)issued by md. Please, comment. --6Vw0j8UKbyX0bfpA Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD4DBQFEpO68C3+MBN1Mb4gRAlo+AKDhO2wjG289EAcx80RCaYc3zzGkvgCVG66T JuFCzZZM2kkdQGV0L5IRTQ== =4WVF -----END PGP SIGNATURE----- --6Vw0j8UKbyX0bfpA-- From owner-freebsd-fs@FreeBSD.ORG Fri Jun 30 10:40:34 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3932A16A403 for ; Fri, 30 Jun 2006 10:40:34 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id C4CA543D48 for ; Fri, 30 Jun 2006 10:40:33 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 4E79446C63; Fri, 30 Jun 2006 06:40:33 -0400 (EDT) Date: Fri, 30 Jun 2006 11:40:33 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ensel Sharon In-Reply-To: Message-ID: <20060630113926.X3964@fledge.watson.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: 6.1 quota bugs cause adaptec 2820sa kernel to crash ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2006 10:40:34 -0000 On Thu, 29 Jun 2006, Ensel Sharon wrote: > I have gotten no responses of any kind from -hackers or from -fs, or > privately. > > I am currently trying to reproduce this on a totally different machine, to > see if I can pin it down as a quota problem or an aac problem or a 2820sa > problem. My efforts are ametuerish, I'm afraid, and I'd be happy to run any > kind of tests, etc., for anyone that is better at this than I am. > > In the meantime, if _anyone_ has any insight into my original post in this > thread (June 23) it would be much appreciated. Especially some details as > to what the problems with quotas really are and how they are being fixed. Ensel, I chatted with Scott a day or two ago, and he told me he believes he has tracked down a bug in the adaptec device driver under high I/O load, and thinks it is possible that quotas are triggering slightly higher load resulting in the bug being exercised. He has a patch, which hopefully he has now sent to you? Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-fs@FreeBSD.ORG Fri Jun 30 13:41:57 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6A00C16A403; Fri, 30 Jun 2006 13:41:57 +0000 (UTC) (envelope-from user@dhp.com) Received: from shell.dhp.com (shell.dhp.com [199.245.105.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id 79A0B43D53; Fri, 30 Jun 2006 13:41:56 +0000 (GMT) (envelope-from user@dhp.com) Received: by shell.dhp.com (Postfix, from userid 896) id 2EC9F3131D; Fri, 30 Jun 2006 09:41:51 -0400 (EDT) Date: Fri, 30 Jun 2006 09:41:51 -0400 (EDT) From: Ensel Sharon To: Robert Watson In-Reply-To: <20060630113926.X3964@fledge.watson.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org Subject: Re: 6.1 quota bugs cause adaptec 2820sa kernel to crash ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2006 13:41:57 -0000 On Fri, 30 Jun 2006, Robert Watson wrote: > I chatted with Scott a day or two ago, and he told me he believes he has > tracked down a bug in the adaptec device driver under high I/O load, and > thinks it is possible that quotas are triggering slightly higher load > resulting in the bug being exercised. He has a patch, which hopefully he has > now sent to you? Got it. Many thanks! From owner-freebsd-fs@FreeBSD.ORG Fri Jun 30 18:31:21 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B9B7716A407; Fri, 30 Jun 2006 18:31:21 +0000 (UTC) (envelope-from mikej@rogers.com) Received: from H43.C18.B96.tor.eicat.ca (H43.C18.B96.tor.eicat.ca [66.96.18.43]) by mx1.FreeBSD.org (Postfix) with ESMTP id 03BA143D6B; Fri, 30 Jun 2006 18:31:20 +0000 (GMT) (envelope-from mikej@rogers.com) Received: from [172.16.0.200] (desktop.home.local [172.16.0.200]) by H43.C18.B96.tor.eicat.ca (Postfix) with ESMTP id 719801141A; Fri, 30 Jun 2006 14:31:05 -0400 (EDT) Message-ID: <44A56E0F.1070904@rogers.com> Date: Fri, 30 Jun 2006 14:31:43 -0400 From: Mike Jakubik User-Agent: Thunderbird 1.5.0.4 (Windows/20060516) MIME-Version: 1.0 To: Kostik Belousov References: <20060523181638.GC767@dimma.mow.oilspace.com> <6eb82e0605231120q37224c6r3b25982f556bed72@mail.gmail.com> <447366AD.30203@rogers.com> <44736E11.6060104@mkproductions.org> <20060523203521.GA48061@xor.obsecurity.org> <20060524062118.GA766@dimma.mow.oilspace.com> <447400BB.9060603@samsco.org> <4485C010.9040402@rogers.com> <20060606182234.GB72368@deviant.kiev.zoral.com.ua> <44A490E6.1000502@rogers.com> <20060630092829.GE1258@deviant.kiev.zoral.com.ua> In-Reply-To: <20060630092829.GE1258@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-SpamToaster-Information: This messages has been scanned by SpamToaster http://www.digitalprogression.ca X-SpamToaster: Found to be clean X-SpamToaster-SpamCheck: not spam, SpamAssassin (not cached, score=-2.491, required 3.5, ALL_TRUSTED -1.80, BAYES_00 -2.60, DNS_FROM_RFC_ABUSE 0.20, DNS_FROM_RFC_POST 1.71) X-SpamToaster-From: mikej@rogers.com X-Spam-Status: No Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2006 18:31:21 -0000 Kostik Belousov wrote: > First, I set the followup to the right mailing list. > > Second, I am really curious what you do. My understanding follows: you > have set up vnode-backed md device (md0a) on sparce file, created ufs2 > on it, mounted it with quotas, and run background fsck on that fs. At > the same time, you did rm for the snapshot file created by fsck. Right ? > This is the procedure i followed, while i have quota enabled, it was not set on the test filesystem. 1) dd if=/dev/zero of=/usr/bigfile bs=1024 seek=209715200 count=0 2) mdconfig -a -t vnode -f /usr/bigfile 3) bsdlabel -w md0 auto 4) newfs -U md0a 5) fsck -v /dev/md0a # ^C this after a second or so, this makes the FS dirty 6) mount /dev/md0a /mnt 7) fsck -v -B /dev/md0a in another window: 8) while true; do ls -al /mnt/.snap;sleep 1;done > Anyway, the problem seems to be not related to neither snapshots nor > quotas. In your trace, process 35 (syncer) tries to sync the vnode > 0xc363c414, that is inode 1515 on aacd0s1f, that is used for md0. That > vnode is already locked by process 515 (md0 kthread). Process 515 is > stuck in the wdrain state, waiting for buffers to be flushed. It seems > that there is huge amount of dirty buffers going to be written to md0, > caused by snapshotting the fs. As result, system deadlocks due to md0 > hung waiting for buffer' runspace, that is occupied by pending write > requests to md0. > > Do -fs@ readers agree with analysis ? > > I propose to set TDP_NORUNNINGBUF thread flag for both swap- and file- > backed md threads to prevent such deadlocks. That i/o is already > accounted for in the upper layer. Moreover, that already accounted > requests do not really differ from requests (re)issued by md. > > Please, comment. > FYI, -CURRENT passes this test without locking up, so the fix is already there somewhere. From owner-freebsd-fs@FreeBSD.ORG Sat Jul 1 03:49:30 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 778B216A407 for ; Sat, 1 Jul 2006 03:49:30 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from fw.zoral.com.ua (ll-227.216.82.212.sovam.net.ua [212.82.216.227]) by mx1.FreeBSD.org (Postfix) with ESMTP id 906E243D45 for ; Sat, 1 Jul 2006 03:49:29 +0000 (GMT) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by fw.zoral.com.ua (8.13.4/8.13.4) with ESMTP id k613nNeZ061742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 1 Jul 2006 06:49:23 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6) with ESMTP id k613nNJ8039022; Sat, 1 Jul 2006 06:49:23 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.13.6/8.13.6/Submit) id k613nN6M039021; Sat, 1 Jul 2006 06:49:23 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 1 Jul 2006 06:49:22 +0300 From: Kostik Belousov To: Mike Jakubik Message-ID: <20060701034922.GA37822@deviant.kiev.zoral.com.ua> References: <447366AD.30203@rogers.com> <44736E11.6060104@mkproductions.org> <20060523203521.GA48061@xor.obsecurity.org> <20060524062118.GA766@dimma.mow.oilspace.com> <447400BB.9060603@samsco.org> <4485C010.9040402@rogers.com> <20060606182234.GB72368@deviant.kiev.zoral.com.ua> <44A490E6.1000502@rogers.com> <20060630092829.GE1258@deviant.kiev.zoral.com.ua> <44A56E0F.1070904@rogers.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vkogqOf2sHV7VnPd" Content-Disposition: inline In-Reply-To: <44A56E0F.1070904@rogers.com> User-Agent: Mutt/1.4.2.1i X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on fw.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=0.4 required=5.0 tests=ALL_TRUSTED, DNS_FROM_RFC_ABUSE,SPF_NEUTRAL autolearn=no version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on fw.zoral.com.ua Cc: freebsd-fs@freebsd.org Subject: Re: md deadlocks on wdrain. Was: [Re: quota and snapshots in 6.1-RELEASE] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Jul 2006 03:49:30 -0000 --vkogqOf2sHV7VnPd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 30, 2006 at 02:31:43PM -0400, Mike Jakubik wrote: > Kostik Belousov wrote: > >Second, I am really curious what you do. My understanding follows: you > >have set up vnode-backed md device (md0a) on sparce file, created ufs2 > >on it, mounted it with quotas, and run background fsck on that fs. At > >the same time, you did rm for the snapshot file created by fsck. Right ? > > =20 >=20 > This is the procedure i followed, while i have quota enabled, it was not= =20 > set on the test filesystem. >=20 > 1) dd if=3D/dev/zero of=3D/usr/bigfile bs=3D1024 seek=3D209715200 count= =3D0 > 2) mdconfig -a -t vnode -f /usr/bigfile > 3) bsdlabel -w md0 auto > 4) newfs -U md0a > 5) fsck -v /dev/md0a # ^C this after a second or so, this makes the FS di= rty > 6) mount /dev/md0a /mnt > 7) fsck -v -B /dev/md0a >=20 > in another window: > 8) while true; do ls -al /mnt/.snap;sleep 1;done Thanks for description. >=20 > FYI, -CURRENT passes this test without locking up, so the fix is already= =20 > there somewhere. May be. May be not, and other issues just prevent complete exhausting of the buffer run space on CURRENT. Did you test it on CURRENT many times, or only once. The same question for STABLE - does it locks every time your do that ? Please, try this patch, and report the results. ? sys/dev/md/.arch-ids Index: sys/dev/md/md.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /usr/local/arch/ncvs/src/sys/dev/md/md.c,v retrieving revision 1.164 diff -u -r1.164 md.c --- sys/dev/md/md.c 28 Mar 2006 21:25:11 -0000 1.164 +++ sys/dev/md/md.c 1 Jul 2006 03:48:41 -0000 @@ -650,6 +650,8 @@ mtx_lock_spin(&sched_lock); sched_prio(curthread, PRIBIO); mtx_unlock_spin(&sched_lock); + if (sc->type =3D=3D MD_VNODE) + curthread->td_pflags |=3D TDP_NORUNNINGBUF; =20 for (;;) { mtx_lock(&sc->queue_mtx); --vkogqOf2sHV7VnPd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQFEpfDCC3+MBN1Mb4gRAs8/AKDd/PLJzPcT4/X+rJyW2wwOGmTKSACffhix 8rQyFB+fcMHKwEntmUgeYCU= =X2WK -----END PGP SIGNATURE----- --vkogqOf2sHV7VnPd--