From owner-freebsd-arch@FreeBSD.ORG Sun Mar 6 00:14:52 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 071D016A4CE for ; Sun, 6 Mar 2005 00:14:52 +0000 (GMT) Received: from dale.uek.cas.cz (dale.uek.cas.cz [195.113.100.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 72C9043D1D for ; Sun, 6 Mar 2005 00:14:51 +0000 (GMT) (envelope-from freza@psi.cz) Received: from dale.uek.cas.cz (localhost.localdomain [127.0.0.1]) by nod32.uek.cas.cz (Postfix) with ESMTP id 28B09BACA for ; Sun, 6 Mar 2005 01:14:49 +0100 (CET) X-Virus-Scanner: This message was checked by NOD32 Antivirus system NOD32 for Linux Mail Server. For more information on NOD32 Antivirus System, please, visit our website: http://www.nod32.com/. Received: from psi.cz (psi.chello.upc.cz [62.245.107.235]) by dale.uek.cas.cz (Postfix) with SMTP id 90297BAC8 for ; Sun, 6 Mar 2005 01:14:44 +0100 (CET) Received: (qmail 5020 invoked by uid 1001); 6 Mar 2005 00:14:44 -0000 Date: Sun, 6 Mar 2005 01:14:44 +0100 From: Jachym Holecek To: Sam Cole Message-ID: <20050306001444.GA4999@bedna> References: <2b50dc97050305102439f6ffcb@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2b50dc97050305102439f6ffcb@mail.gmail.com> User-Agent: Mutt/1.3.28i X-Window-System: Not needed cc: freebsd-arch@FreeBSD.org Subject: Re: FreeBSD for Macs? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Jachym Holecek List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 00:14:52 -0000 Hi, > I'm sort of a novice about different types of processors and operating > systems other than Windows. Is there a version of FreeBSD that will > run on Apple's g4/g5 processors? Thanks! Don't know about FreeBSD, but there's NetBSD/macppc. See http://www.NetBSD.org http://www.NetBSD.org/Ports/macppc for details. Regards, -- Jachym Holecek From owner-freebsd-arch@FreeBSD.ORG Sun Mar 6 05:23:51 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 74DA616A4CE for ; Sun, 6 Mar 2005 05:23:51 +0000 (GMT) Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.205]) by mx1.FreeBSD.org (Postfix) with ESMTP id 06FED43D1F for ; Sun, 6 Mar 2005 05:23:51 +0000 (GMT) (envelope-from masjeloc@gmail.com) Received: by wproxy.gmail.com with SMTP id 69so980887wri for ; Sat, 05 Mar 2005 21:23:50 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=rRshHKjxCKtutyRZcdLlBqZNmbBPBQGigDbMnG67S12MRqPMeyXLeYkwYb6ueVcu7dVr5pitiRWMnlhK0J7Ff0x7ZPgE9UkHtZdk4UUIykBz1Yq0rHRYIykc/gihys2SKPB5P+/sInuC5QdzBTza9t3uN2mFLPCilqH89z6AWik= Received: by 10.54.84.1 with SMTP id h1mr11925wrb; Sat, 05 Mar 2005 21:23:50 -0800 (PST) Received: by 10.54.28.54 with HTTP; Sat, 5 Mar 2005 21:23:50 -0800 (PST) Message-ID: <2b50dc970503052123ad10ef8@mail.gmail.com> Date: Sat, 5 Mar 2005 20:23:50 -0900 From: Sam Cole To: Jachym Holecek In-Reply-To: <20050306001444.GA4999@bedna> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <2b50dc97050305102439f6ffcb@mail.gmail.com> <20050306001444.GA4999@bedna> cc: freebsd-arch@freebsd.org Subject: Re: FreeBSD for Macs? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Sam Cole List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 05:23:51 -0000 Thanks very much for all of your info!! Sam On Sun, 6 Mar 2005 01:14:44 +0100, Jachym Holecek wrote: > Hi, > > > I'm sort of a novice about different types of processors and operating > > systems other than Windows. Is there a version of FreeBSD that will > > run on Apple's g4/g5 processors? Thanks! > > Don't know about FreeBSD, but there's NetBSD/macppc. See > > http://www.NetBSD.org > http://www.NetBSD.org/Ports/macppc > > for details. > > Regards, > -- Jachym Holecek > From owner-freebsd-arch@FreeBSD.ORG Sun Mar 6 07:42:21 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B33116A4CE for ; Sun, 6 Mar 2005 07:42:21 +0000 (GMT) Received: from ylpvm01.prodigy.net (ylpvm01-ext.prodigy.net [207.115.57.32]) by mx1.FreeBSD.org (Postfix) with ESMTP id 03FE343D2F for ; Sun, 6 Mar 2005 07:42:21 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.0.5.51] (adsl-64-171-186-189.dsl.snfc21.pacbell.net [64.171.186.189])j267gJ0i023492 for ; Sun, 6 Mar 2005 02:42:19 -0500 Message-ID: <422AB45A.9040809@root.org> Date: Sat, 05 Mar 2005 23:42:18 -0800 From: Nate Lawson User-Agent: Mozilla Thunderbird 1.0RC1 (X11/20041205) X-Accept-Language: en-us, en MIME-Version: 1.0 To: arch@freebsd.org Content-Type: multipart/mixed; boundary="------------010002050708010508050009" Subject: patch: clean up msdosfs conversion routine X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 07:42:21 -0000 This is a multi-part message in MIME format. --------------010002050708010508050009 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit The attached patch optimizes the unix2win conversion routine. It uses 16 bit accesses instead of 8 bit and jumps to "out" once it hits the trailing NUL rather than drop through each loop. I'd like to make sure my use of the endian routines is correct, if someone can check this. Thanks, -- Nate --------------010002050708010508050009 Content-Type: text/plain; name="msd1.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="msd1.diff" Index: msdosfs_conv.c =================================================================== RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_conv.c,v retrieving revision 1.39 diff -u -r1.39 msdosfs_conv.c --- msdosfs_conv.c 8 Feb 2005 07:51:14 -0000 1.39 +++ msdosfs_conv.c 2 Mar 2005 17:48:52 -0000 @@ -52,6 +52,7 @@ * System include files. */ #include +#include #include #include /* defines tz */ #include @@ -708,9 +711,8 @@ int chksum; struct msdosfsmount *pmp; { - u_int8_t *wcp; - int i, end; - u_int16_t code; + u_int16_t *wcp; + int end, i; /* * Drop trailing blanks and dots @@ -726,7 +728,7 @@ /* * Initialize winentry to some useful default */ - for (wcp = (u_int8_t *)wep, i = sizeof(*wep); --i >= 0; *wcp++ = 0xff); + memset(wep, 0xff, sizeof(*wep)); wep->weCnt = cnt; wep->weAttributes = ATTR_WIN95; wep->weReserved1 = 0; @@ -737,29 +739,34 @@ * Now convert the filename parts */ end = 0; - for (wcp = wep->wePart1, i = sizeof(wep->wePart1)/2; --i >= 0 && !end;) { - code = unix2winchr(&un, &unlen, 0, pmp); - *wcp++ = code; - *wcp++ = code >> 8; - if (!code) + wcp = (uint16_t *)wep->wePart1; + for (i = sizeof(wep->wePart1)/2; --i >= 0;) { + *wcp = htole16(unix2winchr(&un, &unlen, 0, pmp)); + if (*wcp++ == 0) { end = WIN_LAST; + goto out; + } } - for (wcp = wep->wePart2, i = sizeof(wep->wePart2)/2; --i >= 0 && !end;) { - code = unix2winchr(&un, &unlen, 0, pmp); - *wcp++ = code; - *wcp++ = code >> 8; - if (!code) + wcp = (uint16_t *)wep->wePart2; + for (i = sizeof(wep->wePart2)/2; --i >= 0;) { + *wcp = htole16(unix2winchr(&un, &unlen, 0, pmp)); + if (*wcp++ == 0) { end = WIN_LAST; + goto out; + } } - for (wcp = wep->wePart3, i = sizeof(wep->wePart3)/2; --i >= 0 && !end;) { - code = unix2winchr(&un, &unlen, 0, pmp); - *wcp++ = code; - *wcp++ = code >> 8; - if (!code) + wcp = (uint16_t *)wep->wePart3; + for (i = sizeof(wep->wePart3)/2; --i >= 0;) { + *wcp = htole16(unix2winchr(&un, &unlen, 0, pmp)); + if (*wcp++ == 0) { end = WIN_LAST; + goto out; + } } if (*un == '\0') end = WIN_LAST; + +out: wep->weCnt |= end; return !end; } --------------010002050708010508050009-- From owner-freebsd-arch@FreeBSD.ORG Sun Mar 6 07:51:50 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A973116A4CE for ; Sun, 6 Mar 2005 07:51:50 +0000 (GMT) Received: from ylpvm43.prodigy.net (ylpvm43-ext.prodigy.net [207.115.57.74]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2B7D143D39 for ; Sun, 6 Mar 2005 07:51:50 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.0.5.51] (adsl-64-171-186-189.dsl.snfc21.pacbell.net [64.171.186.189])j267pnoo032532 for ; Sun, 6 Mar 2005 02:51:50 -0500 Message-ID: <422AB693.9040600@root.org> Date: Sat, 05 Mar 2005 23:51:47 -0800 From: Nate Lawson User-Agent: Mozilla Thunderbird 1.0RC1 (X11/20041205) X-Accept-Language: en-us, en MIME-Version: 1.0 To: arch@freebsd.org Content-Type: multipart/mixed; boundary="------------030500080703020202000908" Subject: patch: optimize mbnambuf routines in msdosfs X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 07:51:50 -0000 This is a multi-part message in MIME format. --------------030500080703020202000908 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit In my profiling, I found mbnambuf_* were the top 4 CPU hogs on my system when doing directory IO on an msdosfs partition. After examination, no wonder: repeated malloc/free, excessive strcpy, etc. The mbnambuf is indexed by the windows ID, a sequential one-based value. The attached patch optimizes this routine with this in mind, using the ID to index into a single array and concatenating each WIN_CHARS chunk at once. (The last chunk is variable-length.) The part I'd like review for especially is the struct dirent semantics. The sys/dirent.h says that the names should always be null-terminated to a multiple of 4 bytes. The original code did not do this and neither does mine. (It always null terminates, but the result can be an odd number of bytes long.) This code has been tested as working on an FS with dificult filename sizes (255, 13, 26, etc.) It gives a whopping 77.1% decrease in profiled time (total across all functions) and a 73.7% decrease in wall time. Test was "ls -laR > /dev/null" Individual performance gains are below: mbnambuf_init: -90.7% mbnambuf_write: -18.7% mbnambuf_flush: -67.1% -- Nate --------------030500080703020202000908 Content-Type: text/plain; name="msd2.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="msd2.diff" Index: msdosfs_conv.c =================================================================== RCS file: /home/ncvs/src/sys/fs/msdosfs/msdosfs_conv.c,v retrieving revision 1.39 diff -u -r1.39 msdosfs_conv.c --- msdosfs_conv.c 8 Feb 2005 07:51:14 -0000 1.39 +++ msdosfs_conv.c 2 Mar 2005 17:48:52 -0000 @@ -101,11 +102,13 @@ static u_int16_t win2unixchr(u_int16_t, struct msdosfsmount *); static u_int16_t unix2winchr(const u_char **, size_t *, int, struct msdosfsmount *); +#if 0 struct mbnambuf { char * p; size_t n; }; static struct mbnambuf subent[WIN_MAXSUBENTRIES]; +#endif /* * Convert the unix version of time to dos's idea of time to be used in @@ -1194,6 +1198,7 @@ return (wc); } +#if 0 /* * Make subent empty */ @@ -1251,3 +1257,55 @@ mbnambuf_init(); return (dp->d_name); } +#else + +static char *nambuf_ptr; +static size_t nambuf_len; +static int nambuf_max_id; + +void +mbnambuf_init(void) +{ + + if (nambuf_ptr == NULL) { + nambuf_ptr = malloc(MAXNAMLEN + 1, M_MSDOSFSMNT, M_WAITOK); + nambuf_ptr[MAXNAMLEN] = '\0'; + } + nambuf_len = 0; + nambuf_max_id = -1; +} + +void +mbnambuf_write(char *name, int id) +{ + size_t count; + + count = WIN_CHARS; + if (id > nambuf_max_id) { + count = strlen(name); + nambuf_len = id * WIN_CHARS + count; + if (nambuf_len > MAXNAMLEN) { + printf("msdosfs: file name %d too long\n", nambuf_len); + return; + } + nambuf_max_id = id; + } + memcpy(nambuf_ptr + (id * WIN_CHARS), name, count); +} + +char * +mbnambuf_flush(struct dirent *dp) +{ + + if (nambuf_len > sizeof(dp->d_name) - 1) { + mbnambuf_init(); + return (NULL); + } + nambuf_ptr[nambuf_len] = '\0'; + memcpy(dp->d_name, nambuf_ptr, nambuf_len + 1); + dp->d_namlen = nambuf_len; + + mbnambuf_init(); + return (dp->d_name); +} +#endif --------------030500080703020202000908-- From owner-freebsd-arch@FreeBSD.ORG Sun Mar 6 13:34:04 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7658C16A4CE for ; Sun, 6 Mar 2005 13:34:04 +0000 (GMT) Received: from node15.coopprint.com (node15.cooperativeprinting.com [208.4.77.15]) by mx1.FreeBSD.org (Postfix) with SMTP id E5F5043D58 for ; Sun, 6 Mar 2005 13:34:03 +0000 (GMT) (envelope-from ryans@gamersimpact.com) Received: (qmail 23058 invoked by uid 0); 6 Mar 2005 13:32:58 -0000 Received: from unknown (HELO ?192.168.0.5?) (63.231.157.250) by node15.coopprint.com with SMTP; 6 Mar 2005 13:32:58 -0000 Message-ID: <422B06DD.8050801@gamersimpact.com> Date: Sun, 06 Mar 2005 07:34:21 -0600 From: Ryan Sommers User-Agent: Mozilla Thunderbird 0.7.3 (Windows/20040803) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Nate Lawson References: <422AB45A.9040809@root.org> In-Reply-To: <422AB45A.9040809@root.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: arch@freebsd.org Subject: Re: patch: clean up msdosfs conversion routine X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2005 13:34:04 -0000 Nate Lawson wrote: > - u_int16_t code; > + u_int16_t *wcp; Aren't we converting from BSD u_intXX_t to the new C99 uint16_t? -- Ryan Sommers ryans@gamersimpact.com From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 01:19:58 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 325D616A4CE for ; Thu, 10 Mar 2005 01:19:58 +0000 (GMT) Received: from freebee.digiware.nl (dsl439.iae.nl [212.61.63.187]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2363C43D39 for ; Thu, 10 Mar 2005 01:19:57 +0000 (GMT) (envelope-from wjw@digiware.nl) Received: from [212.61.27.67] (opteron.digiware.nl [212.61.27.67]) by freebee.digiware.nl (8.13.1/8.13.1) with ESMTP id j2A1JtFX014573 for ; Thu, 10 Mar 2005 02:19:55 +0100 (CET) (envelope-from wjw@digiware.nl) Message-ID: <422FA0BB.4060105@digiware.nl> Date: Thu, 10 Mar 2005 02:19:55 +0100 From: Willem Jan Withagen Organization: Digiware User-Agent: Mozilla Thunderbird 1.0 (X11/20050127) X-Accept-Language: en-us, en MIME-Version: 1.0 To: "arch@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: SCGTEST.horses.nl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: wjw@digiware.nl List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 01:19:58 -0000 Hoe is het met deze meneer??? Verdacht veel radiostilte? --WjW From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 08:58:14 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 48BD116A4CE for ; Thu, 10 Mar 2005 08:58:14 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE4DB43D6E for ; Thu, 10 Mar 2005 08:58:13 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2A8w9d4010576; Thu, 10 Mar 2005 03:58:09 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2A8w9tT010570; Thu, 10 Mar 2005 03:58:09 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Thu, 10 Mar 2005 03:58:09 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org, pete@isilon.com Message-ID: <20050310034922.Y20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 08:58:14 -0000 I've run into a few vclean races and some related problems with VOP_CLOSE not using locks. I've made some fairly major changes to the way vfs handles vnode teardown in the process of fixing this. I'll summarize what I've done here. The main problem with teardown was the two stage locking scheme involving the XLOCK. I got rid of the XLOCK and simply require the vnode lock throughout the whole operation. To accommodate this, VOP_INACTIVE, VOP_RECLAIM, VOP_CLOSE, and VOP_REVOKE all require the vnode lock. As does vgone(). Prior to this, vgone() would set XLOCK and then do a LK_DRAIN to make sure there were no callers waiting in VOP_LOCK so that they would always see the VI_XLOCK and know that the vnode had changed identities. Now, vgone sets XLOCK, and all lockers who use vget() and vn_lock() check for VI_DOOMED before and after acquiring the vnode lock. To wait for the transition to complete, you simply wait on the vnode lock. This really only required minor changes of the filesystems in the tree. Most only required the removal of a VOP_UNLOCK in VOP_INACTIVE, and a few acquired the lock in VOP_CLOSE to do operations which they otherwise could not. There is one change to ffs and coda which inspect v_data in their vop_lock routines. This is only safe with the interlock held, where before the XLOCK would have protected v_data in the one case that could lead to panic. The patch is available at http://www.chesapeake.net/~jroberson/vgone.diff Cheers, Jeff From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 09:05:36 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 505AD16A4CE for ; Thu, 10 Mar 2005 09:05:36 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id D95C643D2D for ; Thu, 10 Mar 2005 09:05:35 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2A95Wd4012395; Thu, 10 Mar 2005 04:05:32 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2A95W9o012392; Thu, 10 Mar 2005 04:05:32 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Thu, 10 Mar 2005 04:05:32 -0500 (EST) From: Jeff Roberson To: arch@freebsd.org, pete@isilon.com In-Reply-To: <20050310034922.Y20708@mail.chesapeake.net> Message-ID: <20050310040417.A20708@mail.chesapeake.net> References: <20050310034922.Y20708@mail.chesapeake.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 09:05:36 -0000 On Thu, 10 Mar 2005, Jeff Roberson wrote: > I've run into a few vclean races and some related problems with VOP_CLOSE > not using locks. I've made some fairly major changes to the way vfs > handles vnode teardown in the process of fixing this. I'll summarize what > I've done here. > > The main problem with teardown was the two stage locking scheme involving > the XLOCK. I got rid of the XLOCK and simply require the vnode lock > throughout the whole operation. To accommodate this, VOP_INACTIVE, > VOP_RECLAIM, VOP_CLOSE, and VOP_REVOKE all require the vnode lock. As > does vgone(). > > Prior to this, vgone() would set XLOCK and then do a LK_DRAIN to make > sure there were no callers waiting in VOP_LOCK so that they would always > see the VI_XLOCK and know that the vnode had changed identities. Now, > vgone sets XLOCK, and all lockers who use vget() and vn_lock() check for This should be "vgone sets VI_DOOMED" which now means "the vnode has been dissociated from it's filesystem". > VI_DOOMED before and after acquiring the vnode lock. To wait for the > transition to complete, you simply wait on the vnode lock. > > This really only required minor changes of the filesystems in the tree. > Most only required the removal of a VOP_UNLOCK in VOP_INACTIVE, and a few > acquired the lock in VOP_CLOSE to do operations which they otherwise could > not. There is one change to ffs and coda which inspect v_data in their > vop_lock routines. This is only safe with the interlock held, where > before the XLOCK would have protected v_data in the one case that could > lead to panic. > > The patch is available at http://www.chesapeake.net/~jroberson/vgone.diff > > Cheers, > Jeff > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 10:20:00 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 194A316A4CE for ; Thu, 10 Mar 2005 10:19:59 +0000 (GMT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id F3AB343D1F for ; Thu, 10 Mar 2005 10:19:58 +0000 (GMT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 10 Mar 2005 10:19:58 +0000 (GMT) To: Jeff Roberson In-Reply-To: Your message of "Thu, 10 Mar 2005 04:05:32 EST." <20050310040417.A20708@mail.chesapeake.net> Date: Thu, 10 Mar 2005 10:19:55 +0000 From: Ian Dowse Message-ID: <200503101019.aa42819@salmon.maths.tcd.ie> cc: arch@freebsd.org cc: pete@isilon.com Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 10:20:00 -0000 In message <20050310040417.A20708@mail.chesapeake.net>, Jeff Roberson writes: >On Thu, 10 Mar 2005, Jeff Roberson wrote: >> Prior to this, vgone() would set XLOCK and then do a LK_DRAIN to make >> sure there were no callers waiting in VOP_LOCK so that they would always >> see the VI_XLOCK and know that the vnode had changed identities. Now, >> vgone sets XLOCK, and all lockers who use vget() and vn_lock() check for > >This should be "vgone sets VI_DOOMED" which now means "the vnode has been >dissociated from it's filesystem". I think one of the other original reasons for the XLOCK/LK_DRAIN code was filesystems that used shared locks such as NFS, since locking the vnode did not provide exclusive access. But there are no more shared locking filesystems now and we are not going to support this again, right? Is there a new potential race where the vnode could be reused and so have the VI_DOOMED flag cleared before a thread waiting for the original vnode wakes up and sees the VI_DOOMED flag? >> The patch is available at http://www.chesapeake.net/~jroberson/vgone.diff BTW, I believe the vgonel() code can be further simplified since the addition of the VI_DOINGINACT flag some time ago. Below is a patch extracted from some local changes that I've been using locally for quite a long time without problems. The new vbusy() call addresses one case where it appeared a vnode could be picked up for reuse while it was being recycled, but that may be impossible now. The main part of the patch works by letting the VI_DOINGINACT flag take the place of the non-zero usecount. Ian Index: vfs_subr.c =================================================================== RCS file: /dump/FreeBSD-CVS/src/sys/kern/vfs_subr.c,v retrieving revision 1.581 diff -u -r1.581 vfs_subr.c --- vfs_subr.c 17 Feb 2005 10:49:50 -0000 1.581 +++ vfs_subr.c 18 Jan 2005 03:54:52 -0000 @@ -2279,7 +2308,6 @@ void vgonel(struct vnode *vp, struct thread *td) { - int active; /* * If a vgone (or vclean) is already in progress, @@ -2291,18 +2319,11 @@ vx_lock(vp); + /* The vnode must not be on the free list while being cleaned. */ + if (vp->v_iflag & VI_FREE) + vbusy(vp); /* - * Check to see if the vnode is in use. If so we have to reference it - * before we clean it out so that its count cannot fall to zero and - * generate a race against ourselves to recycle it. - */ - if ((active = vp->v_usecount)) - v_incr_usecount(vp, 1); - - /* - * Even if the count is zero, the VOP_INACTIVE routine may still - * have the object locked while it cleans it out. The VOP_LOCK - * ensures that the VOP_INACTIVE routine is done with its work. + * Lock the vnode, draining so that we wait on shared locks too. * For active vnodes, it ensures that no other activity can * occur while the underlying object is being cleaned out. */ @@ -2328,7 +2349,7 @@ * deactivated before being reclaimed. Note that the * VOP_INACTIVE will unlock the vnode. */ - if (active) { + if (vp->v_usecount) { VOP_CLOSE(vp, FNONBLOCK, NOCRED, td); VI_LOCK(vp); if ((vp->v_iflag & VI_DOINGINACT) == 0) { @@ -2352,28 +2373,6 @@ if (VOP_RECLAIM(vp, td)) panic("vclean: cannot reclaim"); - VNASSERT(vp->v_object == NULL, vp, - ("vop_reclaim left v_object vp=%p, tag=%s", vp, vp->v_tag)); - - if (active) { - /* - * Inline copy of vrele() since VOP_INACTIVE - * has already been called. - */ - VI_LOCK(vp); - v_incr_usecount(vp, -1); - if (vp->v_usecount <= 0) { -#ifdef INVARIANTS - if (vp->v_usecount < 0 || vp->v_writecount != 0) { - vprint("vclean: bad ref count", vp); - panic("vclean: ref cnt"); - } -#endif - if (VSHOULDFREE(vp)) - vfree(vp); - } - VI_UNLOCK(vp); - } /* * Delete from old mount point vnode list. */ From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 10:26:36 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 90E6D16A4CE for ; Thu, 10 Mar 2005 10:26:36 +0000 (GMT) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2E32F43D3F for ; Thu, 10 Mar 2005 10:26:36 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) j2AAQY0e022049; Thu, 10 Mar 2005 02:26:34 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.12.9p2/8.12.9/Submit) id j2AAQYHa022048; Thu, 10 Mar 2005 02:26:34 -0800 (PST) (envelope-from dillon) Date: Thu, 10 Mar 2005 02:26:34 -0800 (PST) From: Matthew Dillon Message-Id: <200503101026.j2AAQYHa022048@apollo.backplane.com> To: Jeff Roberson References: <20050310034922.Y20708@mail.chesapeake.net> cc: arch@freebsd.org cc: pete@isilon.com Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 10:26:36 -0000 :I've run into a few vclean races and some related problems with VOP_CLOSE :not using locks. I've made some fairly major changes to the way vfs :handles vnode teardown in the process of fixing this. I'll summarize what :I've done here. : :The main problem with teardown was the two stage locking scheme involving :the XLOCK. I got rid of the XLOCK and simply require the vnode lock :throughout the whole operation. To accommodate this, VOP_INACTIVE, :VOP_RECLAIM, VOP_CLOSE, and VOP_REVOKE all require the vnode lock. As :does vgone(). : :Prior to this, vgone() would set XLOCK and then do a LK_DRAIN to make :sure there were no callers waiting in VOP_LOCK so that they would always :see the VI_XLOCK and know that the vnode had changed identities. Now, :vgone sets XLOCK, and all lockers who use vget() and vn_lock() check for :VI_DOOMED before and after acquiring the vnode lock. To wait for the :transition to complete, you simply wait on the vnode lock. : :This really only required minor changes of the filesystems in the tree. :Most only required the removal of a VOP_UNLOCK in VOP_INACTIVE, and a few :acquired the lock in VOP_CLOSE to do operations which they otherwise could :not. There is one change to ffs and coda which inspect v_data in their :vop_lock routines. This is only safe with the interlock held, where :before the XLOCK would have protected v_data in the one case that could :lead to panic. : :The patch is available at http://www.chesapeake.net/~jroberson/vgone.diff : :Cheers, :Jeff I did basically the same thing in DragonFly. I can only caution that you have to be very, *very* careful to ensure that there are no races between entities trying to ffs_vget() through the inode verses the related vnode undergoing a vgone(). Be aware of the fact that there is a major non-atomicy problem between the destruction of the vnode and the clearing of the inode's bit in the bitmap and the destruction of the inode in the inode hash table. In DragonFly there were races in the inode-reuse case where an inode would be reused before it was entirely disconnected from the vnode. This case can occur because the inode's bitmap bit is cleared before the vnode is unlocked (and you have to do it that way, you can't clear the inode's bitmap bit after the vnode has been unlocked without creating yet more issues). This means that a totally unrelated process creating a file or directory could allocate the SAME inode number out of the inode bitmap and call FFS_VGET() *BEFORE* the original vnode controlling that inode number finishes being reclaimed. In fact, the original inode is still pointing at the vnode undergoing destruction and the vnode's v_data is still pointing at the inode. The result: bad things happen. It got so hairy in DFly that I wound up reordering the code as follows: (1) lock vnode (2) free the inode in the bitmap (UFS_IFREE() equivalent) (3) Set v_data to NULL (4) Remove inode from the inode hash table (clear back pointer) [inode now fully disassociated] (5) unlock vnode. *** Modify the bitmap allocation code for inodes to check whether the candidate inode is still present in the inode hash and SKIP THAT INODE if it is. (This in the UFS code). If you do not do that you wind up with a case where the filesystem cannot differentiate between someone trying to lock the original vnode and someone trying to vget() the vnode related to a newly created file whos inode was just allocated out of the inode bitmap. The word 'nasty' doesn't even begin to describe the problem. There are *THREE* races here: The vnode lock, the inode hash table, AND the inode bitmap. Due to disk I/O it is possible for the system to block in between any of the related operations so you have to make sure that races against all three points are handled properly. My solution was to end-run around the points by leaving the inode hash table intact until the very end. In DragonFly I don't call ufs_ihashrem() until the inode has been completely divorced from the vnode and that gives me a way to check for the race (by checking whether the inode is present in the inode hash table or not). Another thing I did in DragonFly was to get rid of the crazy handling of the vnode's reference count during the reclaim. I don't know what FreeBSD-HEAD is doing now, but FreeBSD-4 dropped the ref count to 0 and then did the crazy VXLOCK stuff. In DragonFly I changed that so the ref count is left at 1 (not 0) during the reclaim. This required fixing up a few cases that were checking the refcount against an absolute 0 or 1 (I forget the cases), but it made the code a whole lot easier to understand. -Matt Matthew Dillon From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 11:21:12 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BC4D416A4CE for ; Thu, 10 Mar 2005 11:21:12 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2EF9E43D46 for ; Thu, 10 Mar 2005 11:21:12 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2ABL4d4046569; Thu, 10 Mar 2005 06:21:04 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2ABL49R046566; Thu, 10 Mar 2005 06:21:04 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Thu, 10 Mar 2005 06:21:04 -0500 (EST) From: Jeff Roberson To: Ian Dowse In-Reply-To: <200503101019.aa42819@salmon.maths.tcd.ie> Message-ID: <20050310061419.Y20708@mail.chesapeake.net> References: <200503101019.aa42819@salmon.maths.tcd.ie> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org cc: pete@isilon.com Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 11:21:12 -0000 On Thu, 10 Mar 2005, Ian Dowse wrote: > In message <20050310040417.A20708@mail.chesapeake.net>, Jeff Roberson writes: > >On Thu, 10 Mar 2005, Jeff Roberson wrote: > >> Prior to this, vgone() would set XLOCK and then do a LK_DRAIN to make > >> sure there were no callers waiting in VOP_LOCK so that they would always > >> see the VI_XLOCK and know that the vnode had changed identities. Now, > >> vgone sets XLOCK, and all lockers who use vget() and vn_lock() check for > > > >This should be "vgone sets VI_DOOMED" which now means "the vnode has been > >dissociated from it's filesystem". > > I think one of the other original reasons for the XLOCK/LK_DRAIN > code was filesystems that used shared locks such as NFS, since > locking the vnode did not provide exclusive access. But there are > no more shared locking filesystems now and we are not going to > support this again, right? According to Kirk, XLOCK was introduced because the vnode lock was not always in struct vnode, and so you couldn't rely on it after VOP_RECLAIM was called. This is no longer the case. A hypothetical filesystem which does not use the standard vnode lock would now have to lock the vnode lock in RECLAIM, setup its ops vector to use the default vnode lock, wait for all waiters to transfer to the vnode lock by releasing the original lock, and then return from RECLAIM. Shared locks are ok, as long as you have an exclusive lock when you call vgone(). > > Is there a new potential race where the vnode could be reused and > so have the VI_DOOMED flag cleared before a thread waiting for the > original vnode wakes up and sees the VI_DOOMED flag? > In all cases callers should hold a reference while they're waiting on a lock for that vnode to succeed. After they acquire the lock, they should see the VI_DOOMED flag, and eventually drop their reference allowing the vnode to be recycled. > >> The patch is available at http://www.chesapeake.net/~jroberson/vgone.diff > > BTW, I believe the vgonel() code can be further simplified since > the addition of the VI_DOINGINACT flag some time ago. Below is a > patch extracted from some local changes that I've been using locally > for quite a long time without problems. The new vbusy() call addresses > one case where it appeared a vnode could be picked up for reuse > while it was being recycled, but that may be impossible now. The > main part of the patch works by letting the VI_DOINGINACT flag take > the place of the non-zero usecount. I'd much rather get rid of DOINGINACT and simply vhold the vnode so it doesn't have a 0 ref count. It seems more natural to me to use a reference than special case another flag. I like the idea of the patch below, but it conflicts with the patch I just posted as I removed the DOINGINACT check from VSHOULDFREE. > > Ian > > Index: vfs_subr.c > =================================================================== > RCS file: /dump/FreeBSD-CVS/src/sys/kern/vfs_subr.c,v > retrieving revision 1.581 > diff -u -r1.581 vfs_subr.c > --- vfs_subr.c 17 Feb 2005 10:49:50 -0000 1.581 > +++ vfs_subr.c 18 Jan 2005 03:54:52 -0000 > @@ -2279,7 +2308,6 @@ > void > vgonel(struct vnode *vp, struct thread *td) > { > - int active; > > /* > * If a vgone (or vclean) is already in progress, > @@ -2291,18 +2319,11 @@ > > vx_lock(vp); > > + /* The vnode must not be on the free list while being cleaned. */ > + if (vp->v_iflag & VI_FREE) > + vbusy(vp); > /* > - * Check to see if the vnode is in use. If so we have to reference it > - * before we clean it out so that its count cannot fall to zero and > - * generate a race against ourselves to recycle it. > - */ > - if ((active = vp->v_usecount)) > - v_incr_usecount(vp, 1); > - > - /* > - * Even if the count is zero, the VOP_INACTIVE routine may still > - * have the object locked while it cleans it out. The VOP_LOCK > - * ensures that the VOP_INACTIVE routine is done with its work. > + * Lock the vnode, draining so that we wait on shared locks too. > * For active vnodes, it ensures that no other activity can > * occur while the underlying object is being cleaned out. > */ > @@ -2328,7 +2349,7 @@ > * deactivated before being reclaimed. Note that the > * VOP_INACTIVE will unlock the vnode. > */ > - if (active) { > + if (vp->v_usecount) { > VOP_CLOSE(vp, FNONBLOCK, NOCRED, td); > VI_LOCK(vp); > if ((vp->v_iflag & VI_DOINGINACT) == 0) { > @@ -2352,28 +2373,6 @@ > if (VOP_RECLAIM(vp, td)) > panic("vclean: cannot reclaim"); > > - VNASSERT(vp->v_object == NULL, vp, > - ("vop_reclaim left v_object vp=%p, tag=%s", vp, vp->v_tag)); > - > - if (active) { > - /* > - * Inline copy of vrele() since VOP_INACTIVE > - * has already been called. > - */ > - VI_LOCK(vp); > - v_incr_usecount(vp, -1); > - if (vp->v_usecount <= 0) { > -#ifdef INVARIANTS > - if (vp->v_usecount < 0 || vp->v_writecount != 0) { > - vprint("vclean: bad ref count", vp); > - panic("vclean: ref cnt"); > - } > -#endif > - if (VSHOULDFREE(vp)) > - vfree(vp); > - } > - VI_UNLOCK(vp); > - } > /* > * Delete from old mount point vnode list. > */ > From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 11:27:53 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CD8C316A4CE for ; Thu, 10 Mar 2005 11:27:53 +0000 (GMT) Received: from mail.chesapeake.net (chesapeake.net [208.142.252.6]) by mx1.FreeBSD.org (Postfix) with ESMTP id 42EFC43D1D for ; Thu, 10 Mar 2005 11:27:53 +0000 (GMT) (envelope-from jroberson@chesapeake.net) Received: from mail.chesapeake.net (localhost [127.0.0.1]) by mail.chesapeake.net (8.12.10/8.12.10) with ESMTP id j2ABRid4048417; Thu, 10 Mar 2005 06:27:44 -0500 (EST) (envelope-from jroberson@chesapeake.net) Received: from localhost (jroberson@localhost)j2ABRiHg048411; Thu, 10 Mar 2005 06:27:44 -0500 (EST) (envelope-from jroberson@chesapeake.net) X-Authentication-Warning: mail.chesapeake.net: jroberson owned process doing -bs Date: Thu, 10 Mar 2005 06:27:44 -0500 (EST) From: Jeff Roberson To: Matthew Dillon In-Reply-To: <200503101026.j2AAQYHa022048@apollo.backplane.com> Message-ID: <20050310062109.L20708@mail.chesapeake.net> References: <20050310034922.Y20708@mail.chesapeake.net> <200503101026.j2AAQYHa022048@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: arch@freebsd.org cc: pete@isilon.com Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 11:27:53 -0000 On Thu, 10 Mar 2005, Matthew Dillon wrote: > :I've run into a few vclean races and some related problems with VOP_CLOSE > :not using locks. I've made some fairly major changes to the way vfs > :handles vnode teardown in the process of fixing this. I'll summarize what > :I've done here. > : > :The main problem with teardown was the two stage locking scheme involving > :the XLOCK. I got rid of the XLOCK and simply require the vnode lock > :throughout the whole operation. To accommodate this, VOP_INACTIVE, > :VOP_RECLAIM, VOP_CLOSE, and VOP_REVOKE all require the vnode lock. As > :does vgone(). > : > :Prior to this, vgone() would set XLOCK and then do a LK_DRAIN to make > :sure there were no callers waiting in VOP_LOCK so that they would always > :see the VI_XLOCK and know that the vnode had changed identities. Now, > :vgone sets XLOCK, and all lockers who use vget() and vn_lock() check for > :VI_DOOMED before and after acquiring the vnode lock. To wait for the > :transition to complete, you simply wait on the vnode lock. > : > :This really only required minor changes of the filesystems in the tree. > :Most only required the removal of a VOP_UNLOCK in VOP_INACTIVE, and a few > :acquired the lock in VOP_CLOSE to do operations which they otherwise could > :not. There is one change to ffs and coda which inspect v_data in their > :vop_lock routines. This is only safe with the interlock held, where > :before the XLOCK would have protected v_data in the one case that could > :lead to panic. > : > :The patch is available at http://www.chesapeake.net/~jroberson/vgone.diff > : > :Cheers, > :Jeff > > I did basically the same thing in DragonFly. I can only caution that > you have to be very, *very* careful to ensure that there are no races > between entities trying to ffs_vget() through the inode verses the > related vnode undergoing a vgone(). Be aware of the fact that there > is a major non-atomicy problem between the destruction of the vnode > and the clearing of the inode's bit in the bitmap and the destruction > of the inode in the inode hash table. I haven't looked at dragonfly, so I don't know why you had this problem. In FreeBSD 5, you will get ENOENT back from the vget() in ufs_ihashget() if the vnode is in the process of being torn down. Interlocking between the vnode interlock and the ihash_mtx ensures that the hash data is still valid when you attempt to acquire the lock. If it is invalidated afterwards you are guaranteed to get ENOENT and eventually pick up the correct inode. > > In DragonFly there were races in the inode-reuse case where an inode > would be reused before it was entirely disconnected from the vnode. > This case can occur because the inode's bitmap bit is cleared before > the vnode is unlocked (and you have to do it that way, you can't clear > the inode's bitmap bit after the vnode has been unlocked without creating > yet more issues). > > This means that a totally unrelated process creating a file or directory > could allocate the SAME inode number out of the inode bitmap and call > FFS_VGET() *BEFORE* the original vnode controlling that inode number > finishes being reclaimed. In fact, the original inode is still pointing > at the vnode undergoing destruction and the vnode's v_data is still > pointing at the inode. The result: bad things happen. > > It got so hairy in DFly that I wound up reordering the code as follows: > > (1) lock vnode > (2) free the inode in the bitmap (UFS_IFREE() equivalent) > (3) Set v_data to NULL > (4) Remove inode from the inode hash table (clear back pointer) > [inode now fully disassociated] > (5) unlock vnode. > > *** Modify the bitmap allocation code for inodes to check whether the > candidate inode is still present in the inode hash and SKIP THAT > INODE if it is. (This in the UFS code). > > If you do not do that you wind up with a case where the filesystem cannot > differentiate between someone trying to lock the original vnode and > someone trying to vget() the vnode related to a newly created file whos > inode was just allocated out of the inode bitmap. > > The word 'nasty' doesn't even begin to describe the problem. There are > *THREE* races here: The vnode lock, the inode hash table, AND the > inode bitmap. Due to disk I/O it is possible for the system to block > in between any of the related operations so you have to make sure that > races against all three points are handled properly. My solution was > to end-run around the points by leaving the inode hash table intact until > the very end. In DragonFly I don't call ufs_ihashrem() until the inode > has been completely divorced from the vnode and that gives me a way > to check for the race (by checking whether the inode is present in the > inode hash table or not). > > Another thing I did in DragonFly was to get rid of the crazy handling of > the vnode's reference count during the reclaim. I don't know what > FreeBSD-HEAD is doing now, but FreeBSD-4 dropped the ref count to 0 > and then did the crazy VXLOCK stuff. In DragonFly I changed that so > the ref count is left at 1 (not 0) during the reclaim. This required > fixing up a few cases that were checking the refcount against an absolute > 0 or 1 (I forget the cases), but it made the code a whole lot easier to > understand. > > -Matt > Matthew Dillon > > From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 21:11:46 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9391E16A4CE for ; Thu, 10 Mar 2005 21:11:46 +0000 (GMT) Received: from critter.freebsd.dk (f170.freebsd.dk [212.242.86.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id C10B843D2D for ; Thu, 10 Mar 2005 21:11:45 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id j2ALBiPq005045 for ; Thu, 10 Mar 2005 22:11:44 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: arch@freebsd.org From: Poul-Henning Kamp Date: Thu, 10 Mar 2005 22:11:44 +0100 Message-ID: <5044.1110489104@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Subject: HEADSUP: linux dev_t emulation X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 21:11:46 -0000 Linux has only 8 bit major + 8 bit minor dev_t. We have 8 bit major + 24 bit dev_t. Obviously, a Linux dev_t cannot be guaranteed to represent a FreeBSD dev_t correctly. This used to be a bigger problem because disk partitions used the upper bit of the minor. Then it became less of a problem because GEOM numbered from the bottom at all times. Now it _may_ be even less of a problem because the FreeBSD dev_t is now bottom allocated all over. BUT! That still doesn't change the disparity in size, a fact we need to keep in mind. Any Linux program which depends on specific major/minor numbers will have to be dealt with in some magic manner, I have no idea which. I have diked out a lot of code in my development tree and it will need to get fixed properly before it can work in the new world order. The main difference is that when looking up a cdev from a dev_t, a reference will be gained which must be dropped again. Whoever is working on Linux emulation should start looking at p4::phk_bufwork now. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Thu Mar 10 22:26:57 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 15BC416A4CE for ; Thu, 10 Mar 2005 22:26:57 +0000 (GMT) Received: from pentafluge.infradead.org (pentafluge.infradead.org [213.146.154.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8133443D4C for ; Thu, 10 Mar 2005 22:26:54 +0000 (GMT) SRS0+3dc48974f7cf3ae35923+564+infradead.org+hch@pentafluge.srs.infradead.org) Received: from hch by pentafluge.infradead.org with local (Exim 4.43 #1 (Red Hat Linux)) id 1D9W7I-0008HQ-UN; Thu, 10 Mar 2005 22:26:52 +0000 Date: Thu, 10 Mar 2005 22:26:52 +0000 From: Christoph Hellwig To: Poul-Henning Kamp Message-ID: <20050310222652.GA31757@infradead.org> References: <5044.1110489104@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5044.1110489104@critter.freebsd.dk> User-Agent: Mutt/1.4.1i X-SRS-Rewrite: SMTP reverse-path rewritten from by pentafluge.infradead.org See http://www.infradead.org/rpr.html cc: arch@freebsd.org Subject: Re: HEADSUP: linux dev_t emulation X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2005 22:26:57 -0000 On Thu, Mar 10, 2005 at 10:11:44PM +0100, Poul-Henning Kamp wrote: > > Linux has only 8 bit major + 8 bit minor dev_t. Not true anymore. Current Linux has 12bit major and 20bit minor. From owner-freebsd-arch@FreeBSD.ORG Fri Mar 11 02:31:08 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2A8E316A4CE for ; Fri, 11 Mar 2005 02:31:08 +0000 (GMT) Received: from salmon.maths.tcd.ie (salmon.maths.tcd.ie [134.226.81.11]) by mx1.FreeBSD.org (Postfix) with SMTP id 4A2D443D53 for ; Fri, 11 Mar 2005 02:31:07 +0000 (GMT) (envelope-from iedowse@maths.tcd.ie) Received: from walton.maths.tcd.ie by salmon.maths.tcd.ie with SMTP id ; 11 Mar 2005 02:31:06 +0000 (GMT) To: Jeff Roberson In-Reply-To: Your message of "Thu, 10 Mar 2005 06:21:04 EST." <20050310061419.Y20708@mail.chesapeake.net> Date: Fri, 11 Mar 2005 02:31:04 +0000 From: Ian Dowse Message-ID: <200503110231.aa37257@salmon.maths.tcd.ie> cc: arch@freebsd.org cc: pete@isilon.com Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2005 02:31:08 -0000 In message <20050310061419.Y20708@mail.chesapeake.net>, Jeff Roberson writes: [Good info about shared lock filesystems and VI_DOOMED snipped] >I'd much rather get rid of DOINGINACT and simply vhold the vnode so it >doesn't have a 0 ref count. It seems more natural to me to use a >reference than special case another flag. I like the idea of the patch >below, but it conflicts with the patch I just posted as I removed the >DOINGINACT check from VSHOULDFREE. Yes, I'm all for a cleaner way of doing this too. Many of the reasons for DOINGINACT will become unnecessary with your changes anyway, such as the code in nfs_inactive() that used to have to worry about the vnode going away while it did the sillyrename I/O. The only advantage I can think of for using a special flag instead of holding a reference in vgone() is the following: if code called by vgone() has a bug and drops one too many references, then the negative reference count will be caught sooner. It's worth at least making sure that we get a clean panic in this case since recursion caused by the reference count hitting zero twice can lead to messy panics that are difficult to debug. Ian From owner-freebsd-arch@FreeBSD.ORG Fri Mar 11 06:21:59 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3BF0416A4CE for ; Fri, 11 Mar 2005 06:21:59 +0000 (GMT) Received: from critter.freebsd.dk (f170.freebsd.dk [212.242.86.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id 41AA843D5D for ; Fri, 11 Mar 2005 06:21:58 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id j2B6LuNh007577; Fri, 11 Mar 2005 07:21:56 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Christoph Hellwig From: "Poul-Henning Kamp" In-Reply-To: Your message of "Thu, 10 Mar 2005 22:26:52 GMT." <20050310222652.GA31757@infradead.org> Date: Fri, 11 Mar 2005 07:21:56 +0100 Message-ID: <7576.1110522116@critter.freebsd.dk> Sender: phk@critter.freebsd.dk cc: arch@freebsd.org Subject: Re: HEADSUP: linux dev_t emulation X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2005 06:21:59 -0000 In message <20050310222652.GA31757@infradead.org>, Christoph Hellwig writes: >On Thu, Mar 10, 2005 at 10:11:44PM +0100, Poul-Henning Kamp wrote: >> >> Linux has only 8 bit major + 8 bit minor dev_t. > >Not true anymore. Current Linux has 12bit major and 20bit minor. Well, that doesn't change the situation: the linuxolator has a task to learn. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Fri Mar 11 13:29:40 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DD8F316A4CE for ; Fri, 11 Mar 2005 13:29:40 +0000 (GMT) Received: from f30.mail.ru (f30.mail.ru [194.67.57.23]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8CE8143D48 for ; Fri, 11 Mar 2005 13:29:40 +0000 (GMT) (envelope-from _pppp@mail.ru) Received: from mail by f30.mail.ru with local id 1D9kCx-000BeX-00; Fri, 11 Mar 2005 16:29:39 +0300 Received: from [81.200.13.122] by win.mail.ru with HTTP; Fri, 11 Mar 2005 16:29:39 +0300 From: dima <_pppp@mail.ru> To: Poul-Henning Kamp Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [81.200.13.122] Date: Fri, 11 Mar 2005 16:29:39 +0300 In-Reply-To: <7576.1110522116@critter.freebsd.dk> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: cc: arch@freebsd.org Subject: Re[2]: HEADSUP: linux dev_t emulation X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: dima <_pppp@mail.ru> List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2005 13:29:41 -0000 > In message <20050310222652.GA31757@infradead.org>, Christoph Hellwig writes: >>On Thu, Mar 10, 2005 at 10:11:44PM +0100, Poul-Henning Kamp wrote: >>> >>> Linux has only 8 bit major + 8 bit minor dev_t. >> >>Not true anymore. Current Linux has 12bit major and 20bit minor. > > Well, that doesn't change the situation: the linuxolator has a task > to learn. A translation table probably? From owner-freebsd-arch@FreeBSD.ORG Fri Mar 11 16:12:50 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 20D3116A4CE for ; Fri, 11 Mar 2005 16:12:50 +0000 (GMT) Received: from critter.freebsd.dk (f170.freebsd.dk [212.242.86.170]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5D9FF43D49 for ; Fri, 11 Mar 2005 16:12:49 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.13.1/8.13.1) with ESMTP id j2BGClWx002280; Fri, 11 Mar 2005 17:12:47 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: dima <_pppp@mail.ru> From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 11 Mar 2005 16:29:39 +0300." Date: Fri, 11 Mar 2005 17:12:47 +0100 Message-ID: <2279.1110557567@critter.freebsd.dk> Sender: phk@critter.freebsd.dk cc: arch@freebsd.org Subject: Re: Re[2]: HEADSUP: linux dev_t emulation X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2005 16:12:50 -0000 In message , dima writes: >> In message <20050310222652.GA31757@infradead.org>, Christoph Hellwig writes: >>>On Thu, Mar 10, 2005 at 10:11:44PM +0100, Poul-Henning Kamp wrote: >>>> >>>> Linux has only 8 bit major + 8 bit minor dev_t. >>> >>>Not true anymore. Current Linux has 12bit major and 20bit minor. >> >> Well, that doesn't change the situation: the linuxolator has a task >> to learn. > >A translation table probably? If so then the table will have to key off the name in the cdevsw associated with the cdev since our major/minor numbers are no longer constant across reboots (just like hardware configurations are not). -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-arch@FreeBSD.ORG Sat Mar 12 00:33:43 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 92B1D16A4CE for ; Sat, 12 Mar 2005 00:33:43 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0FE7743D2D for ; Sat, 12 Mar 2005 00:33:43 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.1/8.13.1) with ESMTP id j2C0XNun049904; Fri, 11 Mar 2005 16:33:27 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <200503120033.j2C0XNun049904@gw.catspoiler.org> Date: Fri, 11 Mar 2005 16:33:23 -0800 (PST) From: Don Lewis To: dillon@apollo.backplane.com In-Reply-To: <200503101026.j2AAQYHa022048@apollo.backplane.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: pete@isilon.com cc: arch@FreeBSD.org Subject: Re: Cleaning up vgone. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2005 00:33:43 -0000 On 10 Mar, Matthew Dillon wrote: > Another thing I did in DragonFly was to get rid of the crazy handling of > the vnode's reference count during the reclaim. I don't know what > FreeBSD-HEAD is doing now, but FreeBSD-4 dropped the ref count to 0 > and then did the crazy VXLOCK stuff. In DragonFly I changed that so > the ref count is left at 1 (not 0) during the reclaim. This required > fixing up a few cases that were checking the refcount against an absolute > 0 or 1 (I forget the cases), but it made the code a whole lot easier to > understand. I suggested doing this a long time ago, but there seemed to be enough side effects that I didn't try to implement it. From owner-freebsd-arch@FreeBSD.ORG Sat Mar 12 21:08:36 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 5E38C16A4CE for ; Sat, 12 Mar 2005 21:08:36 +0000 (GMT) Received: from kientzle.com (h-66-166-149-50.snvacaid.covad.net [66.166.149.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4FE7A43D41 for ; Sat, 12 Mar 2005 21:08:35 +0000 (GMT) (envelope-from kientzle@freebsd.org) Received: from freebsd.org (p54.kientzle.com [66.166.149.54]) by kientzle.com (8.12.9/8.12.9) with ESMTP id j2CL8ZOZ051937 for ; Sat, 12 Mar 2005 13:08:35 -0800 (PST) (envelope-from kientzle@freebsd.org) Message-ID: <42335A52.9060208@freebsd.org> Date: Sat, 12 Mar 2005 13:08:34 -0800 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.4) Gecko/20031006 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Subject: Removing gtar from base X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2005 21:08:36 -0000 My plan with bsdtar has been to have both bsdtar and gtar in 5.x, bsdtar only in 6.x. I'd like to remove gtar from -CURRENT as the next step in that transition. Proposed timeline: * 1 week from today (March 19): Disable WITH_GTAR in -CURRENT. * End of March: Disconnect gtar from build in -CURRENT. * End of May: Remove gtar from -CURRENT. Note: * gtar will remain in 5.x tree indefinitely. * WITH_GTAR will continue to be supported in 5.x. * gtar will continue to be available in ports indefinitely. If there are no objections, I'll start this process 1 week from today. Tim Kientzle