From owner-freebsd-arch@FreeBSD.ORG Sun Mar 10 03:59:32 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 01E82FFC; Sun, 10 Mar 2013 03:59:31 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id CF2E4A4C; Sun, 10 Mar 2013 03:59:31 +0000 (UTC) Received: from Alfreds-MacBook-Pro-9.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 67EFC1A3C25; Sat, 9 Mar 2013 19:59:22 -0800 (PST) Message-ID: <513C0515.4050603@mu.org> Date: Sat, 09 Mar 2013 19:59:17 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130216 Thunderbird/17.0.3 MIME-Version: 1.0 To: Peter Grehan Subject: Re: [RFC] Moving bhyve to head References: <50ECFD6C.4000408@freebsd.org> In-Reply-To: <50ECFD6C.4000408@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Mar 2013 03:59:32 -0000 Peter, bhyve has been a huge boon and very successful for many of us. A few people have asked me if the code will be available in -stable soon. Is that possible? Can I help in any fashion? -Alfred On 1/8/13 9:17 PM, Peter Grehan wrote: > Neel and I would like to move bhyve development from the > projects/bhyve branch into CURRENT. This will allow the code > to reach a wider audience before 10, and provide us with better > feedback on what features should be prioritized. > > The intent of bhyve is to provide a small, extendible codebase > that allows FreeBSD users to easily run virtual machines. Currently, > bhyve supports running FreeBSD/amd64 guests on FreeBSD/amd64 > hosts with Intel VT-x and EPT CPU support. Additional guest operating > systems should be available in the near future, as will AMD-SVM CPU > support. > > bhyve is implemented as a kernel module and user-level utilities. Note > that it has zero impact on the system until the module is loaded. > > The raw diff against CURRENT can be viewed at > http://people.freebsd.org/~neel/bhyve/diff.txt > > (A sanitized diff, without the svn mergeinfo, is at: > http://people.freebsd.org/~neel/bhyve/diff_without_mergeinfo.txt > > A listing of modified and added files with annotations is at: > http://people.freebsd.org/~neel/bhyve/diff_filenames_only.txt) > > Info on bhyve and installation instructions can be found at > http://wiki.freebsd.org/BHyVe > http://bhyve.org > > Comments and review requested :) > > later, > > Peter & Neel. > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Sun Mar 10 05:33:38 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A4678A49 for ; Sun, 10 Mar 2013 05:33:38 +0000 (UTC) (envelope-from grehan@freebsd.org) Received: from alto.onthenet.com.au (alto.OntheNet.com.au [203.13.68.12]) by mx1.freebsd.org (Postfix) with ESMTP id 4234ED87 for ; Sun, 10 Mar 2013 05:33:37 +0000 (UTC) Received: from dommail.onthenet.com.au (dommail.OntheNet.com.au [203.13.70.57]) by alto.onthenet.com.au (Postfix) with ESMTPS id A3ED511B82; Sun, 10 Mar 2013 15:33:36 +1000 (EST) Received: from Peter-Grehans-MacBook-Pro.local (c-67-190-11-104.hsd1.co.comcast.net [67.190.11.104]) by dommail.onthenet.com.au (MOS 4.2.4-GA) with ESMTP id BKN09690 (AUTH peterg@ptree32.com.au); Sun, 10 Mar 2013 15:33:34 +1000 Message-ID: <513C1B2C.9080707@freebsd.org> Date: Sat, 09 Mar 2013 22:33:32 -0700 From: Peter Grehan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.28) Gecko/20120306 Thunderbird/3.1.20 MIME-Version: 1.0 To: Alfred Perlstein Subject: Re: [RFC] Moving bhyve to head References: <50ECFD6C.4000408@freebsd.org> <513C0515.4050603@mu.org> In-Reply-To: <513C0515.4050603@mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Mar 2013 05:33:38 -0000 Hi Alfred, > Peter, bhyve has been a huge boon and very successful for many of us. Great to hear ! > A few people have asked me if the code will be available in -stable soon. The initial plan was to not do this - it seemed a lot of work in addition to the massive TODO list. Also, we weren't sure how stable the interfaces would be. But, I'm not going to stop anyone who is willing to do the backport :) > Is that possible? Can I help in any fashion? Absolutely possible. All early development was done on 8.1. I think most of the relevant host bits (FPU/AVX save/restore) have now been MFC'd thought there might be some more required. rpaulo@ is already doing some work on this - you may wish to coordinate with him. later, Peter. From owner-freebsd-arch@FreeBSD.ORG Sun Mar 10 08:22:09 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0CF582CB; Sun, 10 Mar 2013 08:22:09 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 640A21F3; Sun, 10 Mar 2013 08:22:07 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r2A8LvSb028477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 10 Mar 2013 19:21:59 +1100 Date: Sun, 10 Mar 2013 19:21:57 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Kenneth D. Merry" Subject: Re: patches to add new stat(2) file flags In-Reply-To: <20130308232155.GA47062@nargothrond.kdm.org> Message-ID: <20130310181127.D2309@besplex.bde.org> References: <20130307000533.GA38950@nargothrond.kdm.org> <20130307222553.P981@besplex.bde.org> <20130308232155.GA47062@nargothrond.kdm.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=bNdOu4CZ c=1 sm=1 a=n2O7wv11oSwA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=YOiZBDKP_E4A:10 a=7RUHTw8TR_SxG-0Q0ScA:9 a=CjuIK1q_8ugA:10 a=iA5AuRVOsPQzuK-W:21 a=yF7AlGMdZxl7LVJH:21 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: arch@FreeBSD.org, fs@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Mar 2013 08:22:09 -0000 On Fri, 8 Mar 2013, Kenneth D. Merry wrote: > On Fri, Mar 08, 2013 at 00:37:15 +1100, Bruce Evans wrote: >> On Wed, 6 Mar 2013, Kenneth D. Merry wrote: >> >>> I have attached diffs against head for some additional stat(2) file flags. >>> >>> The primary purpose of these flags is to improve compatibility with CIFS, >>> both from the client and the server side. >>> ... >> >> I missed looking at the diffs in my previous reply. >> >> % --- //depot/users/kenm/FreeBSD-test3/bin/chflags/chflags.1 2013-03-04 >> 17:51:12.000000000 -0700 >> % +++ /usr/home/kenm/perforce4/kenm/FreeBSD-test3/bin/chflags/chflags.1 >> 2013-03-04 17:51:12.000000000 -0700 >> % --- /tmp/tmp.49594.86 2013-03-06 16:42:43.000000000 -0700 >> % +++ /usr/home/kenm/perforce4/kenm/FreeBSD-test3/bin/chflags/chflags.1 >> 2013-03-06 14:47:25.987128763 -0700 >> % @@ -117,6 +117,16 @@ >> % set the user immutable flag (owner or super-user only) >> % .It Cm uunlnk , uunlink >> % set the user undeletable flag (owner or super-user only) >> % +.It Cm system , usystem >> % +set the Windows system flag (owner or super-user only) >> >> This begins unsorting of the list. > > Fixed. > >> It's not just a Windows flag, since it also works in DOS. > > Fixed. Thanks. Hopefully all the simple bugs are fixed now. >> "Owner or" is too strict for msdosfs, since files can only have a >> single owner so it is controlling access using groups is needed. I >> use owner root and group msdosfs for msdosfs mounts. This works for >> normal operations like open/read/write, but fails for most attributes >> including file flags. msdosfs doesn't support many attributes but >> this change is supposed to add support for 3 new file flags so it would >> be good if it didn't restrict the support to root. > > I wasn't trying to change the existing security model for msdosfs, but if > you've got a suggested patch to fix it I can add that in. I can't think of anything better than making group write permission enough for attributes. msdosfs also has some style bugs in this area. It uses VOP_ACCESS() with VADMIN for the non-VA_UTIMES_NULL case of utimes(), but for all other attributes it hard-codes a direct uid check followed a priv_check_cred() with PRIV_VFS_ADMIN. VADMIN requires even more than user write permission for POSIX file systems and using it unchanged for all the attributes would be even more restrictive unless we changed it, but it would be easier to make it uniformly less restrictive for msdosfs by using it consistently. Oops, that was in the old version of ffs. ffs now has related complications and unnecessary style bugs (verboseness and misformatting) to support ACLs. It now uses VOP_ACCESSX() with VWRITE_ATTRIBUTES for utimes(), and VOP_ACCESSX() with other VFOO for all attributes except flags. It still uses VOP_ACCESS() with VADMIN() for flags. >> ... >> % .It Dv SF_ARCHIVED >> ... >> % +Filesystems in FreeBSD may or may not have special handling for this >> flag. >> % +For instance, ZFS tracks changes to files and will clear this bit when a >> % +file is updated. >> % +UFS only stores the flag, and relies on the application to change it when >> % +needed. >> >> I think that is useless, since changing it is needed whenever the file >> changes, and applications can do that (short of running as daemons and >> watching for changes). > > Do you mean applications can't do that or can? Oops, can't. It is still hard for users to know how their file system supports. Even programmers don't know that it is backwards :-). >> % --- //depot/users/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c >> 2013-03-04 17:51:12.000000000 -0700 >> % +++ >> /usr/home/kenm/perforce4/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c >> 2013-03-04 17:51:12.000000000 -0700 >> % --- /tmp/tmp.49594.370 2013-03-06 16:42:43.000000000 -0700 >> % +++ >> /usr/home/kenm/perforce4/kenm/FreeBSD-test3/sys/fs/msdosfs/msdosfs_vnops.c >> 2013-03-06 14:49:47.179130318 -0700 >> % @@ -345,8 +345,17 @@ >> % vap->va_birthtime.tv_nsec = 0; >> % } >> % vap->va_flags = 0; >> % + /* >> % + * The DOS Archive attribute means that a file needs to be >> % + * archived. The BSD SF_ARCHIVED attribute means that a file has >> % + * been archived. Thus the inversion here. >> % + */ >> >> No need to document it again. It goes without saying that ARCHIVE >> != ARCHIVED. > > I disagree. It wasn't immediately obvious to me that SF_ARCHIVED was > generally used as the inverse of the DOS Archived bit until I started > digging into this. If this helps anyone figure that out more quickly, it's > useful. The surprising thing is that it is backwards in FreeBSD and not really supported except in msdosfs. Now several file systems have the comment about it being inverted, but man pages still don't. >> % @@ -420,12 +429,21 @@ >> % if (error) >> % return (error); >> % } >> >> The permissions check before this is delicate and was broken and is >> more broken now. It is still short-circuit to handle setting the >> single flag that used to be supported, and is slightly broken for that: >> - unprivileged user asks to set ARCHIVE by passing !SF_ARCHIVED. We >> allow that, although this may toggle the flag and normal semantics >> for SF flags is to not allow toggling. >> - unprivileged user asks to clear ARCHIVE by passing SF_ARCHIVED. We >> don't allow that. But we should allow preserving ARCHIVE if it is >> already clear. >> The bug wasn't very important when only 1 flag was supported. Now it >> prevents unprivileged users managing the new UF flags if ARCHIVE is >> clear. Fortunately, this is the unusual case. Anyway, unprivileged >> users can set ARCHIVE by doing some other operation. Even the chflags() >> operation should set ARCHIVE and thus allow further chflags()'s that now >> keep ARCHIVE set. Except it is very confusing if a chflags() asks for >> ARCHIVE to be clear. This request might be just to try to preserve >> the current setting and not want it if other things are changed, or >> it might be to purposely clear it. Changing it from set to clear should >> still be privileged. > > I changed it to allow setting or clearing SF_ARCHIVED. Now I can set or > clear the flag as non-root: Actually, it seems OK, since there are no old or new SF_ immututable flags. Some of the actions are broken in the old and new code for directories -- see below. >> See the more complicated permissions check in ffs. It would be safest >> to duplicate most of it, to get different permissions checking for the >> SF and UF flags. Then decide if we want to keep allowing setting >> ARCHIVE without privilege. > > I think we should allow getting and setting SF_ARCHIVED without special > privileges. Given how it is generally used, I don't think it should be > restricted to the super-user. I don't really like that since changing the flags is mainly needed for the failry privileged operation of managing other OS's file systems. However, since we're mapping the SYSTEM flag to a UF_ flag, the SYSTEM flag will require less privilege than the ARCHIVE flag. This is backwards, so we might as well require less privilege for ARCHIVE too. I think we, that is, you should use a new UF_ARCHIVE flag with the correct sense. > Can you provide some code demonstrating how the permissions code should > be changed in msdosfs? I don't know that much about that sort of thing, > so I'll probably spend an inordinate amount of time stumbling > through it. Now I think only cleanups are needed. >> % return EOPNOTSUPP; >> % if (vap->va_flags & SF_ARCHIVED) >> % dep->de_Attributes &= ~ATTR_ARCHIVE; >> % else if (!(dep->de_Attributes & ATTR_DIRECTORY)) >> % dep->de_Attributes |= ATTR_ARCHIVE; >> >> The comment before this says that we ignore attmps to set ATTR_ARCHIVED >> for directories. However, it is out of date. WinXP allows setting it >> and all the new flags for directories, and so do we. > > Do you mean we allow setting it in UFS, or where? Obviously the code above > won't set it on a directory. I meant it here. Actually, the comment matches the code -- I somehow missed the test in the code. However, the code is wrong. All directories except the root directory have this and other attributes, but FreeBSD refuses to set them. More below. >> The WinXP attrib command (at least on a FAT32 fs) doesn't allow setting >> or clearing ARCHIVE (even if it is already set or clear) if any of >> HIDDEN, READONLY or SYSTEM is already set and remains set after the >> command. Thus the HRS attributes act a bit like immutable flags, but >> subtly differently. (ffs has the quite different and worse behaviour >> of allowing chflags() irrespective of immutable flags being set before >> or after, provided there is enough privilege to change the immutable >> flags.) Anyway, they should all give some aspects of immutability. > > We could do that for msdosfs, but why add more things for the user to trip > over given how the filesystem is typically used? Most people probably > use it for USB thumb drives these days. Or perahps on a dual boot system > to access their Windows partition. The small data drives won't have many files with attributes (except ARCHIVE). For multiple-boot, I think the permssions shouldn't be too much different than the foreign OS's. I used not to worry about this and liked deleting WinXP files without asking it, but recently I spent a lot of time recovering a WinXP ntfs partition and changed a bit too much using FreeBSD and Cygwin because I didn't understand the permissions (especially ACLs). ntfs in FreeBSD was less than r/o so it couldn't even back up the permissions (for file flags, it returned the garbage in its internal inode flags without translation...). > *** src/bin/chflags/chflags.1.orig > --- src/bin/chflags/chflags.1 > *************** > *** 101,120 **** > .Bl -tag -offset indent -width ".Cm opaque" > .It Cm arch , archived > set the archived flag (super-user only) > .It Cm opaque > set the opaque flag (owner or super-user only) > - .It Cm nodump > - set the nodump flag (owner or super-user only) > .It Cm sappnd , sappend The opaque flag is UF_ too. > + .It Cm snapshot > + set the snapshot flag (most filesystems do not allow changing this flag) I think none do. It can only be displayed. chflags(1) doesn't display flags, so this shouldn't be here. The problem is that this man page is the only place where the flag names are documented. ls(1) and strtofflags(3) just point to here. strtofflags(3) says that the flag names are documented here, but ls(1) just has an Xref to here. > *** src/lib/libc/sys/chflags.2.orig > --- src/lib/libc/sys/chflags.2 > --- 71,127 ---- > the following values > .Pp > .Bl -tag -width ".Dv SF_IMMUTABLE" -compact -offset indent > ! .It Dv SF_APPEND > The file may only be appended to. > .It Dv SF_ARCHIVED > ! The file has been archived. > ! This flag means the opposite of the Windows and CIFS FILE_ATTRIBUTE_ARCHIVE DOS, Windows and CIFS... > ! attribute. > ! That attribute means that the file should be archived, whereas > ! .Dv SF_ARCHIVED > ! means that the file has been archived. > ! Filesystems in FreeBSD may or may not have special handling for this flag. > ! For instance, ZFS tracks changes to files and will clear this bit when a > ! file is updated. Does zfs clear it in other circumstances? WinXP doesn't for msdosfs (or ntfs?), but FreeBSD clears it when changing some attributes, even for null changes (these are: times except for atimes, and the HIDDEN attribute when it is changed by chmod() -- even for null changes --, but not for the HIDDEN attribute when it is changed (or preserved) by chflags() in your new code). I want to to be cleared for metadata so that backup utilities can trust the ARCHIVE flag for metadata changes. > + .It Dv UF_IMMUTABLE > + The file may not be changed. > + Filesystems may use this flag to maintain compatibility with the Windows and > + CIFS FILE_ATTRIBUTE_READONLY attribute. So READONLY is only mapped to UFS_IMMUTABLE if it gives immutability? > *** src/sys/fs/msdosfs/msdosfs_vnops.c.orig > --- src/sys/fs/msdosfs/msdosfs_vnops.c > *************** > *** 415,431 **** > * set ATTR_ARCHIVE for directories `cp -pr' from a more > * sensible filesystem attempts it a lot. > */ > ! if (vap->va_flags & SF_SETTABLE) { > error = priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0); > if (error) > return (error); > } > ! if (vap->va_flags & ~SF_ARCHIVED) > return EOPNOTSUPP; > if (vap->va_flags & SF_ARCHIVED) > dep->de_Attributes &= ~ATTR_ARCHIVE; > else if (!(dep->de_Attributes & ATTR_DIRECTORY)) > dep->de_Attributes |= ATTR_ARCHIVE; > dep->de_flag |= DE_MODIFIED; > } > > --- 424,448 ---- > * set ATTR_ARCHIVE for directories `cp -pr' from a more > * sensible filesystem attempts it a lot. > */ > ! if (vap->va_flags & (SF_SETTABLE & ~(SF_ARCHIVED))) { Excessive parentheses. > error = priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0); > if (error) > return (error); > } VADMIN is still needed, and that is too strict. This is a general problem and should be fixed separately. > ! if (vap->va_flags & ~(SF_ARCHIVED | UF_HIDDEN | UF_SYSTEM)) > return EOPNOTSUPP; > if (vap->va_flags & SF_ARCHIVED) > dep->de_Attributes &= ~ATTR_ARCHIVE; > else if (!(dep->de_Attributes & ATTR_DIRECTORY)) > dep->de_Attributes |= ATTR_ARCHIVE; > + if (vap->va_flags & UF_HIDDEN) > + dep->de_Attributes |= ATTR_HIDDEN; > + else > + dep->de_Attributes &= ~ATTR_HIDDEN; > + if (vap->va_flags & UF_SYSTEM) > + dep->de_Attributes |= ATTR_SYSTEM; > + else > + dep->de_Attributes &= ~ATTR_SYSTEM; > dep->de_flag |= DE_MODIFIED; > } Technical old and new problems with msdosfs: - all directories except the root directory support the 3 attributes handled above, and READONLY - the special case for the root directory is because before FAT32, the root directory didn't have an entry for itself (and was otherwise special). With FAT32, the root directory is not so special, but still doesn't have an entry for itself. - thus the old code in the above is wrong for all directories except the root directory - thus the new code in the above is wrong for the root directory. It will make changes to the in-core denode. These can be seen by stat() for a while, but go away when the vnode is recycled. - other code is wrong for directories too. deupdat() refuses to convert from the in-core denode to the disk directory entry for directories. So even when the above changes values for directories, the changes only get synced to the disk accidentally when there is a large change (such as for extending the directory), to the directory entry. - being the root directory is best tested for using VV_ROOT. I use the following to fix the corresponding bugs in utimes(): /* Was: silently ignore the non-error or error for all dirs. */ if (DETOV(dep)->v_vflag & VV_ROOT) return (EINVAL); /* Otherwise valid. */ deupdat() needs a similar change to not ignore all directories. Bruce From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 09:18:57 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DC92E915; Mon, 11 Mar 2013 09:18:57 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 45E76F98; Mon, 11 Mar 2013 09:18:57 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2B9IrE6008284; Mon, 11 Mar 2013 11:18:53 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2B9IrE6008284 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2B9IqNH008283; Mon, 11 Mar 2013 11:18:52 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 11 Mar 2013 11:18:52 +0200 From: Konstantin Belousov To: arch@freebsd.org Subject: Unmapped buffers: to be merged in several days Message-ID: <20130311091852.GR3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="25AjPoXvrcQXrjmi" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 09:18:57 -0000 --25AjPoXvrcQXrjmi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The latest version of the unmapped buffers patch is available at http://people.freebsd.org/~kib/misc/unmapped.17.patch The patch makes the user data buffers, as well as the page-ins, for UFS, the swap-in/out, clustering use unmapped buffers, removing the TLB shootdown overhead and buffer map contention and fragmentation. The ahci(4) and md(4) is converted to accept unmapped BIO requests. Other drivers and geom classes get the compat mapped BIOs, the transient mapping is established by the geom down thread. The KVA for the transient maping the carved from the buffer map, up to 10% of which is repurposed to the transient bio KVA. The hope is that the rest of drivers and geom classes will be converted to accept unmapped i/o shortly, making the transient map unused. The patch was tested by Peter Holm using the whole stress2 suite, on both i386 and amd64, on ahci(4) and ad(4) attached disks. ad(4) uses the transient remapping for unmapped requests, so the testing should cover both new and old i/o pathes. The previous version of the patch is already used on some high-load machines by Scott Long, on ahci(4), isci(4) and mps(4). Brendan Fabeny did useful testing in his environment. The biggest change comparing to the previous mail, is the prevention of the deadlocks due to the bugs in the bufspace limit code. In the HEAD, bufspace is equal to the size of the buffer map, which effectively makes the code which limits the total space allocated to buffers, by maxbufspace, a nop, due to the buffer map fragmentation. In the patch, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. I intend to commit the change as is, with the following modifications: - the pmap_copy_pages() will be a stub for all architectures where it was not tested. The only tested arches are i386, amd64 and powerpc64. - For all architectures where pmap_copy_pages() is a stub, the GB_UNMAPPED flag for the buffer allocators will be nop. FYI. --25AjPoXvrcQXrjmi Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRPaF8AAoJEJDCuSvBvK1B25gP/1Nimbk1v61oYGHKm7n3B8Qb AOXO0yNlOb0ZuAE4ZFVzpBWbwCf/AOtEpGsoeFlyBm1Y4J0mYhZgKZUH1rYTxGZ9 qM8FMg09LU+z6n72MIaUxDCrkFCfM+kUuXi2vvLtqCQIEMsUkiiD8s3wGI1yhE6e wKrZygfmPpnOAsu9v8vGTdOWXgE045JIZwc+mf2eZuIIoNswcVAAgVEkrbpB7fKO 1PUfqdHl9BBzLms/o/zh+j9jdTGXRNToXTi6uDUZBNZHv00DsmhhQ11HuhckNL2Y vFSG0E416sXmOHTNMj+pp0gtP4qxFE3lvaUw9+WbgMDzOUJaWmG3iGyv2EF0TNcU V9czQx7GBiAtBkfylldOzaKLU0+piUh5QKlx/ONAa0J7CCO22jbzCPYOPl8Ssofc gL5b8ArzZkAH5TKe5lYdAwsEEH2o5wCdvt8KMmX1zz+8TvNbap+7QtirttsgNddA NFqn5qVo4WIe64R+QXSqbUByJcNSy/XibsnGTGu4zlpspQEbjWC0S4Z7vLNnMcWj P5CZR62nvWpXwwjJ/D2KSYnS6c3x5NqDv0ynb0/rz0JRYPrKZL/OFYYippGt48YB MdYeqSdJU4p1UkdCs4ZSj98Ovfie74nZ+l30racq2dTO0oDEG8zhavfwBvJ+jcbw dvDyhsuBzYcoQ+fvg98x =IS+v -----END PGP SIGNATURE----- --25AjPoXvrcQXrjmi-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 09:59:39 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 21A06F5C for ; Mon, 11 Mar 2013 09:59:39 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id DC63020A for ; Mon, 11 Mar 2013 09:59:38 +0000 (UTC) Received: from ds4.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id 08C75BA8B; Mon, 11 Mar 2013 09:59:37 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id C97FD94C4; Mon, 11 Mar 2013 10:59:36 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Konstantin Belousov Subject: Re: Unmapped buffers: to be merged in several days References: <20130311091852.GR3794@kib.kiev.ua> Date: Mon, 11 Mar 2013 10:59:36 +0100 In-Reply-To: <20130311091852.GR3794@kib.kiev.ua> (Konstantin Belousov's message of "Mon, 11 Mar 2013 11:18:52 +0200") Message-ID: <86k3pe1cl3.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 09:59:39 -0000 Konstantin Belousov writes: > The ahci(4) and md(4) is converted to accept unmapped BIO requests. > Other drivers and geom classes get the compat mapped BIOs, the > transient mapping is established by the geom down thread. The KVA > for the transient maping the carved from the buffer map, up to 10% > of which is repurposed to the transient bio KVA. The hope is that > the rest of drivers and geom classes will be converted to accept > unmapped i/o shortly, making the transient map unused. Could you briefly summarize what needs to be done to convert a driver? DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 18:25:02 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 44EEFA92 for ; Mon, 11 Mar 2013 18:25:02 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 97D3675F for ; Mon, 11 Mar 2013 18:25:01 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2BIOtAi004101; Mon, 11 Mar 2013 20:24:55 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2BIOtAi004101 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2BIOsc8004100; Mon, 11 Mar 2013 20:24:54 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 11 Mar 2013 20:24:54 +0200 From: Konstantin Belousov To: Dag-Erling Sm??rgrav Subject: Re: Unmapped buffers: to be merged in several days Message-ID: <20130311182454.GX3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="oMQ/0Eds4GFUZehv" Content-Disposition: inline In-Reply-To: <86k3pe1cl3.fsf@ds4.des.no> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 18:25:02 -0000 --oMQ/0Eds4GFUZehv Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 11, 2013 at 10:59:36AM +0100, Dag-Erling Sm??rgrav wrote: > Konstantin Belousov writes: > > The ahci(4) and md(4) is converted to accept unmapped BIO requests. > > Other drivers and geom classes get the compat mapped BIOs, the > > transient mapping is established by the geom down thread. The KVA > > for the transient maping the carved from the buffer map, up to 10% > > of which is repurposed to the transient bio KVA. The hope is that > > the rest of drivers and geom classes will be converted to accept > > unmapped i/o shortly, making the transient map unused. >=20 > Could you briefly summarize what needs to be done to convert a driver? There are different kind of drivers, each kind requires its own approach, and got different ground work done to ease the transition. For the geom classes, if the provider does not require access to the bio data, the conversion is as easy as setting the G_PF_ACCEPT_UNMAPPED in the flags. Look for the example in the sys/geom/part/g_part.c, which marks all gpart providers as accepting unmapped bios. If the class does need to access the bio data, to be marked unmapped-processing, the class should be rewritten. Now, the class should verify is the bio passed is mapped or not, and process the pages passed in the bio_ma array instead of bio_data. The involved example is sys/dev/md/md.c. For the disk(9) drivers, the flag to set is DISKFLAG_UNMAPPED_BIO. Again, the driver should be inspected before the flag is set. For the CAM-backed HBAs, there is a support provided to make the convertion trivial. If the hardware uses dma to perform the i/o and the driver uses busdma(9) to prepare the dma-able buffers for the dma engine, all the driver author needs is to ensure that bus_dmamap_load_ccb(9) function is used to handle the buffer load, and set the PIM_UNMAPPED flag. Most of the drivers were converted already to use load_ccb() during the Jeff' physbio work. For an examples of the trivial conversion, look at the sys/dev/ahci/ahci.c and sys/dev/siis/siis.c. I was conservative on setting the PIM_UNMAPPED for the untested drivers, but the only work that is needed is testing. Quite non-trivial driver to convert is ad(4), either CAMified or not. The issue is that even the drive that uses DMA could fall back to the PIO mode at runtime, requiring driver access to the potentially unmapped data. There are some preparations done for providing easy to use transient mapping KPI for such drivers, but it is not in the state where it could be usefully applied to the task, yet. --oMQ/0Eds4GFUZehv Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRPiF2AAoJEJDCuSvBvK1B4LcP/1NjEO+XgYIHaTwAHcJvDr9J 3RZZtno75xpp0W/GeGynwT0S6XS/+NM+5YM2AShwzlzf41pk4aQ1Kq2iWMBmsl8u xyfGBSj0LwbqZpioTLwc3p77xdqH1esnrnDebTThuMy2FFtgxJWpARKALceyLcji WYdZPw0Rl0Oo03yB6qButBYLOO6kTG0V0kgJmUx83xMMSLD/0vhk0OJuL+iMh33h 5IA7KUOilJcQFxqM82E0Q08kBsLfxt+7AqhcfGa0B6Y5ihZUWReLGf6h0/Ayv/MZ Bd3MjNm8auZx6Ou4AGNDiZfIOee9cwjH3ODt2Pc2ztJrKOnNiUzdAwYW4pHcyxQH tuY0NVoxSDxfxOMMdZjYs3kxJ99q3BcAIB4z7Fqpz0k2mzbhdJ8DdPW5pE1xqLK+ a1niidq49okJHcyZpiCY9RxpJaOi2fjY6coXH0Fdof3MXlc7+hn18Ldf/d3GqGkW I9Ux+VtMEWathZOSuj0q8HtQnZI70ffwq/Ik6AeOp1ivuusI9tSmoHLx9hQujoMm iKUY8ksR9jYDhcoEcEmq/LONowcRtnhcPJQ47LSTgQJpatdVHMktchoeJTj5nNYs RUodo50TuLaSIpjI5tWchqP1/0nGuO57xL3oC+7hdZOSvNovGq7ks5Nn+167/x4Y d0PHAyw8N/Tsy/tyPdCK =p/xK -----END PGP SIGNATURE----- --oMQ/0Eds4GFUZehv-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 21:04:28 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C2022D6F for ; Mon, 11 Mar 2013 21:04:28 +0000 (UTC) (envelope-from lev@serebryakov.spb.ru) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 86A8314B for ; Mon, 11 Mar 2013 21:04:28 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:8571:2d32:217f:d124]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 9D2994AC57; Tue, 12 Mar 2013 01:04:26 +0400 (MSK) Date: Tue, 12 Mar 2013 01:04:25 +0400 From: Lev Serebryakov X-Priority: 3 (Normal) Message-ID: <329178079.20130312010425@serebryakov.spb.ru> To: Konstantin Belousov Subject: Re: Unmapped buffers: to be merged in several days In-Reply-To: <20130311182454.GX3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Dag-Erling Sm??rgrav , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 21:04:28 -0000 Hello, Konstantin. You wrote 11 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 22:24:54: KB> If the class does need to access the bio data, to be marked KB> unmapped-processing, the class should be rewritten. Now, the class KB> should verify is the bio passed is mapped or not, and process the pages KB> passed in the bio_ma array instead of bio_data. The involved example is KB> sys/dev/md/md.c. Will GEOM class, which needs to touch data (like raid3 or my off-tree raid5), benefit from conversion, compare to generic mechanism, provided for not-converted by your patch? --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 21:12:08 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6CB2726D for ; Mon, 11 Mar 2013 21:12:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id C8CAF1D7 for ; Mon, 11 Mar 2013 21:12:07 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2BLBwmv041437; Mon, 11 Mar 2013 23:11:58 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2BLBwmv041437 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2BLBwX1041436; Mon, 11 Mar 2013 23:11:58 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 11 Mar 2013 23:11:58 +0200 From: Konstantin Belousov To: Lev Serebryakov Subject: Re: Unmapped buffers: to be merged in several days Message-ID: <20130311211158.GE3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="JU1P3uBe1BU9Y2xT" Content-Disposition: inline In-Reply-To: <329178079.20130312010425@serebryakov.spb.ru> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Dag-Erling Sm??rgrav , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 21:12:08 -0000 --JU1P3uBe1BU9Y2xT Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 12, 2013 at 01:04:25AM +0400, Lev Serebryakov wrote: > Hello, Konstantin. > You wrote 11 =CD=C1=D2=D4=C1 2013 =C7., 22:24:54: >=20 > KB> If the class does need to access the bio data, to be marked > KB> unmapped-processing, the class should be rewritten. Now, the class > KB> should verify is the bio passed is mapped or not, and process the pag= es > KB> passed in the bio_ma array instead of bio_data. The involved example = is > KB> sys/dev/md/md.c. > Will GEOM class, which needs to touch data (like raid3 or my off-tree > raid5), benefit from conversion, compare to generic mechanism, > provided for not-converted by your patch? First, what do you mean by 'benefit'. Answer would obviously depend on the criteria. Second, I do not think that any wizard can usefully answer this question, for usual criteria like speed or code maintanability. FWIW, I tried to get an Intel documentation for IOAT engine which should allow to perform the XOR checksumming of the unmapped buffers, suitable for e.g. hardware-assisted software raid5, but did not succeeded. --JU1P3uBe1BU9Y2xT Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRPkieAAoJEJDCuSvBvK1BU58P/R1NuIJ5JnEjBO5jQ6dOrDV0 RiyFHXaHZeHIEQsPe5mtgCjOLcRD2OBicTgnjJK/el8UBsuF9LrUpkX76ATfoj10 ibMi8oB61ZR1ub9llmXlEA1vykZxV1CbODPhokAQLvf3bEJLxOLLIEajtNsmlbkE sBB9fP5KcrHIfv57Ya/mnVEeQo9tC3NPsmXdq/fJZu7wj7zwU+nSMX8jVqz2clr5 18A6stx2in0MKueYEhAtESq1oYGTHSEV7GXiowFq/Gj7mHy30I4RVXqoOGEsAi/I Rv7G+zRfk9cm2o+2lug8+PirYnYK7Zslrd5t8/7LhS8zIuHCNXQCCs42EMPJ8rPH 5GzQk6keDqlZxFBaIxSk6Ni2NDIiCWkcUce3vrupTsOqfyJGpRyuUnR6Hn2S73kV C2zHD+QyHrNRRdpD/yR355LbiVRVqok26abE4s4ldgthv7kacTNjkBdNWnLXEU2z iScoVsL7mkrhLtWSOtY5665f/lcAgtvkoO6F6WEt3Hw0dSCxDkHaYnPAfZAiZWi+ bFpmL98HcGMH+dObBWIQupAOFPO/prrf1nwYVSBTK4se5NZghcdVjAu1Sf3m+tVo 9hWYWaa0Knp0/RGxZ90Aj6tb6mJ9/X53wLG0wgZYWqiCoMYmwO2APKepgrjhwVMF wqfbwFArOE+HLotLOiqi =J1br -----END PGP SIGNATURE----- --JU1P3uBe1BU9Y2xT-- From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 21:22:54 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F34209B7 for ; Mon, 11 Mar 2013 21:22:53 +0000 (UTC) (envelope-from lev@serebryakov.spb.ru) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id B99B42AB for ; Mon, 11 Mar 2013 21:22:53 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:8571:2d32:217f:d124]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 53E0A4AC58; Tue, 12 Mar 2013 01:22:52 +0400 (MSK) Date: Tue, 12 Mar 2013 01:22:50 +0400 From: Lev Serebryakov X-Priority: 3 (Normal) Message-ID: <1106238192.20130312012250@serebryakov.spb.ru> To: Konstantin Belousov Subject: Re: Unmapped buffers: to be merged in several days In-Reply-To: <20130311211158.GE3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 21:22:54 -0000 Hello, Konstantin. You wrote 12 =CD=C1=D2=D4=C1 2013 =C7., 1:11:58: >> KB> If the class does need to access the bio data, to be marked >> KB> unmapped-processing, the class should be rewritten. Now, the class >> KB> should verify is the bio passed is mapped or not, and process the pa= ges >> KB> passed in the bio_ma array instead of bio_data. The involved example= is >> KB> sys/dev/md/md.c. >> Will GEOM class, which needs to touch data (like raid3 or my off-tree >> raid5), benefit from conversion, compare to generic mechanism, >> provided for not-converted by your patch? KB> First, what do you mean by 'benefit'. Answer would obviously depend KB> on the criteria. When you did this work (unmapped buffers) you had some meaning of `benefit' for whole system (FreeBSD) in mind, do you? I don't think, you do such complex work to do things worse, not better :) As far as I understand, this code significantly decrease TLB shoot-out for disk-intensive tasks in mult-core/CPU systems (and that leads to better I/O and overall system performance), at cost of more complex code in some places. Am I right? So, my question could be re-formulated in this framework like this: could be custom code in GEOM class (but such, that provide access to every data byte in every BIO) less stressful for system / faster than generic code, provided by you in GEOM framework for old classes? KB> Second, I do not think that any wizard can usefully answer this questio= n, KB> for usual criteria like speed or code maintanability. But you have answer for YOUR code, right? You could predict behavior of "generic" code for GEOMs without proper flag (you've wrote this code, after all!) and "reasonable" code, which should be written by GEOM class author for GEOM class, which need touch every byte in BIO buffer. I don't think, here is many variants of such code exists, am I right? KB> FWIW, I tried to get an Intel documentation for IOAT engine which should KB> allow to perform the XOR checksumming of the unmapped buffers, suitable KB> for e.g. hardware-assisted software raid5, but did not succeeded. :-( --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 22:22:49 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 17AED151 for ; Mon, 11 Mar 2013 22:22:49 +0000 (UTC) (envelope-from jim.harris@gmail.com) Received: from mail-ea0-x22d.google.com (mail-ea0-x22d.google.com [IPv6:2a00:1450:4013:c01::22d]) by mx1.freebsd.org (Postfix) with ESMTP id 7890474F for ; Mon, 11 Mar 2013 22:22:48 +0000 (UTC) Received: by mail-ea0-f173.google.com with SMTP id h14so1415918eak.18 for ; Mon, 11 Mar 2013 15:22:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=WydNQZQBLvUnbz3F9YiAdJ2VPOO9bBeYDGEYoTIdd7w=; b=n9njR17O5UOYACieAnTjzX6nXsIvuJPSmLj5YDHrWmvNYUxrPep51bykhG4diOJnVe GkHlqdiIhs+UrqDspbnSH67g+7iFd5naK0KcqaXxITgHyfB10Oyv7zD0Wtfu+PDZqTbM yLa6MnVFgP1ZVNZO3Yg1DTKct8SnMsg3arH4zOXZ5azZ6mQSmLopIzWqokPrbMCnavhR 3/oYk/xaYfUnvi3f+P16lU9GdTbF906gc+EtB8zOGhV2fMjAJA3K4lmihGYyadiJ4Wib LKi7eQenxws5FTZ8OGhGBOh4v4l5P51nH2RNHIy9M1RbdA/rHl0S1MccK2inB85710JN Z6Sg== MIME-Version: 1.0 X-Received: by 10.15.34.198 with SMTP id e46mr41086638eev.27.1363040567605; Mon, 11 Mar 2013 15:22:47 -0700 (PDT) Sender: jim.harris@gmail.com Received: by 10.14.96.129 with HTTP; Mon, 11 Mar 2013 15:22:47 -0700 (PDT) In-Reply-To: <20130311211158.GE3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> Date: Mon, 11 Mar 2013 15:22:47 -0700 X-Google-Sender-Auth: 1pXb4TyEzLXkMP2g0HR9yew22P4 Message-ID: Subject: Re: Unmapped buffers: to be merged in several days From: Jim Harris To: Konstantin Belousov Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Dag-Erling Sm??rgrav , Lev Serebryakov , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 22:22:49 -0000 On Mon, Mar 11, 2013 at 2:11 PM, Konstantin Belousov wrote: > FWIW, I tried to get an Intel documentation for IOAT engine which should > allow to perform the XOR checksumming of the unmapped buffers, suitable > for e.g. hardware-assisted software raid5, but did not succeeded. > Please note that XOR checksumming support only exists in the E5-1600/2400/2600/4600 series (Sandy Bridge Xeon) and C5500/C3500 series (Jasper Forest - based on Nehalem Xeon) processors. The SNB Xeon series EDS section for IOAT isn't public, but the Jasper Forest datasheet volume 2 does contain the register interfaces. See sections 3.9 and section 3.10. The ioat HW interface for XOR is the same for both series save for some delta in errata. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-c5500-c3500-datasheet-vol-2.pdf Also note there is an ioat driver (DMA operations only) on the user/jimharris/ioat branch in the FreeBSD SVN repo. The Linux ioat driver with XOR support is dual-licensed BSD/GPL. Regards, -Jim From owner-freebsd-arch@FreeBSD.ORG Mon Mar 11 23:14:16 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CA594DAF for ; Mon, 11 Mar 2013 23:14:16 +0000 (UTC) (envelope-from mjacob@freebsd.org) Received: from ns1.feral.com (ns1.feral.com [192.67.166.1]) by mx1.freebsd.org (Postfix) with ESMTP id A70C2935 for ; Mon, 11 Mar 2013 23:14:15 +0000 (UTC) Received: from [192.168.135.7] (76-14-49-207.sf-cable.astound.net [76.14.49.207]) (authenticated bits=0) by ns1.feral.com (8.14.6/8.14.4) with ESMTP id r2BN7f2m088200 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Mon, 11 Mar 2013 16:07:41 -0700 (PDT) (envelope-from mjacob@freebsd.org) Message-ID: <513E63BC.9020105@freebsd.org> Date: Mon, 11 Mar 2013 16:07:40 -0700 From: Matthew Jacob Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: freebsd-arch@freebsd.org Subject: Re: Unmapped buffers: to be merged in several days References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (ns1.feral.com [192.67.166.1]); Mon, 11 Mar 2013 16:07:41 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: mjacob@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Mar 2013 23:14:16 -0000 On 3/11/2013 3:22 PM, Jim Harris wrote: > On Mon, Mar 11, 2013 at 2:11 PM, Konstantin Belousov wrote: > >> FWIW, I tried to get an Intel documentation for IOAT engine which should >> allow to perform the XOR checksumming of the unmapped buffers, suitable >> for e.g. hardware-assisted software raid5, but did not succeeded. >> > Please note that XOR checksumming support only exists in the > E5-1600/2400/2600/4600 series (Sandy Bridge Xeon) and C5500/C3500 series > (Jasper Forest - based on Nehalem Xeon) processors. The SNB Xeon series > It's been our experience at Xyratex that the h/w XOR checksum engine is generally slower than the CPU, at least with more modern Sandy Bridge chipsets. But, as you know, YMMV. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 12 06:09:08 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id D86398F9; Tue, 12 Mar 2013 06:09:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7A98D9CD; Tue, 12 Mar 2013 06:09:08 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2C690cd035561; Tue, 12 Mar 2013 08:09:00 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2C690cd035561 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2C68x3o035560; Tue, 12 Mar 2013 08:08:59 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 12 Mar 2013 08:08:59 +0200 From: Konstantin Belousov To: Jim Harris Subject: Re: Unmapped buffers: to be merged in several days Message-ID: <20130312060859.GI3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3lO0Bl7Vyth644rr" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Dag-Erling Sm??rgrav , Lev Serebryakov , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2013 06:09:08 -0000 --3lO0Bl7Vyth644rr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 11, 2013 at 03:22:47PM -0700, Jim Harris wrote: > On Mon, Mar 11, 2013 at 2:11 PM, Konstantin Belousov wrote: >=20 > > FWIW, I tried to get an Intel documentation for IOAT engine which should > > allow to perform the XOR checksumming of the unmapped buffers, suitable > > for e.g. hardware-assisted software raid5, but did not succeeded. > > >=20 > Please note that XOR checksumming support only exists in the > E5-1600/2400/2600/4600 series (Sandy Bridge Xeon) and C5500/C3500 series > (Jasper Forest - based on Nehalem Xeon) processors. The SNB Xeon series > EDS section for IOAT isn't public, but the Jasper Forest datasheet volume= 2 > does contain the register interfaces. See sections 3.9 and section 3.10. > The ioat HW interface for XOR is the same for both series save for some > delta in errata. >=20 > http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xe= on-c5500-c3500-datasheet-vol-2.pdf Thank you for this pointer too. I remember from looking at the e5-1600 registers description, that all magic happens in the channel commands, which specifications are absent in the chipset documentation. >=20 > Also note there is an ioat driver (DMA operations only) on the > user/jimharris/ioat branch in the FreeBSD SVN repo. The Linux ioat driver > with XOR support is dual-licensed BSD/GPL. >=20 Yes, I did some reading of this code. --3lO0Bl7Vyth644rr Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRPsZ7AAoJEJDCuSvBvK1B6+EQAJns14Gy3UCBwlRFseY0rK9C /Mh8FeJPRISuiBLDNg8RrIU3qI4ApZRzCw6lYqdlM5LslliiIOArsXR2rQuxm7D3 hUmXEEfAoiRqKmhLZKX/yUC8hjq/dhK8s5nRraJVscQbrkgl4vrm6esdARoeJkH7 RbrCcCOA/2dVNnJi7CRuMlhU3kLXfHwcw5pBgajXCP3fvX//xYadM2uADjuvWOG8 NL8Gky7YsLB/P0On7pxMFTi9FmlJ0QHk0BfzfAiyAjzxvyRmXdFtPeWRQBTmxYDl AzsES62QnBWtkgGOhCszbOqed5JMw+eCjbML5XqcoJY9n9Qgs+Ej0Pkme1eWAO2j CV6+Lh4/gWj/sbjjG4H6N0W0qGn2ignqotfyw5TT8x7P37Ldb2iqE0yaZjg9lYEl 1xNsXnV9Bgfmy0Z5+4LpCwbOkJyZWqldnT6BTlDLLfnnTyn8VwEuSjkbZ2JcQYQv 0vMzGULp5O8SlKJqtBfc8/hk2mdxJsaEus9Vy6ivzbSE/1N5PTlZ4Zj2E/+4D01x hBXoOWWnFhPa1E3086o0+rF3plgMYXihpvj2wiGlJe/8B1pkSeNIq3cgknBcOES7 T1myRzyDGmFsuBDxYbc7KRtyOL9q9KhoWeajlEOq+zCK5XYden8zzwUODaLYn/Lq 5v2EuyjLT0g/DbkJOODm =hDZ8 -----END PGP SIGNATURE----- --3lO0Bl7Vyth644rr-- From owner-freebsd-arch@FreeBSD.ORG Tue Mar 12 06:53:32 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 620C4C8; Tue, 12 Mar 2013 06:53:32 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 19447B5A; Tue, 12 Mar 2013 06:53:32 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:8571:2d32:217f:d124]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 6C8854AC57; Tue, 12 Mar 2013 10:53:28 +0400 (MSK) Date: Tue, 12 Mar 2013 10:53:26 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <913397478.20130312105326@serebryakov.spb.ru> To: Jim Harris Subject: Re: Unmapped buffers: to be merged in several days In-Reply-To: References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Konstantin Belousov , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2013 06:53:32 -0000 Hello, Jim. You wrote 12 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 2:22:47: JH> Please note that XOR checksumming support only exists in the JH> E5-1600/2400/2600/4600 series (Sandy Bridge Xeon) and C5500/C3500 series JH> (Jasper Forest - based on Nehalem Xeon) processors. The SNB Xeon series And what about ICHxR/ICHxDO (where x =3D 7, 8, 9, 10) chipsets? They s= upport software RAID5 in Windows drivers, is it support pure-software? I understand, that it is not true-hardware RAID, but does they have XOR engine for unmapped buffers? --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-arch@FreeBSD.ORG Tue Mar 12 15:19:45 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A21BB1C0; Tue, 12 Mar 2013 15:19:45 +0000 (UTC) (envelope-from jim.harris@gmail.com) Received: from mail-ee0-f47.google.com (mail-ee0-f47.google.com [74.125.83.47]) by mx1.freebsd.org (Postfix) with ESMTP id 0ADB1192; Tue, 12 Mar 2013 15:19:44 +0000 (UTC) Received: by mail-ee0-f47.google.com with SMTP id e52so2768613eek.20 for ; Tue, 12 Mar 2013 08:19:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=sv3IXROZ+rZkKfQ6DmnH/Ki0s5ZZb9Rb9aB1E6Qw1o8=; b=0FPBXOBk7wbcI0Hu+nCCdMQBmr0zZSqQTm93H42+GuCaS57nE/9hbHhnlbo5kSFxri K+gda9yWwKCVzGxPu3JFS8enCvpswr5JTvKnJPEqrLIVeU3QomqzNgtQqcrVbgyYdbbX rv/5WvTVgoEDAeVcyrKgaVABu4JI4viTl09DSVDtXS9ywBeN3Zd6VdP8l6gkYyHtbgyp 6cKOLIexVf35hO4BV8mBmt/UeTcbjQztRDqdzTHhLqAp2aFgBFzoja9TvGiX/ylHjYEl Uk+KsOUXf9yJVYEbAcB8jDXD3qMcem22EnWRNv6mDP9HNOcZx5TP1PyNnkqGKlmPmm1O UBPQ== MIME-Version: 1.0 X-Received: by 10.15.21.4 with SMTP id c4mr48688554eeu.34.1363101583661; Tue, 12 Mar 2013 08:19:43 -0700 (PDT) Received: by 10.14.96.129 with HTTP; Tue, 12 Mar 2013 08:19:43 -0700 (PDT) In-Reply-To: <913397478.20130312105326@serebryakov.spb.ru> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> <913397478.20130312105326@serebryakov.spb.ru> Date: Tue, 12 Mar 2013 08:19:43 -0700 Message-ID: Subject: Re: Unmapped buffers: to be merged in several days From: Jim Harris To: lev@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Konstantin Belousov , arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2013 15:19:45 -0000 On Mon, Mar 11, 2013 at 11:53 PM, Lev Serebryakov wrote: > Hello, Jim. > You wrote 12 =D0=BC=D0=B0=D1=80=D1=82=D0=B0 2013 =D0=B3., 2:22:47: > > JH> Please note that XOR checksumming support only exists in the > JH> E5-1600/2400/2600/4600 series (Sandy Bridge Xeon) and C5500/C3500 > series > JH> (Jasper Forest - based on Nehalem Xeon) processors. The SNB Xeon > series > And what about ICHxR/ICHxDO (where x =3D 7, 8, 9, 10) chipsets? They > support > software RAID5 in Windows drivers, is it support pure-software? I > understand, that it is not true-hardware RAID, but does they have XOR > engine for unmapped buffers? > > The XOR calculations for ICH-based RAID5 are all done strictly in software on mapped buffers - no hardware offload. From owner-freebsd-arch@FreeBSD.ORG Tue Mar 12 23:01:39 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 526828AB for ; Tue, 12 Mar 2013 23:01:39 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from flat.berklix.org (flat.berklix.org [83.236.223.115]) by mx1.freebsd.org (Postfix) with ESMTP id B4FA5EE0 for ; Tue, 12 Mar 2013 23:01:38 +0000 (UTC) Received: from mart.js.berklix.net (p57BCFE7C.dip.t-dialin.net [87.188.254.124]) (authenticated bits=128) by flat.berklix.org (8.14.5/8.14.5) with ESMTP id r2CN0vLf006836; Wed, 13 Mar 2013 00:00:57 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id r2CN1KHE025314; Wed, 13 Mar 2013 00:01:20 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id r2CN0vHY068859; Wed, 13 Mar 2013 00:01:02 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201303122301.r2CN0vHY068859@fire.js.berklix.net> To: arch@freebsd.org Subject: IBM Active Memory Expansion = compression in the idle loop. From: "Julian H. Stacey" Organization: http://berklix.com BSD Linux Unix Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://www.berklix.com/free/ X-URL: http://www.berklix.com/~jhs/cv/ Date: Wed, 13 Mar 2013 00:00:57 +0100 Sender: jhs@berklix.com Cc: Wolfgang Stief X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2013 23:01:39 -0000 Hi arch@freebsd.org cc Wolfgang Stief FYI Just mentioning this for general interest: At an IBM presentation to SAGE (Sys Admin Guild) in Munich yesterday 2013-03-12, IBM mentioned "Active Memory Expansion' Which is memory compression using spare CPU cycles, IBM tend to have more spare CPU cycles as sometimes software is only licenced for so many CPUs, & other CPUs are idling. Interesting idea, though presumably less useful for FreeBSD (& Linux etc) where we dont generaly have those licensed binary per CPU issues, so perhaps less spare unused CPU cycles. http://ixquick.com (search engine) found this: https://www.ibm.com/developerworks/wikis/display/WikiPtype/IBM+Active+Memory+Expansion The projector slides were some German, some American, some PDFs of the evening's 3 presentations will be linked here in a couple of days I expect. http://www.guug.de/lokal/muenchen/index.html I didnt ask (but wondered) what sort of loads would have lots of flabby data that could be easily cheaply compressed. IBM were obviously focused on business databases (I wonder if they still ahve fixed length records ?) Presumably less interesting to compress RAM data if a CPU is working on eg geographic topography data or some such ? I also didnt ask if it was patented (in case any think "great idea" & rush off to code :-) Apparently IBM's Linux dev drivers are all public source, not binary only, (but presumably FSF licence), but I think this IBM Active Memory Expansion (AME) is only (as per URL above) for * HMC: V7R7.1.0.0 * eFW: 7.1 * AIX: 6.1 TL4 SP2 & not for Linux, so probably there's no public source to browse. Anyway, seemed an odd new idea to me. Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Reply below not above, like a play script. Indent old text with "> ". Send plain text. No quoted-printable, HTML, base64, multipart/alternative. From owner-freebsd-arch@FreeBSD.ORG Wed Mar 13 00:32:37 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 159927B for ; Wed, 13 Mar 2013 00:32:37 +0000 (UTC) (envelope-from ian@FreeBSD.org) Received: from mho-02-ewr.mailhop.org (mho-04-ewr.mailhop.org [204.13.248.74]) by mx1.freebsd.org (Postfix) with ESMTP id CFE2B372 for ; Wed, 13 Mar 2013 00:32:36 +0000 (UTC) Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52] helo=damnhippie.dyndns.org) by mho-02-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1UFZcR-000AB9-OB; Wed, 13 Mar 2013 00:32:35 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r2D0WX9p011672; Tue, 12 Mar 2013 18:32:33 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 24.8.230.52 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/QJ2b7eniFEU3RGSrfz35R Subject: Re: Unmapped buffers: to be merged in several days From: Ian Lepore To: Konstantin Belousov In-Reply-To: <20130311091852.GR3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> Content-Type: multipart/mixed; boundary="=-uwgpPeDn+da0oGAf/a9S" Date: Tue, 12 Mar 2013 18:32:33 -0600 Message-ID: <1363134753.1291.287.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2013 00:32:37 -0000 --=-uwgpPeDn+da0oGAf/a9S Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Mon, 2013-03-11 at 11:18 +0200, Konstantin Belousov wrote: > The latest version of the unmapped buffers patch is available at > http://people.freebsd.org/~kib/misc/unmapped.17.patch > The patch makes the user data buffers, as well as the page-ins, for > UFS, the swap-in/out, clustering use unmapped buffers, removing the TLB > shootdown overhead and buffer map contention and fragmentation. > The ahci(4) and md(4) is converted to accept unmapped BIO requests. > > Other drivers and geom classes get the compat mapped BIOs, the > transient mapping is established by the geom down thread. The KVA > for the transient maping the carved from the buffer map, up to 10% > of which is repurposed to the transient bio KVA. The hope is that > the rest of drivers and geom classes will be converted to accept > unmapped i/o shortly, making the transient map unused. > > The patch was tested by Peter Holm using the whole stress2 suite, > on both i386 and amd64, on ahci(4) and ad(4) attached disks. ad(4) > uses the transient remapping for unmapped requests, so the testing > should cover both new and old i/o pathes. The previous version of the > patch is already used on some high-load machines by Scott Long, on > ahci(4), isci(4) and mps(4). Brendan Fabeny did useful testing in his > environment. > > The biggest change comparing to the previous mail, is the prevention of > the deadlocks due to the bugs in the bufspace limit code. In the HEAD, > bufspace is equal to the size of the buffer map, which effectively > makes the code which limits the total space allocated to buffers, by > maxbufspace, a nop, due to the buffer map fragmentation. > > In the patch, filesystem metadata is not the subject to maxbufspace > limit anymore. Since the metadata buffers are always mapped, the buffers > still have to fit into the buffer map, which provides a reasonable > (but practically unreachable) upper bound on it. The non-metadata buffer > allocations, both mapped and unmapped, is accounted against maxbufspace, > as before. Effectively, this means that the maxbufspace is forced on > mapped and unmapped buffers separately. > > I intend to commit the change as is, with the following modifications: > - the pmap_copy_pages() will be a stub for all architectures where > it was not tested. The only tested arches are i386, amd64 and powerpc64. > - For all architectures where pmap_copy_pages() is a stub, the GB_UNMAPPED > flag for the buffer allocators will be nop. > > FYI. I tested this for armv4 today, and it works. I had a (bogus) used-before-init warning from gcc, and I had to add a couple lines of code to the pmap_copy_pages() to increment some variables; patch attached. I think the pmap-v6 routine needs the same change, but I didn't get as far as testing v6 yet. I tested with both the md and ahci drivers on armv4. Peformance seemed to be about the same before and after based on some crude tests such as "time tar -cf - /mnt >/dev/null" where I had the ahci drive (a fast ssd with a few hundred MB of data on ufs) mounted on /mnt. I don't have a v6 board with a sata interface running yet, but I can test with md, hopefully I'll get to it tomorrow. -- Ian --=-uwgpPeDn+da0oGAf/a9S Content-Disposition: inline; filename="unmapped.17.armfixes.diff" Content-Type: text/x-patch; name="unmapped.17.armfixes.diff"; charset="us-ascii" Content-Transfer-Encoding: 7bit Minimal changes required to get umapped.17 to build and run. diff -r 179fcc6b2485 -r 2f1c61450df0 sys/arm/arm/pmap.c --- a/sys/arm/arm/pmap.c Tue Mar 12 13:41:10 2013 -0600 +++ b/sys/arm/arm/pmap.c Tue Mar 12 13:45:34 2013 -0600 @@ -4458,6 +4458,9 @@ pmap_copy_pages(vm_page_t ma[], vm_offse pmap_copy_page_offs_func(VM_PAGE_TO_PHYS(a_pg), a_pg_offset, VM_PAGE_TO_PHYS(b_pg), b_pg_offset, cnt); #endif + xfersize -= cnt; + a_offset += cnt; + b_offset += cnt; } } diff -r 179fcc6b2485 -r 2f1c61450df0 sys/dev/md/md.c --- a/sys/dev/md/md.c Tue Mar 12 13:41:10 2013 -0600 +++ b/sys/dev/md/md.c Tue Mar 12 13:45:34 2013 -0600 @@ -753,9 +753,10 @@ mdstart_vnode(struct md_s *sc, struct bi KASSERT(bp->bio_length <= MAXPHYS, ("bio_length %jd", (uintmax_t)bp->bio_length)); - if ((bp->bio_flags & BIO_UNMAPPED) == 0) + if ((bp->bio_flags & BIO_UNMAPPED) == 0) { + pb = NULL; aiov.iov_base = bp->bio_data; - else { + } else { pb = getpbuf(&md_vnode_pbuf_freecnt); pmap_qenter((vm_offset_t)pb->b_data, bp->bio_ma, bp->bio_ma_n); aiov.iov_base = (void *)((vm_offset_t)pb->b_data + --=-uwgpPeDn+da0oGAf/a9S-- From owner-freebsd-arch@FreeBSD.ORG Wed Mar 13 09:33:57 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 20442DBB; Wed, 13 Mar 2013 09:33:57 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 72EC86C; Wed, 13 Mar 2013 09:33:56 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r2D9XmCg083526; Wed, 13 Mar 2013 11:33:48 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.0 kib.kiev.ua r2D9XmCg083526 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r2D9Xmk7083525; Wed, 13 Mar 2013 11:33:48 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 13 Mar 2013 11:33:48 +0200 From: Konstantin Belousov To: Ian Lepore Subject: Re: Unmapped buffers: to be merged in several days Message-ID: <20130313093348.GU3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <1363134753.1291.287.camel@revolution.hippie.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="z0bQ+oXmfTh/YqQi" Content-Disposition: inline In-Reply-To: <1363134753.1291.287.camel@revolution.hippie.lan> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2013 09:33:57 -0000 --z0bQ+oXmfTh/YqQi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Mar 12, 2013 at 06:32:33PM -0600, Ian Lepore wrote: > I tested this for armv4 today, and it works. I had a (bogus) > used-before-init warning from gcc, and I had to add a couple lines of > code to the pmap_copy_pages() to increment some variables; patch > attached. I think the pmap-v6 routine needs the same change, but I > didn't get as far as testing v6 yet. =20 >=20 > I tested with both the md and ahci drivers on armv4. Peformance seemed > to be about the same before and after based on some crude tests such as > "time tar -cf - /mnt >/dev/null" where I had the ahci drive (a fast ssd > with a few hundred MB of data on ufs) mounted on /mnt. >=20 > I don't have a v6 board with a sata interface running yet, but I can > test with md, hopefully I'll get to it tomorrow. >=20 > -- Ian >=20 > Minimal changes required to get umapped.17 to build and run. >=20 > diff -r 179fcc6b2485 -r 2f1c61450df0 sys/arm/arm/pmap.c > --- a/sys/arm/arm/pmap.c Tue Mar 12 13:41:10 2013 -0600 > +++ b/sys/arm/arm/pmap.c Tue Mar 12 13:45:34 2013 -0600 > @@ -4458,6 +4458,9 @@ pmap_copy_pages(vm_page_t ma[], vm_offse > pmap_copy_page_offs_func(VM_PAGE_TO_PHYS(a_pg), a_pg_offset, > VM_PAGE_TO_PHYS(b_pg), b_pg_offset, cnt); > #endif > + xfersize -=3D cnt; > + a_offset +=3D cnt; > + b_offset +=3D cnt; > } > } > =20 > diff -r 179fcc6b2485 -r 2f1c61450df0 sys/dev/md/md.c > --- a/sys/dev/md/md.c Tue Mar 12 13:41:10 2013 -0600 > +++ b/sys/dev/md/md.c Tue Mar 12 13:45:34 2013 -0600 > @@ -753,9 +753,10 @@ mdstart_vnode(struct md_s *sc, struct bi > =20 > KASSERT(bp->bio_length <=3D MAXPHYS, ("bio_length %jd", > (uintmax_t)bp->bio_length)); > - if ((bp->bio_flags & BIO_UNMAPPED) =3D=3D 0) > + if ((bp->bio_flags & BIO_UNMAPPED) =3D=3D 0) { > + pb =3D NULL; > aiov.iov_base =3D bp->bio_data; > - else { > + } else { > pb =3D getpbuf(&md_vnode_pbuf_freecnt); > pmap_qenter((vm_offset_t)pb->b_data, bp->bio_ma, bp->bio_ma_n); > aiov.iov_base =3D (void *)((vm_offset_t)pb->b_data + Both are applied, thank you, arm change is needed for pmap-v6.c as well, definitely. I do not expect to see much, if any, change in the system time on the single-core machines, esp. on the memory-starved configurations. What you could try to measure to get the targeted test, is to calculate the sha1 of some large file which still fits into the RAM, but causes the buffer cache trashing. E.g., on my workstation with 12GB of RAM and 1.2GB of buffer cache space, I would take 10GB file and measure the time to sha1 it, 4 times. First run is to load the pages into cache, ignore the first run timings. Next 3 runs give you the data for ministat(1), for patched and for stock kernels. On 4x core/HTT sandy bridge machine, I see 30% reduction of the system time in the described test. For single-core, the difference should be much less, if any. --z0bQ+oXmfTh/YqQi Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRQEf7AAoJEJDCuSvBvK1Bg/cQAJZCNV0EPITLyND8YMfpSzG8 WINMfTOOqVPU2Vz2Tz23EAzuIwo/qv9aLY8mp9Ty5OvXPr8v1txPpTUS1+zzFttH yFshGJyrHzFWl74eWEunJDkfA9WyJxaiOKRgOPIPqzrrhamt0GuA21kRfdx2U4Xj pcKHwkltBCsViD/eHCts1s2JG6jekDCMq29tJA/XXklafEumOWmmOH2vALK+Bobo I8O7a+iz50WFZsmPmr8TXlUUONBaCQaKEMyRZ3/UDrcMbvHv7GeiK6SxadhNWrnr 93lK2BBNUD6B4MG3gN+tOfx5qQ/EpVgmiHdtlJbgha/s/44H0Oog0X6EMVuJ4t3a x0zq9EY7O3aPpsYVSYpSPLKJ3WLYwBMpcPMWJ1fyhNptukDp4IVf8AJVWoJ6g+HD TeuTXy8TnhOodYJZh55OQhN24DCs8SH9qAOmodGbAwNo7K5bg+t24dnv28T+Ji+O FB0samkuzfCSBJAWtDJYft3KtaBtJqQdG74X4IAHpsIalZES5pLuiQv9TH3+Z/c9 aWyNJEKFBDlupUe7Dd5DgPl9/H9/54Z7n3toPwE2zk6kiIB4YvO3gAEovulBsKk+ tuZl9XLwTdzW1xsipIW2i7HuqfiZuqYpZntWo4KrOLoCE7NNheAQWiLzc8b5EgFH dlQlHa+R+zDbtkfreHfd =5hue -----END PGP SIGNATURE----- --z0bQ+oXmfTh/YqQi-- From owner-freebsd-arch@FreeBSD.ORG Wed Mar 13 11:02:33 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 7F4E751D for ; Wed, 13 Mar 2013 11:02:33 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 464BB814 for ; Wed, 13 Mar 2013 11:02:33 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:8571:2d32:217f:d124]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 02A924AC58; Wed, 13 Mar 2013 15:02:31 +0400 (MSK) Date: Wed, 13 Mar 2013 15:02:28 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <14310571813.20130313150228@serebryakov.spb.ru> To: Konstantin Belousov Subject: Re: Unmapped buffers: to be merged in several days In-Reply-To: <20130311211158.GE3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2013 11:02:33 -0000 Hello, Konstantin. You wrote 12 =CD=C1=D2=D4=C1 2013 =C7., 1:11:58: >> Will GEOM class, which needs to touch data (like raid3 or my off-tree >> raid5), benefit from conversion, compare to generic mechanism, >> provided for not-converted by your patch? KB> First, what do you mean by 'benefit'. Answer would obviously depend KB> on the criteria. One more thought: it seems, that raid3/raid5 software implementations could benefit on reading when strict checks are disabled. Is it possible to map downstream BIOs, but use upstream ones unmapped? --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-arch@FreeBSD.ORG Wed Mar 13 11:04:05 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B67185DD for ; Wed, 13 Mar 2013 11:04:05 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [46.4.40.135]) by mx1.freebsd.org (Postfix) with ESMTP id 7CBB3829 for ; Wed, 13 Mar 2013 11:04:05 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:8571:2d32:217f:d124]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id DDEB74AC57; Wed, 13 Mar 2013 15:03:57 +0400 (MSK) Date: Wed, 13 Mar 2013 15:03:55 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1164735347.20130313150355@serebryakov.spb.ru> To: Konstantin Belousov , arch@freebsd.org Subject: Re: Unmapped buffers: to be merged in several days In-Reply-To: <14310571813.20130313150228@serebryakov.spb.ru> References: <20130311091852.GR3794@kib.kiev.ua> <86k3pe1cl3.fsf@ds4.des.no> <20130311182454.GX3794@kib.kiev.ua> <329178079.20130312010425@serebryakov.spb.ru> <20130311211158.GE3794@kib.kiev.ua> <14310571813.20130313150228@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Mar 2013 11:04:05 -0000 Hello, Lev. You wrote 13 =CD=C1=D2=D4=C1 2013 =C7., 15:02:28: >>> Will GEOM class, which needs to touch data (like raid3 or my off-tree >>> raid5), benefit from conversion, compare to generic mechanism, >>> provided for not-converted by your patch? KB>> First, what do you mean by 'benefit'. Answer would obviously depend KB>> on the criteria. LS> One more thought: it seems, that raid3/raid5 software implementations LS> could benefit on reading when strict checks are disabled. Is it LS> possible to map downstream BIOs, but use upstream ones unmapped? Where could I see documentation/examples how to: (a) manually map unmapped BIO (b) create unmapped BIO to pass downstream --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-arch@FreeBSD.ORG Thu Mar 14 14:42:58 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DDC96325 for ; Thu, 14 Mar 2013 14:42:58 +0000 (UTC) (envelope-from ian@FreeBSD.org) Received: from mho-01-ewr.mailhop.org (mho-03-ewr.mailhop.org [204.13.248.66]) by mx1.freebsd.org (Postfix) with ESMTP id AAF3D173 for ; Thu, 14 Mar 2013 14:42:58 +0000 (UTC) Received: from c-24-8-230-52.hsd1.co.comcast.net ([24.8.230.52] helo=damnhippie.dyndns.org) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1UG9Mq-000462-68; Thu, 14 Mar 2013 14:42:52 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r2EEgmfu013936; Thu, 14 Mar 2013 08:42:48 -0600 (MDT) (envelope-from ian@FreeBSD.org) X-Mail-Handler: Dyn Standard SMTP by Dyn X-Originating-IP: 24.8.230.52 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/sendlabs/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX1/sffRmUjSjxsZqTy9batN4 Subject: Re: Unmapped buffers: to be merged in several days From: Ian Lepore To: Konstantin Belousov In-Reply-To: <20130311091852.GR3794@kib.kiev.ua> References: <20130311091852.GR3794@kib.kiev.ua> Content-Type: text/plain; charset="us-ascii" Date: Thu, 14 Mar 2013 08:42:48 -0600 Message-ID: <1363272168.1157.22.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Mar 2013 14:42:58 -0000 On Mon, 2013-03-11 at 11:18 +0200, Konstantin Belousov wrote: > The latest version of the unmapped buffers patch is available at > http://people.freebsd.org/~kib/misc/unmapped.17.patch [...] > > I intend to commit the change as is, with the following modifications: > - the pmap_copy_pages() will be a stub for all architectures where > it was not tested. The only tested arches are i386, amd64 and powerpc64. > - For all architectures where pmap_copy_pages() is a stub, the GB_UNMAPPED > flag for the buffer allocators will be nop. > > FYI. I've now tested this on armv6 as well, with no problems. -- Ian From owner-freebsd-arch@FreeBSD.ORG Thu Mar 14 23:23:22 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C13988C9; Thu, 14 Mar 2013 23:23:22 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (garage.dawidek.net [91.121.88.72]) by mx1.freebsd.org (Postfix) with ESMTP id 7EDF41C0; Thu, 14 Mar 2013 23:23:22 +0000 (UTC) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) by mail.dawidek.net (Postfix) with ESMTPSA id 4BD9BB02; Fri, 15 Mar 2013 00:20:02 +0100 (CET) Date: Fri, 15 Mar 2013 00:24:50 +0100 From: Pawel Jakub Dawidek To: Bruce Evans Subject: Re: patches to add new stat(2) file flags Message-ID: <20130314232449.GC1446@garage.freebsd.pl> References: <20130307000533.GA38950@nargothrond.kdm.org> <20130307214649.X981@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="p2kqVDKq5asng8Dg" Content-Disposition: inline In-Reply-To: <20130307214649.X981@besplex.bde.org> X-OS: FreeBSD 10.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@FreeBSD.org, "Kenneth D. Merry" , fs@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Mar 2013 23:23:22 -0000 --p2kqVDKq5asng8Dg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 07, 2013 at 10:21:38PM +1100, Bruce Evans wrote: > On Wed, 6 Mar 2013, Kenneth D. Merry wrote: >=20 > > I have attached diffs against head for some additional stat(2) file fla= gs. > > > > The primary purpose of these flags is to improve compatibility with CIF= S, > > both from the client and the server side. > > ... > > UF_IMMUTABLE: Command line name: "uchg", "uimmutable" > > ZFS name: XAT_READONLY, ZFS_READONLY > > Windows: FILE_ATTRIBUTE_READONLY > > > > This flag means that the file may not be modified. > > This is not a new flag, but where applicable it is > > mapped to the Windows readonly bit. ZFS and UFS > > now both support the flag and enforce it. > > > > The behavior of this flag is compatible with MacOS X. >=20 > This is incompatible with mapping the DOS read-only attribute to the > non-writeable file permission in msdosfs. msdosfs does this mainly to > get at least one useful file permission, but the semantics are subtly > different from all of file permissions, UF_IMMUTABLE and SF_IMMUTABLE. > I think it should be a new flag. I agree, especially that I saw some discussion recently on Illumos mailing lists to not enforce this flag in ZFS, which would be confusing to FreeBSD users if we forget to _not_ merge that change. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl --p2kqVDKq5asng8Dg Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlFCXEEACgkQForvXbEpPzSCswCeLMmHONhIZDnAFFCZD+iv2Ghq AygAn0fbIw2k8sJHl5Fv41sUqi4kIjY8 =Tb+w -----END PGP SIGNATURE----- --p2kqVDKq5asng8Dg-- From owner-freebsd-arch@FreeBSD.ORG Fri Mar 15 09:47:48 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2849E385; Fri, 15 Mar 2013 09:47:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail28.syd.optusnet.com.au (mail28.syd.optusnet.com.au [211.29.133.169]) by mx1.freebsd.org (Postfix) with ESMTP id A3007DCA; Fri, 15 Mar 2013 09:47:46 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail28.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r2F9lYsg010340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 15 Mar 2013 20:47:38 +1100 Date: Fri, 15 Mar 2013 20:47:34 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Pawel Jakub Dawidek Subject: Re: patches to add new stat(2) file flags In-Reply-To: <20130314232449.GC1446@garage.freebsd.pl> Message-ID: <20130315184014.A902@besplex.bde.org> References: <20130307000533.GA38950@nargothrond.kdm.org> <20130307214649.X981@besplex.bde.org> <20130314232449.GC1446@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=JMpjKL2b c=1 sm=1 a=n2O7wv11oSwA:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=YOiZBDKP_E4A:10 a=LVUDrmMsRTOz-s-3SHEA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: arch@FreeBSD.org, "Kenneth D. Merry" , fs@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Mar 2013 09:47:48 -0000 On Fri, 15 Mar 2013, Pawel Jakub Dawidek wrote: > On Thu, Mar 07, 2013 at 10:21:38PM +1100, Bruce Evans wrote: >> On Wed, 6 Mar 2013, Kenneth D. Merry wrote: >> >>> I have attached diffs against head for some additional stat(2) file flags. >>> >>> The primary purpose of these flags is to improve compatibility with CIFS, >>> both from the client and the server side. >>> ... >>> UF_IMMUTABLE: Command line name: "uchg", "uimmutable" >>> ZFS name: XAT_READONLY, ZFS_READONLY >>> Windows: FILE_ATTRIBUTE_READONLY >>> >>> This flag means that the file may not be modified. >>> This is not a new flag, but where applicable it is >>> mapped to the Windows readonly bit. ZFS and UFS >>> now both support the flag and enforce it. >>> >>> The behavior of this flag is compatible with MacOS X. >> >> This is incompatible with mapping the DOS read-only attribute to the >> non-writeable file permission in msdosfs. msdosfs does this mainly to >> get at least one useful file permission, but the semantics are subtly >> different from all of file permissions, UF_IMMUTABLE and SF_IMMUTABLE. >> I think it should be a new flag. > > I agree, especially that I saw some discussion recently on Illumos > mailing lists to not enforce this flag in ZFS, which would be confusing > to FreeBSD users if we forget to _not_ merge that change. However, I now think the READONLY attribute would map well to UF_IMMUTABLE in msdosfs, better than the current mapping of the READONLY attribute to the inverse of the write permissions bits. The permissions bits are also controlled by the permissions bits of the mount point, and this is the least worst way to control them for general files. When this is mixed with control by the READONLY attribute (which involves back-control of the READONLY attribute according to the permissions bits), the behaviour is confusing and might lead to the READONLY bit being set for too many files (e.g., for copies of man pages, since man pages are installed with the bogus permissions r--r--r-- although the owner (root) can write them (the r--r--r-- permissions only made sense when the owner was bin)). If the READONLY attribute is instead mapped only to UF_IMMUTABLE, its impact would be smaller since there aren't so many files which have a native READONLY attribute or a native UF_IMMUTABLE attribute. The READONLY attribute would interact badly with the permissions bits in a different way -- just like UF_IMMUTABLE interacts with them. It is confusing when ls -l shows writability for non-writable files. Further testing of possible confusion from UF_IMMUTABLE on a rw-r--r-- uchg file on ffs showed that: - eaccess(2) with flag W_OK used to work correctly, although this was not documented. It used to return the documented errno EACCES, but its man page didn't say anything about immutable attributes and said that this error means that the permissions bits indicate no access (or search permission is denied). - eaccess(2) with flag W_OK now returns the undocumented errno EPERM. Its man page doesn't seem to have changed significantly. Documentation for ACLs also seems to be missing. The old and new man pages point to more details in intro(2). The fine details are missing there too. There is just the usual weaselish "appropriate privilege" used in a generic way for EPERM. This can mean anything, but what it means is not documented in either man page. Actually, eaccess() used to work correctly because I fixed it locally. It seems to have always been broken in FreeBSD. The current version is: @ /* @ * If immutable bit set, nobody gets to write it. "& ~VADMIN_PERMS" @ * is here, because without it, * it would be impossible for the owner @ * to remove the IMMUTABLE flag. @ */ @ if ((accmode & (VMODIFY_PERMS & ~VADMIN_PERMS)) && @ (ip->i_flags & (IMMUTABLE | SF_SNAPSHOT))) @ return (EPERM); Bugs to be fixed here: - the first sentence in the comment is banal and doesn't even echo the code (the code actually handles several immutable bits (obfuscated by the IMMUTABLE macro), and also the snapshot bit) - the second sentence in the comment has a misplaced comment delimiter '*' in the middle of it. It also doesn't fully echo the code, but is not banal. - the "write" in the first sentence also doesn't even echo the code. It used to echo the code when the code was simpler. The code used to check only (accmode & VWRITE). But immutability prevents much more than writing, and the code now handles that. - wrong errno. ext2fs still uses the old ffs code here (except it doesn't use IMMUTABLE and checks explicitly for the only immutable flag that it supports). It duplicates the SF_SNAPSHOT check, but that is nonsense because ext2fs doesn't support snapshots. nandfs copies ffs for setattr, so it has immutabilty flags checks there, but it just uses vaccess() for access(), so it it is missing the above, so the immutable flags checks are either nonsense where they are made or missing here. tmpfs uses the old ffs code here (except for mangling the style, but it does remove the banal comment). I couldn't see exactly what zfs does here, but it mostly returns EPERM for immutable flags checks. Fixing foofs_access() hopefully also fixes open(2), unlink(2), ... Unfortunately, my fix is incompatible with dubious fixes that make the man pages bug for bug compatible with the code. POSIX of course doesn't document EPERM for open(2) (except in the general weasel section about appropriate privilege). FreeBSD didn't document it either in the version in which the above was fixed. But now FreeBSD documents in open.2 and other man pages that immutability gives EPERM, and the code always had this bug. The changes in the man pages have some style bugs: in open.2: - a comma splice in the reference to chflags(2) - this reference is only made in 1 of the descriptions of EPERM. These style bugs were cloned to most or all man pages that are affected by immutability or nounlink flags. ACLs still seem to be unmentioned in all these man pages. I don't use them, so I don't know what happens for them. However, the core vfs function vaccess() is careful to always return EACCES and EPERM as explicitly specified by POSIX. This means EACCES for all cases except VADMIN. VADMIN/EPERM apply to chmod(), chown(), ... but shouldn't apply to open(), unlink(), rename(), ... Bruce