From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 02:59:19 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EA6021065673; Sun, 8 Jan 2012 02:59:18 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 63FA68FC0A; Sun, 8 Jan 2012 02:59:18 +0000 (UTC) Received: by qcse13 with SMTP id e13so2109379qcs.13 for ; Sat, 07 Jan 2012 18:59:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; bh=0aufyMnADM7W7OJQtv7bhYH/1mLoNq68Z3UV6wYxFOs=; b=e8/UC1S8TdXTly+vyrT5Bi2xJzCePLDz+V7XHzNqT8oQMYFzGotmEZElLIp9e7/8H4 s6sv7SilEXqdawPd8ipWlZHNo7Tt+Ir3XiTLTAdcIRFWd06sbrLWq/aVY+pnVGJCZYz/ ymLv6dgx78IspQlSd8qiluOlpOV84mrp6hIEs= MIME-Version: 1.0 Received: by 10.224.175.2 with SMTP id v2mr13794045qaz.69.1325990147263; Sat, 07 Jan 2012 18:35:47 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.185.82 with HTTP; Sat, 7 Jan 2012 18:35:47 -0800 (PST) Date: Sun, 8 Jan 2012 03:35:47 +0100 X-Google-Sender-Auth: hLMtQ6zc1bPNm7zwZ4zufg7lMSY Message-ID: From: Giovanni Trematerra To: freebsd-arch@freebsd.org, jilles@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Attilio Rao , flo@freebsd.org, Konstantin Belousov Subject: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 02:59:19 -0000 Hi, the patch at http://www.trematerra.net/patches/pipefifo_merge2.diff is a preliminary version of the FIFO optimizations project that I picked up from the wiki. http://wiki.freebsd.org/IdeasPage#FIFO_optimizations_.28GSoC.29 zhaoshuai@ produced the following patch in the 2009 which attempted a first merge of the interfaces: http://www.trematerra.net/patches/fifo_soc2009.diff However I felt like the work was not yet completed and come up with my final version. Now fifoes derive their structures from pipes one with just special handling to support VFS operations. All the operations but the creation/destruction for fifoes and pipes are handled by the same code. The heart of the patch is the new struct pipeinfo. pipeinfo is a per-file descriptor state. Basically it maintains a read end and a write end for the descriptor. As pipes are bidirectional in FreeBSD, for a pipe this two fields are always equal but different for a fifo. To let fifo code in sys/fs/fifofs/fifo_vnops.c create/destroy the pipe, two functions (pipe_ctor/pipe_dtor) were written. pipe_ctor setups things like a call to kern_pipe and return a pipeinfo structure, while pipe_dtor releases all the resources for a given pipeinfo. Once a pipe was setup during a fifo_open call, all the subsequent operations on the fifo are handled by the same code of a pipe expect for the clean up code that calls pipe_dtor. Allocation of two pipeinfo structures for a pipe were showed to slow down things by some micro-benchmarking. To speed up things during creation/destruction of pipes, the patch allocates all the needed data structure zone using the umapipe struct that packing together all the needed data structures to be allocated at pipe creation. A similar umafifo structure is used for fifoes. Thanks to jilles that made a review of the patch in a previous form, privately. Thanks a lot to attilio that answered my stupid questions and drove me in the right direction. TEST The patch passed all the test in tools/regression/pipe and tools/regression/fifo it passed a overnight of running stress test pho's suite and some buildworld/buildkernel of FreeBSD HEAD. PERFORMANCE the patch doesn't add any performance penalty at pipe code, the test was performed by compiling a GENERIC kernel. There's a lot of pipe code involving during a compilation. Such a test was used to discovery a performance regression in r226042. this is the ministat output http://www.trematerra.net/patches/pipeperf/pipeperf_result here the raw data http://www.trematerra.net/patches/pipeperf/pipeperf_rawresult The new fifo implementation gives to fifoes a boost of at least 15% in avarage. The test was send 1000 chunk of different size through a fifo. here the output of ministat output http://www.trematerra.net/patches/fifoperf/fifoperf_result here the raw results http://www.trematerra.net/patches/fifoperf/fifoperf_rawresult if someone would like to reproduce the test can download a copy of the sources a http://www.trematerra.net/patches/fifoperf/ the code is from zhaoshuai@. Makefile, builds benchmark.c benchmark.c set up a fifo, fork and send a number of messages of a size depending from the input parameters. runme.sh run the test launching benchmark binary to send different number of messages of different sizes. Test and review are welcomed. -- Gianni From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 13:00:42 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 353961065673 for ; Sun, 8 Jan 2012 13:00:42 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 8FAB38FC13 for ; Sun, 8 Jan 2012 13:00:41 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id q08D0eNI027952; Sun, 8 Jan 2012 14:00:40 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id q08D0d4F027951; Sun, 8 Jan 2012 14:00:39 +0100 (CET) (envelope-from marius) Date: Sun, 8 Jan 2012 14:00:39 +0100 From: Marius Strobl To: Stefan Bethke Message-ID: <20120108130039.GG88161@alchemy.franken.de> References: <8D025847-4BE4-4B2C-87D7-97E72CC9D325@lassitu.de> <20120104215930.GM90831@alchemy.franken.de> <47ABA638-7E08-4350-A03C-3D4A23BF2D7E@lassitu.de> <1763C3FF-1EA0-4DC0-891D-63816EBF4A04@lassitu.de> <20120106182756.GA88161@alchemy.franken.de> <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: freebsd-arch@freebsd.org Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 13:00:42 -0000 On Fri, Jan 06, 2012 at 10:53:14PM +0100, Stefan Bethke wrote: >=20 > Am 06.01.2012 um 22:47 schrieb Marius Strobl: >=20 > > On Fri, Jan 06, 2012 at 09:35:40PM +0100, Stefan Bethke wrote: > >>=20 > >> Am 06.01.2012 um 19:27 schrieb Marius Strobl: > >>=20 > >>> On Fri, Jan 06, 2012 at 01:57:06PM +0100, Stefan Bethke wrote: > >>>> Am 05.01.2012 um 21:52 schrieb Stefan Bethke: > >>>>=20 > >>>>> The problem with this is that the miibus instance might not be a (t= ransitive) child of the ethernet driver that has the MII that needs to be a= djusted to the new PHY settings. And since the method does not provide any= parameters about which phy or miibus did issue the method, or which ifp it= applies to, bubbling it up won't work (that the scenario where the PHY for= arge0 is connected to the switch's MDIO, which is attached to arge1's MDIO= ). > >>>>>=20 > >>>>>>> Since the parent will now be the mdiobus, miibus needs effectivel= y two attachments, one to the provider of the MDIO access, the other for th= e ethernet interface. I propose to associate the ethernet interface by a m= odified mii_attach() function that takes a device_t (of the ethernet driver= ) instead of the two callback function pointers. > >>>>>>=20 > >>>>>> Please elaborate on why these changes are technically necessary > >>>>>> to implement what you are trying do. Otherwise I prefer to avoid > >>>>>> them given the rototilling they'd cause. > >>>>>=20 > >>>>> Necessary is a strong word. Right now, I'm trying to understand ho= w a sensible change would even look like, and which combination of glue cod= e and miibus changes make the most sense. > >>>>>=20 > >>>>> Let me see if I can come up with a prototype patch the next couple = of days, so we don't have to theorize about the changes that might or might= not be necessary. > >>>>=20 > >>>> Here's a patch that causes zero rototilling, if I'm not mistaken. > >>>>=20 > >>>> The patch implements the split between the MDIO access and notificat= ions posted to the ethernet interface device that has the MII that needs to= be adjusted in accordance with the PHY autonegotiation results. I've adde= d a field to the ivars struct and not the softc, because the softc is inclu= ded by many network drivers, while the ivars are private to mii.c For this= reason, I believe this change is API and ABI compatible, and likely can be= MFCed. (I believe MFCing is not high on the priority list because many ot= her parts in sys/mips would need to be MFCed first for all the Atheros plat= forms to become fully usable, but Adrian can correct me.) > >>>=20 > >>> By calling an newbus method on a device that is not the parent this > >>> patch hacks around how newbus is intended to work. > >>=20 > >> Admittedly, it adds a reference across the tree. > >>=20 > >>> I also still don't see why for the scenarios you describe you can't s= imply use miibus(4) as-is in a clean way. > >>=20 > >> [ Scenarios for which the existing model works ] > >>=20 > >>> That's why I proposed the model that puc(4), scc(4) etc are > >>> following to solve this in a clean way, which for arge(4) > >>> would look like: > >>> nexus0 > >>> | > >>> miimux0 > >>> / \ > >>> arge0 arge1 > >>> | | > >>> ethswitch0 | > >>> | | > >>> miibus0 miibus1 > >>> | | > >>> foophy0 foophy1 > >>>=20 > >> [ Explanation on how things work with above setup ] > >>=20 > >> Except that your diagram does not correlate with the scenario I've out= lined. I'll try to explain again: the MDIO master access for the PHY which= MII lines are connected to arge1 are hosted on a separate device. Please = refer to this diagram: http://wiki.freebsd.org/StefanBethke/EtherSwitch?act= ion=3DAttachFile&do=3Dget&target=3Dembedded-switch.png (arge0 and phy4) > >>=20 > >> To make things as clear as possible, consider an RTL836x controller wh= ich is attached to the system through an I2C bus. (Never mind that it has = a switch, that's not relevant to the discussion here.) It has one MII bus = connection connecting one ethernet interface MAC to one PHY; the MDIO maste= r that can talk to that PHY is in the RTL836x. The common ancestor for the= ethernet driver and the MDIO driver then are likely to be very near the to= p, meaning that the interposed driver would need support not only for the e= thernet driver in question, but the I2C bus as well. An interposed driver = at nexus level that gets the phy linkchg message bubbled up to it does need= to send it back down to the ethernet driver. The message sent by miibus d= oes contains neither source nor destination information, so that miibus nee= ds to be attached to a unique driver instances that adds that information t= o the message before bubbling it up. Of course, it also needs to get this = information from somewhere, so a reference to the ethernet driver needs to = be injected somehow. > >>=20 > >=20 > > Okay, now I'm confused; I don't know the RTL836x but you seem to say > > that essentially for this discussion this is a I2C to MDIO bridge, > > with iicbus(4) being involved to access the MDIO of PHYs being new > > information. Is this scenario supposed to be also covered by > > embedded-switch.png? If it is, what purpose does the MDIO connection > > between arge1 and the switch in there serve for? >=20 > Yes, for the purpose of this discussion, the RTL836x series is an I2C to = MDIO bridge; the diagram for the Atheros chip is an MDIO to MDIO bridge in = that sense. >=20 > Atheros switches, as well as the RTL803x series, have an MDIO slave inste= ad of an I2C slave, but the basic layout remains the same. The ethernet dr= iver for arge0 cannot directly talk to PHY4 because it's hanging off some m= ore or less remote part of the device tree. That's what I'm trying to get = across, and that's the problem I'm trying to solve in a more or less generi= c way. >=20 Okay, this is the kind of information I was looking for as coupling devices with newbus that have no close relation in the hierarchy is tedious. However, when not using newbus the question arises how do you intend to associate the device_t of say arge0 with the mdiobus0 hanging off somewhere beneath iicbus0? Marius From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 16:41:36 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44C3D106566B; Sun, 8 Jan 2012 16:41:36 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id CE0F58FC0A; Sun, 8 Jan 2012 16:41:35 +0000 (UTC) Received: from happy.home.yamagi.org (g224129135.adsl.alicedsl.de [92.224.129.135]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.org (Postfix) with ESMTPSA id 1FE951666334; Sun, 8 Jan 2012 17:41:33 +0100 (CET) Date: Sun, 8 Jan 2012 17:41:12 +0100 From: Yamagi Burmeister To: kostikbel@gmail.com Message-Id: <20120108174112.50e030ba.lists@yamagi.org> In-Reply-To: <20120102063700.GF50300@deviant.kiev.zoral.com.ua> References: <20111226220756.GR50300@deviant.kiev.zoral.com.ua> <20120102063700.GF50300@deviant.kiev.zoral.com.ua> X-Mailer: Sylpheed 3.1.2 (GTK+ 2.24.6; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Sun__8_Jan_2012_17_41_12_+0100_kuU+DV/Jqkt+Aq8r" Cc: amd64@freebsd.org, arch@freebsd.org, sparc64@freebsd.org Subject: Re: AVX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 16:41:36 -0000 --Signature=_Sun__8_Jan_2012_17_41_12_+0100_kuU+DV/Jqkt+Aq8r Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, I've tested your patch on a Core2Duo with XSAVE but (of course) without AVX on 10-CURRENT as of today (r229812): CPU: Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz (2194.55-MHz K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Family =3D 6 Model =3D 17 Stepping =3D 10 Features=3D0xbfebfbff Features2=3D0x408e3bd AMD Features=3D0x20100800 AMD Features2=3D0x1=20 Everything's fine: - System is booting without a problem - All applications are working - AVX applications are still failing with SIGILL But there's one problem: While a shutdown (shutdown -p now) is always successfull a reboot (shutdown -r now) and suspend (zzz) are resulting=20 in a double panic. The first panic is a "Fatal trap 9: general protection fault while in kernel mode" on "cpuid =3D 1; apic id =3D 01". The process is always "idle: cpu1". The second panic is also "Fatal trap 9: general protection fault while in kernel mode" but with "cpuid =3D 0; apic id =3D 00". The process is always "init".=20 Since it's a dual core cpu, one panic for each processor core?=20 I'm unable to get a core dump and ddb is unresponsive to any keyboard input. A serial console is unavailable, since it's a laptop... Nevertheless I've uploaded screenshots of both panics to: =20 http://deponie.yamagi.org/freebsd/debug/avx=20 On Mon, 2 Jan 2012 08:37:00 +0200 Kostik Belousov wrote: > The patch > http://people.freebsd.org/~kib/misc/avx.2.patch > is the commit candidate. Compared with avx.1.patch, it includes > several bugfixes, some move of code around, and finishes the > implementation of getcontextx(3) for non-x86 architectures. >=20 > Please note that variant of getcontextx() is required for deferred > signal delivery from libthr. This is the reason for Cc:ing sparc64@, > could somebody test the patch on this architecture ? I used the > http://people.freebsd.org/~kib/misc/defer_sig.c to test deferred > delivery on amd64. >=20 > Another missed testing point is machines capable of XSAVE but lacking > AVX extensions. I think most Core2 fall into this category, but my Core2 > machine is disassembled. Could anybody test the patch on non-SandyBridge > machine having XSAVE support ? You can check the capability using > ports/sysutils/x86info or looking at the early boot Features2 line, > which shall contain the XSAVE. --=20 Homepage: www.yamagi.org XMPP: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB --Signature=_Sun__8_Jan_2012_17_41_12_+0100_kuU+DV/Jqkt+Aq8r Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8JxzwACgkQWTjlg++8y8t4jgCfWd8oyl3al8qrCbC2A5ZbiA7U 24UAnRQJoOVK7JWDzrXpCBGKJ6BnrNMn =kXS0 -----END PGP SIGNATURE----- --Signature=_Sun__8_Jan_2012_17_41_12_+0100_kuU+DV/Jqkt+Aq8r-- From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 19:59:18 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7C3C1065675; Sun, 8 Jan 2012 19:59:17 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 667608FC1A; Sun, 8 Jan 2012 19:59:16 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q08JxDJY083633 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 8 Jan 2012 21:59:13 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q08JxDfc047052; Sun, 8 Jan 2012 21:59:13 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q08JxDvF047051; Sun, 8 Jan 2012 21:59:13 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 8 Jan 2012 21:59:13 +0200 From: Kostik Belousov To: Yamagi Burmeister Message-ID: <20120108195913.GI31224@deviant.kiev.zoral.com.ua> References: <20111226220756.GR50300@deviant.kiev.zoral.com.ua> <20120102063700.GF50300@deviant.kiev.zoral.com.ua> <20120108174112.50e030ba.lists@yamagi.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+OcHDfVcPO70+1iC" Content-Disposition: inline In-Reply-To: <20120108174112.50e030ba.lists@yamagi.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: amd64@freebsd.org, arch@freebsd.org Subject: Re: AVX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 19:59:18 -0000 --+OcHDfVcPO70+1iC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Jan 08, 2012 at 05:41:12PM +0100, Yamagi Burmeister wrote: > Hi, > I've tested your patch on a Core2Duo with XSAVE but (of course) without > AVX on 10-CURRENT as of today (r229812): >=20 > CPU: Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz (2194.55-MHz > K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Family =3D 6 Mo= del > =3D 17 Stepping =3D 10 > Features=3D0xbfebfbff MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > Features2=3D0x408e3bd PDCM,SSE4.1,XSAVE> > AMD Features=3D0x20100800 AMD Features2=3D0x1=20 Is this Features excerpt from the patched kernel, or from pristine svn sources ? If the later, please show me the Features from the patched kernel. >=20 > Everything's fine: > - System is booting without a problem > - All applications are working > - AVX applications are still failing with SIGILL >=20 > But there's one problem: While a shutdown (shutdown -p now) is always > successfull a reboot (shutdown -r now) and suspend (zzz) are resulting=20 > in a double panic. The first panic is a "Fatal trap 9: general > protection fault while in kernel mode" on "cpuid =3D 1; apic id =3D 01". > The process is always "idle: cpu1". >=20 > The second panic is also "Fatal trap 9: general protection fault while > in kernel mode" but with "cpuid =3D 0; apic id =3D 00". The process is > always "init".=20 >=20 > Since it's a dual core cpu, one panic for each processor core?=20 >=20 > I'm unable to get a core dump and ddb is unresponsive to any keyboard > input. A serial console is unavailable, since it's a laptop... > Nevertheless I've uploaded screenshots of both panics to: > =20 > http://deponie.yamagi.org/freebsd/debug/avx=20 I thought that I correctly handled savectx, but apparently I did not. The issue for sleep enter could be fixed by the avx.4.patch, I am not sure about shutdown -r panic. http://people.freebsd.org/~kib/misc/avx.4.patch --+OcHDfVcPO70+1iC Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8J9ZEACgkQC3+MBN1Mb4hCkACfRfaVp2kkmdloEwBMrrxK6wms QCcAoJBGCqQasa9UV9hZbSTAxS4mHVnv =oOp+ -----END PGP SIGNATURE----- --+OcHDfVcPO70+1iC-- From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 21:13:53 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 99ED7106564A for ; Sun, 8 Jan 2012 21:13:53 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 7AC438FC08 for ; Sun, 8 Jan 2012 21:13:53 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q08KZjL5024434 for ; Sun, 8 Jan 2012 12:35:48 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201082035.q08KZjL5024434@gw.catspoiler.org> Date: Sun, 8 Jan 2012 12:35:45 -0800 (PST) From: Don Lewis To: arch@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: Subject: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 21:13:53 -0000 I've got a machine that is set up to dual boot both FreeBSD and Linux. It is also disk space impaired, so to make the best use possible of the available space, I have FreeBSD set up to swap to the Linux swap partition. Until now I haven't had working crash dumps because geom didn't permit crash dumps to Linux swap partitions. This patch removes that limitation. This could be useful for users of laptops who boot multiple operating systems. Index: sys/geom/part/g_part_ebr.c =================================================================== --- sys/geom/part/g_part_ebr.c (revision 229800) +++ sys/geom/part/g_part_ebr.c (working copy) @@ -333,9 +333,10 @@ { struct g_part_ebr_entry *entry; - /* Allow dumping to a FreeBSD partition only. */ + /* Allow dumping to a FreeBSD partition or Linux swap partition only. */ entry = (struct g_part_ebr_entry *)baseentry; - return ((entry->ent.dp_typ == DOSPTYP_386BSD) ? 1 : 0); + return ((entry->ent.dp_typ == DOSPTYP_386BSD || + entry->ent.dp_typ == DOSPTYP_LINSWP) ? 1 : 0); } #if defined(GEOM_PART_EBR_COMPAT) Index: sys/geom/part/g_part_mbr.c =================================================================== --- sys/geom/part/g_part_mbr.c (revision 229800) +++ sys/geom/part/g_part_mbr.c (working copy) @@ -304,9 +304,10 @@ { struct g_part_mbr_entry *entry; - /* Allow dumping to a FreeBSD partition only. */ + /* Allow dumping to a FreeBSD partition or Linux swap partition only. */ entry = (struct g_part_mbr_entry *)baseentry; - return ((entry->ent.dp_typ == DOSPTYP_386BSD) ? 1 : 0); + return ((entry->ent.dp_typ == DOSPTYP_386BSD || + entry->ent.dp_typ == DOSPTYP_LINSWP) ? 1 : 0); } static int Is anyone else disturbed by the foot shooting potential of allowing crash dumps to be written to 386BSD partitions? From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 22:27:27 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E809C106564A for ; Sun, 8 Jan 2012 22:27:27 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [IPv6:2001:470:1f0b:105e::1ea]) by mx1.freebsd.org (Postfix) with ESMTP id A35958FC17 for ; Sun, 8 Jan 2012 22:27:27 +0000 (UTC) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id 03F6511AA11; Sun, 8 Jan 2012 22:27:25 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: Stefan Bethke In-Reply-To: <20120108130039.GG88161@alchemy.franken.de> Date: Sun, 8 Jan 2012 23:27:25 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> References: <8D025847-4BE4-4B2C-87D7-97E72CC9D325@lassitu.de> <20120104215930.GM90831@alchemy.franken.de> <47ABA638-7E08-4350-A03C-3D4A23BF2D7E@lassitu.de> <1763C3FF-1EA0-4DC0-891D-63816EBF4A04@lassitu.de> <20120106182756.GA88161@alchemy.franken.de> <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> <20120108130039.GG88161@alchemy.franken.de> To: Marius Strobl X-Mailer: Apple Mail (2.1251.1) Cc: freebsd-arch@freebsd.org Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 22:27:28 -0000 Am 08.01.2012 um 14:00 schrieb Marius Strobl: > Okay, this is the kind of information I was looking for as coupling > devices with newbus that have no close relation in the hierarchy is > tedious. However, when not using newbus the question arises how do > you intend to associate the device_t of say arge0 with the mdiobus0 > hanging off somewhere beneath iicbus0? In my experimental tree, I've hacked together a small function that = parses a string for a devclass name and unit number, and looks that up. I'm also trying a number of other approaches; mainly I'm trying to = understand how newbus works, and what kind of driver I want at the = various points, ideally auto-attached, or configured by hints, instead = of by custom code. I think I'll need another couple of days to get a = good enough understanding of drivers, devclasses and their tree, and the = device tree. Stefan --=20 Stefan Bethke Fon +49 151 14070811 From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 23:29:29 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E2FE106564A; Sun, 8 Jan 2012 23:29:29 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4846A8FC0C; Sun, 8 Jan 2012 23:29:29 +0000 (UTC) Received: by obbwd18 with SMTP id wd18so4860513obb.13 for ; Sun, 08 Jan 2012 15:29:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=cFJv1wyHKk02vLmca8qp2f1QkRONIaJOn+jmAsj8wyQ=; b=eSR9zzlGtwH/8cAidQkrHtJDG2qfJ5puX5BwabCIQosKPYlXHsgv3fJAy2P7cvQ/Hs SBMOO+EQZ4AUZre2Am3NEIhMjqLunOs6mn61KJRndKHtJmvHMFBsfY0q6frDNKgKjZYt uQp7lXnbpuHpLijA7FX3ppPWrVmIXwNon3dD4= MIME-Version: 1.0 Received: by 10.182.1.8 with SMTP id 8mr12390374obi.11.1326063826243; Sun, 08 Jan 2012 15:03:46 -0800 (PST) Received: by 10.182.152.6 with HTTP; Sun, 8 Jan 2012 15:03:46 -0800 (PST) In-Reply-To: <201201082035.q08KZjL5024434@gw.catspoiler.org> References: <201201082035.q08KZjL5024434@gw.catspoiler.org> Date: Sun, 8 Jan 2012 15:03:46 -0800 Message-ID: From: Garrett Cooper To: Don Lewis Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 23:29:29 -0000 On Sun, Jan 8, 2012 at 12:35 PM, Don Lewis wrote: > I've got a machine that is set up to dual boot both FreeBSD and Linux. > It is also disk space impaired, so to make the best use possible of the > available space, I have FreeBSD set up to swap to the Linux swap > partition. Until now I haven't had working crash dumps because geom > didn't permit crash dumps to Linux swap partitions. This patch removes > that limitation. =A0This could be useful for users of laptops who boot > multiple operating systems. Seems like a good idea, but could dumping to a Linux partition confuse FreeBSD or vice versa? Thanks! -Garrett From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 23:33:08 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6971106564A for ; Sun, 8 Jan 2012 23:33:08 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9F80F8FC0C for ; Sun, 8 Jan 2012 23:33:08 +0000 (UTC) Received: by obbwd18 with SMTP id wd18so4862467obb.13 for ; Sun, 08 Jan 2012 15:33:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=Krq1SYoG856LDhI97sLZ6AbruOny8RKKsm9Ss54IYWg=; b=FqITqcnSg/cIDwzw9YmXcd7kjtP/2rjJcDaIc7MHHWL3L805K2Cyoiqcb1gjvJGq0o YapAjCiMXQY40+TKbrQSymBnslwGaHep811dVQlLT0GccAbgamrq2PsSXEEiT6fOWsn1 FV6qpEWgfau9ls4zqPN7bOetD8EyuLQqr6Mm0= MIME-Version: 1.0 Received: by 10.182.160.1 with SMTP id xg1mr12561293obb.30.1326065588070; Sun, 08 Jan 2012 15:33:08 -0800 (PST) Received: by 10.182.67.163 with HTTP; Sun, 8 Jan 2012 15:33:08 -0800 (PST) In-Reply-To: References: <201201082035.q08KZjL5024434@gw.catspoiler.org> Date: Sun, 8 Jan 2012 15:33:08 -0800 Message-ID: From: Xin LI To: Garrett Cooper Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, Don Lewis Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 23:33:09 -0000 On Sun, Jan 8, 2012 at 3:03 PM, Garrett Cooper wrote: > On Sun, Jan 8, 2012 at 12:35 PM, Don Lewis wrote: >> I've got a machine that is set up to dual boot both FreeBSD and Linux. >> It is also disk space impaired, so to make the best use possible of the >> available space, I have FreeBSD set up to swap to the Linux swap >> partition. Until now I haven't had working crash dumps because geom >> didn't permit crash dumps to Linux swap partitions. This patch removes >> that limitation. =C2=A0This could be useful for users of laptops who boo= t >> multiple operating systems. > > =C2=A0 =C2=A0Seems like a good idea, but could dumping to a Linux partiti= on > confuse FreeBSD or vice versa? Unlikely, these are scratch spaces and validated upon boot (i.e. the dump saver would "taste" before saving). Cheers, --=20 Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 23:41:22 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41B121065676; Sun, 8 Jan 2012 23:41:22 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-tul01m020-f182.google.com (mail-tul01m020-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id ED9238FC0C; Sun, 8 Jan 2012 23:41:21 +0000 (UTC) Received: by obbwd18 with SMTP id wd18so4865945obb.13 for ; Sun, 08 Jan 2012 15:41:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=FYErkzThGnRqD8OPyj0K57ENRwVArv7eR+tk4zGRSJ4=; b=E7Z1/t89jDZ8mRAw9rKGMNWvfmuqC42DW1eI0MWxfhNET1Y/nA/mGCzrtBEsM/vwCW HLHJynsDJFAGOFRW9UkSDEQ+pJIExlSsMdi/Cqv0uvwWd2EAsq7RXow9YG3bAq59FVWd i4sDhIROikiXE5S9b1/JDrrXeXzAd4AO/8nCI= MIME-Version: 1.0 Received: by 10.182.1.8 with SMTP id 8mr12457228obi.11.1326066081499; Sun, 08 Jan 2012 15:41:21 -0800 (PST) Received: by 10.182.152.6 with HTTP; Sun, 8 Jan 2012 15:41:21 -0800 (PST) In-Reply-To: References: <201201082035.q08KZjL5024434@gw.catspoiler.org> Date: Sun, 8 Jan 2012 15:41:21 -0800 Message-ID: From: Garrett Cooper To: Xin LI Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, Don Lewis Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 23:41:22 -0000 On Sun, Jan 8, 2012 at 3:33 PM, Xin LI wrote: > On Sun, Jan 8, 2012 at 3:03 PM, Garrett Cooper wrote= : >> On Sun, Jan 8, 2012 at 12:35 PM, Don Lewis wrote: >>> I've got a machine that is set up to dual boot both FreeBSD and Linux. >>> It is also disk space impaired, so to make the best use possible of the >>> available space, I have FreeBSD set up to swap to the Linux swap >>> partition. Until now I haven't had working crash dumps because geom >>> didn't permit crash dumps to Linux swap partitions. This patch removes >>> that limitation. =A0This could be useful for users of laptops who boot >>> multiple operating systems. >> >> =A0 =A0Seems like a good idea, but could dumping to a Linux partition >> confuse FreeBSD or vice versa? > > Unlikely, these are scratch spaces and validated upon boot (i.e. the > dump saver would "taste" before saving). So the answer is: 1. No for FreeBSD 2. It's unlikely that the Linux side will be affected ? I would just be concerned with some potentially more interesting cases where the swap for a crashdump got partially overwritten, but the same issue would exist I suppose with FreeBSD if someone whacked the contents of a partition I suppose, e.g. it's not a big issue if the tools that grok the crashdump fail gracefully. Thanks! -Garrett From owner-freebsd-arch@FreeBSD.ORG Sun Jan 8 23:42:21 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B6DC1065672 for ; Sun, 8 Jan 2012 23:42:21 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 075198FC08 for ; Sun, 8 Jan 2012 23:42:20 +0000 (UTC) Received: by iadj38 with SMTP id j38so8165658iad.13 for ; Sun, 08 Jan 2012 15:42:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=FRP+9/OhCZNHrngHlQKxRMap5tyl2hngaNb8FiJ08I4=; b=BZZnOR8iSxyAwpYX6Rx2O5RdmewPM7lJi2hlLn7ak8Q8DJqhCv+BcHJ8JMZvX7SZSZ hAcNJwmFpYdcRziAns4fB4Z4Ut4kTuu5aRYqWJCS/5GUHMm5nm/CSn4osOszvD69Kvr5 m6WLybXbXVqTHbJ1hYhFox6Pgx7Q4gtmh8/tQ= MIME-Version: 1.0 Received: by 10.42.142.129 with SMTP id s1mr9170482icu.42.1326066140454; Sun, 08 Jan 2012 15:42:20 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.42.243.65 with HTTP; Sun, 8 Jan 2012 15:42:20 -0800 (PST) Date: Sun, 8 Jan 2012 15:42:20 -0800 X-Google-Sender-Auth: dtCKXUEoUcy4s_d1FO_6mtUaMbc Message-ID: From: Adrian Chadd To: freebsd-arch@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jan 2012 23:42:21 -0000 Hi, In order to fit an lzma'd kernel in 892 kilobytes of flash (that's 892 * 1024 bytes), I've needed to break out a few things into modules. I'd like to commit a couple of modules - for example, ar71xx_ehci/ar71xx_ohci for USB stuff - but I don't want them built for anything other than ar71xx builds. Thus I don't see the reason for putting them in sys/modules/Makefile. They build fine if they're included in MODULES_OVERRIDE in the relevant kernel config file. So is it ok to just commit some modules in sys/modules/ which aren't in the Makefile, and instead include them in the relevant SoC kernel configs so they're built? Or is there some other tradition for doing this? Thanks, Adrian From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 00:03:16 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 907B71065670 for ; Mon, 9 Jan 2012 00:03:16 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 517418FC0C for ; Mon, 9 Jan 2012 00:03:16 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q09037wI024742; Sun, 8 Jan 2012 16:03:11 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201090003.q09037wI024742@gw.catspoiler.org> Date: Sun, 8 Jan 2012 16:03:07 -0800 (PST) From: Don Lewis To: yanegomi@gmail.com In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Cc: arch@FreeBSD.org, delphij@gmail.com Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 00:03:16 -0000 On 8 Jan, Garrett Cooper wrote: > On Sun, Jan 8, 2012 at 3:33 PM, Xin LI wrote: >> On Sun, Jan 8, 2012 at 3:03 PM, Garrett Cooper wrote: >>> On Sun, Jan 8, 2012 at 12:35 PM, Don Lewis wrote: >>>> I've got a machine that is set up to dual boot both FreeBSD and Linux. >>>> It is also disk space impaired, so to make the best use possible of the >>>> available space, I have FreeBSD set up to swap to the Linux swap >>>> partition. Until now I haven't had working crash dumps because geom >>>> didn't permit crash dumps to Linux swap partitions. This patch removes >>>> that limitation.  This could be useful for users of laptops who boot >>>> multiple operating systems. >>> >>>    Seems like a good idea, but could dumping to a Linux partition >>> confuse FreeBSD or vice versa? Even sharing a swap partition could potentially be an issue if the contents of swap for one OS could be interpreted as a crash dump for the other OS. I haven't seen any issues with Linux getting confused about this. Before I made this change, I didn't have a way of testing the reverse. >> Unlikely, these are scratch spaces and validated upon boot (i.e. the >> dump saver would "taste" before saving). And fortunately the dump saver runs in userland, which lessens the possibilities of general mayhem. > So the answer is: > 1. No for FreeBSD > 2. It's unlikely that the Linux side will be affected > ? > I would just be concerned with some potentially more interesting > cases where the swap for a crashdump got partially overwritten, but > the same issue would exist I suppose with FreeBSD if someone whacked > the contents of a partition I suppose, e.g. it's not a big issue if > the tools that grok the crashdump fail gracefully. It's already possible to corrupt the dump image if something consumes a bunch of swap (like fsck checking a big filesystem) before the crash saver runs. Dumping to a raw 386BSD partition has similar issues. In addition to the possibility of accidentally dumping to a partition that contains active filesystems, geom is going to want to taste the partition looking for a BSD label, so it has to be careful about handling random garbage. Also, if the partition formerly contained active filesystems and still has a valid BSD label, the label might not get overwritten, but the crash dump could partially overwrite a filesystem. If at some later date the sysadmin tries to mount that filesystem, the results could be undesirable. It might be a good idea to prevent a 386BSD partition if it contains a valid BSD label with partitions that have an fstype other than "unused". Swap should probably have similar restrictions. From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 06:27:07 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAE94106564A; Mon, 9 Jan 2012 06:27:07 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id CAE0E8FC08; Mon, 9 Jan 2012 06:27:07 +0000 (UTC) Received: by elvis.mu.org (Postfix, from userid 1192) id 8ECF01A3C6D; Sun, 8 Jan 2012 22:17:06 -0800 (PST) Date: Sun, 8 Jan 2012 22:17:06 -0800 From: Alfred Perlstein To: Don Lewis Message-ID: <20120109061706.GC89781@elvis.mu.org> References: <201201082035.q08KZjL5024434@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201201082035.q08KZjL5024434@gw.catspoiler.org> User-Agent: Mutt/1.4.2.3i Cc: arch@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 06:27:07 -0000 This is cool, it does seem to beg for a function instead of inlining the logic in two places in case someone wants to add even more logic to it. * Don Lewis [120108 13:14] wrote: > I've got a machine that is set up to dual boot both FreeBSD and Linux. > It is also disk space impaired, so to make the best use possible of the > available space, I have FreeBSD set up to swap to the Linux swap > partition. Until now I haven't had working crash dumps because geom > didn't permit crash dumps to Linux swap partitions. This patch removes > that limitation. This could be useful for users of laptops who boot > multiple operating systems. > > > Index: sys/geom/part/g_part_ebr.c > =================================================================== > --- sys/geom/part/g_part_ebr.c (revision 229800) > +++ sys/geom/part/g_part_ebr.c (working copy) > @@ -333,9 +333,10 @@ > { > struct g_part_ebr_entry *entry; > > - /* Allow dumping to a FreeBSD partition only. */ > + /* Allow dumping to a FreeBSD partition or Linux swap partition only. */ > entry = (struct g_part_ebr_entry *)baseentry; > - return ((entry->ent.dp_typ == DOSPTYP_386BSD) ? 1 : 0); > + return ((entry->ent.dp_typ == DOSPTYP_386BSD || > + entry->ent.dp_typ == DOSPTYP_LINSWP) ? 1 : 0); > } > > #if defined(GEOM_PART_EBR_COMPAT) > Index: sys/geom/part/g_part_mbr.c > =================================================================== > --- sys/geom/part/g_part_mbr.c (revision 229800) > +++ sys/geom/part/g_part_mbr.c (working copy) > @@ -304,9 +304,10 @@ > { > struct g_part_mbr_entry *entry; > > - /* Allow dumping to a FreeBSD partition only. */ > + /* Allow dumping to a FreeBSD partition or Linux swap partition only. */ > entry = (struct g_part_mbr_entry *)baseentry; > - return ((entry->ent.dp_typ == DOSPTYP_386BSD) ? 1 : 0); > + return ((entry->ent.dp_typ == DOSPTYP_386BSD || > + entry->ent.dp_typ == DOSPTYP_LINSWP) ? 1 : 0); > } > > static int > > > > Is anyone else disturbed by the foot shooting potential of allowing > crash dumps to be written to 386BSD partitions? > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- - Alfred Perlstein .- VMOA #5191, 03 vmax, 92 gs500, 85 ch250, 07 zx10 .- FreeBSD committer From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 07:59:28 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9773E106566C; Mon, 9 Jan 2012 07:59:28 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [IPv6:2001:470:1f0b:105e::1ea]) by mx1.freebsd.org (Postfix) with ESMTP id 5A9768FC08; Mon, 9 Jan 2012 07:59:28 +0000 (UTC) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id 28C8811A8DD; Mon, 9 Jan 2012 07:59:27 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: Stefan Bethke In-Reply-To: Date: Mon, 9 Jan 2012 08:59:26 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <09C21BD6-46E1-4593-9E3C-183CF229A613@lassitu.de> References: To: Adrian Chadd X-Mailer: Apple Mail (2.1251.1) Cc: freebsd-arch@freebsd.org Subject: Re: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 07:59:28 -0000 Am 09.01.2012 um 00:42 schrieb Adrian Chadd: > Hi, >=20 > In order to fit an lzma'd kernel in 892 kilobytes of flash (that's 892 > * 1024 bytes), I've needed to break out a few things into modules. >=20 > I'd like to commit a couple of modules - for example, > ar71xx_ehci/ar71xx_ohci for USB stuff - but I don't want them built > for anything other than ar71xx builds. Thus I don't see the reason for > putting them in sys/modules/Makefile. >=20 > They build fine if they're included in MODULES_OVERRIDE in the > relevant kernel config file. >=20 > So is it ok to just commit some modules in sys/modules/ which aren't > in the Makefile, and instead include them in the relevant SoC kernel > configs so they're built? Or is there some other tradition for doing > this? Why would it hurt to have them connected to the standard build? Doe the = tinderboxes build modules? Stefan --=20 Stefan Bethke Fon +49 151 14070811 From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 08:39:05 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70832106564A for ; Mon, 9 Jan 2012 08:39:05 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 2CF718FC15 for ; Mon, 9 Jan 2012 08:39:05 +0000 (UTC) Received: by vbbfr13 with SMTP id fr13so4019684vbb.13 for ; Mon, 09 Jan 2012 00:39:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=9S4BG8r05CE0UH49Qi1sNtjTYnu4COhDNaaa5rN3O8k=; b=r8xyoQmL26qy2IRP60AQ+5BhEr0pnlY9o84PBUDv1QJkyuRGxIYQZxJHpLi1MOD62q p7X13fNXZlVd4Cor50tBzjCL7H4AVuwiBVX8WgMntP8gkJ4168zLVfVXaz++YcMTevdu I116SVEkb7RW5GOoMwNVMPElryRpbBZ85XIUs= MIME-Version: 1.0 Received: by 10.52.35.10 with SMTP id d10mr6956576vdj.132.1326098344563; Mon, 09 Jan 2012 00:39:04 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.36.5 with HTTP; Mon, 9 Jan 2012 00:39:04 -0800 (PST) In-Reply-To: <09C21BD6-46E1-4593-9E3C-183CF229A613@lassitu.de> References: <09C21BD6-46E1-4593-9E3C-183CF229A613@lassitu.de> Date: Mon, 9 Jan 2012 00:39:04 -0800 X-Google-Sender-Auth: l1gddSjcmUABONBU-_ZGDrxHZuM Message-ID: From: Adrian Chadd To: Stefan Bethke Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-arch@freebsd.org Subject: Re: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 08:39:05 -0000 Because they have to compile only for MIPS? And they'd only work for ar71xx= ? Adrian On 8 January 2012 23:59, Stefan Bethke wrote: > Am 09.01.2012 um 00:42 schrieb Adrian Chadd: > >> Hi, >> >> In order to fit an lzma'd kernel in 892 kilobytes of flash (that's 892 >> * 1024 bytes), I've needed to break out a few things into modules. >> >> I'd like to commit a couple of modules - for example, >> ar71xx_ehci/ar71xx_ohci for USB stuff - but I don't want them built >> for anything other than ar71xx builds. Thus I don't see the reason for >> putting them in sys/modules/Makefile. >> >> They build fine if they're included in MODULES_OVERRIDE in the >> relevant kernel config file. >> >> So is it ok to just commit some modules in sys/modules/ which aren't >> in the Makefile, and instead include them in the relevant SoC kernel >> configs so they're built? Or is there some other tradition for doing >> this? > > Why would it hurt to have them connected to the standard build? =A0Doe th= e tinderboxes build modules? > > > Stefan > > -- > Stefan Bethke =A0 Fon +49 151 14070811 > > > From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 08:52:33 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 835D5106566C; Mon, 9 Jan 2012 08:52:33 +0000 (UTC) (envelope-from stb@lassitu.de) Received: from gilb.zs64.net (gilb.zs64.net [IPv6:2001:470:1f0b:105e::1ea]) by mx1.freebsd.org (Postfix) with ESMTP id 420E18FC1D; Mon, 9 Jan 2012 08:52:33 +0000 (UTC) Received: by gilb.zs64.net (Postfix, from stb@lassitu.de) id 040A811AB41; Mon, 9 Jan 2012 08:52:31 +0000 (UTC) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Stefan Bethke In-Reply-To: Date: Mon, 9 Jan 2012 09:52:30 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <09C21BD6-46E1-4593-9E3C-183CF229A613@lassitu.de> To: Adrian Chadd X-Mailer: Apple Mail (2.1084) Cc: freebsd-arch@freebsd.org Subject: Re: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 08:52:33 -0000 Am 09.01.2012 um 09:39 schrieb Adrian Chadd: > On 8 January 2012 23:59, Stefan Bethke wrote: >> Am 09.01.2012 um 00:42 schrieb Adrian Chadd: >>=20 >>> Hi, >>>=20 >>> In order to fit an lzma'd kernel in 892 kilobytes of flash (that's = 892 >>> * 1024 bytes), I've needed to break out a few things into modules. >>>=20 >>> I'd like to commit a couple of modules - for example, >>> ar71xx_ehci/ar71xx_ohci for USB stuff - but I don't want them built >>> for anything other than ar71xx builds. Thus I don't see the reason = for >>> putting them in sys/modules/Makefile. >>>=20 >>> They build fine if they're included in MODULES_OVERRIDE in the >>> relevant kernel config file. >>>=20 >>> So is it ok to just commit some modules in sys/modules/ which aren't >>> in the Makefile, and instead include them in the relevant SoC kernel >>> configs so they're built? Or is there some other tradition for doing >>> this? >>=20 >> Why would it hurt to have them connected to the standard build? Doe = the tinderboxes build modules? >=20 > Because they have to compile only for MIPS? And they'd only work for = ar71xx? We have lots of modules that have specific requirements; they're still = connected to the build. What warrants different handling here? You = could put them under .if ${MACHINE_CPUARCH} !=3D "mips" like a number of = them are already. Stefan --=20 Stefan Bethke Fon +49 151 14070811 From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:04:16 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 029051065672 for ; Mon, 9 Jan 2012 09:04:16 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id B0CC58FC15 for ; Mon, 9 Jan 2012 09:04:15 +0000 (UTC) Received: by vbbfr13 with SMTP id fr13so4034578vbb.13 for ; Mon, 09 Jan 2012 01:04:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=EExUd2PbvQlRkryVqSoaGSqyJwG+DIFAH/HpnfLviqY=; b=FP7mAIfWvS/YsD8fRGaHHW5EeK5eIa9q87GzJl6F8oqXCJn0sMu9EQDjcV1LK46Tar QmfqD4KAGrmr6y9iRy2S79RDQjNwMVfO/x1wsNSf8Nt/ee9dtshkC6qVgpeOJIG4S/00 vlrNN/IqolCy0ZjFqa8Rd6+Fq0B5GndEZr1H8= MIME-Version: 1.0 Received: by 10.52.35.10 with SMTP id d10mr6987646vdj.132.1326099854880; Mon, 09 Jan 2012 01:04:14 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.36.5 with HTTP; Mon, 9 Jan 2012 01:04:14 -0800 (PST) In-Reply-To: References: <09C21BD6-46E1-4593-9E3C-183CF229A613@lassitu.de> Date: Mon, 9 Jan 2012 01:04:14 -0800 X-Google-Sender-Auth: 3_UmmhXjfGPrqsilUJUfIsKnBYc Message-ID: From: Adrian Chadd To: Stefan Bethke Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-arch@freebsd.org Subject: Re: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:04:16 -0000 On 9 January 2012 00:52, Stefan Bethke wrote: > We have lots of modules that have specific requirements; they're still co= nnected to the build. =A0What warrants different handling here? =A0You coul= d put them under .if ${MACHINE_CPUARCH} !=3D "mips" like a number of them a= re already. Hm, I didn't want to add modules for a specific SoC that won't ever be built for any other platform. Eg, if someone does a XLR build, they shouldn't get ar71xx modules built. But sure, I can just connect them to the mips build and be done with it. Adrian From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:04:51 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5464A106566C; Mon, 9 Jan 2012 09:04:51 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id CC6DD8FC18; Mon, 9 Jan 2012 09:04:50 +0000 (UTC) Received: by vbbfr13 with SMTP id fr13so4034924vbb.13 for ; Mon, 09 Jan 2012 01:04:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Ub+tdAAF1Pm/p3Xw0FWXOBgzt8R2Ms+Pmj4z7EKHWog=; b=RekGRgcM6618wMgf49wgcqFhoELmomdacXkElHKMEm3xycmcj5WUSzFMqysm7uvcVG s99M8qeRjyYIWO0VF3WLU9zGFUER9UQkQI+8lPlYHgk1/QW9wa/Kr7xUG1tlg9DSlPvc QPD+HOcJbjt8/PUMd1Pz5UZx7nhe7bQW9SonA= MIME-Version: 1.0 Received: by 10.52.24.35 with SMTP id r3mr7090452vdf.81.1326099890248; Mon, 09 Jan 2012 01:04:50 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.36.5 with HTTP; Mon, 9 Jan 2012 01:04:50 -0800 (PST) In-Reply-To: <20120109061706.GC89781@elvis.mu.org> References: <201201082035.q08KZjL5024434@gw.catspoiler.org> <20120109061706.GC89781@elvis.mu.org> Date: Mon, 9 Jan 2012 01:04:50 -0800 X-Google-Sender-Auth: 2VxS6bugMi4BZDJrRenc4Wh-4IM Message-ID: From: Adrian Chadd To: Alfred Perlstein Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: arch@freebsd.org, Don Lewis Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:04:51 -0000 .. doesn't linux swap have some metadata somewhere? Adrian On 8 January 2012 22:17, Alfred Perlstein wrote: > This is cool, it does seem to beg for a function instead of inlining > the logic in two places in case someone wants to add even more logic > to it. > > * Don Lewis [120108 13:14] wrote: >> I've got a machine that is set up to dual boot both FreeBSD and Linux. >> It is also disk space impaired, so to make the best use possible of the >> available space, I have FreeBSD set up to swap to the Linux swap >> partition. Until now I haven't had working crash dumps because geom >> didn't permit crash dumps to Linux swap partitions. This patch removes >> that limitation. =A0This could be useful for users of laptops who boot >> multiple operating systems. >> >> >> Index: sys/geom/part/g_part_ebr.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- sys/geom/part/g_part_ebr.c =A0 =A0 =A0 =A0(revision 229800) >> +++ sys/geom/part/g_part_ebr.c =A0 =A0 =A0 =A0(working copy) >> @@ -333,9 +333,10 @@ >> =A0{ >> =A0 =A0 =A0 struct g_part_ebr_entry *entry; >> >> - =A0 =A0 /* Allow dumping to a FreeBSD partition only. */ >> + =A0 =A0 /* Allow dumping to a FreeBSD partition or Linux swap partitio= n only. */ >> =A0 =A0 =A0 entry =3D (struct g_part_ebr_entry *)baseentry; >> - =A0 =A0 return ((entry->ent.dp_typ =3D=3D DOSPTYP_386BSD) ? 1 : 0); >> + =A0 =A0 return ((entry->ent.dp_typ =3D=3D DOSPTYP_386BSD || >> + =A0 =A0 =A0 =A0 entry->ent.dp_typ =3D=3D DOSPTYP_LINSWP) ? 1 : 0); >> =A0} >> >> =A0#if defined(GEOM_PART_EBR_COMPAT) >> Index: sys/geom/part/g_part_mbr.c >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> --- sys/geom/part/g_part_mbr.c =A0 =A0 =A0 =A0(revision 229800) >> +++ sys/geom/part/g_part_mbr.c =A0 =A0 =A0 =A0(working copy) >> @@ -304,9 +304,10 @@ >> =A0{ >> =A0 =A0 =A0 struct g_part_mbr_entry *entry; >> >> - =A0 =A0 /* Allow dumping to a FreeBSD partition only. */ >> + =A0 =A0 /* Allow dumping to a FreeBSD partition or Linux swap partitio= n only. */ >> =A0 =A0 =A0 entry =3D (struct g_part_mbr_entry *)baseentry; >> - =A0 =A0 return ((entry->ent.dp_typ =3D=3D DOSPTYP_386BSD) ? 1 : 0); >> + =A0 =A0 return ((entry->ent.dp_typ =3D=3D DOSPTYP_386BSD || >> + =A0 =A0 =A0 =A0 entry->ent.dp_typ =3D=3D DOSPTYP_LINSWP) ? 1 : 0); >> =A0} >> >> =A0static int >> >> >> >> Is anyone else disturbed by =A0the foot shooting potential of allowing >> crash dumps to be written to 386BSD partitions? >> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > > -- > - Alfred Perlstein > .- VMOA #5191, 03 vmax, 92 gs500, 85 ch250, 07 zx10 > .- FreeBSD committer > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:11:15 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29C5C106566C; Mon, 9 Jan 2012 09:11:15 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id EAF568FC12; Mon, 9 Jan 2012 09:11:14 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q099B605025369; Mon, 9 Jan 2012 01:11:10 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201090911.q099B605025369@gw.catspoiler.org> Date: Mon, 9 Jan 2012 01:11:06 -0800 (PST) From: Don Lewis To: adrian@FreeBSD.org In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: arch@FreeBSD.org, alfred@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:11:15 -0000 On 9 Jan, Adrian Chadd wrote: > .. doesn't linux swap have some metadata somewhere? Darned if I know, but it doesn't seem to care about FreeBSD swap data overwriting its swap partition. From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:16:47 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 76BAA106564A; Mon, 9 Jan 2012 09:16:47 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 39C3914FBCE; Mon, 9 Jan 2012 09:16:46 +0000 (UTC) Message-ID: <4F0AB07D.1060208@FreeBSD.org> Date: Mon, 09 Jan 2012 01:16:45 -0800 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111222 Thunderbird/9.0 MIME-Version: 1.0 To: Don Lewis References: <201201090911.q099B605025369@gw.catspoiler.org> In-Reply-To: <201201090911.q099B605025369@gw.catspoiler.org> X-Enigmail-Version: undefined OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, adrian@FreeBSD.org, alfred@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:16:47 -0000 On 01/09/2012 01:11, Don Lewis wrote: > On 9 Jan, Adrian Chadd wrote: >> .. doesn't linux swap have some metadata somewhere? > > Darned if I know, but it doesn't seem to care about FreeBSD swap data > overwriting its swap partition. Have you had to do anything special for linux boot? I multi-boot myself and would love to be able to save space on my laptop by only having one universal swap partition. I started to look at doing this but found various docs that said don't unless you are able to recreate the metadata that Adrian referenced above. Doug -- You can observe a lot just by watching. -- Yogi Berra Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:21:42 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2B791065676 for ; Mon, 9 Jan 2012 09:21:42 +0000 (UTC) (envelope-from nvass@gmx.com) Received: from mailout-eu.gmx.com (mailout-eu.gmx.com [213.165.64.42]) by mx1.freebsd.org (Postfix) with SMTP id EF1048FC14 for ; Mon, 9 Jan 2012 09:21:41 +0000 (UTC) Received: (qmail invoked by alias); 09 Jan 2012 09:21:40 -0000 Received: from adsl-32.46.190.8.tellas.gr (EHLO [192.168.73.192]) [46.190.8.32] by mail.gmx.com (mp-eu001) with SMTP; 09 Jan 2012 10:21:40 +0100 X-Authenticated: #46156728 X-Provags-ID: V01U2FsdGVkX18cjV8o0zAcIYcfJCLq7iLekz88co9X+fc6ULgm9u DFrqDaBPwwGNSF Message-ID: <4F0AB19C.6000601@gmx.com> Date: Mon, 09 Jan 2012 11:21:32 +0200 From: Nikos Vassiliadis User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Don Lewis References: <201201090911.q099B605025369@gw.catspoiler.org> In-Reply-To: <201201090911.q099B605025369@gw.catspoiler.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Cc: arch@FreeBSD.org, adrian@FreeBSD.org, alfred@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:21:42 -0000 On 1/9/2012 11:11 AM, Don Lewis wrote: > On 9 Jan, Adrian Chadd wrote: >> .. doesn't linux swap have some metadata somewhere? > > Darned if I know, but it doesn't seem to care about FreeBSD swap data > overwriting its swap partition. Linux will not use the swap partition without the metadata. And these metadata are located to the start of the partition, that is, dumping core there will surely destroy them. Perhaps you can add a warning in the dumpon manual page, that the swap metadata must be re-created after a coredump? Nikos From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:23:51 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 844CF1065672; Mon, 9 Jan 2012 09:23:51 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 530AC8FC12; Mon, 9 Jan 2012 09:23:51 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q099NhEp025399; Mon, 9 Jan 2012 01:23:47 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201090923.q099NhEp025399@gw.catspoiler.org> Date: Mon, 9 Jan 2012 01:23:43 -0800 (PST) From: Don Lewis To: alfred@FreeBSD.org In-Reply-To: <20120109061706.GC89781@elvis.mu.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: arch@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:23:51 -0000 On 8 Jan, Alfred Perlstein wrote: > This is cool, it does seem to beg for a function instead of inlining > the logic in two places in case someone wants to add even more logic > to it. That would be a bit messy because the two variations of this function use different partition entry structures. I believe that when these functions are called, geom has already tasted the partitions, so we might be able to use a common helper function to look for the presence of a BSD label. From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:25:33 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id AFB7D106566B; Mon, 9 Jan 2012 09:25:33 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 6F09914D968; Mon, 9 Jan 2012 09:25:32 +0000 (UTC) Message-ID: <4F0AB28B.9070003@FreeBSD.org> Date: Mon, 09 Jan 2012 01:25:31 -0800 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111222 Thunderbird/9.0 MIME-Version: 1.0 To: Nikos Vassiliadis References: <201201090911.q099B605025369@gw.catspoiler.org> <4F0AB19C.6000601@gmx.com> In-Reply-To: <4F0AB19C.6000601@gmx.com> X-Enigmail-Version: undefined OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, Don Lewis , alfred@FreeBSD.org, adrian@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:25:33 -0000 On 01/09/2012 01:21, Nikos Vassiliadis wrote: > On 1/9/2012 11:11 AM, Don Lewis wrote: >> On 9 Jan, Adrian Chadd wrote: >>> .. doesn't linux swap have some metadata somewhere? >> >> Darned if I know, but it doesn't seem to care about FreeBSD swap data >> overwriting its swap partition. > > Linux will not use the swap partition without the metadata. And > these metadata are located to the start of the partition, that > is, dumping core there will surely destroy them. Actually I'm fairly confident that we write dumps backwards from the end of the swap partition. It's done that way on purpose in case fsck'ing causes the system to swap, it may still be possible to save the dump. Doug -- You can observe a lot just by watching. -- Yogi Berra Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:28:51 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0C771065673; Mon, 9 Jan 2012 09:28:51 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 4C5F58FC08; Mon, 9 Jan 2012 09:28:51 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q099Sfdh025417; Mon, 9 Jan 2012 01:28:45 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201090928.q099Sfdh025417@gw.catspoiler.org> Date: Mon, 9 Jan 2012 01:28:41 -0800 (PST) From: Don Lewis To: nvass@gmx.com In-Reply-To: <4F0AB19C.6000601@gmx.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: arch@FreeBSD.org, adrian@FreeBSD.org, alfred@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:28:51 -0000 On 9 Jan, Nikos Vassiliadis wrote: > On 1/9/2012 11:11 AM, Don Lewis wrote: >> On 9 Jan, Adrian Chadd wrote: >>> .. doesn't linux swap have some metadata somewhere? >> >> Darned if I know, but it doesn't seem to care about FreeBSD swap data >> overwriting its swap partition. > > Linux will not use the swap partition without the metadata. And > these metadata are located to the start of the partition, that > is, dumping core there will surely destroy them. Don't we write the crash dump at the end of the partition? I thought we did this to make it less likely that we would overwrite the crash dump by using swap before savecore had a chance to run. > Perhaps you can add a warning in the dumpon manual page, that the > swap metadata must be re-created after a coredump? From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:37:57 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E9A4106566C; Mon, 9 Jan 2012 09:37:57 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id CBF288FC0A; Mon, 9 Jan 2012 09:37:56 +0000 (UTC) Received: from happy.home.yamagi.org (f054056137.adsl.alicedsl.de [78.54.56.137]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.org (Postfix) with ESMTPSA id 7C1701666334; Mon, 9 Jan 2012 10:37:54 +0100 (CET) Date: Mon, 9 Jan 2012 10:37:47 +0100 From: Yamagi Burmeister To: kostikbel@gmail.com Message-Id: <20120109103747.578d4e44.lists@yamagi.org> In-Reply-To: <20120108195913.GI31224@deviant.kiev.zoral.com.ua> References: <20111226220756.GR50300@deviant.kiev.zoral.com.ua> <20120102063700.GF50300@deviant.kiev.zoral.com.ua> <20120108174112.50e030ba.lists@yamagi.org> <20120108195913.GI31224@deviant.kiev.zoral.com.ua> X-Mailer: Sylpheed 3.1.2 (GTK+ 2.24.6; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Mon__9_Jan_2012_10_37_47_+0100_HicVJVGVxC0j/NuV" Cc: amd64@freebsd.org, arch@freebsd.org Subject: Re: AVX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:37:57 -0000 --Signature=_Mon__9_Jan_2012_10_37_47_+0100_HicVJVGVxC0j/NuV Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable First, thank you for working on AVX, it's much appreciated. On Sun, 8 Jan 2012 21:59:13 +0200 Kostik Belousov wrote: > > CPU: Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz (2194.55-MHz > > K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Family =3D 6 = Model > > =3D 17 Stepping =3D 10 > > Features=3D0xbfebfbff > MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > Features2=3D0x408e3bd > PDCM,SSE4.1,XSAVE> > > AMD Features=3D0x20100800 AMD Features2=3D0x1=20 > Is this Features excerpt from the patched kernel, or from pristine svn > sources ? If the later, please show me the Features from the patched > kernel. That was the output of the patched kernel. > I thought that I correctly handled savectx, but apparently I did not. > The issue for sleep enter could be fixed by the avx.4.patch, I am not > sure about shutdown -r panic. >=20 > http://people.freebsd.org/~kib/misc/avx.4.patch Both panics are gone. The system goes into suspend just fine and even resumes. And no more panics at reboot. --=20 Homepage: www.yamagi.org XMPP: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB --Signature=_Mon__9_Jan_2012_10_37_47_+0100_HicVJVGVxC0j/NuV Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8KtXIACgkQWTjlg++8y8vr0wCgq2Ou7DqWxfGAPqOU5psR0flm o3oAoJaZygWcC4WJQAKdCgYiDYnzpZaI =6Q6W -----END PGP SIGNATURE----- --Signature=_Mon__9_Jan_2012_10_37_47_+0100_HicVJVGVxC0j/NuV-- From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 09:41:22 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0D52106567A for ; Mon, 9 Jan 2012 09:41:22 +0000 (UTC) (envelope-from nvass@gmx.com) Received: from mailout-eu.gmx.com (mailout-eu.gmx.com [213.165.64.42]) by mx1.freebsd.org (Postfix) with SMTP id 3A9778FC1D for ; Mon, 9 Jan 2012 09:41:22 +0000 (UTC) Received: (qmail invoked by alias); 09 Jan 2012 09:41:19 -0000 Received: from adsl-32.46.190.8.tellas.gr (EHLO [192.168.73.192]) [46.190.8.32] by mail.gmx.com (mp-eu002) with SMTP; 09 Jan 2012 10:41:19 +0100 X-Authenticated: #46156728 X-Provags-ID: V01U2FsdGVkX18wAAttakYw5D2rDH6NuuXv4y3aEEAXPW94/BGPb4 O7fzWmpb+hOsd/ Message-ID: <4F0AB63D.2040503@gmx.com> Date: Mon, 09 Jan 2012 11:41:17 +0200 From: Nikos Vassiliadis User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Doug Barton References: <201201090911.q099B605025369@gw.catspoiler.org> <4F0AB19C.6000601@gmx.com> <4F0AB28B.9070003@FreeBSD.org> In-Reply-To: <4F0AB28B.9070003@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Cc: arch@FreeBSD.org, Don Lewis , alfred@FreeBSD.org, adrian@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 09:41:23 -0000 On 1/9/2012 11:25 AM, Doug Barton wrote: > Actually I'm fairly confident that we write dumps backwards from the end > of the swap partition. It's done that way on purpose in case fsck'ing > causes the system to swap, it may still be possible to save the dump. So, dumping core is safe, but not sharing the swap area... It would be nice to be able to do that. Nikos From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 10:09:34 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4DF54106566C; Mon, 9 Jan 2012 10:09:34 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id EF92D8FC0C; Mon, 9 Jan 2012 10:09:33 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q09A9NQb025487; Mon, 9 Jan 2012 02:09:27 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201091009.q09A9NQb025487@gw.catspoiler.org> Date: Mon, 9 Jan 2012 02:09:23 -0800 (PST) From: Don Lewis To: nvass@gmx.com In-Reply-To: <4F0AB63D.2040503@gmx.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: alfred@FreeBSD.org, arch@FreeBSD.org, adrian@FreeBSD.org, dougb@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 10:09:34 -0000 On 9 Jan, Nikos Vassiliadis wrote: > On 1/9/2012 11:25 AM, Doug Barton wrote: >> Actually I'm fairly confident that we write dumps backwards from the end >> of the swap partition. It's done that way on purpose in case fsck'ing >> causes the system to swap, it may still be possible to save the dump. > > So, dumping core is safe, but not sharing the swap area... > It would be nice to be able to do that. According to the mkswap(8) man page (which hasn't been updated since 2.2 even though the machine is running a 2.6 kernel) on a nearby Linux machine, the metadata stored in the first page of the swap partition. It looks like we could safely coexist if we skipped the first page of the partition. Otherwise Linux will want mkswap to be run on the partition before it will swap to the partition. Dunno why it never caused problems for me ... BTW, partition type 0x82 was also used for Solaris x86 before 2005. Hopefully nobody will accidentally overwrite their old Solaris partition with a FreeBSD crash dump ;-) From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 13:48:27 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DBD0106566B; Mon, 9 Jan 2012 13:48:27 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 52DE08FC15; Mon, 9 Jan 2012 13:48:27 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 070D746B06; Mon, 9 Jan 2012 08:48:27 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8A814B967; Mon, 9 Jan 2012 08:48:26 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Date: Mon, 9 Jan 2012 07:57:29 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201201090757.29250.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Jan 2012 08:48:26 -0500 (EST) Cc: Adrian Chadd , Stefan Bethke Subject: Re: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 13:48:27 -0000 On Monday, January 09, 2012 4:04:14 am Adrian Chadd wrote: > On 9 January 2012 00:52, Stefan Bethke wrote: > > We have lots of modules that have specific requirements; they're still connected to the build. What warrants different handling here? You could put them under .if ${MACHINE_CPUARCH} != "mips" like a number of them are already. > > Hm, I didn't want to add modules for a specific SoC that won't ever be > built for any other platform. Eg, if someone does a XLR build, they > shouldn't get ar71xx modules built. Were you planning on including them in the ar71xx kernel configs via MODULES_OVERRIDE or some such? If so, that would be sufficient to get 'make tinderbox' to cover them at least (and hopefully tinderbox builds). In that case I think it is fine to not have them hooked up in the main sys/modules build. Or rather, if Warner commits his KERNOPTS thing, perhaps you could make sys/modules/Makefile include them on appropriate SoC kernels only. -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 13:48:30 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC757106566B; Mon, 9 Jan 2012 13:48:29 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id AAF448FC08; Mon, 9 Jan 2012 13:48:29 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 46C4B46B58; Mon, 9 Jan 2012 08:48:29 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id AA1CBB991; Mon, 9 Jan 2012 08:48:28 -0500 (EST) From: John Baldwin To: Giovanni Trematerra Date: Mon, 9 Jan 2012 08:48:25 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201201090848.25736.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 09 Jan 2012 08:48:28 -0500 (EST) Cc: flo@freebsd.org, Attilio Rao , Konstantin Belousov , freebsd-arch@freebsd.org, jilles@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 13:48:30 -0000 On Saturday, January 07, 2012 9:35:47 pm Giovanni Trematerra wrote: > Hi, > the patch at > http://www.trematerra.net/patches/pipefifo_merge2.diff > > is a preliminary version of the FIFO optimizations project that I picked up from > the wiki. > http://wiki.freebsd.org/IdeasPage#FIFO_optimizations_.28GSoC.29 > > zhaoshuai@ produced the following patch in the 2009 which attempted a first > merge of the interfaces: > http://www.trematerra.net/patches/fifo_soc2009.diff > > However I felt like the work was not yet completed and come up with my final > version. > Now fifoes derive their structures from pipes one with just special handling > to support VFS operations. > All the operations but the creation/destruction for fifoes and pipes are handled > by the same code. > The heart of the patch is the new struct pipeinfo. > pipeinfo is a per-file descriptor state. Basically it maintains a read end and > a write end for the descriptor. As pipes are bidirectional in FreeBSD, for a > pipe this two fields are always equal but different for a fifo. > To let fifo code in sys/fs/fifofs/fifo_vnops.c create/destroy the pipe, two > functions (pipe_ctor/pipe_dtor) were written. pipe_ctor setups things like a > call to kern_pipe and return a pipeinfo structure, while pipe_dtor releases > all the resources for a given pipeinfo. Once a pipe was setup during a fifo_open > call, all the subsequent operations on the fifo are handled by the same code > of a pipe expect for the clean up code that calls pipe_dtor. > Allocation of two pipeinfo structures for a pipe were showed to slow down > things by some micro-benchmarking. To speed up things during > creation/destruction of pipes, the patch allocates all the needed data structure > zone using the umapipe struct that packing together all the needed data > structures to be allocated at pipe creation. A similar umafifo structure is > used for fifoes. > Thanks to jilles that made a review of the patch in a previous form, privately. > Thanks a lot to attilio that answered my stupid questions and drove me in the > right direction. Thanks for taking this on. In general I think this looks good, but had a few comments: - Why did you move setting the timestamps of pipes from the UMA ctor to an init routine? This seems wrong. The init routine is only invoked when memory is first allocated to a slab to create a set of umapipe or umafifo structures. However, that umapipe/fifo may be reused multiple times, all with the same timestamp. Setting the timestamp in the ctor routine means it is set each time a pipepair or fifo is created which seems more appropiate. Similarly with the inode value. - I would maybe call pipe_ctor(), fifo_ctor() instead or otherwise adjust the name to note it is only used to create a FIFO, not used to create a normal pipe pair (which it's name implies). - s/socket/pipe/ in the FIONREAD comment in sys_pipe.c. - Two extra blank lines in the patch that I think should be reverted: --- a/sys/fs/fifofs/fifo_vnops.c Sun Jan 01 19:00:13 2012 +0100 +++ b/sys/fs/fifofs/fifo_vnops.c Mon Jan 09 00:13:55 2012 +0100 int fi_wgen; }; + static vop_print_t fifo_print; static vop_open_t fifo_open; .... @@ -1598,13 +1839,20 @@ pipe_kqfilter(struct file *fp, struct kn return (EPIPE); } cpipe = cpipe->pipe_peer; + break; default: PIPE_UNLOCK(cpipe); return (EINVAL); -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 14:34:31 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54EB31065670; Mon, 9 Jan 2012 14:34:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 9AD4A8FC0A; Mon, 9 Jan 2012 14:34:27 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q09EYNbv001127 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Jan 2012 01:34:24 +1100 Date: Tue, 10 Jan 2012 01:34:23 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Giovanni Trematerra In-Reply-To: Message-ID: <20120110005155.S2378@besplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: jilles@FreeBSD.org, Attilio Rao , flo@FreeBSD.org, Konstantin Belousov , freebsd-arch@FreeBSD.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 14:34:31 -0000 On Sun, 8 Jan 2012, Giovanni Trematerra wrote: > Hi, > the patch at > http://www.trematerra.net/patches/pipefifo_merge2.diff > > is a preliminary version of the FIFO optimizations project that I picked up from > the wiki. > http://wiki.freebsd.org/IdeasPage#FIFO_optimizations_.28GSoC.29 I would go the other way, and pessimize pipes to be like fifos. Then optimize the socket layer under both. Fifos are not important, but they are implemented on top of the socket layer which is important. Pipes are important. In 4.4BSD, pipes were implemented on top of the socket layer too. This was much simpler than for fifos -- pipe() was just a wrapper that took a whole 44 lines, while fifofs took 602 lines. Now, fifofs still only takes 753 lines, but sys_pipe.c takes 1671 lines. pipe() is similar to socketpair(), but even simpler. socketpair() took 62 lines in 4.4BSD. It still takes only 81 lines (the extras are mainly for splitting it into sys_socketpair() and kern_socketpair()). The pipe optimizations in FreeBSD originated in 1996. They are good locally, but may have inhibited more useful optimizations in the socket layer. For the socket layer, there is the ZERO_COPY_SOCKETS options. This gives optimizations related to the ones for pipes. I have no experience with it. It seems to be only for hardware sockets. It is apparently not very popular or well maintained, since it isn't an any GENERIC. The socket layer provides some fancy ioctls that might be useful and even work for anything implemented on top of sockets. The ones for controlling socket buffer sizes and watermarks are most interesting. I don't know if the fifo wrapper does anything to prevent passing these to the socket layer. For pipes, there are no fancy ioctls. The pipe code uses heuristics and thre hard-coded value PIPE_MINDIRECT to decide whether it should try to optimize for small writes or large writes. These mostly work, but don't provide as much control as the socket ioctls. I once did a lot of benchmarking of FreeBSD pipe i/o vs Linux pipe i/o. Linux is much faster for small blocks and FreeBSD is much faster for large blocks provided they are not so large as to bust caches. This is because although the FreeBSD options for direct writes work, they have large overheads, and FreeBSD has much larger overheads generally. If the application could control the mode, then the overheads could be reduced by switching to completely different code (and if you want socket ioctls, even to the socket code). But this would be very complicated. Linux-2.6.10 implements fifos as a small wrapper around pipes, while FreeBSD implements them as a large wrapper around sockets. I hope the former is what you do -- share most pipe code, without making it more complicated, and with making the fifo wrapper much simpler. The Linux code is much simpler and smaller, since for pipes it it doesn't implement direct mode, and for sockets it doesn't have to interact with the complicated socket layer. Bruce From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 15:00:32 2012 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5A1D106564A for ; Mon, 9 Jan 2012 15:00:32 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 1C4828FC08 for ; Mon, 9 Jan 2012 15:00:31 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q09EkO0m015010 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 9 Jan 2012 16:46:24 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q09EkNeO052979; Mon, 9 Jan 2012 16:46:23 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q09EkNC4052978; Mon, 9 Jan 2012 16:46:23 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 9 Jan 2012 16:46:23 +0200 From: Kostik Belousov To: Bruce Evans Message-ID: <20120109144623.GW31224@deviant.kiev.zoral.com.ua> References: <20120110005155.S2378@besplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6cExhHXXDEBW2NKZ" Content-Disposition: inline In-Reply-To: <20120110005155.S2378@besplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: jilles@FreeBSD.org, Attilio Rao , flo@FreeBSD.org, Giovanni Trematerra , freebsd-arch@FreeBSD.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 15:00:32 -0000 --6cExhHXXDEBW2NKZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Jan 10, 2012 at 01:34:23AM +1100, Bruce Evans wrote: > For the socket layer, there is the ZERO_COPY_SOCKETS options. This > gives optimizations related to the ones for pipes. I have no experience > with it. It seems to be only for hardware sockets. It is apparently > not very popular or well maintained, since it isn't an any GENERIC. It is known to be (very) broken with regard to the vnode-backed mappings. AFAIR, after the COW kicks in, buffer code operates on the wrong page. In the best case, it results in the kernel panic, in the worst, the user data is corrupted. --6cExhHXXDEBW2NKZ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8K/b8ACgkQC3+MBN1Mb4g28ACeJFiMeJoj0HrHf0AZVpc0xybn yG4An2BspEMgFbnA6JoSfEMU0XZbkSv7 =ek7t -----END PGP SIGNATURE----- --6cExhHXXDEBW2NKZ-- From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 17:06:02 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 18859106566B for ; Mon, 9 Jan 2012 17:06:02 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8D4E88FC15 for ; Mon, 9 Jan 2012 17:06:01 +0000 (UTC) Received: by eaaf13 with SMTP id f13so2989505eaa.13 for ; Mon, 09 Jan 2012 09:06:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:cc:subject:sender:date:message-id:user-agent:mime-version :content-type; bh=NL4w91vVixuFEW4kUO1WOHgNyCEgOjDCmcwVauwN/O8=; b=Y1AIt6c0inwKZGHpgdf2BlVaDxRgSawFHtqXrx9/d8H8N3OmsSYlUL+ftAT7JY9BTA tiosYdHdjURVhdGczKSqylSJq/GYCamQCbY6qWJXTB8drEKcPzlnchSJVAjXTsusb63C 9dQI2bM8h6L1IVsiTgqyAPPqa3+xDPeiijvDY= Received: by 10.205.121.138 with SMTP id gc10mr7509921bkc.3.1326127076484; Mon, 09 Jan 2012 08:37:56 -0800 (PST) Received: from localhost ([95.69.173.122]) by mx.google.com with ESMTPS id l20sm28667551bkv.5.2012.01.09.08.37.54 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 09 Jan 2012 08:37:55 -0800 (PST) From: Mikolaj Golub To: arch@freebsd.org Sender: Mikolaj Golub Date: Mon, 09 Jan 2012 18:37:52 +0200 Message-ID: <86sjjobzmn.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: Robert Watson , Kostik Belousov Subject: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 17:06:02 -0000 --=-=-= Hi, There is a longstanding problem with nullfs(5) that is unix sockets do not work between lower and upper layers. See, e.g. kern/51583, kern/159663. On a unix socket binding the created socket is referenced in the vnode field v_socket. This field is used on connect (from the vnode returned by lookup). Unix socket functions like unp_bind/connect set/access this field directly. This is the issue for nullfs, which uses two-layer vnode approach: binding to the upper layer, the socket reference is stored in the upper vnode; binding to the lower fs, the socket reference is stored in the lower vnode and is not seen from the upper layer. E.g. having /mnt/upper nullfs mounted on /mnt/lower: 1) if we bind to /mnt/lower/test.sock we can connect only to /mnt/lower/test.sock. 2) if we bind to /mnt/upper/test.sock we can connect only to /mnt/upper/test.sock. The desired behavior is one can connect to both the lower and the upper paths regardless if we bind to /mnt/lower/test.sock or /mnt/upeer/test.sock. In kern/159663 two approaches were discussed: 1) copy the socket pointer from lower vnode to upper vnode on the upper vnode get (fix the case when one binds to the lower fs and wants to connect via the upper, but does not fix the case when one binds to the upper and wants to connect via the lower fs); 2) make null_lookup/create return lower vnode for VSOCK vnodes. Both approaches have issues and looks rather hackish. kib@ suggested that the issue could be fixed if one added new VOP_* operations for setting and accessing vnode's v_socket field. The attached patch implements this. It also can be found here: http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and VOP_UNPDETACH. Their purpose can be understood from the modifications in uipc_usrreq.c: - vp->v_socket = unp->unp_socket; + VOP_UNPBIND(vp, unp->unp_socket); - so2 = vp->v_socket; + VOP_UNPCONNECT(vp, &so2); - unp->unp_vnode->v_socket = NULL; + VOP_UNPDETACH(unp->unp_vnode); The default functions just do these simple operations, while filesystems like nullfs can do more complicated things. The patch also implements functions for nullfs. By default the old behavior is preserved. To get the new behaviour the filesystem should be (re)mounted with sobypass option. Then the socket operations are bypassed to a lower vnode, which makes the socket be accessible from both layers. I am very interested to hear other people opinion on this. -- Mikolaj Golub --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=nullfs.VOP_UNP.4.patch Index: sys/sys/vnode.h =================================================================== --- sys/sys/vnode.h (revision 229701) +++ sys/sys/vnode.h (working copy) @@ -695,6 +695,9 @@ int vop_stdpathconf(struct vop_pathconf_args *); int vop_stdpoll(struct vop_poll_args *); int vop_stdvptocnp(struct vop_vptocnp_args *ap); int vop_stdvptofh(struct vop_vptofh_args *ap); +int vop_stdunpbind(struct vop_unpbind_args *ap); +int vop_stdunpconnect(struct vop_unpconnect_args *ap); +int vop_stdunpdetach(struct vop_unpdetach_args *ap); int vop_eopnotsupp(struct vop_generic_args *ap); int vop_ebadf(struct vop_generic_args *ap); int vop_einval(struct vop_generic_args *ap); Index: sys/kern/uipc_usrreq.c =================================================================== --- sys/kern/uipc_usrreq.c (revision 229701) +++ sys/kern/uipc_usrreq.c (working copy) @@ -542,7 +542,7 @@ restart: UNP_LINK_WLOCK(); UNP_PCB_LOCK(unp); - vp->v_socket = unp->unp_socket; + VOP_UNPBIND(vp, unp->unp_socket); unp->unp_vnode = vp; unp->unp_addr = soun; unp->unp_flags &= ~UNP_BINDING; @@ -638,7 +638,7 @@ uipc_detach(struct socket *so) * XXXRW: Should assert vp->v_socket == so. */ if ((vp = unp->unp_vnode) != NULL) { - unp->unp_vnode->v_socket = NULL; + VOP_UNPDETACH(vp); unp->unp_vnode = NULL; } unp2 = unp->unp_conn; @@ -1308,7 +1308,7 @@ unp_connect(struct socket *so, struct sockaddr *na * and to protect simultaneous locking of multiple pcbs. */ UNP_LINK_WLOCK(); - so2 = vp->v_socket; + VOP_UNPCONNECT(vp, &so2); if (so2 == NULL) { error = ECONNREFUSED; goto bad2; Index: sys/kern/vfs_default.c =================================================================== --- sys/kern/vfs_default.c (revision 229701) +++ sys/kern/vfs_default.c (working copy) @@ -123,6 +123,9 @@ struct vop_vector default_vnodeops = { .vop_unlock = vop_stdunlock, .vop_vptocnp = vop_stdvptocnp, .vop_vptofh = vop_stdvptofh, + .vop_unpbind = vop_stdunpbind, + .vop_unpconnect = vop_stdunpconnect, + .vop_unpdetach = vop_stdunpdetach, }; /* @@ -1037,6 +1040,39 @@ vop_stdadvise(struct vop_advise_args *ap) return (error); } +int +vop_stdunpbind(struct vop_unpbind_args *ap) +{ + struct vnode *vp; + + vp = ap->a_vp; + + vp->v_socket = ap->a_socket; + return (0); +} + +int +vop_stdunpconnect(struct vop_unpconnect_args *ap) +{ + struct vnode *vp; + + vp = ap->a_vp; + + *ap->a_socket = vp->v_socket; + return (0); +} + +int +vop_stdunpdetach(struct vop_unpdetach_args *ap) +{ + struct vnode *vp; + + vp = ap->a_vp; + + vp->v_socket = NULL; + return (0); +} + /* * vfs default ops * used to fill the vfs function table to get reasonable default return values. Index: sys/kern/vnode_if.src =================================================================== --- sys/kern/vnode_if.src (revision 229701) +++ sys/kern/vnode_if.src (working copy) @@ -639,3 +639,23 @@ vop_advise { IN off_t end; IN int advice; }; + +%% unpbind vp E E E + +vop_unpbind { + IN struct vnode *vp; + IN struct socket *socket; +}; + +%% unpconnect vp E E E + +vop_unpconnect { + IN struct vnode *vp; + OUT struct socket **socket; +}; + +%% unpdetach vp E E E + +vop_unpdetach { + IN struct vnode *vp; +}; Index: sys/fs/nullfs/null.h =================================================================== --- sys/fs/nullfs/null.h (revision 229701) +++ sys/fs/nullfs/null.h (working copy) @@ -37,8 +37,15 @@ struct null_mount { struct mount *nullm_vfs; struct vnode *nullm_rootvp; /* Reference to root null_node */ + uint64_t nullm_flags; /* nullfs options specific for mount */ }; +/* + * Flags stored in nullm_flags. + */ +#define NULLMNT_SOBYPASS 0x00000001 /* Bypass unix socket operations + to lower vnode */ + #ifdef _KERNEL /* * A cache of vnode references @@ -47,8 +54,16 @@ struct null_node { LIST_ENTRY(null_node) null_hash; /* Hash list */ struct vnode *null_lowervp; /* VREFed once */ struct vnode *null_vnode; /* Back pointer */ + u_int null_flags; /* Flags */ }; +/* + * Flags stored in null_flags. + */ + +#define NULL_SOBYPASS 0x00000001 /* Bypass unix socket operations + to lower vnode */ + #define MOUNTTONULLMOUNT(mp) ((struct null_mount *)((mp)->mnt_data)) #define VTONULL(vp) ((struct null_node *)(vp)->v_data) #define NULLTOV(xp) ((xp)->null_vnode) Index: sys/fs/nullfs/null_vnops.c =================================================================== --- sys/fs/nullfs/null_vnops.c (revision 229701) +++ sys/fs/nullfs/null_vnops.c (working copy) @@ -812,6 +812,52 @@ null_vptocnp(struct vop_vptocnp_args *ap) return (error); } +static int +null_unpbind(struct vop_unpbind_args *ap) +{ + struct vnode *vp; + struct null_node *xp; + struct null_mount *xmp; + + vp = ap->a_vp; + xp = VTONULL(vp); + xmp = MOUNTTONULLMOUNT(vp->v_mount); + if (xmp->nullm_flags & NULLMNT_SOBYPASS) { + xp->null_flags |= NULL_SOBYPASS; + return (null_bypass((struct vop_generic_args *)ap)); + } else { + return (vop_stdunpbind(ap)); + } +} + +static int +null_unpconnect(struct vop_unpconnect_args *ap) +{ + struct vnode *vp; + struct null_mount *xmp; + + vp = ap->a_vp; + xmp = MOUNTTONULLMOUNT(vp->v_mount); + if (xmp->nullm_flags & NULLMNT_SOBYPASS) + return (null_bypass((struct vop_generic_args *)ap)); + else + return (vop_stdunpconnect(ap)); +} + +static int +null_unpdetach(struct vop_unpdetach_args *ap) +{ + struct vnode *vp; + struct null_node *xp; + + vp = ap->a_vp; + xp = VTONULL(vp); + if (xp->null_flags & NULL_SOBYPASS) + return (null_bypass((struct vop_generic_args *)ap)); + else + return (vop_stdunpdetach(ap)); +} + /* * Global vfs data structures */ @@ -837,4 +883,7 @@ struct vop_vector null_vnodeops = { .vop_unlock = null_unlock, .vop_vptocnp = null_vptocnp, .vop_vptofh = null_vptofh, + .vop_unpbind = null_unpbind, + .vop_unpconnect = null_unpconnect, + .vop_unpdetach = null_unpdetach, }; Index: sys/fs/nullfs/null_subr.c =================================================================== --- sys/fs/nullfs/null_subr.c (revision 229701) +++ sys/fs/nullfs/null_subr.c (working copy) @@ -235,6 +235,7 @@ null_nodeget(mp, lowervp, vpp) xp->null_vnode = vp; xp->null_lowervp = lowervp; + xp->null_flags = 0; vp->v_type = lowervp->v_type; vp->v_data = xp; vp->v_vnlock = lowervp->v_vnlock; Index: sys/fs/nullfs/null_vfsops.c =================================================================== --- sys/fs/nullfs/null_vfsops.c (revision 229701) +++ sys/fs/nullfs/null_vfsops.c (working copy) @@ -84,16 +84,26 @@ nullfs_mount(struct mount *mp) if (mp->mnt_flag & MNT_ROOTFS) return (EOPNOTSUPP); /* - * Update is a no-op + * Update is supported only for some options. */ if (mp->mnt_flag & MNT_UPDATE) { - /* - * Only support update mounts for NFS export. - */ + error = EOPNOTSUPP; + xmp = MOUNTTONULLMOUNT(mp); + if (vfs_flagopt(mp->mnt_optnew, "sobypass", NULL, 0)) { + MNT_ILOCK(mp); + xmp->nullm_flags |= NULLMNT_SOBYPASS; + MNT_IUNLOCK(mp); + error = 0; + } + if (vfs_flagopt(mp->mnt_optnew, "nosobypass", NULL, 0)) { + MNT_ILOCK(mp); + xmp->nullm_flags &= ~NULLMNT_SOBYPASS; + MNT_IUNLOCK(mp); + error = 0; + } if (vfs_flagopt(mp->mnt_optnew, "export", NULL, 0)) - return (0); - else - return (EOPNOTSUPP); + error = 0; + return (error); } /* @@ -182,6 +192,11 @@ nullfs_mount(struct mount *mp) MNT_ILOCK(mp); mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag & MNTK_MPSAFE; MNT_IUNLOCK(mp); + + xmp->nullm_flags = 0; + vfs_flagopt(mp->mnt_optnew, "sobypass", &xmp->nullm_flags, + NULLMNT_SOBYPASS); + mp->mnt_data = xmp; vfs_getnewfsid(mp); Index: sbin/mount_nullfs/mount_nullfs.c =================================================================== --- sbin/mount_nullfs/mount_nullfs.c (revision 229701) +++ sbin/mount_nullfs/mount_nullfs.c (working copy) @@ -57,27 +57,36 @@ static const char rcsid[] = #include "mntopts.h" +#define NULLOPT_SOBYPASS 0x00000001 +#define NULLOPT_MASK (NULLOPT_SOBYPASS) + static struct mntopt mopts[] = { MOPT_STDOPTS, + MOPT_UPDATE, + {"sobypass", 0, NULLOPT_SOBYPASS, 1}, MOPT_END }; +static char fstype[] = "nullfs"; + int subdir(const char *, const char *); static void usage(void) __dead2; int main(int argc, char *argv[]) { - struct iovec iov[6]; - int ch, mntflags; + struct iovec *iov; + int ch, iovlen, mntflags, nullflags, negflags; char source[MAXPATHLEN]; char target[MAXPATHLEN]; - mntflags = 0; + mntflags = nullflags = 0; + negflags = NULLOPT_MASK; while ((ch = getopt(argc, argv, "o:")) != -1) switch(ch) { case 'o': - getmntopts(optarg, mopts, &mntflags, 0); + getmntopts(optarg, mopts, &mntflags, &nullflags); + getmntopts(optarg, mopts, &mntflags, &negflags); break; case '?': default: @@ -97,20 +106,18 @@ main(int argc, char *argv[]) errx(EX_USAGE, "%s (%s) and %s are not distinct paths", argv[0], target, argv[1]); - iov[0].iov_base = strdup("fstype"); - iov[0].iov_len = sizeof("fstype"); - iov[1].iov_base = strdup("nullfs"); - iov[1].iov_len = strlen(iov[1].iov_base) + 1; - iov[2].iov_base = strdup("fspath"); - iov[2].iov_len = sizeof("fspath"); - iov[3].iov_base = source; - iov[3].iov_len = strlen(source) + 1; - iov[4].iov_base = strdup("target"); - iov[4].iov_len = sizeof("target"); - iov[5].iov_base = target; - iov[5].iov_len = strlen(target) + 1; - - if (nmount(iov, 6, mntflags)) + iov = NULL; + iovlen = 0; + build_iovec(&iov, &iovlen, "fstype", fstype, (size_t)-1); + build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1); + build_iovec(&iov, &iovlen, "target", target, (size_t)-1); + if ((nullflags & NULLOPT_SOBYPASS) != 0) + build_iovec(&iov, &iovlen, "sobypass", NULL, 0); + if ((mntflags & MNT_UPDATE) != 0) { + if ((negflags & NULLOPT_SOBYPASS) == 0) + build_iovec(&iov, &iovlen, "nosobypass", NULL, 0); + } + if (nmount(iov, iovlen, mntflags)) err(1, NULL); exit(0); } Index: sbin/mount_nullfs/mount_nullfs.8 =================================================================== --- sbin/mount_nullfs/mount_nullfs.8 (revision 229701) +++ sbin/mount_nullfs/mount_nullfs.8 (working copy) @@ -79,8 +79,14 @@ Options are specified with a flag followed by a comma separated string of options. See the .Xr mount 8 -man page for possible options and their meanings. +man page for standard options and their meanings. +Options specific for +.Nm : +.Bl -tag -width sobypass +.It Cm sobypass +Bypass unix socket operations to the lower layer. .El +.El .Pp The null layer has two purposes. First, it serves as a demonstration of layering by providing a layer --=-=-=-- From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 17:24:17 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6ED92106566C; Mon, 9 Jan 2012 17:24:17 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 3C6818FC08; Mon, 9 Jan 2012 17:24:16 +0000 (UTC) Received: by elvis.mu.org (Postfix, from userid 1192) id A76F31A3C8C; Mon, 9 Jan 2012 09:24:16 -0800 (PST) Date: Mon, 9 Jan 2012 09:24:16 -0800 From: Alfred Perlstein To: Mikolaj Golub Message-ID: <20120109172416.GB51558@elvis.mu.org> References: <86sjjobzmn.fsf@kopusha.home.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86sjjobzmn.fsf@kopusha.home.net> User-Agent: Mutt/1.4.2.3i Cc: Kostik Belousov , arch@freebsd.org, Robert Watson Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 17:24:17 -0000 Adding new VOPs makes the most sense. * Mikolaj Golub [120109 09:06] wrote: > Hi, > > There is a longstanding problem with nullfs(5) that is unix sockets do > not work between lower and upper layers. > > See, e.g. kern/51583, kern/159663. > > On a unix socket binding the created socket is referenced in the vnode > field v_socket. This field is used on connect (from the vnode returned > by lookup). Unix socket functions like unp_bind/connect set/access > this field directly. > > This is the issue for nullfs, which uses two-layer vnode approach: > binding to the upper layer, the socket reference is stored in the > upper vnode; binding to the lower fs, the socket reference is stored > in the lower vnode and is not seen from the upper layer. > > E.g. having /mnt/upper nullfs mounted on /mnt/lower: > > 1) if we bind to /mnt/lower/test.sock we can connect only to > /mnt/lower/test.sock. > > 2) if we bind to /mnt/upper/test.sock we can connect only to > /mnt/upper/test.sock. > > The desired behavior is one can connect to both the lower and the > upper paths regardless if we bind to /mnt/lower/test.sock or > /mnt/upeer/test.sock. > > In kern/159663 two approaches were discussed: > > 1) copy the socket pointer from lower vnode to upper vnode on the > upper vnode get (fix the case when one binds to the lower fs and wants > to connect via the upper, but does not fix the case when one binds to > the upper and wants to connect via the lower fs); > > 2) make null_lookup/create return lower vnode for VSOCK vnodes. > > Both approaches have issues and looks rather hackish. > > kib@ suggested that the issue could be fixed if one added new VOP_* > operations for setting and accessing vnode's v_socket field. > > The attached patch implements this. It also can be found here: > > http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch > > It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and > VOP_UNPDETACH. Their purpose can be understood from the modifications > in uipc_usrreq.c: > > - vp->v_socket = unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); > > - so2 = vp->v_socket; > + VOP_UNPCONNECT(vp, &so2); > > - unp->unp_vnode->v_socket = NULL; > + VOP_UNPDETACH(unp->unp_vnode); > > The default functions just do these simple operations, while > filesystems like nullfs can do more complicated things. > > The patch also implements functions for nullfs. By default the old > behavior is preserved. To get the new behaviour the filesystem should > be (re)mounted with sobypass option. Then the socket operations are > bypassed to a lower vnode, which makes the socket be accessible from > both layers. > > I am very interested to hear other people opinion on this. > > -- > Mikolaj Golub > > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- - Alfred Perlstein .- VMOA #5191, 03 vmax, 92 gs500, 85 ch250, 07 zx10 .- FreeBSD committer From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 17:43:39 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F043D1065679; Mon, 9 Jan 2012 17:43:39 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9378D8FC16; Mon, 9 Jan 2012 17:43:39 +0000 (UTC) Received: by vcbfk1 with SMTP id fk1so4498342vcb.13 for ; Mon, 09 Jan 2012 09:43:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=/c81fBBuPovLSll81vQVN2Pwd9lwj1kfdTaFcc1VrUQ=; b=irF3Z8qhkKLOLHbKn47MHu3KgZUgVniZSkJnJpVYaVhd/GHaG05O+sjPoXkKxVE2dQ LXAyBLV2EFmHo6dRBrFe6V03K/9+8tUgjBPnyvb8xg1IKqIYvijEVSVaIrYMzMu5ITig RFajSrwtA2cng9Iww7Kfc8dkS5H352LiKg04g= MIME-Version: 1.0 Received: by 10.52.35.10 with SMTP id d10mr7898790vdj.132.1326131019003; Mon, 09 Jan 2012 09:43:39 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.36.5 with HTTP; Mon, 9 Jan 2012 09:43:38 -0800 (PST) In-Reply-To: <201201090757.29250.jhb@freebsd.org> References: <201201090757.29250.jhb@freebsd.org> Date: Mon, 9 Jan 2012 09:43:38 -0800 X-Google-Sender-Auth: m19gF6cSk9xuLPASd8x3t92Qm_0 Message-ID: From: Adrian Chadd To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Stefan Bethke , freebsd-arch@freebsd.org Subject: Re: Where should I put ar71xx_* modules? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 17:43:40 -0000 On 9 January 2012 04:57, John Baldwin wrote: > Were you planning on including them in the ar71xx kernel configs via > MODULES_OVERRIDE or some such? =A0If so, that would be sufficient to get > 'make tinderbox' to cover them at least (and hopefully tinderbox builds). > In that case I think it is fine to not have them hooked up in the main > sys/modules build. =A0Or rather, if Warner commits his KERNOPTS thing, pe= rhaps > you could make sys/modules/Makefile include them on appropriate SoC kerne= ls > only. Yup that was the plan - include them in MODULES_OVERRIDE for the appropriate kernels. Adrian From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 20:22:20 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E26F51065672; Mon, 9 Jan 2012 20:22:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6204A8FC0C; Mon, 9 Jan 2012 20:22:18 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q09KMFNP059559 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 9 Jan 2012 22:22:15 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q09KMFR3054492; Mon, 9 Jan 2012 22:22:15 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q09KMEgo054491; Mon, 9 Jan 2012 22:22:14 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 9 Jan 2012 22:22:14 +0200 From: Kostik Belousov To: Yamagi Burmeister Message-ID: <20120109202214.GZ31224@deviant.kiev.zoral.com.ua> References: <20111226220756.GR50300@deviant.kiev.zoral.com.ua> <20120102063700.GF50300@deviant.kiev.zoral.com.ua> <20120108174112.50e030ba.lists@yamagi.org> <20120108195913.GI31224@deviant.kiev.zoral.com.ua> <20120109103747.578d4e44.lists@yamagi.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="9C8JZALFROMdO0HE" Content-Disposition: inline In-Reply-To: <20120109103747.578d4e44.lists@yamagi.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: amd64@freebsd.org, arch@freebsd.org, marius@freebsd.org, flo@freebsd.org Subject: Re: AVX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 20:22:20 -0000 --9C8JZALFROMdO0HE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jan 09, 2012 at 10:37:47AM +0100, Yamagi Burmeister wrote: > First, thank you for working on AVX, it's much appreciated. >=20 > On Sun, 8 Jan 2012 21:59:13 +0200 > Kostik Belousov wrote: >=20 > > > CPU: Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz (2194.55-MHz > > > K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Family =3D 6= Model > > > =3D 17 Stepping =3D 10 > > > Features=3D0xbfebfbff > > MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > > Features2=3D0x408e3bd > > PDCM,SSE4.1,XSAVE> > > > AMD Features=3D0x20100800 AMD Features2=3D0x1=20 > > Is this Features excerpt from the patched kernel, or from pristine svn > > sources ? If the later, please show me the Features from the patched > > kernel. >=20 > That was the output of the patched kernel. I see. The issue was that cpu_feature2, which is decoded to print the Features2 line, is retrieved much earlier then XCR0 is updated by fpu initialization code. I added a cludge to reload cpu_feature2 if XSAVE was indeed enabled. So now you should see OSXSAVE reported if XSAVE is indeed enabled. >=20 > > I thought that I correctly handled savectx, but apparently I did not. > > The issue for sleep enter could be fixed by the avx.4.patch, I am not > > sure about shutdown -r panic. > >=20 > > http://people.freebsd.org/~kib/misc/avx.4.patch >=20 > Both panics are gone. The system goes into suspend just fine and even > resumes. And no more panics at reboot. Very good, thank you for testing. I decided to avoid use of __fillcontextx() for !x86 architectures for now, since nobody seems to know what happens on sparc64. Florian, can you, please, retest the defer_sig to see if it works now ? I hope that this is commit candidate: http://people.freebsd.org/~kib/misc/avx.5.patch --9C8JZALFROMdO0HE Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8LTHYACgkQC3+MBN1Mb4jcFgCeL5DpuUjELfEVHpjT3IS9h3VE hC4AoM/xGLzjJD04zenoDKCWjlYwFLCy =++UU -----END PGP SIGNATURE----- --9C8JZALFROMdO0HE-- From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 22:27:30 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5BBF81065672; Mon, 9 Jan 2012 22:27:30 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id B1FFA8FC08; Mon, 9 Jan 2012 22:27:29 +0000 (UTC) Received: by qcse13 with SMTP id e13so3165542qcs.13 for ; Mon, 09 Jan 2012 14:27:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=13fEg+iPZAZS8/SQhk209vUfADR12PXSAqo0WXirJEE=; b=TWiBVDDcBG4f54usiiJ1reD8pu+rEZUEBISWdPuL/tk6mQNCkcHgEuV/aeEj0QwJSA SE7fzAOEQzPMQHl0h/aK+DMnWEvQ+XHrb0ZpBB7Z7ur3bbjaD/nWcJLNI2BeX1bmFmJX 4KX52vxqg+BmUWot0G7RytSTozZAu6NqU1hFQ= MIME-Version: 1.0 Received: by 10.229.135.193 with SMTP id o1mr6575083qct.25.1326148049025; Mon, 09 Jan 2012 14:27:29 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.185.82 with HTTP; Mon, 9 Jan 2012 14:27:28 -0800 (PST) In-Reply-To: <20120110005155.S2378@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> Date: Mon, 9 Jan 2012 23:27:28 +0100 X-Google-Sender-Auth: m1duyA_0eOIkuM_e6i3PMqxqaOA Message-ID: From: Giovanni Trematerra To: Bruce Evans Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: jilles@freebsd.org, Attilio Rao , flo@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 22:27:30 -0000 On Mon, Jan 9, 2012 at 3:34 PM, Bruce Evans wrote: > On Sun, 8 Jan 2012, Giovanni Trematerra wrote: > >> Hi, >> the patch at >> http://www.trematerra.net/patches/pipefifo_merge2.diff >> >> is a preliminary version of the FIFO optimizations project that I picked >> up from >> the wiki. >> http://wiki.freebsd.org/IdeasPage#FIFO_optimizations_.28GSoC.29 > > > I would go the other way, and pessimize pipes to be like fifos. =A0Then > optimize the socket layer under both. =A0Fifos are not important, but > they are implemented on top of the socket layer which is important. > Pipes are important. =A0In 4.4BSD, pipes were implemented on top of the > socket layer too. =A0This was much simpler than for fifos -- pipe() was > just a wrapper that took a whole 44 lines, while fifofs took 602 lines. > Now, fifofs still only takes 753 lines, but sys_pipe.c takes 1671 > lines. =A0pipe() is similar to socketpair(), but even simpler. =A0socketp= air() > took 62 lines in 4.4BSD. =A0It still takes only 81 lines (the extras are > mainly for splitting it into sys_socketpair() and kern_socketpair()). > The pipe optimizations in FreeBSD originated in 1996. =A0They are good > locally, but may have inhibited more useful optimizations in the socket > layer. > [skip] > > The socket layer provides some fancy ioctls that might be useful and > even work for anything implemented on top of sockets. =A0The ones for > controlling socket buffer sizes and watermarks are most interesting. > I don't know if the fifo wrapper does anything to prevent passing these > to the socket layer. =A0For pipes, there are no fancy ioctls. =A0The pipe > code uses heuristics and thre hard-coded value PIPE_MINDIRECT to > decide whether it should try to optimize for small writes or large > writes. =A0These mostly work, but don't provide as much control as the > socket ioctls. =A0I once did a lot of benchmarking of FreeBSD pipe i/o > vs Linux pipe i/o. =A0Linux is much faster for small blocks and FreeBSD > is much faster for large blocks provided they are not so large as to > bust caches. =A0This is because although the FreeBSD options for direct > writes work, they have large overheads, and FreeBSD has much larger > overheads generally. =A0If the application could control the mode, then > the overheads could be reduced by switching to completely different > code (and if you want socket ioctls, even to the socket code). =A0But > this would be very complicated. Thanks a lot for your time. I see you don't like the way pipes are implemented in FreeBSD but that isn't relevant with the patch. The aim of the patch is eliminating unnecessary code, introduce no performance penalty into pipe code and make fifos faster. I think the patch achieved all the above goals. If we have to implement pipes on top of the socket layer or in a different way is just a different story. if we'll come up with a better implementation for the pipes, with this patc= h, fifos gain the improvements for free. > > Linux-2.6.10 implements fifos as a small wrapper around pipes, while > FreeBSD implements them as a large wrapper around sockets. =A0I hope the > former is what you do -- share most pipe code, without making it more > complicated, and with making the fifo wrapper much simpler. =A0The Linux > code is much simpler and smaller, since for pipes it it doesn't > implement direct mode, and for sockets it doesn't have to interact with > the complicated socket layer. If you read the patch, as I think you didn't, you'd see that there's no wra= pper at all. fifo's code is just fifo_open, fifo_close and another couple of hel= per functions to deal with VFS, all the remaining code is shared with pipes and no complicated code was added. -- Gianni From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 22:52:08 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77E98106564A; Mon, 9 Jan 2012 22:52:08 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 32A208FC16; Mon, 9 Jan 2012 22:52:07 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id 047F9632F; Mon, 9 Jan 2012 22:52:06 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id B3438816B; Mon, 9 Jan 2012 23:52:06 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Adrian Chadd References: <86ty4a8mc3.fsf@ds4.des.no> Date: Mon, 09 Jan 2012 23:52:06 +0100 In-Reply-To: (Adrian Chadd's message of "Fri, 6 Jan 2012 13:30:31 -0800") Message-ID: <86ehv8xze1.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, freebsd-current , freebsd-arch@freebsd.org Subject: Re: Is it possible to make subr_acl_nfs4 and subr_acl_posix1e disabled? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 22:52:08 -0000 Adrian Chadd writes: > Dag-Erling Sm=C3=B8rgrav writes: > > I would be very annoyed if it were no longer possible to netboot > > GENERIC... > I don't want to break that. :) I Just don't want to compile it in > unless I'm using NFS/ZFS, and on my 4MB flash boards I'm not booting > w/ NFS compiled in statically.. Sorry, I just realized that I read the text of your message but not the subject; I thought you were proposing to remove NFS from GENERIC. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Mon Jan 9 23:20:23 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9C6B1065672; Mon, 9 Jan 2012 23:20:23 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qw0-f47.google.com (mail-qw0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id E3BDB8FC19; Mon, 9 Jan 2012 23:20:22 +0000 (UTC) Received: by qadb17 with SMTP id b17so1401199qad.13 for ; Mon, 09 Jan 2012 15:20:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=GPDBTCQQcWF4euxBVIWPCz6P7qB9+a4oba+k1VMPOUg=; b=F2gplfH8YWhNpVc4ETwoVSF09+lPDNWzISczNJ7tmKlLtot1OIIU2DWuIKH0Kvr7EE LMZEZ0eLwYlRB9lucYnasC8xdiu1TvTmWbNOnavuRCxrrOvsTABCjEAY8RF8UXxVBw55 O7lQrD1lfCJ+SQy10WxJ75YYvrrX94752H95c= MIME-Version: 1.0 Received: by 10.224.175.2 with SMTP id v2mr21725918qaz.69.1326151222095; Mon, 09 Jan 2012 15:20:22 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.185.82 with HTTP; Mon, 9 Jan 2012 15:20:21 -0800 (PST) In-Reply-To: <201201090848.25736.jhb@freebsd.org> References: <201201090848.25736.jhb@freebsd.org> Date: Tue, 10 Jan 2012 00:20:21 +0100 X-Google-Sender-Auth: o2Hgir6CHlb9FBest_xradJFAH4 Message-ID: From: Giovanni Trematerra To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: flo@freebsd.org, Attilio Rao , Konstantin Belousov , freebsd-arch@freebsd.org, jilles@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Jan 2012 23:20:23 -0000 On Mon, Jan 9, 2012 at 2:48 PM, John Baldwin wrote: > On Saturday, January 07, 2012 9:35:47 pm Giovanni Trematerra wrote: >> Hi, >> the patch at >> http://www.trematerra.net/patches/pipefifo_merge2.diff >> >> is a preliminary version of the FIFO optimizations project that I picked= up from >> the wiki. >> http://wiki.freebsd.org/IdeasPage#FIFO_optimizations_.28GSoC.29 >> >> zhaoshuai@ produced the following patch in the 2009 which attempted a fi= rst >> merge of the interfaces: >> http://www.trematerra.net/patches/fifo_soc2009.diff >> >> However I felt like the work was not yet completed and come up with my f= inal >> version. >> Now fifoes derive their structures from pipes one with just special hand= ling >> to support VFS operations. >> All the operations but the creation/destruction for fifoes and pipes are= handled >> by the same code. >> The heart of the patch is the new struct pipeinfo. >> pipeinfo is a per-file descriptor state. Basically it maintains a read e= nd and >> a write end for the descriptor. As pipes are bidirectional in FreeBSD, f= or a >> pipe this two fields are always equal but different for a fifo. >> To let fifo code in sys/fs/fifofs/fifo_vnops.c create/destroy the pipe, = two >> functions (pipe_ctor/pipe_dtor) were written. pipe_ctor setups things li= ke a >> call to kern_pipe and return a pipeinfo structure, while pipe_dtor relea= ses >> all the resources for a given pipeinfo. Once a pipe was setup during a f= ifo_open >> call, all the subsequent operations on the fifo are handled by the same = code >> of a pipe expect for the clean up code that calls pipe_dtor. >> Allocation of two pipeinfo structures for a pipe were showed to slow dow= n >> things by some micro-benchmarking. To speed up things during >> creation/destruction of pipes, the patch allocates all the needed data s= tructure >> zone using the umapipe struct that packing together all the needed data >> structures to be allocated at pipe creation. A similar umafifo structure= is >> used for fifoes. >> Thanks to jilles that made a review of the patch in a previous form, pri= vately. >> Thanks a lot to attilio that answered my stupid questions and drove me i= n the >> right direction. > > Thanks for taking this on. =A0In general I think this looks good, but had= a few > comments: > > - Why did you move setting the timestamps of pipes from the UMA ctor to a= n > =A0init routine? =A0This seems wrong. =A0The init routine is only invoked= when > =A0memory is first allocated to a slab to create a set of umapipe or umaf= ifo > =A0structures. =A0However, that umapipe/fifo may be reused multiple times= , all > =A0with the same timestamp. =A0Setting the timestamp in the ctor routine = means > =A0it is set each time a pipepair or fifo is created which seems more > =A0appropiate. =A0Similarly with the inode value. Ops, it was by accident. Thanks for pointed me out. > - I would maybe call pipe_ctor(), fifo_ctor() instead or otherwise adjust > =A0the name to note it is only used to create a FIFO, not used to create > =A0a normal pipe pair (which it's name implies). > - s/socket/pipe/ in the FIONREAD comment in sys_pipe.c. ok. > - Two extra blank lines in the patch that I think should be reverted: > > --- a/sys/fs/fifofs/fifo_vnops.c =A0 =A0 =A0 =A0Sun Jan 01 19:00:13 2012 = +0100 > +++ b/sys/fs/fifofs/fifo_vnops.c =A0 =A0 =A0 =A0Mon Jan 09 00:13:55 2012 = +0100 > =A0 =A0 =A0 =A0int =A0 =A0 =A0 =A0 =A0 =A0 fi_wgen; > =A0}; > > + > =A0static vop_print_t =A0 =A0 fifo_print; > =A0static vop_open_t =A0 =A0 =A0fifo_open; > .... > @@ -1598,13 +1839,20 @@ pipe_kqfilter(struct file *fp, struct kn > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (EPIPE); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cpipe =3D cpipe->pipe_peer; > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0break; > =A0 =A0 =A0 =A0default: > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0PIPE_UNLOCK(cpipe); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (EINVAL); will do. Thank you for your review. -- Gianni From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 00:17:26 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59A34106564A; Tue, 10 Jan 2012 00:17:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 278D18FC14; Tue, 10 Jan 2012 00:17:26 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q0A0HItk037943; Mon, 9 Jan 2012 16:17:20 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201100017.q0A0HItk037943@gw.catspoiler.org> Date: Mon, 9 Jan 2012 16:17:18 -0800 (PST) From: Don Lewis To: dougb@FreeBSD.org In-Reply-To: <4F0AB07D.1060208@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: arch@FreeBSD.org, adrian@FreeBSD.org, alfred@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 00:17:26 -0000 On 9 Jan, Doug Barton wrote: > On 01/09/2012 01:11, Don Lewis wrote: >> On 9 Jan, Adrian Chadd wrote: >>> .. doesn't linux swap have some metadata somewhere? >> >> Darned if I know, but it doesn't seem to care about FreeBSD swap data >> overwriting its swap partition. > > Have you had to do anything special for linux boot? I multi-boot myself > and would love to be able to save space on my laptop by only having one > universal swap partition. I started to look at doing this but found > various docs that said don't unless you are able to recreate the > metadata that Adrian referenced above. Looks like this is safe to do. There is some code in swaponsomething() to avoid the first two page-size blocks of the swap file to avoid overwriting the BSD label if the swap partition starts at sector zero of a BSD partition. Here's the confirmation that my Linux swap metadata is unmolested: # dd if=/dev/da0s4 bs=4k count=1 | strings 1+0 records in 1+0 records out 4096 bytes transferred in 0.012188 secs (336063 bytes/sec) jvLI SWAP-sda4 SWAPSPACE2 I think UFS always avoided this problem, but I seem to remember SunOS fixing this problem for swap. Before Sun fixed this, SunOS would stop on the first sector of the swap partition. If you decided to use a dedicated swap disk and started the swap partition at sector zero, the label would get blown away as soon as you started using swap space. From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 09:41:22 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 769E9106566C; Tue, 10 Jan 2012 09:41:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 506008FC0C; Tue, 10 Jan 2012 09:41:20 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0A9fF38002088 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Jan 2012 20:41:16 +1100 Date: Tue, 10 Jan 2012 20:41:15 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Giovanni Trematerra In-Reply-To: Message-ID: <20120110153807.H943@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1060448873-1326188475=:943" Cc: flo@freebsd.org, Attilio Rao , Konstantin Belousov , freebsd-arch@freebsd.org, jilles@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 09:41:22 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1060448873-1326188475=:943 Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE On Mon, 9 Jan 2012, Giovanni Trematerra wrote: > On Mon, Jan 9, 2012 at 3:34 PM, Bruce Evans wrote: >> >> I would go the other way, and pessimize pipes to be like fifos. =A0Then >> optimize the socket layer under both. =A0Fifos are not important, but >> they are implemented on top of the socket layer which is important. >> Pipes are important. ... >> ... >> Linux-2.6.10 implements fifos as a small wrapper around pipes, while >> FreeBSD implements them as a large wrapper around sockets. =A0I hope the >> former is what you do -- share most pipe code, without making it more >> complicated, and with making the fifo wrapper much simpler. =A0The Linux >> code is much simpler and smaller, since for pipes it it doesn't >> implement direct mode, and for sockets it doesn't have to interact with >> the complicated socket layer. > > If you read the patch, as I think you didn't, you'd see that there's no w= rapper > at all. fifo's code is just fifo_open, fifo_close and another couple of h= elper > functions to deal with VFS, all the remaining code is shared with pipes a= nd > no complicated code was added. I think you don't want me to read the patch, since I would see too much detail starting with style bugs. Anyway.. % diff -r fee0771aad22 sys/fs/fifofs/fifo.h % --- a/sys/fs/fifofs/fifo.h=09Sun Jan 01 19:00:13 2012 +0100 % +++ b/sys/fs/fifofs/fifo.h=09Mon Jan 09 00:13:55 2012 +0100 % @@ -35,4 +35,5 @@ % */ % int=09fifo_vnoperate(struct vop_generic_args *); % int=09fifo_printinfo(struct vnode *); % +int=09fifo_iseof(struct file *); This further unsorts an unsorted list by adding to the end of it. % diff -r fee0771aad22 sys/fs/fifofs/fifo_vnops.c % --- a/sys/fs/fifofs/fifo_vnops.c=09Sun Jan 01 19:00:13 2012 +0100 % +++ b/sys/fs/fifofs/fifo_vnops.c=09Mon Jan 09 00:13:55 2012 +0100 %=20 % @@ -54,74 +54,28 @@ % ... % struct fifoinfo { % -=09struct socket=09*fi_readsock; % -=09struct socket=09*fi_writesock; % -=09long=09=09fi_readers; % -=09long=09=09fi_writers; % +=09struct pipeinfo *fi_pipeinfo; % +=09long=09fi_readers; % +=09long=09fi_writers; Indentation lost. % =09int=09=09fi_wgen; % }; %=20 % + Extra blank line. % static vop_print_t=09fifo_print; % ... % @@ -186,47 +139,34 @@ fifo_open(ap) % ... % -=09=09SOCKBUF_LOCK(&rso->so_rcv); % -=09=09rso->so_rcv.sb_state |=3D SBS_CANTRCVMORE; % -=09=09SOCKBUF_UNLOCK(&rso->so_rcv); % -=09=09KASSERT(vp->v_fifoinfo =3D=3D NULL, % -=09=09 ("fifo_open: v_fifoinfo race")); % + % + =09=09KASSERT(vp->v_fifoinfo =3D=3D NULL, ("fifo_open: v_fifoinfo race"= )); % + Extra blank line. % =09=09vp->v_fifoinfo =3D fip; % =09} % +=09pip =3D fip->fi_pipeinfo; % + Extra blank line. % + =09KASSERT(pip !=3D NULL, ("fifo_open: pipeinfo is NULL")); % + Extra blank line. No comment on 100 or so further errors of this type. Extra blank lines near KASSERT()s are a common style bug, but this bug was missing in the old code here. % +=09rpipe =3D pip->pi_rpipe; % +=09wpipe =3D pip->pi_wpipe; %=20 % =09/* % =09 * Use the fifo_mtx lock here, in addition to the vnode lock, % ... % -static int % -fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct thre= ad *td) % -{ % -=09struct fifoinfo *fip; % -=09struct file filetmp; % -=09int levents, revents =3D 0; % - % -=09fip =3D fp->f_data; % -=09levents =3D events & % -=09 (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND); % -=09if ((fp->f_flag & FREAD) && levents) { % -=09=09filetmp.f_data =3D fip->fi_readsock; % -=09=09filetmp.f_cred =3D cred; % -=09=09mtx_lock(&fifo_mtx); % -=09=09if (fp->f_seqcount =3D=3D fip->fi_wgen) % -=09=09=09levents |=3D POLLINIGNEOF; % -=09=09mtx_unlock(&fifo_mtx); % -=09=09revents |=3D soo_poll(&filetmp, levents, cred, td); % -=09} % -=09levents =3D events & (POLLOUT | POLLWRNORM | POLLWRBAND); % -=09if ((fp->f_flag & FWRITE) && levents) { % -=09=09filetmp.f_data =3D fip->fi_writesock; % -=09=09filetmp.f_cred =3D cred; % -=09=09revents |=3D soo_poll(&filetmp, levents, cred, td); % -=09} % -=09return (revents); % -} In this file, I have most experience fixing this function (and open and close so that select and poll work). The above looks simple, but has a complex interaction with layers above and below it. Most of the details are in the socket layer. You had to reimplement these in the pipe layer. The most delicate point involving fs_wgen seems to be reimplemented correctly in fifo_iseof(). Before I fixed this for fifos, poll and select on pipes (especially for EOF) was less broken than for fifos, partly because pipes are simpler -- they can't be reopened. My tests in /usr/src/tools/regression/poll/ are hopefully enough to detect any regressions. Some of the tests are intentionally left broken and/or expected to fail, to be bug for bug compatible with old kernel bugs. % ... % diff -r fee0771aad22 sys/kern/sys_pipe.c % --- a/sys/kern/sys_pipe.c=09Sun Jan 01 19:00:13 2012 +0100 % +++ b/sys/kern/sys_pipe.c=09Mon Jan 09 00:13:55 2012 +0100 % ... % @@ -164,6 +184,8 @@ static struct fileops pipeops =3D { % static void=09filt_pipedetach(struct knote *kn); % static int=09filt_piperead(struct knote *kn, long hint); % static int=09filt_pipewrite(struct knote *kn, long hint); % +static void filt_pipedetach_notsup(struct knote *kn); % +static int filt_pipenotsup(struct knote *kn, long hint); This unsorts a sorted list by adding to the end of it. The old prototypes are in KNF style (tab before the function name) but the new ones aren't. % @@ -205,7 +232,7 @@ SYSCTL_INT(_kern_ipc, OID_AUTO, piperesi % =09 &piperesizeallowed, 0, "Pipe resizing allowed"); %=20 % static void pipeinit(void *dummy __unused); % -static void pipeclose(struct pipe *cpipe); % +static void pipeclose(struct pipe *cpipe, int isfifo); % static void pipe_free_kmem(struct pipe *cpipe); % static int pipe_create(struct pipe *pipe, int backing); % static __inline int pipelock(struct pipe *cpipe, int catch); % @@ -223,8 +250,12 @@ static int pipespace_new(struct pipe *cp % static int=09pipe_zone_ctor(void *mem, int size, void *arg, int flags); % static int=09pipe_zone_init(void *mem, int size, int flags); % static void=09pipe_zone_fini(void *mem, int size); % +static int=09fifo_zone_ctor(void *mem, int size, void *arg, int flags); % +static int=09fifo_zone_init(void *mem, int size, int flags); % +static void=09fifo_zone_fini(void *mem, int size); Further unsortings of an unsorted list (f < p). The ctor/init/fini were sorted in "logical" (activation), but that becomes unmanageable for long lists. init/close/free_kmem/create are now in random order. The indentation style matches the nearby code but not KNF or previous code. The style for nameing parameters matches the nearby code and KNF but not previous code. % +static int pipe_umacreate(struct thread *td, struct umapipe **p_up, int = isfifo); As above, plus line too long. % @@ -247,12 +282,14 @@ pipeinit(void *dummy __unused) % static int % pipe_zone_ctor(void *mem, int size, void *arg, int flags) % { % +=09struct umapipe *up; % =09struct pipepair *pp; % =09struct pipe *rpipe, *wpipe; Unsorting. % @@ -295,42 +328,133 @@ pipe_zone_ctor(void *mem, int size, void % static int % pipe_zone_init(void *mem, int size, int flags) % { % -=09struct pipepair *pp; % +=09struct umapipe *up; % +=09struct pipeinfo *pip; % +=09struct timespec ctime; %=20 % -=09KASSERT(size =3D=3D sizeof(*pp), ("pipe_zone_init: wrong size")); % +=09KASSERT(size =3D=3D sizeof(*up), ("pipe_zone_init: wrong size")); %=20 % -=09pp =3D (struct pipepair *)mem; % +=09up =3D (struct umapipe *)mem; % +=09vfs_timestamp(&ctime); % +=09pip =3D &up->pip[0]; % +=09pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime =3D ctime; % +=09pip->pi_ino =3D -1; %=20 % -=09mtx_init(&pp->pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RECURSE); % +=09pip =3D &up->pip[1]; % +=09pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime =3D ctime; % +=09pip->pi_ino =3D -1; % + % +=09mtx_init(&up->pp.pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RECURSE); % =09return (0); Timestamps seem to be broken. jhb pointed out a problem in them, without much detail (but I forget the exact detail). Here it can't be right to have timestamps on both sides, since timestamps are a property of the file at its lowest level, not the file descriptor or even the file at the open file (fcntl) level. For fifos, timestamps are even more a property of the file. % ... % +static int % +fifo_zone_init(void *mem, int size, int flags) % +{ % +=09struct umafifo *up; % +=09struct pipeinfo *pip; % + % +=09KASSERT(size =3D=3D sizeof(*up), ("fifo_zone_init: wrong size")); % + % +=09up =3D (struct umafifo *)mem; % +=09pip =3D &up->pip[0]; % +=09vfs_timestamp(&pip->pi_ctime); % +=09pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime; For fifos, is wrong to have even 1 timestamp at this level (except unused ones won't hurt). Fifos and their timstamps persist as disk files, and the timestamps for these disk files are managed by the underlying file system. Any timestamps at this level can only give possibilities for inconsistencies. For example, this level uses vfs_timestamp(), but the file system level might use a different timestamp method, either because it is buggy or because it cannot represent timestamps with the granularity that vfs_timestamp() gives. (It is a bug that vfs_timestamp() is global.) % @@ -1219,7 +1403,7 @@ pipe_write(fp, uio, active_cred, flags,=20 % =09} %=20 % =09if (error =3D=3D 0) % -=09=09vfs_timestamp(&wpipe->pipe_mtime); % +=09=09vfs_timestamp(&pip->pi_mtime); %=20 % =09/* % =09 * We have something to offer, It's doing timstamps in the same way for fifos as for pipes. There must by a problem for stat() too. The old fifo code uses fo_stat() to get back to the underlying file system which knows all the attributes (not just timestamps). I can't see anything like that here. The new fifo code seems to just use pipe_stat() which gives many fake attributes which are likely to differ from ones in the file system. % @@ -1492,12 +1726,12 @@ pipe_free_kmem(cpipe) % * shutdown the pipe % */ % static void % -pipeclose(cpipe) % +pipeclose(cpipe, isfifo) % =09struct pipe *cpipe; % +=09int isfifo; % { I don't see any reason to have a different zone for fifos. This complicates some interfaces ... % @@ -1570,21 +1798,34 @@ pipeclose(cpipe) % #ifdef MAC % =09=09mac_pipe_destroy(pp); % #endif % -=09=09uma_zfree(pipe_zone, cpipe->pipe_pair); % +=09=09uma_zfree(isfifo ? fifo_zone : pipe_zone, cpipe->pipe_pair); The new isfifo parameter is only used here. I think the separate fifo zone just lets you see the memory usage for fifos separately. In the old version, this was mixed up with socket memory usage and thus even harder to separate. But this is cosmetic. There seem to be no changes for actual control of the memory usage. This gives some minor breakages: - there is a limit on pipe kva. This now applies to fifos too (?). I sometimes find the old limit too small. It might need to be increased. - there seems to be no limit on pipe actual memory, except implicit ones from limiting pipe kva - there are too many sysctls for controlling and reporting pipe kva and other things. These give more and/or different detail than the pipe zone statistics, but they are not duplicated for fifos. Good, since there are already too many. - there is a resource limit on socket real memory that used to apply to fifos too. It might need to be decreased, to correspond to the move to the pipe kva limit. % diff -r fee0771aad22 sys/sys/pipe.h % --- a/sys/sys/pipe.h=09Sun Jan 01 19:00:13 2012 +0100 % +++ b/sys/sys/pipe.h=09Mon Jan 09 00:13:55 2012 +0100 % @@ -28,6 +28,8 @@ % #error "no user-servicable parts inside" % #endif %=20 % +#include % + This namespace pollution was intentionally left out. % /* % * Pipe buffer size, keep moderate in value, pipes take kva space. % */ % @@ -103,16 +105,12 @@ struct pipe { % =09struct=09pipebuf pipe_buffer;=09/* data storage */ % =09struct=09pipemapping pipe_map;=09/* pipe mapping for direct I/O */ % =09struct=09selinfo pipe_sel;=09/* for compat with select */ was a prerequisite for this file. % -=09struct=09timespec pipe_atime;=09/* time of last access */ % -=09struct=09timespec pipe_mtime;=09/* time of last modify */ % -=09struct=09timespec pipe_ctime;=09/* time of status change */ % =09struct=09sigio *pipe_sigio;=09/* information for async I/O */ % =09struct=09pipe *pipe_peer;=09/* link with other direction */ % =09struct=09pipepair *pipe_pair;=09/* container structure pointer */ % =09u_int=09pipe_state;=09=09/* pipe status info */ % =09int=09pipe_busy;=09=09/* busy flag, mostly to handle rundown sanely *= / % =09int=09pipe_present;=09=09/* still present? */ % -=09ino_t=09pipe_ino;=09=09/* fake inode for stat(2) */ Both this and the timestamps should never have been here, since they are per-"disk"-file but they were per-pipe-endpoint (see below). % }; %=20 % /* % @@ -138,5 +136,24 @@ struct pipepair { % #define PIPE_UNLOCK(pipe)=09mtx_unlock(PIPE_MTX(pipe)) % #define PIPE_LOCK_ASSERT(pipe, type) mtx_assert(PIPE_MTX(pipe), (type)) %=20 % +#define PIPE_CNT(pipe)=09((pipe->pipe_state & PIPE_DIRECTW) ? \ % +=09=09pipe->pipe_map.cnt : pipe->pipe_buffer.cnt) % + % +/* % + * Per-file descriptor structure. % + */ I was very confused by this comment. It is sort of backwards. This structure is for the lowest level, which is the pipepair level for pipes and the disk level for for fifos. (But we have to expand the disk level, first from dinodes/directory entries to inodes/directory blocks, then from inodes to vnodes, then append pipepairs). The open file level is a level or two above that, and the file descriptor level is further above. The levels are stacked even more confusingly for pipes. Above the pipepair level, there is the pipe level, but this is modes sideways than fully above. Open files are more at the level of pipes than pipepairs. "pipe" in normal usage can mean either "pipe" or "pipepair" in this implementation.) The timestamps and inode were at a wrong level. You made a step towards fixing this by moving them down, but the comment says that they are now at the highest level. % +struct pipeinfo { % + =09struct=09pipe=09*pi_rpipe;=09/* pipe we read from */ % + =09struct=09pipe=09*pi_wpipe;=09/* pipe we write to */ % +=09struct=09timespec pi_atime;=09/* time of last access */ % +=09struct=09timespec pi_mtime;=09/* time of last modify */ % +=09struct=09timespec pi_ctime;=09/* time of status change */ % +=09ino_t=09pi_ino;=09=09/* fake pipe inode for stat(2) */ Indentation error. % +}; I was confused by the layering for this struct too. This struct seems to be needed only to swap rpipe with wpipe for the 2 ends of a pipe. This is confusing, but I can't see any better way at the moment. Putting the other fields in it just gives confusion which leads to bugs and minor resource wastage. All the other fields must be per-file at the lowest level, so any duplication of them gives either bugs (if they are different) or just wastes resources time to write them and check that they are the same, and and space to hold copies). These fields belong in the pipepair struct. This moves them down (more sideways) another level. POSIX is fuzzy about whether the attributes are unique for the 2 ends of a pipe. It requires all st_ timestamps and some other attributes to be "meaningful" unless otherwise specified and doesn't specify anything otherwise for pipes, at least in an old draft. But for pipe() it says: 27884 Upon successful completion, pipe( ) shall mark for update = the st_atime, st_ctime, and st_mtime 27885 fields of the pipe. Assuming that this part is not fuzzy, "the ... st_atime... fields of the pipe" in it must refer to unique fields. Similarly for pi_ino. These fields shouldn't be used at all for fifos, as mentioned above. File systems still maintain separate non-copies which pipe_stat() hides. This doesn't matter much for timestamps and st_ino. It matters for modes and permissions, etc. After moving timestamps and pi_ino into the pipepair struct, the new info struct is reduced to 2 pointers into the pipepair struct. Unfortunately, the pointer to the pipeinfo struct cannot be simply replaced by a pointer to the pipepair struct (and dereferencing the latter instead of the former) because of complications for the separate ends of a pipe: @ @@ -372,17 +554,17 @@ kern_pipe(struct thread *td, int fildes[ @ =09 * to avoid races against processes which manage to dup() the read @ =09 * side while we are blocked trying to allocate the write side. @ =09 */ @ -=09finit(rf, FREAD | FWRITE, DTYPE_PIPE, rpipe, &pipeops); @ +=09finit(rf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[0], &pipeops); @ =09error =3D falloc(td, &wf, &fd, 0); @ =09if (error) { @ =09=09fdclose(fdp, rf, fildes[0], td); @ =09=09fdrop(rf, td); @ =09=09/* rpipe has been closed by fdrop(). */ @ -=09=09pipeclose(wpipe); @ +=09=09pipeclose(wpipe, 0); @ =09=09return (error); @ =09} @ =09/* An extra reference on `wf' has been held for us by falloc(). */ @ -=09finit(wf, FREAD | FWRITE, DTYPE_PIPE, wpipe, &pipeops); @ +=09finit(wf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[1], &pipeops); One end used to get rpipe and the other end wpipe. This is used mainly by read/write to go in the correct direction. Now, the 2 ends get pointers to different pipeinfo structs, with the only differences in the pipeinfo structs being: - swap rpipe and wpipe. This is used mainly by read/write as before - different timestamps (to implement bugs and resource wastage) - pi_ino should be the same in both (just waste space). For fifos, rpipe =3D=3D wpipe so the 2 pipeinfo structs should be the same and the complications are not needed. I always get confused about the plumbing for the 4 directions (2 directions for 2 pipe fd's with bidirectional pipes), but it seems that we have lots of complexity to support bidirectional pipes which "no one" uses. There are some more complications resource wastages related to having 2 pipe ends when only 1 is needed: - struct pipe is in struct pipepair twice. This can't quite be fixed by moving it to struct pipeinfo. If both pipe structs for malloc()ed, then the 2 pointers in struct pipeinfo would be enough for accessing them, but I think the space wastage is too small to fix like this. - some code was symmetrical relative to rpipe and wpipe and it didn't care in which order they were in. Now rpipe can be equal to wpipe, some of this code is no longer symmetrical because it does things like free(rpipe), while the rest of it is still symmetrical because it does things like initalizing rpipe->foo to the same value as wpipe->foo. You had to make some changes in this area. Example of a change in this area: % @@@ -349,18 +473,76 @@ kern_pipe(struct thread *td, int fildes[ % @ =09/* Only the forward direction pipe is backed by default */ % @ =09if ((error =3D pipe_create(rpipe, 1)) !=3D 0 || % @ =09 (error =3D pipe_create(wpipe, 0)) !=3D 0) { % @-=09=09pipeclose(rpipe); % @-=09=09pipeclose(wpipe); % @+=09=09pipeclose(rpipe, isfifo); % @+=09=09pipeclose(wpipe, isfifo); Not really symmetrical. There must be 2 closes if there are 2 open fd's (1 in each direction). But what if there is only 1 open fd for a fifo with rpipe =3D wpipe? pipe_dtor() is more obviously correct, since it only does 1 pipeclose() for fifos. The new isfifo flag is only used in pipeclose() to select the zone. % @ =09=09return (error); % @ =09} % @=20 % @ =09rpipe->pipe_state |=3D PIPE_DIRECTOK; % @ =09wpipe->pipe_state |=3D PIPE_DIRECTOK; % @=20 % @+=09if (isfifo) { % @+=09=09up->pip[0].pi_rpipe =3D rpipe; % @+=09=09up->pip[0].pi_wpipe =3D wpipe; % @+=09} else { % @+=09=09up->pip[0].pi_rpipe =3D rpipe; % @+=09=09up->pip[0].pi_wpipe =3D rpipe; % @+=09=09up->pip[1].pi_rpipe =3D wpipe; % @+=09=09up->pip[1].pi_wpipe =3D wpipe; % @+=09} This seems slightly broken. Fifos should only use 1 pipeinfo struct, but other fifo code initializes both up->pip[0] and up->pip[1]. Having only 1 pipe end initialized would destroy any remaining symmetry but would make it clear which 1 is actually used. % + % +extern struct fileops pipeops; % + % +int pipe_ctor(struct pipeinfo **ppip, struct thread *td); % +void pipe_dtor(struct pipeinfo *pip); Indentation errors. %=20 % #endif /* !_SYS_PIPE_H_ */ Bruce --0-1060448873-1326188475=:943-- From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 09:54:16 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94BCB106564A; Tue, 10 Jan 2012 09:54:16 +0000 (UTC) (envelope-from lists@yamagi.org) Received: from mail.yamagi.org (unknown [IPv6:2a01:4f8:121:2102:1::7]) by mx1.freebsd.org (Postfix) with ESMTP id 2A4868FC12; Tue, 10 Jan 2012 09:54:15 +0000 (UTC) Received: from happy.home.yamagi.org (g231180128.adsl.alicedsl.de [92.231.180.128]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.yamagi.org (Postfix) with ESMTPSA id 6F6541666334; Tue, 10 Jan 2012 10:54:13 +0100 (CET) Date: Tue, 10 Jan 2012 10:54:03 +0100 From: Yamagi Burmeister To: kostikbel@gmail.com Message-Id: <20120110105403.f5a425a4.lists@yamagi.org> In-Reply-To: <20120109202214.GZ31224@deviant.kiev.zoral.com.ua> References: <20111226220756.GR50300@deviant.kiev.zoral.com.ua> <20120102063700.GF50300@deviant.kiev.zoral.com.ua> <20120108174112.50e030ba.lists@yamagi.org> <20120108195913.GI31224@deviant.kiev.zoral.com.ua> <20120109103747.578d4e44.lists@yamagi.org> <20120109202214.GZ31224@deviant.kiev.zoral.com.ua> X-Mailer: Sylpheed 3.1.2 (GTK+ 2.24.6; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg="PGP-SHA1"; boundary="Signature=_Tue__10_Jan_2012_10_54_03_+0100_L.B.l=db+vIoMNs1" Cc: amd64@freebsd.org, arch@freebsd.org, marius@freebsd.org, flo@freebsd.org Subject: Re: AVX X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 09:54:16 -0000 --Signature=_Tue__10_Jan_2012_10_54_03_+0100_L.B.l=db+vIoMNs1 Content-Type: text/plain; charset=US-ASCII Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, 9 Jan 2012 22:22:14 +0200 Kostik Belousov wrote: > > > > CPU: Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz (2194.55-MHz > > > > K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Family =3D= 6 Model > > > > =3D 17 Stepping =3D 10 > > > > Features=3D0xbfebfbff > > > MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,P= BE> > > > > Features2=3D0x408e3bd > > > PDCM,SSE4.1,XSAVE> > > > > AMD Features=3D0x20100800 AMD Features2=3D0x1= =20 > > > Is this Features excerpt from the patched kernel, or from pristine svn > > > sources ? If the later, please show me the Features from the patched > > > kernel. > >=20 > > That was the output of the patched kernel. > I see. The issue was that cpu_feature2, which is decoded to print the > Features2 line, is retrieved much earlier then XCR0 is updated by > fpu initialization code. I added a cludge to reload cpu_feature2 if > XSAVE was indeed enabled. So now you should see OSXSAVE reported if > XSAVE is indeed enabled. Seems to be okay now: Features2=3D0xc08e3bd --=20 Homepage: www.yamagi.org XMPP: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB --Signature=_Tue__10_Jan_2012_10_54_03_+0100_L.B.l=db+vIoMNs1 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk8MCsQACgkQWTjlg++8y8trmgCg5P3Gu9JU342L+r2YS4S8j8lF bX0AoNp3UmN9nZnTGN47QELh2c0xdHXQ =rMlb -----END PGP SIGNATURE----- --Signature=_Tue__10_Jan_2012_10_54_03_+0100_L.B.l=db+vIoMNs1-- From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 12:04:27 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C8A4106566B; Tue, 10 Jan 2012 12:04:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id BC8958FC08; Tue, 10 Jan 2012 12:04:25 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0AC4Img016496 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Jan 2012 23:04:19 +1100 Date: Tue, 10 Jan 2012 23:04:18 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20120110153807.H943@besplex.bde.org> Message-ID: <20120110211510.T1676@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: flo@freebsd.org, Giovanni Trematerra , Attilio Rao , Konstantin Belousov , freebsd-arch@freebsd.org, jilles@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 12:04:27 -0000 On Tue, 10 Jan 2012, Bruce Evans wrote: > I think you don't want me to read the patch, since I would see too much > detail starting with style bugs. Anyway.. > ... One more set of details. % -static int % -fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct thread *td) % -{ % - struct fifoinfo *fip; % - struct file filetmp; % - int levents, revents = 0; % - % - fip = fp->f_data; % - levents = events & % - (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLRDBAND); % - if ((fp->f_flag & FREAD) && levents) { % - filetmp.f_data = fip->fi_readsock; % - filetmp.f_cred = cred; % - mtx_lock(&fifo_mtx); % - if (fp->f_seqcount == fip->fi_wgen) % - levents |= POLLINIGNEOF; % - mtx_unlock(&fifo_mtx); % - revents |= soo_poll(&filetmp, levents, cred, td); % - } % - levents = events & (POLLOUT | POLLWRNORM | POLLWRBAND); % - if ((fp->f_flag & FWRITE) && levents) { % - filetmp.f_data = fip->fi_writesock; % - filetmp.f_cred = cred; % - revents |= soo_poll(&filetmp, levents, cred, td); % - } % - return (revents); % -} This was reasonably clean. My version is cleaner: - POLLIGNEOF is an old mistake of mine. I tried to kill it, but kib@ propagated it to sys_pipe.c too, where it has survived another release or two. In my version, I still have it in the call to soo_poll() but don't have it in the `levents = events & ...' mask. Thus it is a pure kernel flag, and acts the same as your isfifo flag -- it tells the socket layer to do something unusual because this is a fifo. It is not needed any more, since the pipe layer is close to the fifo layer so it can just do something unusual. It can determine whether the pipe is a fifo without passing around flags (the flag should be in pipe_state). - My version is missing the FREAD and FWRITE checks. These seem to be necessary, but I think they don't belong at this level. Also, the error handling for them seems quite broken (nonexistent). I think POLLERR is supposed to be returned for attempts to poll for an impossible condition, but the FREAD and FWRITE checks give a return of 0. And returning 0 is much worse than returning success, since it will cause at least poll() to block() when it should return, Here is the commit that added these checks: % ---------------------------- % revision 1.118 % date: 2005/09/12 10:16:18; author: rwatson; state: Exp; lines: +2 -2 % Only poll the fifo for read events if the fifo is attached to a readable % file descriptor. Otherwise, the read end of a fifo might return that it % is writable (which it isn't). But it should return (with POLLERR). This is an error condition and should be detected. POSIX is fuzzy about this. It only says that POLLERR is for when an error occurred. It defines the POLLNVAL error clearly as meaning that the fd is invalid. Well that is not so clear. A non-open fd is clearly invalid. This is handled in upper layers. Polling for a direction that can't work can be considered as an invalid fd too, unless "invalid" has its technical meaning. Linux-2.6.10 sets POLLERR for reading from a pipe or fifo with no readers, and has an XXX comment saying that most Unices don't do this for fifos. This seems wrong to me, and FreeBSD doesn't do it for any of pipes, fifos or sockets. But for pipes, there is tricky EOF handling associated with this condition. I can't see anywhere where Linux gives this based on the open mode. % % Only poll the fifo for write events if the fifo attached to a writable % file descriptor. Otherwise, the write end of a fifo might return that % it is readable (which it isn't). Seems to be necessary too. I can't see anywhere where Linux returns POLLERR for i/o errors or unwritable files. % % In the event that a file is FREAD|FWRITE (which is allowed by POSIX, but % has undefined behavior), we poll for both. % % MFC after: 3 days % ---------------------------- select() is interestingly different than poll(). It can't return POLLERR. Thus, the old broken behaviour gave the best close to possible behaviour for select() at the usual level. The POLLERR's should make it return success, and the false successes in the kernel would have done the same. Only cases where there were no false successes in the kernel were broken. % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td) % struct ucred *active_cred; % struct thread *td; % { % - struct pipe *rpipe = fp->f_data; % + struct pipeinfo *pip = fp->f_data; % + struct pipe *rpipe; % struct pipe *wpipe; % int revents = 0; % #ifdef MAC % int error; % #endif % % - wpipe = rpipe->pipe_peer; % + rpipe = pip->pi_rpipe; % + wpipe = pip->pi_wpipe->pipe_peer; % PIPE_LOCK(rpipe); % #ifdef MAC % error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair); % if (error) % - goto locked_error; % + return (0); Seems to be broken. The unlock is now missing. I don't like defaults set by initializations in declarations 'revents = 0'. Both the default and the return of 0 here seem to be wrong. This is an error condition, so I think POLLERR should be returned, as about. Otherwise, poll() will probably block. And the block is not just transient, at least in the above since the error condition can never go away. You will only be saved from blocking forever if there is success on some other file descriptor or event. % #endif % - if (events & (POLLIN | POLLRDNORM)) % - if ((rpipe->pipe_state & PIPE_DIRECTW) || % - (rpipe->pipe_buffer.cnt > 0)) % - revents |= events & (POLLIN | POLLRDNORM); % + if (fp->f_flag & FREAD) { % + if (events & (POLLIN | POLLRDNORM)) % + if ((rpipe->pipe_state & PIPE_DIRECTW) || % + (rpipe->pipe_buffer.cnt > 0)) % + revents |= events & (POLLIN | POLLRDNORM); The change in fifos_vnops.c was done cleanly by adding the FREAD check to the events mask check. With fifos now polled here, it is needed (modulo bugs) here too. But here it makes the important changes for fifos, if any, unreadable by indenting everything. % - if (events & (POLLOUT | POLLWRNORM)) % - if (wpipe->pipe_present != PIPE_ACTIVE || % - (wpipe->pipe_state & PIPE_EOF) || % - (((wpipe->pipe_state & PIPE_DIRECTW) == 0) && % - ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= PIPE_BUF || % - wpipe->pipe_buffer.size == 0))) % - revents |= events & (POLLOUT | POLLWRNORM); % + PIPE_UNLOCK(rpipe); % + if (fifo_iseof(fp)) % + events |= POLLINIGNEOF; % + PIPE_LOCK(rpipe); % % - if ((events & POLLINIGNEOF) == 0) { % - if (rpipe->pipe_state & PIPE_EOF) { % - revents |= (events & (POLLIN | POLLRDNORM)); % - if (wpipe->pipe_present != PIPE_ACTIVE || % - (wpipe->pipe_state & PIPE_EOF)) % - revents |= POLLHUP; % + if ((events & POLLINIGNEOF) == 0) { % + if (rpipe->pipe_state & PIPE_EOF) { % + revents |= (events & (POLLIN | POLLRDNORM)); % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF)) % + revents |= POLLHUP; % + } % } % } % + if (fp->f_flag & FWRITE) % + if (events & (POLLOUT | POLLWRNORM)) % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF) || % + (((wpipe->pipe_state & PIPE_DIRECTW) == 0) && % + ((wpipe->pipe_buffer.size - wpipe->pipe_buffer.cnt) >= % + PIPE_BUF || wpipe->pipe_buffer.size == 0))) % + revents |= events & (POLLOUT | POLLWRNORM); % % if (revents == 0) { % - if (events & (POLLIN | POLLRDNORM)) { % - selrecord(td, &rpipe->pipe_sel); % - if (SEL_WAITING(&rpipe->pipe_sel)) % - rpipe->pipe_state |= PIPE_SEL; % - } % + if (fp->f_flag & FREAD) % + if (events & (POLLIN | POLLRDNORM)) { % + selrecord(td, &rpipe->pipe_sel); % + if (SEL_WAITING(&rpipe->pipe_sel)) % + rpipe->pipe_state |= PIPE_SEL; % + } % % - if (events & (POLLOUT | POLLWRNORM)) { % - selrecord(td, &wpipe->pipe_sel); % - if (SEL_WAITING(&wpipe->pipe_sel)) % - wpipe->pipe_state |= PIPE_SEL; % - } % + if (fp->f_flag & FWRITE) % + if (events & (POLLOUT | POLLWRNORM)) { % + selrecord(td, &wpipe->pipe_sel); % + if (SEL_WAITING(&wpipe->pipe_sel)) % + wpipe->pipe_state |= PIPE_SEL; % + } % } % -#ifdef MAC % -locked_error: % -#endif % PIPE_UNLOCK(rpipe); % % return (revents); It seems that not much really changed here. To avoid indentation and fix bugs, the FREAD and FWRITE checks should be done up front. I think they can be done before locking and mac checking. Something like: if ((fp->f_flag & FREAD) && (events & (POLLIN | POLLRDNORM)) return (POLLERR); if ((fp->f_flag & FWRITE) && (events & (POLLOUT | POLLWRNORM)) return (POLLERR); if (events & POLLINIGNEOF) return (POLLER); /* try to kill this too */ Since the diff for pipe_poll() was unreadable, here it is again with the old lines removed. A few more problems are now obvious: % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td) % struct ucred *active_cred; % struct thread *td; % { % + struct pipeinfo *pip = fp->f_data; % + struct pipe *rpipe; % struct pipe *wpipe; % int revents = 0; % #ifdef MAC % int error; % #endif % % + rpipe = pip->pi_rpipe; % + wpipe = pip->pi_wpipe->pipe_peer; % PIPE_LOCK(rpipe); % #ifdef MAC % error = mac_pipe_check_poll(active_cred, rpipe->pipe_pair); % if (error) % + return (0); % #endif % + if (fp->f_flag & FREAD) { % + if (events & (POLLIN | POLLRDNORM)) % + if ((rpipe->pipe_state & PIPE_DIRECTW) || % + (rpipe->pipe_buffer.cnt > 0)) % + revents |= events & (POLLIN | POLLRDNORM); % This style bug (extra blank line) was common in old code. It helps make the diffs unreadable too. % + PIPE_UNLOCK(rpipe); % + if (fifo_iseof(fp)) % + events |= POLLINIGNEOF; % + PIPE_LOCK(rpipe); This is new code (needed to force POLLIGNEOF for fifos). It is a layering violation to call the fifo code for non-fifos here. fifo_iseof() handles this internally by checking fp->vnode->v_fifoinfo. The pipe layer should know if it is dealing with a fifo in a better way than that. I don't like unlocking in the middle in general, and here it gives races. We will miss setting POLLIN | POLLRDNORM for certain changes if they weren't set earlier and the state changed while unlocked. Why unlock anyway or lock in fifo_iseof()? Only fi_seqcount == fi_wgen is checked under the lock there. Races in that check are probably just as harmless as races here. And locking doesn't even prevent them, since if fi_seqcount or fi_wgen can change underneath us, they can also change just after we check them. They rarely change compared with the buffer count raced with above. % % + if ((events & POLLINIGNEOF) == 0) { % + if (rpipe->pipe_state & PIPE_EOF) { % + revents |= (events & (POLLIN | POLLRDNORM)); % + if (wpipe->pipe_present != PIPE_ACTIVE || % + (wpipe->pipe_state & PIPE_EOF)) % + revents |= POLLHUP; % + } This is old code, reindented. It was not needed, since it used to just check for the POLLINIGNEOF mistake in the user events. Now it is needed to give the modified (POLLINIGNEOF) semantics from the kernel flag for fifos. It is much uglier than the corresponding code in the old fifo_poll_f(). That begins with putting the relevant user events in levents. So that it doesn't have to repeat the long mask expressions. Well, that's about the limits of the cleanups. Something like the above is still needed to give the semantics change. The socket layer still has code that corresponds exactly to the above. It is now not needed, since it now only supports the POLLINIGNEOF mistake in the user events. One copy of this code is bad enough. Bruce From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 13:19:21 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13111106566B; Tue, 10 Jan 2012 13:19:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C5F2F8FC0A; Tue, 10 Jan 2012 13:19:20 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 69CC046B2C; Tue, 10 Jan 2012 08:19:20 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E5432B91A; Tue, 10 Jan 2012 08:19:19 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 10 Jan 2012 08:19:18 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <86sjjobzmn.fsf@kopusha.home.net> In-Reply-To: <86sjjobzmn.fsf@kopusha.home.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201201100819.18892.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 10 Jan 2012 08:19:20 -0500 (EST) Cc: Mikolaj Golub , arch@freebsd.org, Robert Watson , Kostik Belousov Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 13:19:21 -0000 On Monday, January 09, 2012 11:37:52 am Mikolaj Golub wrote: > Hi, > > There is a longstanding problem with nullfs(5) that is unix sockets do > not work between lower and upper layers. > > See, e.g. kern/51583, kern/159663. > > On a unix socket binding the created socket is referenced in the vnode > field v_socket. This field is used on connect (from the vnode returned > by lookup). Unix socket functions like unp_bind/connect set/access > this field directly. > > This is the issue for nullfs, which uses two-layer vnode approach: > binding to the upper layer, the socket reference is stored in the > upper vnode; binding to the lower fs, the socket reference is stored > in the lower vnode and is not seen from the upper layer. > > E.g. having /mnt/upper nullfs mounted on /mnt/lower: > > 1) if we bind to /mnt/lower/test.sock we can connect only to > /mnt/lower/test.sock. > > 2) if we bind to /mnt/upper/test.sock we can connect only to > /mnt/upper/test.sock. > > The desired behavior is one can connect to both the lower and the > upper paths regardless if we bind to /mnt/lower/test.sock or > /mnt/upeer/test.sock. > > In kern/159663 two approaches were discussed: > > 1) copy the socket pointer from lower vnode to upper vnode on the > upper vnode get (fix the case when one binds to the lower fs and wants > to connect via the upper, but does not fix the case when one binds to > the upper and wants to connect via the lower fs); > > 2) make null_lookup/create return lower vnode for VSOCK vnodes. > > Both approaches have issues and looks rather hackish. > > kib@ suggested that the issue could be fixed if one added new VOP_* > operations for setting and accessing vnode's v_socket field. > > The attached patch implements this. It also can be found here: > > http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch > > It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and > VOP_UNPDETACH. Their purpose can be understood from the modifications > in uipc_usrreq.c: > > - vp->v_socket = unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); > > - so2 = vp->v_socket; > + VOP_UNPCONNECT(vp, &so2); > > - unp->unp_vnode->v_socket = NULL; > + VOP_UNPDETACH(unp->unp_vnode); > > The default functions just do these simple operations, while > filesystems like nullfs can do more complicated things. > > The patch also implements functions for nullfs. By default the old > behavior is preserved. To get the new behaviour the filesystem should > be (re)mounted with sobypass option. Then the socket operations are > bypassed to a lower vnode, which makes the socket be accessible from > both layers. > > I am very interested to hear other people opinion on this. I think this is a decent solution. Why not make the locking notes for VOP_UNPCONNECT() be "L" instead of "E"? A read lock should be sufficient to fetch the socket? In fact, I suspect that unp_connect() could actually use a shared lock on the vnode by adding 'LOCKSHARE' to the flags passed to namei() via NDINIT(). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 13:19:21 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 13111106566B; Tue, 10 Jan 2012 13:19:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id C5F2F8FC0A; Tue, 10 Jan 2012 13:19:20 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 69CC046B2C; Tue, 10 Jan 2012 08:19:20 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E5432B91A; Tue, 10 Jan 2012 08:19:19 -0500 (EST) From: John Baldwin To: freebsd-arch@freebsd.org Date: Tue, 10 Jan 2012 08:19:18 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <86sjjobzmn.fsf@kopusha.home.net> In-Reply-To: <86sjjobzmn.fsf@kopusha.home.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201201100819.18892.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 10 Jan 2012 08:19:20 -0500 (EST) Cc: Mikolaj Golub , arch@freebsd.org, Robert Watson , Kostik Belousov Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 13:19:21 -0000 On Monday, January 09, 2012 11:37:52 am Mikolaj Golub wrote: > Hi, > > There is a longstanding problem with nullfs(5) that is unix sockets do > not work between lower and upper layers. > > See, e.g. kern/51583, kern/159663. > > On a unix socket binding the created socket is referenced in the vnode > field v_socket. This field is used on connect (from the vnode returned > by lookup). Unix socket functions like unp_bind/connect set/access > this field directly. > > This is the issue for nullfs, which uses two-layer vnode approach: > binding to the upper layer, the socket reference is stored in the > upper vnode; binding to the lower fs, the socket reference is stored > in the lower vnode and is not seen from the upper layer. > > E.g. having /mnt/upper nullfs mounted on /mnt/lower: > > 1) if we bind to /mnt/lower/test.sock we can connect only to > /mnt/lower/test.sock. > > 2) if we bind to /mnt/upper/test.sock we can connect only to > /mnt/upper/test.sock. > > The desired behavior is one can connect to both the lower and the > upper paths regardless if we bind to /mnt/lower/test.sock or > /mnt/upeer/test.sock. > > In kern/159663 two approaches were discussed: > > 1) copy the socket pointer from lower vnode to upper vnode on the > upper vnode get (fix the case when one binds to the lower fs and wants > to connect via the upper, but does not fix the case when one binds to > the upper and wants to connect via the lower fs); > > 2) make null_lookup/create return lower vnode for VSOCK vnodes. > > Both approaches have issues and looks rather hackish. > > kib@ suggested that the issue could be fixed if one added new VOP_* > operations for setting and accessing vnode's v_socket field. > > The attached patch implements this. It also can be found here: > > http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch > > It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and > VOP_UNPDETACH. Their purpose can be understood from the modifications > in uipc_usrreq.c: > > - vp->v_socket = unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); > > - so2 = vp->v_socket; > + VOP_UNPCONNECT(vp, &so2); > > - unp->unp_vnode->v_socket = NULL; > + VOP_UNPDETACH(unp->unp_vnode); > > The default functions just do these simple operations, while > filesystems like nullfs can do more complicated things. > > The patch also implements functions for nullfs. By default the old > behavior is preserved. To get the new behaviour the filesystem should > be (re)mounted with sobypass option. Then the socket operations are > bypassed to a lower vnode, which makes the socket be accessible from > both layers. > > I am very interested to hear other people opinion on this. I think this is a decent solution. Why not make the locking notes for VOP_UNPCONNECT() be "L" instead of "E"? A read lock should be sufficient to fetch the socket? In fact, I suspect that unp_connect() could actually use a shared lock on the vnode by adding 'LOCKSHARE' to the flags passed to namei() via NDINIT(). -- John Baldwin From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 14:02:38 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B893106564A; Tue, 10 Jan 2012 14:02:38 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D73B08FC18; Tue, 10 Jan 2012 14:02:37 +0000 (UTC) Received: from [192.168.14.19] (unknown [62.49.66.12]) by cyrus.watson.org (Postfix) with ESMTPSA id D191446B1A; Tue, 10 Jan 2012 09:02:36 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: <86sjjobzmn.fsf@kopusha.home.net> Date: Tue, 10 Jan 2012 14:02:34 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <86sjjobzmn.fsf@kopusha.home.net> To: Mikolaj Golub X-Mailer: Apple Mail (2.1251.1) Cc: arch@freebsd.org, Kostik Belousov Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 14:02:38 -0000 On 9 Jan 2012, at 16:37, Mikolaj Golub wrote: > kib@ suggested that the issue could be fixed if one added new VOP_* > operations for setting and accessing vnode's v_socket field. I like the philosophy of the proposed approach signifiantly better than = the previous discussed approaches. Some thoughts: (1) I don't think the new behaviour should be optional -- it was always = the intent that nullfs pass through all behaviours to the underlying = layer, it's just that certain edge cases didn't appear in the original = implementation. Memory mapping was fixed a few years ago using similar = techniques. This will significantly reduce the complexity of your patch, = and also avoid user confusion since it will now behave "as expected". = Certainly, mention in future release notes would be appropriate, = however. (2) I'd like to think (as John also mentioned?) that we could use a = shared vnode lock when doing read-only access (i.e., connect). (3) With this patch, an rwlock is held over vnode operations -- required = due to the interlocked synchronisation of vnode and unix domain sockets. = We often try to avoid doing this for reasons of lock order (and = principle). It appears that it is likely fine in this case but it makes = me slightly nervous. (4) I'm slightly puzzled by the bind(2) case and interactions with = layering -- possibly there is a bug here. If I issue bind(2) against the = top layer, it looks like vp->v_socket will be set in the bottom layer, = but unp->unp_vnode will be assigned to the top-layer vnode? My = assumption was that you would want unp_vnode to always point to the = bottom (real) vnode, which suggest to me that the VOPs shouldn't just = assign v_socket, but should also assign unp_vnode. This has implications = elsewhere in uipc_usrreq.c as well. Could you clarify whether you think = this could be an issue? It may also be worth KASSERTing that the = top-level vnode never points at anything but NULL to catch bugs like = this. This may mean the VOPs have to have a bit of "test-and-set" to = them to get atomicity properties right when it comes to bind(2). In general, I think this is the right thing to do, and I'm very pleased = you're doing it -- but the patch requires some further work. Robert > The attached patch implements this. It also can be found here: >=20 > http://people.freebsd.org/~trociny/nullfs.VOP_UNP.4.patch >=20 > It adds three VOP_* operations: VOP_UNPBIND, VOP_UNPCONNECT and > VOP_UNPDETACH. Their purpose can be understood from the modifications > in uipc_usrreq.c: >=20 > - vp->v_socket =3D unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); >=20 > - so2 =3D vp->v_socket; > + VOP_UNPCONNECT(vp, &so2); >=20 > - unp->unp_vnode->v_socket =3D NULL; > + VOP_UNPDETACH(unp->unp_vnode); >=20 > The default functions just do these simple operations, while > filesystems like nullfs can do more complicated things. >=20 > The patch also implements functions for nullfs. By default the old > behavior is preserved. To get the new behaviour the filesystem should > be (re)mounted with sobypass option. Then the socket operations are > bypassed to a lower vnode, which makes the socket be accessible from > both layers. >=20 > I am very interested to hear other people opinion on this. >=20 > --=20 > Mikolaj Golub >=20 > Index: sys/sys/vnode.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/sys/vnode.h (revision 229701) > +++ sys/sys/vnode.h (working copy) > @@ -695,6 +695,9 @@ int vop_stdpathconf(struct vop_pathconf_args = *); > int vop_stdpoll(struct vop_poll_args *); > int vop_stdvptocnp(struct vop_vptocnp_args *ap); > int vop_stdvptofh(struct vop_vptofh_args *ap); > +int vop_stdunpbind(struct vop_unpbind_args *ap); > +int vop_stdunpconnect(struct vop_unpconnect_args *ap); > +int vop_stdunpdetach(struct vop_unpdetach_args *ap); > int vop_eopnotsupp(struct vop_generic_args *ap); > int vop_ebadf(struct vop_generic_args *ap); > int vop_einval(struct vop_generic_args *ap); > Index: sys/kern/uipc_usrreq.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/uipc_usrreq.c (revision 229701) > +++ sys/kern/uipc_usrreq.c (working copy) > @@ -542,7 +542,7 @@ restart: >=20 > UNP_LINK_WLOCK(); > UNP_PCB_LOCK(unp); > - vp->v_socket =3D unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); > unp->unp_vnode =3D vp; > unp->unp_addr =3D soun; > unp->unp_flags &=3D ~UNP_BINDING; > @@ -638,7 +638,7 @@ uipc_detach(struct socket *so) > * XXXRW: Should assert vp->v_socket =3D=3D so. > */ > if ((vp =3D unp->unp_vnode) !=3D NULL) { > - unp->unp_vnode->v_socket =3D NULL; > + VOP_UNPDETACH(vp); > unp->unp_vnode =3D NULL; > } > unp2 =3D unp->unp_conn; > @@ -1308,7 +1308,7 @@ unp_connect(struct socket *so, struct sockaddr = *na > * and to protect simultaneous locking of multiple pcbs. > */ > UNP_LINK_WLOCK(); > - so2 =3D vp->v_socket; > + VOP_UNPCONNECT(vp, &so2); > if (so2 =3D=3D NULL) { > error =3D ECONNREFUSED; > goto bad2; > Index: sys/kern/vfs_default.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/vfs_default.c (revision 229701) > +++ sys/kern/vfs_default.c (working copy) > @@ -123,6 +123,9 @@ struct vop_vector default_vnodeops =3D { > .vop_unlock =3D vop_stdunlock, > .vop_vptocnp =3D vop_stdvptocnp, > .vop_vptofh =3D vop_stdvptofh, > + .vop_unpbind =3D vop_stdunpbind, > + .vop_unpconnect =3D vop_stdunpconnect, > + .vop_unpdetach =3D vop_stdunpdetach, > }; >=20 > /* > @@ -1037,6 +1040,39 @@ vop_stdadvise(struct vop_advise_args *ap) > return (error); > } >=20 > +int > +vop_stdunpbind(struct vop_unpbind_args *ap) > +{ > + struct vnode *vp; > + > + vp =3D ap->a_vp; > + > + vp->v_socket =3D ap->a_socket; > + return (0); > +} > + > +int > +vop_stdunpconnect(struct vop_unpconnect_args *ap) > +{ > + struct vnode *vp; > + > + vp =3D ap->a_vp; > + > + *ap->a_socket =3D vp->v_socket; > + return (0); > +} > + > +int > +vop_stdunpdetach(struct vop_unpdetach_args *ap) > +{ > + struct vnode *vp; > + > + vp =3D ap->a_vp; > + > + vp->v_socket =3D NULL; > + return (0); > +} > + > /* > * vfs default ops > * used to fill the vfs function table to get reasonable default = return values. > Index: sys/kern/vnode_if.src > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/vnode_if.src (revision 229701) > +++ sys/kern/vnode_if.src (working copy) > @@ -639,3 +639,23 @@ vop_advise { > IN off_t end; > IN int advice; > }; > + > +%% unpbind vp E E E > + > +vop_unpbind { > + IN struct vnode *vp; > + IN struct socket *socket; > +}; > + > +%% unpconnect vp E E E > + > +vop_unpconnect { > + IN struct vnode *vp; > + OUT struct socket **socket; > +}; > + > +%% unpdetach vp E E E > + > +vop_unpdetach { > + IN struct vnode *vp; > +}; > Index: sys/fs/nullfs/null.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/fs/nullfs/null.h (revision 229701) > +++ sys/fs/nullfs/null.h (working copy) > @@ -37,8 +37,15 @@ > struct null_mount { > struct mount *nullm_vfs; > struct vnode *nullm_rootvp; /* Reference to root null_node = */ > + uint64_t nullm_flags; /* nullfs options specific for = mount */ > }; >=20 > +/* > + * Flags stored in nullm_flags. > + */ > +#define NULLMNT_SOBYPASS 0x00000001 /* Bypass unix = socket operations > + to lower vnode */ > + > #ifdef _KERNEL > /* > * A cache of vnode references > @@ -47,8 +54,16 @@ struct null_node { > LIST_ENTRY(null_node) null_hash; /* Hash list */ > struct vnode *null_lowervp; /* VREFed once */ > struct vnode *null_vnode; /* Back pointer */ > + u_int null_flags; /* Flags */ > }; >=20 > +/* > + * Flags stored in null_flags. > + */ > + > +#define NULL_SOBYPASS 0x00000001 /* Bypass unix socket = operations > + to lower vnode */ > + > #define MOUNTTONULLMOUNT(mp) ((struct null_mount = *)((mp)->mnt_data)) > #define VTONULL(vp) ((struct null_node *)(vp)->v_data) > #define NULLTOV(xp) ((xp)->null_vnode) > Index: sys/fs/nullfs/null_vnops.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/fs/nullfs/null_vnops.c (revision 229701) > +++ sys/fs/nullfs/null_vnops.c (working copy) > @@ -812,6 +812,52 @@ null_vptocnp(struct vop_vptocnp_args *ap) > return (error); > } >=20 > +static int > +null_unpbind(struct vop_unpbind_args *ap) > +{ > + struct vnode *vp; > + struct null_node *xp; > + struct null_mount *xmp; > + > + vp =3D ap->a_vp; > + xp =3D VTONULL(vp); > + xmp =3D MOUNTTONULLMOUNT(vp->v_mount); > + if (xmp->nullm_flags & NULLMNT_SOBYPASS) { > + xp->null_flags |=3D NULL_SOBYPASS; > + return (null_bypass((struct vop_generic_args *)ap)); > + } else { > + return (vop_stdunpbind(ap)); > + } > +} > + > +static int > +null_unpconnect(struct vop_unpconnect_args *ap) > +{ > + struct vnode *vp; > + struct null_mount *xmp; > + > + vp =3D ap->a_vp; > + xmp =3D MOUNTTONULLMOUNT(vp->v_mount); > + if (xmp->nullm_flags & NULLMNT_SOBYPASS) > + return (null_bypass((struct vop_generic_args *)ap)); > + else > + return (vop_stdunpconnect(ap)); > +} > + > +static int > +null_unpdetach(struct vop_unpdetach_args *ap) > +{ > + struct vnode *vp; > + struct null_node *xp; > + > + vp =3D ap->a_vp; > + xp =3D VTONULL(vp); > + if (xp->null_flags & NULL_SOBYPASS) > + return (null_bypass((struct vop_generic_args *)ap)); > + else > + return (vop_stdunpdetach(ap)); > +} > + > /* > * Global vfs data structures > */ > @@ -837,4 +883,7 @@ struct vop_vector null_vnodeops =3D { > .vop_unlock =3D null_unlock, > .vop_vptocnp =3D null_vptocnp, > .vop_vptofh =3D null_vptofh, > + .vop_unpbind =3D null_unpbind, > + .vop_unpconnect =3D null_unpconnect, > + .vop_unpdetach =3D null_unpdetach, > }; > Index: sys/fs/nullfs/null_subr.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/fs/nullfs/null_subr.c (revision 229701) > +++ sys/fs/nullfs/null_subr.c (working copy) > @@ -235,6 +235,7 @@ null_nodeget(mp, lowervp, vpp) >=20 > xp->null_vnode =3D vp; > xp->null_lowervp =3D lowervp; > + xp->null_flags =3D 0; > vp->v_type =3D lowervp->v_type; > vp->v_data =3D xp; > vp->v_vnlock =3D lowervp->v_vnlock; > Index: sys/fs/nullfs/null_vfsops.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/fs/nullfs/null_vfsops.c (revision 229701) > +++ sys/fs/nullfs/null_vfsops.c (working copy) > @@ -84,16 +84,26 @@ nullfs_mount(struct mount *mp) > if (mp->mnt_flag & MNT_ROOTFS) > return (EOPNOTSUPP); > /* > - * Update is a no-op > + * Update is supported only for some options. > */ > if (mp->mnt_flag & MNT_UPDATE) { > - /* > - * Only support update mounts for NFS export. > - */ > + error =3D EOPNOTSUPP; > + xmp =3D MOUNTTONULLMOUNT(mp); > + if (vfs_flagopt(mp->mnt_optnew, "sobypass", NULL, 0)) { > + MNT_ILOCK(mp); > + xmp->nullm_flags |=3D NULLMNT_SOBYPASS; > + MNT_IUNLOCK(mp); > + error =3D 0; > + } > + if (vfs_flagopt(mp->mnt_optnew, "nosobypass", NULL, 0)) = { > + MNT_ILOCK(mp); > + xmp->nullm_flags &=3D ~NULLMNT_SOBYPASS; > + MNT_IUNLOCK(mp); > + error =3D 0; > + } > if (vfs_flagopt(mp->mnt_optnew, "export", NULL, 0)) > - return (0); > - else > - return (EOPNOTSUPP); > + error =3D 0; > + return (error); > } >=20 > /* > @@ -182,6 +192,11 @@ nullfs_mount(struct mount *mp) > MNT_ILOCK(mp); > mp->mnt_kern_flag |=3D lowerrootvp->v_mount->mnt_kern_flag & = MNTK_MPSAFE; > MNT_IUNLOCK(mp); > + > + xmp->nullm_flags =3D 0; > + vfs_flagopt(mp->mnt_optnew, "sobypass", &xmp->nullm_flags, > + NULLMNT_SOBYPASS); > + > mp->mnt_data =3D xmp; > vfs_getnewfsid(mp); >=20 > Index: sbin/mount_nullfs/mount_nullfs.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sbin/mount_nullfs/mount_nullfs.c (revision 229701) > +++ sbin/mount_nullfs/mount_nullfs.c (working copy) > @@ -57,27 +57,36 @@ static const char rcsid[] =3D >=20 > #include "mntopts.h" >=20 > +#define NULLOPT_SOBYPASS 0x00000001 > +#define NULLOPT_MASK (NULLOPT_SOBYPASS) > + > static struct mntopt mopts[] =3D { > MOPT_STDOPTS, > + MOPT_UPDATE, > + {"sobypass", 0, NULLOPT_SOBYPASS, 1}, > MOPT_END > }; >=20 > +static char fstype[] =3D "nullfs"; > + > int subdir(const char *, const char *); > static void usage(void) __dead2; >=20 > int > main(int argc, char *argv[]) > { > - struct iovec iov[6]; > - int ch, mntflags; > + struct iovec *iov; > + int ch, iovlen, mntflags, nullflags, negflags; > char source[MAXPATHLEN]; > char target[MAXPATHLEN]; >=20 > - mntflags =3D 0; > + mntflags =3D nullflags =3D 0; > + negflags =3D NULLOPT_MASK; > while ((ch =3D getopt(argc, argv, "o:")) !=3D -1) > switch(ch) { > case 'o': > - getmntopts(optarg, mopts, &mntflags, 0); > + getmntopts(optarg, mopts, &mntflags, = &nullflags); > + getmntopts(optarg, mopts, &mntflags, &negflags); > break; > case '?': > default: > @@ -97,20 +106,18 @@ main(int argc, char *argv[]) > errx(EX_USAGE, "%s (%s) and %s are not distinct paths", > argv[0], target, argv[1]); >=20 > - iov[0].iov_base =3D strdup("fstype"); > - iov[0].iov_len =3D sizeof("fstype"); > - iov[1].iov_base =3D strdup("nullfs"); > - iov[1].iov_len =3D strlen(iov[1].iov_base) + 1; > - iov[2].iov_base =3D strdup("fspath"); > - iov[2].iov_len =3D sizeof("fspath"); > - iov[3].iov_base =3D source; > - iov[3].iov_len =3D strlen(source) + 1; > - iov[4].iov_base =3D strdup("target"); > - iov[4].iov_len =3D sizeof("target"); > - iov[5].iov_base =3D target; > - iov[5].iov_len =3D strlen(target) + 1; > - > - if (nmount(iov, 6, mntflags)) > + iov =3D NULL; > + iovlen =3D 0; > + build_iovec(&iov, &iovlen, "fstype", fstype, (size_t)-1); > + build_iovec(&iov, &iovlen, "fspath", source, (size_t)-1); > + build_iovec(&iov, &iovlen, "target", target, (size_t)-1); > + if ((nullflags & NULLOPT_SOBYPASS) !=3D 0) > + build_iovec(&iov, &iovlen, "sobypass", NULL, 0); > + if ((mntflags & MNT_UPDATE) !=3D 0) { > + if ((negflags & NULLOPT_SOBYPASS) =3D=3D 0) > + build_iovec(&iov, &iovlen, "nosobypass", NULL, = 0); > + } > + if (nmount(iov, iovlen, mntflags)) > err(1, NULL); > exit(0); > } > Index: sbin/mount_nullfs/mount_nullfs.8 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sbin/mount_nullfs/mount_nullfs.8 (revision 229701) > +++ sbin/mount_nullfs/mount_nullfs.8 (working copy) > @@ -79,8 +79,14 @@ Options are specified with a > flag followed by a comma separated string of options. > See the > .Xr mount 8 > -man page for possible options and their meanings. > +man page for standard options and their meanings. > +Options specific for > +.Nm : > +.Bl -tag -width sobypass > +.It Cm sobypass > +Bypass unix socket operations to the lower layer. > .El > +.El > .Pp > The null layer has two purposes. > First, it serves as a demonstration of layering by providing a layer From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 20:30:12 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBCDA106564A; Tue, 10 Jan 2012 20:30:12 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id 073A28FC12; Tue, 10 Jan 2012 20:30:11 +0000 (UTC) Received: by eaan12 with SMTP id n12so532117eaa.13 for ; Tue, 10 Jan 2012 12:30:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:message-id :user-agent:mime-version:content-type; bh=LTTeT7QSNJRErDiegT4qZsQnY/QXW7yBhHQchBU0M0M=; b=N/9dh/1uOaGOV6SIg9w5dRS3Hn8b+p2Vbov/HABoTebGqFGjBkWm+H1kgNO33auBV7 dFFC3iQidgdonwVihKzl8a/gTS/CRKiAm5OWe8/0wesNShqbiq9DX9MxpMmNElukkJ/w dR6MD9VtyjK6Lsxm79eOu/G8zXmfk1thtIrmA= Received: by 10.204.10.65 with SMTP id o1mr9040366bko.19.1326227410874; Tue, 10 Jan 2012 12:30:10 -0800 (PST) Received: from localhost ([95.69.173.122]) by mx.google.com with ESMTPS id d23sm106076493bkw.15.2012.01.10.12.30.08 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 Jan 2012 12:30:09 -0800 (PST) From: Mikolaj Golub To: "Robert N. M. Watson" References: <86sjjobzmn.fsf@kopusha.home.net> X-Comment-To: Robert N. M. Watson Sender: Mikolaj Golub Date: Tue, 10 Jan 2012 22:30:06 +0200 Message-ID: <86fwfnti5t.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: arch@freebsd.org, Kostik Belousov Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 20:30:12 -0000 On Tue, 10 Jan 2012 14:02:34 +0000 Robert N. M. Watson wrote: RNMW> On 9 Jan 2012, at 16:37, Mikolaj Golub wrote: >> kib@ suggested that the issue could be fixed if one added new VOP_* >> operations for setting and accessing vnode's v_socket field. RNMW> I like the philosophy of the proposed approach signifiantly better than RNMW> the previous discussed approaches. Some thoughts: RNMW> (1) I don't think the new behaviour should be optional -- it was always RNMW> the intent that nullfs pass through all behaviours to the underlying RNMW> layer, it's just that certain edge cases didn't appear in the original RNMW> implementation. Memory mapping was fixed a few years ago using similar RNMW> techniques. This will significantly reduce the complexity of your RNMW> patch, and also avoid user confusion since it will now behave "as RNMW> expected". Certainly, mention in future release notes would be RNMW> appropriate, however. I don't mind having only the new behavior, as I can't imagine where I would need a nullfs with nosobypass option mounted and I also like when things are simple :-). On the other hand there might be people who relied on the old behavior and who would be surprised if it had changed. So, if other people agree I will remove the old behaviour to make the patch simpler. Another option would be to have sobypass by default with possibility to (re)mount fs with nosobypass. RNMW> (2) I'd like to think (as John also mentioned?) that we could use a RNMW> shared vnode lock when doing read-only access (i.e., connect). Thanks, trying this. RNMW> (3) With this patch, an rwlock is held over vnode operations -- RNMW> required due to the interlocked synchronisation of vnode and unix RNMW> domain sockets. We often try to avoid doing this for reasons of lock RNMW> order (and principle). It appears that it is likely fine in this case RNMW> but it makes me slightly nervous. Well, I have not noticed any issues but it might be because I don't have much experience here. RNMW> (4) I'm slightly puzzled by the bind(2) case and interactions with RNMW> layering -- possibly there is a bug here. If I issue bind(2) against RNMW> the top layer, it looks like vp->v_socket will be set in the bottom RNMW> layer, but unp->unp_vnode will be assigned to the top-layer vnode? My RNMW> assumption was that you would want unp_vnode to always point to the RNMW> bottom (real) vnode, which suggest to me that the VOPs shouldn't just RNMW> assign v_socket, but should also assign unp_vnode. This has RNMW> implications elsewhere in uipc_usrreq.c as well. Could you clarify RNMW> whether you think this could be an issue? I have made unp_vnode to always point to a vnode returned in bind() on create intentionally. On bind v_usecount is increased for this vnode and in uipc_detach() we have to call vrele() for this same vnode. I believe that in uipc_usrreq.c via unp->unp_vnode we should reference only the upper vnode, accessing the lower vnode only via VOP_* operations. I don't see issues with such approach so far, while doing in the suggested way (unp_vnode to always point to vnode that has socket reference) I don't see how to decrease usecount for upper vnode on detach (we want to have usecount for the upper vnode increased on bind, so e.g. umount would fail with EBUSY when trying to unmount nullfs if somebody bond to an upper path). RNMW>It may also be worth KASSERTing that the top-level vnode never points at RNMW>anything but NULL to catch bugs like this. This may mean the VOPs have RNMW>to have a bit of "test-and-set" to them to get atomicity properties RNMW>right when it comes to bind(2). RNMW> In general, I think this is the right thing to do, and I'm very pleased RNMW> you're doing it -- but the patch requires some further work. Thanks. -- Mikolaj Golub From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 20:36:28 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 602E1106566B for ; Tue, 10 Jan 2012 20:36:28 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id E19D58FC08 for ; Tue, 10 Jan 2012 20:36:27 +0000 (UTC) Received: by bkbzs2 with SMTP id zs2so862660bkb.13 for ; Tue, 10 Jan 2012 12:36:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; bh=X9Og+FMUhNglCfVGU8JDm2bxiF2jXwSZwcoDf359JKQ=; b=TCh/V6qkj8WumBCrzFcOhZqQr2Yj4RTRCISJWd29k2sy0O9QScuHK+o23LaUmiJsEu nY+KmvH01SLDkiO/jylT2/SbvFao/5qwaWand8XT8LWMnSKsR5dlKLOpo3cVkNYjh3I9 J3dEDoKb+3mW1y3sT0bFkRfX2PgFOppMZuAQQ= Received: by 10.204.153.211 with SMTP id l19mr9311425bkw.24.1326226221425; Tue, 10 Jan 2012 12:10:21 -0800 (PST) Received: from localhost ([95.69.173.122]) by mx.google.com with ESMTPS id y22sm84131292bkf.4.2012.01.10.12.10.17 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 Jan 2012 12:10:18 -0800 (PST) From: Mikolaj Golub To: John Baldwin References: <86sjjobzmn.fsf@kopusha.home.net> <201201100819.18892.jhb@freebsd.org> X-Comment-To: John Baldwin Sender: Mikolaj Golub Date: Tue, 10 Jan 2012 22:10:15 +0200 In-Reply-To: <201201100819.18892.jhb@freebsd.org> (John Baldwin's message of "Tue, 10 Jan 2012 08:19:18 -0500") Message-ID: <86hb03tj2w.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Kostik Belousov , Robert Watson , freebsd-arch@freebsd.org Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 20:36:28 -0000 On Tue, 10 Jan 2012 08:19:18 -0500 John Baldwin wrote: JB> I think this is a decent solution. Why not make the locking notes for JB> VOP_UNPCONNECT() be "L" instead of "E"? A read lock should be sufficient JB> to fetch the socket? In fact, I suspect that unp_connect() could actually JB> use a shared lock on the vnode by adding 'LOCKSHARE' to the flags passed JB> to namei() via NDINIT(). This looks reasonable to me. I am going to modify the patch accordingly. Thanks. -- Mikolaj Golub From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 23:04:58 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2983F106566B; Tue, 10 Jan 2012 23:04:58 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8509D8FC0A; Tue, 10 Jan 2012 23:04:57 +0000 (UTC) Received: by qcse13 with SMTP id e13so68980qcs.13 for ; Tue, 10 Jan 2012 15:04:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=bSQdsDlMGjTRULbmt/AwxU4muH0yhFsk23et/VrpXiU=; b=pE/vUwFlnO6HnxKrhKznYLKF4IUVmRnSaHBjpLaqKsGnT/VZvGonFfaI4HHqj5Hg65 EBbjGkQ6V5bdhp1qPLsTdhT3RBAWE5g48Q1W/m1Y6nqNvtaSwDqp8Ky+/yK08BJQmHhW db5drqHNNu/InnCxAd7LSWrzkGTym7NmESZHA= MIME-Version: 1.0 Received: by 10.229.76.215 with SMTP id d23mr8321462qck.45.1326236696884; Tue, 10 Jan 2012 15:04:56 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.185.82 with HTTP; Tue, 10 Jan 2012 15:04:56 -0800 (PST) In-Reply-To: <20120110153807.H943@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> Date: Wed, 11 Jan 2012 00:04:56 +0100 X-Google-Sender-Auth: wABnixwXHHlF3seg1uLdj7a5bbU Message-ID: From: Giovanni Trematerra To: Bruce Evans Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: jilles@freebsd.org, Attilio Rao , flo@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 23:04:58 -0000 On Tue, Jan 10, 2012 at 10:41 AM, Bruce Evans wrote: > On Mon, 9 Jan 2012, Giovanni Trematerra wrote: > >> On Mon, Jan 9, 2012 at 3:34 PM, Bruce Evans wrote= : >>> >>> >>> I would go the other way, and pessimize pipes to be like fifos. =A0Then >>> optimize the socket layer under both. =A0Fifos are not important, but >>> they are implemented on top of the socket layer which is important. >>> Pipes are important. ... >>> ... >>> >>> Linux-2.6.10 implements fifos as a small wrapper around pipes, while >>> FreeBSD implements them as a large wrapper around sockets. =A0I hope th= e >>> former is what you do -- share most pipe code, without making it more >>> complicated, and with making the fifo wrapper much simpler. =A0The Linu= x >>> code is much simpler and smaller, since for pipes it it doesn't >>> implement direct mode, and for sockets it doesn't have to interact with >>> the complicated socket layer. >> >> >> If you read the patch, as I think you didn't, you'd see that there's no >> wrapper >> at all. fifo's code is just fifo_open, fifo_close and another couple of >> helper >> functions to deal with VFS, all the remaining code is shared with pipes >> and >> no complicated code was added. > > > I think you don't want me to read the patch, since I would see too much > detail starting with style bugs. =A0Anyway.. Thanks a lot for your review. I really appreciated it. [skip] I'll do my best to fix style bugs. > > In this file, I have most experience fixing this function (and open > and close so that select and poll work). =A0The above looks simple, but > has a complex interaction with layers above and below it. =A0Most of the > details are in the socket layer. =A0You had to reimplement these in the > pipe layer. =A0The most delicate point involving fs_wgen seems to be > reimplemented correctly in fifo_iseof(). =A0Before I fixed this for > fifos, poll and select on pipes (especially for EOF) was less broken > than for fifos, partly because pipes are simpler -- they can't be > reopened. =A0My tests in /usr/src/tools/regression/poll/ are hopefully > enough to detect any regressions. =A0Some of the tests are intentionally > left broken and/or expected to fail, to be bug for bug compatible with > old kernel bugs. > ok. I'll try that regression test > > % @@ -295,42 +328,133 @@ pipe_zone_ctor(void *mem, int size, void > % =A0static int > % =A0pipe_zone_init(void *mem, int size, int flags) > % =A0{ > % - =A0 =A0 struct pipepair *pp; > % + =A0 =A0 struct umapipe *up; > % + =A0 =A0 struct pipeinfo *pip; > % + =A0 =A0 struct timespec ctime; > % % - =A0 KASSERT(size =3D=3D sizeof(*pp), ("pipe_zone_init: wrong size")= ); > % + =A0 =A0 KASSERT(size =3D=3D sizeof(*up), ("pipe_zone_init: wrong size= ")); > % % - =A0 pp =3D (struct pipepair *)mem; > % + =A0 =A0 up =3D (struct umapipe *)mem; > % + =A0 =A0 vfs_timestamp(&ctime); > % + =A0 =A0 pip =3D &up->pip[0]; > % + =A0 =A0 pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime =3D ctime; > % + =A0 =A0 pip->pi_ino =3D -1; > % % - =A0 mtx_init(&pp->pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RECURSE= ); > % + =A0 =A0 pip =3D &up->pip[1]; > % + =A0 =A0 pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime =3D ctime; > % + =A0 =A0 pip->pi_ino =3D -1; > % + > % + =A0 =A0 mtx_init(&up->pp.pp_mtx, "pipe mutex", NULL, MTX_DEF | MTX_RE= CURSE); > % =A0 =A0 =A0 return (0); > > Timestamps seem to be broken. =A0jhb pointed out a problem in them, witho= ut > much detail (but I forget the exact detail). =A0Here it can't be right to > have timestamps on both sides, since timestamps are a property of the > file at its lowest level, not the file descriptor or even the file at the > open file (fcntl) level. =A0For fifos, timestamps are even more a propert= y > of the file. > > % ... > % +static int > % +fifo_zone_init(void *mem, int size, int flags) > % +{ > % + =A0 =A0 struct umafifo *up; > % + =A0 =A0 struct pipeinfo *pip; > % + > % + =A0 =A0 KASSERT(size =3D=3D sizeof(*up), ("fifo_zone_init: wrong size= ")); > % + > % + =A0 =A0 up =3D (struct umafifo *)mem; > % + =A0 =A0 pip =3D &up->pip[0]; > % + =A0 =A0 vfs_timestamp(&pip->pi_ctime); > % + =A0 =A0 pip->pi_atime =3D pip->pi_mtime =3D pip->pi_ctime; > > For fifos, is wrong to have even 1 timestamp at this level (except > unused ones won't hurt). =A0Fifos and their timstamps persist as disk > files, and the timestamps for these disk files are managed by the > underlying file system. > Any timestamps at this level can only give > possibilities for inconsistencies. =A0For example, this level uses > vfs_timestamp(), but the file system level might use a different > timestamp method, either because it is buggy or because it cannot > represent timestamps with the granularity that vfs_timestamp() gives. > (It is a bug that vfs_timestamp() is global.) > > % @@ -1219,7 +1403,7 @@ pipe_write(fp, uio, active_cred, flags, % =A0 =A0= =A0 } > % % =A0 =A0 if (error =3D=3D 0) > % - =A0 =A0 =A0 =A0 =A0 =A0 vfs_timestamp(&wpipe->pipe_mtime); > % + =A0 =A0 =A0 =A0 =A0 =A0 vfs_timestamp(&pip->pi_mtime); > % % =A0 =A0 /* > % =A0 =A0 =A0 =A0* We have something to offer, > > It's doing timstamps in the same way for fifos as for pipes. =A0There mus= t > by a problem for stat() too. =A0The old fifo code uses fo_stat() to get > back to the underlying file system which knows all the attributes (not > just timestamps). =A0I can't see anything like that here. =A0The new fifo > code seems to just use pipe_stat() which gives many fake attributes > which are likely to differ from ones in the file system. > > % @@ -1492,12 +1726,12 @@ pipe_free_kmem(cpipe) > % =A0 * shutdown the pipe > % =A0 */ > % =A0static void > % -pipeclose(cpipe) > % +pipeclose(cpipe, isfifo) > % =A0 =A0 =A0 struct pipe *cpipe; > % + =A0 =A0 int isfifo; > % =A0{ > > I don't see any reason to have a different zone for fifos. =A0This > complicates some interfaces ... Just to not waste a sizeof(struct pipeinfo ) bytes. > > % @@ -1570,21 +1798,34 @@ pipeclose(cpipe) > % =A0#ifdef MAC > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 mac_pipe_destroy(pp); > % =A0#endif > % - =A0 =A0 =A0 =A0 =A0 =A0 uma_zfree(pipe_zone, cpipe->pipe_pair); > % + =A0 =A0 =A0 =A0 =A0 =A0 uma_zfree(isfifo ? fifo_zone : pipe_zone, cpi= pe->pipe_pair); > > The new isfifo parameter is only used here. That's the only way to understand which zone I need to use. > > % diff -r fee0771aad22 sys/sys/pipe.h > % --- a/sys/sys/pipe.h =A0Sun Jan 01 19:00:13 2012 +0100 > % +++ b/sys/sys/pipe.h =A0Mon Jan 09 00:13:55 2012 +0100 > % @@ -28,6 +28,8 @@ > % =A0#error "no user-servicable parts inside" > % =A0#endif > % % +#include > % + > > This namespace pollution was intentionally left out. > > % =A0/* > % =A0 * Pipe buffer size, keep moderate in value, pipes take kva space. > % =A0 */ > % @@ -103,16 +105,12 @@ struct pipe { > % =A0 =A0 =A0 struct =A0pipebuf pipe_buffer; =A0 =A0/* data storage */ > % =A0 =A0 =A0 struct =A0pipemapping pipe_map; =A0 /* pipe mapping for dir= ect I/O */ > % =A0 =A0 =A0 struct =A0selinfo pipe_sel; =A0 =A0 =A0 /* for compat with = select */ > > was a prerequisite for this file. I'll fix it. > > % - =A0 =A0 struct =A0timespec pipe_atime; =A0 =A0/* time of last access = */ > % - =A0 =A0 struct =A0timespec pipe_mtime; =A0 =A0/* time of last modify = */ > % - =A0 =A0 struct =A0timespec pipe_ctime; =A0 =A0/* time of status chang= e */ > % =A0 =A0 =A0 struct =A0sigio *pipe_sigio; =A0 =A0 =A0/* information for = async I/O */ > % =A0 =A0 =A0 struct =A0pipe *pipe_peer; =A0 =A0 =A0 =A0/* link with othe= r direction */ > % =A0 =A0 =A0 struct =A0pipepair *pipe_pair; =A0 =A0/* container structur= e pointer */ > % =A0 =A0 =A0 u_int =A0 pipe_state; =A0 =A0 =A0 =A0 =A0 =A0 /* pipe statu= s info */ > % =A0 =A0 =A0 int =A0 =A0 pipe_busy; =A0 =A0 =A0 =A0 =A0 =A0 =A0/* busy f= lag, mostly to handle > rundown sanely */ > % =A0 =A0 =A0 int =A0 =A0 pipe_present; =A0 =A0 =A0 =A0 =A0 /* still pres= ent? */ > % - =A0 =A0 ino_t =A0 pipe_ino; =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fake inode= for stat(2) */ > > Both this and the timestamps should never have been here, since they > are per-"disk"-file but they were per-pipe-endpoint (see below). > > % =A0}; > % % =A0/* > % @@ -138,5 +136,24 @@ struct pipepair { > % =A0#define PIPE_UNLOCK(pipe) =A0 =A0mtx_unlock(PIPE_MTX(pipe)) > % =A0#define PIPE_LOCK_ASSERT(pipe, type) =A0mtx_assert(PIPE_MTX(pipe), (= type)) > % % +#define PIPE_CNT(pipe) =A0 =A0 ((pipe->pipe_state & PIPE_DIRECTW) ? = \ > % + =A0 =A0 =A0 =A0 =A0 =A0 pipe->pipe_map.cnt : pipe->pipe_buffer.cnt) > % + > % +/* > % + * =A0Per-file descriptor structure. > % + */ > > I was very confused by this comment. =A0It is sort of backwards. =A0This > structure is for the lowest level, which is the pipepair level for > pipes and the disk level for for fifos. > =A0(But we have to expand the disk level, first from dinodes/directory > =A0entries to inodes/directory blocks, then from inodes to vnodes, then > =A0append pipepairs). =A0The open file level is a level or two above that= , > =A0and the file descriptor level is further above. > > =A0The levels are stacked even more confusingly for pipes. =A0Above the > =A0pipepair level, there is the pipe level, but this is modes sideways > =A0than fully above. =A0Open files are more at the level of pipes than > =A0pipepairs. =A0"pipe" in normal usage can mean either "pipe" or > =A0"pipepair" in this implementation.) > The timestamps and inode were at a wrong level. =A0You made a step toward= s > fixing this by moving them down, but the comment says that they are now > at the highest level. The comment is wrong I'll change it. > > % +struct pipeinfo { > % + =A0 =A0 struct =A0pipe =A0 =A0*pi_rpipe; =A0 =A0 =A0/* pipe we read f= rom */ > % + =A0 =A0 struct =A0pipe =A0 =A0*pi_wpipe; =A0 =A0 =A0/* pipe we write = to */ > % + =A0 =A0 struct =A0timespec pi_atime; =A0 =A0 =A0/* time of last acces= s */ > % + =A0 =A0 struct =A0timespec pi_mtime; =A0 =A0 =A0/* time of last modif= y */ > % + =A0 =A0 struct =A0timespec pi_ctime; =A0 =A0 =A0/* time of status cha= nge */ > % + =A0 =A0 ino_t =A0 pi_ino; =A0 =A0 =A0 =A0 /* fake pipe inode for stat= (2) */ > > Indentation error. > > % +}; > > I was confused by the layering for this struct too. =A0This struct seems > to be needed only to swap rpipe with wpipe for the 2 ends of a pipe. > This is confusing, but I can't see any better way at the moment. > Putting the other fields in it just gives confusion which leads to > bugs and minor resource wastage. =A0All the other fields must be per-file > at the lowest level, so any duplication of them gives either bugs > (if they are different) or just wastes resources time to write them > and check that they are the same, and and space to hold copies). > These fields belong in the pipepair struct. =A0This moves them down > (more sideways) another level. > > POSIX is fuzzy about whether the attributes are unique for the 2 ends > of a pipe. =A0It requires all st_ timestamps and some other attributes > to be "meaningful" unless otherwise specified and doesn't specify > anything otherwise for pipes, at least in an old draft. =A0But for pipe() > it says: > > 27884 =A0 =A0 =A0 =A0 =A0 =A0Upon successful completion, pipe( ) shall ma= rk for update > the st_atime, st_ctime, and st_mtime > 27885 =A0 =A0 =A0 =A0 =A0 =A0fields of the pipe. > > Assuming that this part is not fuzzy, "the ... st_atime... fields of > the pipe" in it must refer to unique fields. > > Similarly for pi_ino. > > These fields shouldn't be used at all for fifos, as mentioned above. > File systems still maintain separate non-copies which pipe_stat() > hides. =A0This doesn't matter much for timestamps and st_ino. =A0It > matters for modes and permissions, etc. I'll fix it Just to be clear. timestamps of last access or last modify aren't saved back into file system for fifo. it's hard to fix this now so I might look at it later. > > After moving timestamps and pi_ino into the pipepair struct, the new > info struct is reduced to 2 pointers into the pipepair struct. > Unfortunately, the pointer to the pipeinfo struct cannot be simply > replaced by a pointer to the pipepair struct (and dereferencing the > latter instead of the former) because of complications for the > separate ends of a pipe: > > @ @@ -372,17 +554,17 @@ kern_pipe(struct thread *td, int fildes[ > @ =A0 =A0 =A0 =A0* to avoid races against processes which manage to dup()= the read > @ =A0 =A0 =A0 =A0* side while we are blocked trying to allocate the write= side. > @ =A0 =A0 =A0 =A0*/ > @ - =A0 =A0 finit(rf, FREAD | FWRITE, DTYPE_PIPE, rpipe, &pipeops); > @ + =A0 =A0 finit(rf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[0], &pipeops); > @ =A0 =A0 =A0 error =3D falloc(td, &wf, &fd, 0); > @ =A0 =A0 =A0 if (error) { > @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 fdclose(fdp, rf, fildes[0], td); > @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 fdrop(rf, td); > @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* rpipe has been closed by fdrop(). */ > @ - =A0 =A0 =A0 =A0 =A0 =A0 pipeclose(wpipe); > @ + =A0 =A0 =A0 =A0 =A0 =A0 pipeclose(wpipe, 0); > @ =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (error); > @ =A0 =A0 =A0 } > @ =A0 =A0 =A0 /* An extra reference on `wf' has been held for us by fallo= c(). */ > @ - =A0 =A0 finit(wf, FREAD | FWRITE, DTYPE_PIPE, wpipe, &pipeops); > @ + =A0 =A0 finit(wf, FREAD | FWRITE, DTYPE_PIPE, &up->pip[1], &pipeops); > > One end used to get rpipe and the other end wpipe. =A0This is used mainly > by read/write to go in the correct direction. =A0Now, the 2 ends get > pointers to different pipeinfo structs, with the only differences in > the pipeinfo structs being: > - swap rpipe and wpipe. =A0This is used mainly by read/write as before rpipe and wpipe aren't swapped. for pipes: pip[0].pi_rpipe =3D=3D pip[0].pi_wpipe pip[1].pi_rpipe =3D=3D pip[1].pi_wpipe pipe[0].pi_rpipe !=3D pipe[1].pi_rpipe pipe[0].pi_wpipe !=3D pipe[1].pi_wpipe for fifos pip[0].pi_rpipe !=3D pip[0].pi_wpipe > - different timestamps (to implement bugs and resource wastage) > - pi_ino should be the same in both (just waste space). Well posix says: "The st_ino and st_dev fields taken together uniquely identify the file within the system." So that's a matter of what a file is: pipepair or and end of if? > > For fifos, rpipe =3D=3D wpipe so the 2 pipeinfo structs should be the sam= e > and the complications are not needed. > Err, that's true for pipes. > > There are some more complications resource wastages related to having > 2 pipe ends when only 1 is needed: > - struct pipe is in struct pipepair twice. =A0This can't quite be fixed > =A0by moving it to struct pipeinfo. =A0If both pipe structs for malloc()e= d, > =A0then the 2 pointers in struct pipeinfo would be enough for accessing > =A0them, but I think the space wastage is too small to fix like this. > - some code was symmetrical relative to rpipe and wpipe and it didn't > =A0care in which order they were in. =A0Now rpipe can be equal to wpipe, > =A0some of this code is no longer symmetrical because it does things > =A0like free(rpipe), while the rest of it is still symmetrical because > =A0it does things like initalizing rpipe->foo to the same value as > =A0wpipe->foo. =A0You had to make some changes in this area. =A0Example > =A0of a change in this area: > > % @@@ -349,18 +473,76 @@ kern_pipe(struct thread *td, int fildes[ > % @ =A0 =A0 /* Only the forward direction pipe is backed by default */ > % @ =A0 =A0 if ((error =3D pipe_create(rpipe, 1)) !=3D 0 || > % @ =A0 =A0 =A0 =A0 (error =3D pipe_create(wpipe, 0)) !=3D 0) { > % @- =A0 =A0 =A0 =A0 =A0 =A0pipeclose(rpipe); > % @- =A0 =A0 =A0 =A0 =A0 =A0pipeclose(wpipe); > % @+ =A0 =A0 =A0 =A0 =A0 =A0pipeclose(rpipe, isfifo); > % @+ =A0 =A0 =A0 =A0 =A0 =A0pipeclose(wpipe, isfifo); > > =A0Not really symmetrical. =A0There must be 2 closes if there are 2 open > =A0fd's (1 in each direction). =A0But what if there is only 1 open fd for > =A0a fifo with rpipe =3D wpipe? =A0pipe_dtor() is more obviously correct, > =A0since it only does 1 pipeclose() for fifos. =A0The new isfifo flag is > =A0only used in pipeclose() to select the zone. That's needed to free UMA allocated memory. > > % @ =A0 =A0 =A0 =A0 =A0 =A0 return (error); > % @ =A0 =A0 } > % @ % @ =A0 =A0 =A0 =A0 rpipe->pipe_state |=3D PIPE_DIRECTOK; > % @ =A0 =A0 wpipe->pipe_state |=3D PIPE_DIRECTOK; > % @ % @+ =A0 =A0 =A0 =A0if (isfifo) { > % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_rpipe =3D rpipe; > % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_wpipe =3D wpipe; > % @+ =A0 =A0} else { > % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_rpipe =3D rpipe; > % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[0].pi_wpipe =3D rpipe; > % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[1].pi_rpipe =3D wpipe; > % @+ =A0 =A0 =A0 =A0 =A0 =A0up->pip[1].pi_wpipe =3D wpipe; > % @+ =A0 =A0} > > =A0This seems slightly broken. =A0Fifos should only use 1 pipeinfo struct= , > =A0but other fifo code initializes both up->pip[0] and up->pip[1]. your interpretation seems wrong. > Having only 1 pipe end initialized would destroy any remaining symmetry > but would make it clear which 1 is actually used. > > % + > % +extern struct fileops pipeops; > % + > % +int pipe_ctor(struct pipeinfo **ppip, struct thread *td); > % +void pipe_dtor(struct pipeinfo *pip); > > Indentation errors. > > % % =A0#endif /* !_SYS_PIPE_H_ */ > > Bruce From owner-freebsd-arch@FreeBSD.ORG Tue Jan 10 23:33:55 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 89281106566B; Tue, 10 Jan 2012 23:33:55 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id DB44D8FC14; Tue, 10 Jan 2012 23:33:54 +0000 (UTC) Received: by qcse13 with SMTP id e13so83089qcs.13 for ; Tue, 10 Jan 2012 15:33:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=ypP/98SmBTHIpbxopbdC5RK2JcymYLp2rxV/E3kTjKw=; b=tweezkXfk9VL7mlEiA2HI1yQ1PzMk/44arcAXJW4ickcmh9p0CFpnj//k118fEFuWw ckXmvnPMKsWjAvkpcGyVb70jQHVMwu1ryXRxQ1hf+QbgTHgM5+Nc8J5RvaDdJcdoT9OB nGN/BGJPo3KMg89hwkBSUAXZ+c8XdrnTbH/fw= MIME-Version: 1.0 Received: by 10.229.135.193 with SMTP id o1mr8385554qct.25.1326238434193; Tue, 10 Jan 2012 15:33:54 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.185.82 with HTTP; Tue, 10 Jan 2012 15:33:54 -0800 (PST) In-Reply-To: References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> Date: Wed, 11 Jan 2012 00:33:54 +0100 X-Google-Sender-Auth: PvgtfM8OvtZJQGqT7YzfEj6DGi0 Message-ID: From: Giovanni Trematerra To: Bruce Evans , jilles@freebsd.org, Attilio Rao , flo@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org, Peter Holm Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Jan 2012 23:33:55 -0000 Hi all, this new patch at http://www.trematerra.net/patches/pipefifo_merge2.4.diff try to solve issues raised by jhb and some of the bde ones. I hope I did fix all the style bugs at least. Thank you for your time. -- Gianni From owner-freebsd-arch@FreeBSD.ORG Wed Jan 11 00:14:47 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B5E01065673; Wed, 11 Jan 2012 00:14:47 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id ADD3C8FC0C; Wed, 11 Jan 2012 00:14:46 +0000 (UTC) Received: by qabj40 with SMTP id j40so168231qab.13 for ; Tue, 10 Jan 2012 16:14:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=SmH1CFWHKyh3ifz+QLz+U2DJo3nzk/Z9ho+x5MNz0Ro=; b=IH63HcwL8E3DcnDujbroT/KHzhnUJXkUcFJLTy1JjXgYDDxmGL5YIqRY/Bz7f2LFI6 JH1NiAmrb7DBxbnPyeuBtj0QSoKdeMn4d4yQbgp3YrgG07fqAgrjH3Frp3ZWX9C6os1c AmJeQGu0piIhxt0xbmQbWdoOOWLspYBYy52+A= MIME-Version: 1.0 Received: by 10.224.175.2 with SMTP id v2mr27563404qaz.69.1326240884153; Tue, 10 Jan 2012 16:14:44 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.185.82 with HTTP; Tue, 10 Jan 2012 16:14:44 -0800 (PST) In-Reply-To: <20120110211510.T1676@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> <20120110211510.T1676@besplex.bde.org> Date: Wed, 11 Jan 2012 01:14:44 +0100 X-Google-Sender-Auth: UqaM80im2N_poDWTJVwgcTDBCrU Message-ID: From: Giovanni Trematerra To: Bruce Evans Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: jilles@freebsd.org, Attilio Rao , flo@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 00:14:47 -0000 On Tue, Jan 10, 2012 at 1:04 PM, Bruce Evans wrote: > On Tue, 10 Jan 2012, Bruce Evans wrote: > >> I think you don't want me to read the patch, since I would see too much >> detail starting with style bugs. =A0Anyway.. >> ... > > > One more set of details. > > % + =A0 =A0 =A0 =A0 =A0 =A0 PIPE_UNLOCK(rpipe); > % + =A0 =A0 =A0 =A0 =A0 =A0 if (fifo_iseof(fp)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 events |=3D POLLINIGNEOF; > % + =A0 =A0 =A0 =A0 =A0 =A0 PIPE_LOCK(rpipe); > > This is new code (needed to force POLLIGNEOF for fifos). > > It is a layering violation to call the fifo code for non-fifos here. > fifo_iseof() handles this internally by checking fp->vnode->v_fifoinfo. > The pipe layer should know if it is dealing with a fifo in a better way > than that. I fixed this in http://www.trematerra.net/patches/pipefifo_merge2.4.diff > > I don't like unlocking in the middle in general, and here it gives > races. =A0We will miss setting POLLIN | POLLRDNORM for certain changes > if they weren't set earlier and the state changed while unlocked. =A0Why > unlock anyway or lock in fifo_iseof()? =A0Only fi_seqcount =3D=3D fi_wgen > is checked under the lock there. =A0Races in that check are probably > just as harmless as races here. =A0And locking doesn't even prevent them, > since if fi_seqcount or fi_wgen can change underneath us, they can also > change just after we check them. =A0They rarely change compared with the > buffer count raced with above. fixed that too as you suggest. > % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td) > % =A0 =A0 =A0 struct ucred *active_cred; > % =A0 =A0 =A0 struct thread *td; > % =A0{ > % - =A0 =A0 struct pipe *rpipe =3D fp->f_data; > % + =A0 =A0 struct pipeinfo *pip =3D fp->f_data; > % + =A0 =A0 struct pipe *rpipe; > % =A0 =A0 =A0 struct pipe *wpipe; > % =A0 =A0 =A0 int revents =3D 0; > % =A0#ifdef MAC > % =A0 =A0 =A0 int error; > % =A0#endif > % % - =A0 wpipe =3D rpipe->pipe_peer; > > % + =A0 =A0 rpipe =3D pip->pi_rpipe; > % + =A0 =A0 wpipe =3D pip->pi_wpipe->pipe_peer; > % =A0 =A0 =A0 PIPE_LOCK(rpipe); > % =A0#ifdef MAC > % =A0 =A0 =A0 error =3D mac_pipe_check_poll(active_cred, rpipe->pipe_pair= ); > % =A0 =A0 =A0 if (error) > % - =A0 =A0 =A0 =A0 =A0 =A0 goto locked_error; > % + =A0 =A0 =A0 =A0 =A0 =A0 return (0); > > Seems to be broken. =A0The unlock is now missing. fixed. > > % -static int > % -fifo_poll_f(struct file *fp, int events, struct ucred *cred, struct > thread *td) > % -{ > % - =A0 =A0 struct fifoinfo *fip; > % - =A0 =A0 struct file filetmp; > % - =A0 =A0 int levents, revents =3D 0; > % - > % - =A0 =A0 fip =3D fp->f_data; > % - =A0 =A0 levents =3D events & > % - =A0 =A0 =A0 =A0 (POLLIN | POLLINIGNEOF | POLLPRI | POLLRDNORM | POLLR= DBAND); > % - =A0 =A0 if ((fp->f_flag & FREAD) && levents) { > % - =A0 =A0 =A0 =A0 =A0 =A0 filetmp.f_data =3D fip->fi_readsock; > % - =A0 =A0 =A0 =A0 =A0 =A0 filetmp.f_cred =3D cred; > % - =A0 =A0 =A0 =A0 =A0 =A0 mtx_lock(&fifo_mtx); > % - =A0 =A0 =A0 =A0 =A0 =A0 if (fp->f_seqcount =3D=3D fip->fi_wgen) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 levents |=3D POLLINIGNEOF; > % - =A0 =A0 =A0 =A0 =A0 =A0 mtx_unlock(&fifo_mtx); > % - =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D soo_poll(&filetmp, levents, cred= , td); > % - =A0 =A0 } > % - =A0 =A0 levents =3D events & (POLLOUT | POLLWRNORM | POLLWRBAND); > % - =A0 =A0 if ((fp->f_flag & FWRITE) && levents) { > % - =A0 =A0 =A0 =A0 =A0 =A0 filetmp.f_data =3D fip->fi_writesock; > % - =A0 =A0 =A0 =A0 =A0 =A0 filetmp.f_cred =3D cred; > % - =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D soo_poll(&filetmp, levents, cred= , td); > % - =A0 =A0 } > % - =A0 =A0 return (revents); > % -} > > This was reasonably clean. =A0My version is cleaner: > - POLLIGNEOF is an old mistake of mine. =A0I tried to kill it, but kib@ > =A0propagated it to sys_pipe.c too, where it has survived another release > =A0or two. =A0In my version, I still have it in the call to soo_poll() bu= t > =A0don't have it in the `levents =3D events & ...' mask. =A0Thus it is a > =A0pure kernel flag, and acts the same as your isfifo flag -- it tells > =A0the socket layer to do something unusual because this is a fifo. =A0It > =A0is not needed any more, since the pipe layer is close to the fifo > =A0layer so it can just do something unusual. =A0It can determine whether > =A0the pipe is a fifo without passing around flags (the flag should be > =A0in pipe_state). > - My version is missing the FREAD and FWRITE checks. =A0These seem to be > =A0necessary, but I think they don't belong at this level. =A0Also, the > =A0error handling for them seems quite broken (nonexistent). =A0I think > =A0POLLERR is supposed to be returned for attempts to poll for an > =A0impossible condition, but the FREAD and FWRITE checks give a return > =A0of 0. =A0And returning 0 is much worse than returning success, since > =A0it will cause at least poll() to block() when it should return, > =A0Here is the commit that added these checks: > > % ---------------------------- > % revision 1.118 > % date: 2005/09/12 10:16:18; =A0author: rwatson; =A0state: Exp; =A0lines:= +2 -2 > % Only poll the fifo for read events if the fifo is attached to a readabl= e > % file descriptor. =A0Otherwise, the read end of a fifo might return that= it > % is writable (which it isn't). > > But it should return (with POLLERR). =A0This is an error condition and > should be detected. > > POSIX is fuzzy about this. =A0It only says that POLLERR is for when an er= ror > occurred. =A0It defines the POLLNVAL error clearly as meaning that the fd > is invalid. =A0Well that is not so clear. =A0A non-open fd is clearly inv= alid. > This is handled in upper layers. =A0Polling for a direction that can't wo= rk > can be considered as an invalid fd too, unless "invalid" has its technica= l > meaning. =A0Linux-2.6.10 sets POLLERR for reading from a pipe or fifo wit= h > no readers, and has an XXX comment saying that most Unices don't do this > for fifos. =A0This seems wrong to me, and FreeBSD doesn't do it for any o= f > pipes, fifos or sockets. =A0But for pipes, there is tricky EOF handling > associated with this condition. =A0I can't see anywhere where Linux gives > this based on the open mode. > > % % Only poll the fifo for write events if the fifo attached to a writabl= e > % file descriptor. =A0Otherwise, the write end of a fifo might return tha= t > % it is readable (which it isn't). > > Seems to be necessary too. =A0I can't see anywhere where Linux returns > POLLERR for i/o errors or unwritable files. > > % % In the event that a file is FREAD|FWRITE (which is allowed by POSIX, = but > % has undefined behavior), we poll for both. > % % MFC after: =A03 days > % ---------------------------- > > select() is interestingly different than poll(). =A0It can't return POLLE= RR. > Thus, the old broken behaviour gave the best close to possible behaviour > for select() at the usual level. =A0The POLLERR's should make it return > success, and the false successes in the kernel would have done the same. > Only cases where there were no false successes in the kernel were broken. > > > I don't like defaults set by initializations in declarations 'revents > =3D 0'. =A0Both the default and the return of 0 here seem to be wrong. > This is an error condition, so I think POLLERR should be returned, as > about. =A0Otherwise, poll() will probably block. =A0And the block is not > just transient, at least in the above since the error condition can > never go away. =A0You will only be saved from blocking forever if there > is success on some other file descriptor or event. > > % =A0#endif > % - =A0 =A0 if (events & (POLLIN | POLLRDNORM)) > % - =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE_DIRECTW) || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.cnt > 0)) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D events & (POLLIN= | POLLRDNORM); > % + =A0 =A0 if (fp->f_flag & FREAD) { > % + =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLIN | POLLRDNORM)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE= _DIRECTW) || > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.c= nt > 0)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = events & (POLLIN | POLLRDNORM); > > The change in fifos_vnops.c was done cleanly by adding the FREAD check > to the events mask check. =A0With fifos now polled here, it is needed > (modulo bugs) here too. =A0But here it makes the important changes for > fifos, if any, unreadable by indenting everything. > > % - =A0 =A0 if (events & (POLLOUT | POLLWRNORM)) > % - =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D PIPE_ACTIVE || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & PIPE_EOF) || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (((wpipe->pipe_state & PIPE_DIRECTW) = =3D=3D 0) && > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0((wpipe->pipe_buffer.size - wpipe-= >pipe_buffer.cnt) >=3D > PIPE_BUF || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0wpipe->pipe_buffer.size = =3D=3D 0))) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D events & (POLLOU= T | POLLWRNORM); > % + =A0 =A0 =A0 =A0 =A0 =A0 PIPE_UNLOCK(rpipe); > % + =A0 =A0 =A0 =A0 =A0 =A0 if (fifo_iseof(fp)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 events |=3D POLLINIGNEOF; > % + =A0 =A0 =A0 =A0 =A0 =A0 PIPE_LOCK(rpipe); > % % - =A0 if ((events & POLLINIGNEOF) =3D=3D 0) { > % - =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_EOF) { > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D (events & (POLLI= N | POLLRDNORM)); > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D = PIPE_ACTIVE || > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & = PIPE_EOF)) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = POLLHUP; > % + =A0 =A0 =A0 =A0 =A0 =A0 if ((events & POLLINIGNEOF) =3D=3D 0) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_= EOF) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = (events & (POLLIN | POLLRDNORM)); > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pi= pe_present !=3D PIPE_ACTIVE || > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpip= e->pipe_state & PIPE_EOF)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 revents |=3D POLLHUP; > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > % =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > % =A0 =A0 =A0 } > % + =A0 =A0 if (fp->f_flag & FWRITE) > % + =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLOUT | POLLWRNORM)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pipe_present !=3D = PIPE_ACTIVE || > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpipe->pipe_state & = PIPE_EOF) || % + > =A0 =A0 =A0 =A0 =A0 =A0 (((wpipe->pipe_state & PIPE_DIRECTW) =3D=3D 0) && > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ((wpipe->pipe_buffer.size - w= pipe->pipe_buffer.cnt) >>=3D > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 PIPE_BUF || w= pipe->pipe_buffer.size =3D=3D 0))) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = events & (POLLOUT | POLLWRNORM); > % % =A0 =A0 if (revents =3D=3D 0) { > % - =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLIN | POLLRDNORM)) { > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 selrecord(td, &rpipe->pipe_se= l); > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (SEL_WAITING(&rpipe->pipe_= sel)) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rpipe->pipe_s= tate |=3D PIPE_SEL; > % - =A0 =A0 =A0 =A0 =A0 =A0 } > % + =A0 =A0 =A0 =A0 =A0 =A0 if (fp->f_flag & FREAD) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLIN | POLLRD= NORM)) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 selrecord(td,= &rpipe->pipe_sel); > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (SEL_WAITI= NG(&rpipe->pipe_sel)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 rpipe->pipe_state |=3D PIPE_SEL; > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > % % - =A0 =A0 =A0 =A0 =A0 if (events & (POLLOUT | POLLWRNORM)) { > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 selrecord(td, &wpipe->pipe_se= l); > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (SEL_WAITING(&wpipe->pipe_= sel)) > % - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 wpipe->pipe_s= tate |=3D PIPE_SEL; > % - =A0 =A0 =A0 =A0 =A0 =A0 } > % + =A0 =A0 =A0 =A0 =A0 =A0 if (fp->f_flag & FWRITE) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLOUT | POLLW= RNORM)) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 selrecord(td,= &wpipe->pipe_sel); > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (SEL_WAITI= NG(&wpipe->pipe_sel)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 wpipe->pipe_state |=3D PIPE_SEL; > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > % =A0 =A0 =A0 } > % -#ifdef MAC > % -locked_error: > % -#endif > % =A0 =A0 =A0 PIPE_UNLOCK(rpipe); > % % =A0 =A0 return (revents); > > It seems that not much really changed here. =A0To avoid indentation and > fix bugs, the FREAD and FWRITE checks should be done up front. =A0I think > they can be done before locking and mac checking. =A0Something like: > > =A0 =A0 =A0 =A0if ((fp->f_flag & FREAD) && (events & (POLLIN | POLLRDNORM= )) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (POLLERR); > =A0 =A0 =A0 =A0if ((fp->f_flag & FWRITE) && (events & (POLLOUT | POLLWRNO= RM)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (POLLERR); > =A0 =A0 =A0 =A0if (events & POLLINIGNEOF) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (POLLER); =A0 =A0 =A0 =A0/* try to = kill this too */ > > Since the diff for pipe_poll() was unreadable, here it is again with > the old lines removed. =A0A few more problems are now obvious: > > % @@ -1326,58 +1549,66 @@ pipe_poll(fp, events, active_cred, td) > % =A0 =A0 =A0 struct ucred *active_cred; > % =A0 =A0 =A0 struct thread *td; > % =A0{ > % + =A0 =A0 struct pipeinfo *pip =3D fp->f_data; > % + =A0 =A0 struct pipe *rpipe; > % =A0 =A0 =A0 struct pipe *wpipe; > % =A0 =A0 =A0 int revents =3D 0; > % =A0#ifdef MAC > % =A0 =A0 =A0 int error; > % =A0#endif > % % + =A0 rpipe =3D pip->pi_rpipe; > % + =A0 =A0 wpipe =3D pip->pi_wpipe->pipe_peer; > % =A0 =A0 =A0 PIPE_LOCK(rpipe); > % =A0#ifdef MAC > % =A0 =A0 =A0 error =3D mac_pipe_check_poll(active_cred, rpipe->pipe_pair= ); > % =A0 =A0 =A0 if (error) > % + =A0 =A0 =A0 =A0 =A0 =A0 return (0); > % =A0#endif > % + =A0 =A0 if (fp->f_flag & FREAD) { > % + =A0 =A0 =A0 =A0 =A0 =A0 if (events & (POLLIN | POLLRDNORM)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if ((rpipe->pipe_state & PIPE= _DIRECTW) || > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (rpipe->pipe_buffer.c= nt > 0)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = events & (POLLIN | POLLRDNORM); > % > > This style bug (extra blank line) was common in old code. =A0It helps mak= e > the diffs unreadable too. > > > % % + =A0 =A0 =A0 =A0 =A0 if ((events & POLLINIGNEOF) =3D=3D 0) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (rpipe->pipe_state & PIPE_= EOF) { > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 revents |=3D = (events & (POLLIN | POLLRDNORM)); > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (wpipe->pi= pe_present !=3D PIPE_ACTIVE || > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (wpip= e->pipe_state & PIPE_EOF)) > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 revents |=3D POLLHUP; > % + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > > This is old code, reindented. =A0It was not needed, since it used to > just check for the POLLINIGNEOF mistake in the user events. =A0Now it > is needed to give the modified (POLLINIGNEOF) semantics from the kernel > flag for fifos. =A0It is much uglier than the corresponding code in the > old fifo_poll_f(). =A0That begins with putting the relevant user events > in levents. =A0So that it doesn't have to repeat the long mask expression= s. > Well, that's about the limits of the cleanups. =A0Something like the > above is still needed to give the semantics change. > > The socket layer still has code that corresponds exactly to the above. > It is now not needed, since it now only supports the POLLINIGNEOF > mistake in the user events. =A0One copy of this code is bad enough. > It seems to me that your concerns aren't related to the patch. I'll try to address them when the patch will be into the tree, -- Gianni From owner-freebsd-arch@FreeBSD.ORG Wed Jan 11 01:18:49 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A429106566C for ; Wed, 11 Jan 2012 01:18:49 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 25D778FC14 for ; Wed, 11 Jan 2012 01:18:48 +0000 (UTC) Received: by vbbfr13 with SMTP id fr13so227541vbb.13 for ; Tue, 10 Jan 2012 17:18:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=h2J6rp3nkoK2PzDZ8wPV43mvGckrfs2ZqIL+xPekxGE=; b=fNPEGB3pJ+9sSnBF4oCOcN8rwdpOp5lXXTpe5aHp7DJRaJWb7IQ4E3N2PZB6n04e3q IsQzgmyn+Am/7J9SB8HX1FVpLtCZq+c5rqW85qDL+dLPT77in3pxo0CZEX0/fY3CTBpZ 3u6vGniTxmdOrBNPzE40oBchcf0t0OxWyUFUE= MIME-Version: 1.0 Received: by 10.52.173.80 with SMTP id bi16mr10394175vdc.115.1326244728426; Tue, 10 Jan 2012 17:18:48 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.36.5 with HTTP; Tue, 10 Jan 2012 17:18:48 -0800 (PST) In-Reply-To: <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> References: <8D025847-4BE4-4B2C-87D7-97E72CC9D325@lassitu.de> <20120104215930.GM90831@alchemy.franken.de> <47ABA638-7E08-4350-A03C-3D4A23BF2D7E@lassitu.de> <1763C3FF-1EA0-4DC0-891D-63816EBF4A04@lassitu.de> <20120106182756.GA88161@alchemy.franken.de> <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> <20120108130039.GG88161@alchemy.franken.de> <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> Date: Tue, 10 Jan 2012 17:18:48 -0800 X-Google-Sender-Auth: Y6qkW-XRSBBsNLiTNcNz9n1buyM Message-ID: From: Adrian Chadd To: Stefan Bethke Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-arch@freebsd.org, Marius Strobl Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 01:18:49 -0000 On 8 January 2012 14:27, Stefan Bethke wrote: >> Okay, this is the kind of information I was looking for as coupling >> devices with newbus that have no close relation in the hierarchy is >> tedious. However, when not using newbus the question arises how do >> you intend to associate the device_t of say arge0 with the mdiobus0 >> hanging off somewhere beneath iicbus0? > > In my experimental tree, I've hacked together a small function that parse= s a string for a devclass name and unit number, and looks that up. > > I'm also trying a number of other approaches; mainly I'm trying to unders= tand how newbus works, and what kind of driver I want at the various points= , ideally auto-attached, or configured by hints, instead of by custom code.= =A0I think I'll need another couple of days to get a good enough understan= ding of drivers, devclasses and their tree, and the device tree. Hi guys, Has there been any further traction on this? I'd like to try and figure out a way to get all this switchphy stuff into -HEAD as soon as possible. Thanks, Adrian From owner-freebsd-arch@FreeBSD.ORG Wed Jan 11 19:37:40 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AD569106564A for ; Wed, 11 Jan 2012 19:37:40 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 416F58FC0C for ; Wed, 11 Jan 2012 19:37:39 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id q0BJbcvb049329; Wed, 11 Jan 2012 20:37:38 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id q0BJbcmW049328; Wed, 11 Jan 2012 20:37:38 +0100 (CET) (envelope-from marius) Date: Wed, 11 Jan 2012 20:37:38 +0100 From: Marius Strobl To: Stefan Bethke Message-ID: <20120111193738.GB44286@alchemy.franken.de> References: <8D025847-4BE4-4B2C-87D7-97E72CC9D325@lassitu.de> <20120104215930.GM90831@alchemy.franken.de> <47ABA638-7E08-4350-A03C-3D4A23BF2D7E@lassitu.de> <1763C3FF-1EA0-4DC0-891D-63816EBF4A04@lassitu.de> <20120106182756.GA88161@alchemy.franken.de> <95372FB3-406F-46C2-8684-4FDB672D9FCF@lassitu.de> <20120106214741.GB88161@alchemy.franken.de> <20120108130039.GG88161@alchemy.franken.de> <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <23477898-8D85-498C-8E30-192810BD68A8@lassitu.de> User-Agent: Mutt/1.4.2.3i Cc: freebsd-arch@freebsd.org Subject: Re: Extending sys/dev/mii X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2012 19:37:40 -0000 On Sun, Jan 08, 2012 at 11:27:25PM +0100, Stefan Bethke wrote: > Am 08.01.2012 um 14:00 schrieb Marius Strobl: > > > Okay, this is the kind of information I was looking for as coupling > > devices with newbus that have no close relation in the hierarchy is > > tedious. However, when not using newbus the question arises how do > > you intend to associate the device_t of say arge0 with the mdiobus0 > > hanging off somewhere beneath iicbus0? > > In my experimental tree, I've hacked together a small function that parses a string for a devclass name and unit number, and looks that up. > > I'm also trying a number of other approaches; mainly I'm trying to understand how newbus works, and what kind of driver I want at the various points, ideally auto-attached, or configured by hints, instead of by custom code. I think I'll need another couple of days to get a good enough understanding of drivers, devclasses and their tree, and the device tree. > Okay, I suggest to postpone this discussion until then. For the scenario when mdiobus is the parent of miibus I see no technical need to change miibus to support what you want to do, just implement the miibus_if in mdiobus and redirect it to the device_t of the MAC there. Moreover, that way the hack to sidestep newbus is contained in the layer that actually needs it and not scattered over multiple frameworks. Marius From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 01:33:43 2012 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B864E106566B; Thu, 12 Jan 2012 01:33:43 +0000 (UTC) (envelope-from jmg@h2.funkthat.com) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) by mx1.freebsd.org (Postfix) with ESMTP id 8A7178FC13; Thu, 12 Jan 2012 01:33:43 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id q0C1D24O064770 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 11 Jan 2012 17:13:02 -0800 (PST) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id q0C1D2mK064764; Wed, 11 Jan 2012 17:13:02 -0800 (PST) (envelope-from jmg) Date: Wed, 11 Jan 2012 17:13:02 -0800 From: John-Mark Gurney To: Don Lewis Message-ID: <20120112011301.GI52468@funkthat.com> Mail-Followup-To: Don Lewis , nvass@gmx.com, alfred@freebsd.org, arch@freebsd.org, adrian@freebsd.org, dougb@freebsd.org References: <4F0AB63D.2040503@gmx.com> <201201091009.q09A9NQb025487@gw.catspoiler.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201201091009.q09A9NQb025487@gw.catspoiler.org> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Wed, 11 Jan 2012 17:13:03 -0800 (PST) Cc: arch@freebsd.org, adrian@freebsd.org, nvass@gmx.com, alfred@freebsd.org, dougb@freebsd.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 01:33:43 -0000 Don Lewis wrote this message on Mon, Jan 09, 2012 at 02:09 -0800: > On 9 Jan, Nikos Vassiliadis wrote: > > On 1/9/2012 11:25 AM, Doug Barton wrote: > >> Actually I'm fairly confident that we write dumps backwards from the end > >> of the swap partition. It's done that way on purpose in case fsck'ing > >> causes the system to swap, it may still be possible to save the dump. > > > > So, dumping core is safe, but not sharing the swap area... > > It would be nice to be able to do that. > > According to the mkswap(8) man page (which hasn't been updated > since 2.2 even though the machine is running a 2.6 kernel) on a nearby > Linux machine, the metadata stored in the first page of the swap > partition. It looks like we could safely coexist if we skipped the first > page of the partition. Otherwise Linux will want mkswap to be run on the > partition before it will swap to the partition. Don't we already skip the first 8k of the swap partition because back in the day when bsdlabel's partition sector 0 was the same as the slice sector 0, and so if any FS or swap wrote to the first 8k, it would overwrite the bsdlabel? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 01:36:31 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BC30106566B; Thu, 12 Jan 2012 01:36:31 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 6DFE48FC16; Thu, 12 Jan 2012 01:36:31 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q0C1aI7i047108; Wed, 11 Jan 2012 17:36:22 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201201120136.q0C1aI7i047108@gw.catspoiler.org> Date: Wed, 11 Jan 2012 17:36:18 -0800 (PST) From: Don Lewis To: jmg@funkthat.com In-Reply-To: <20120112011301.GI52468@funkthat.com> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: alfred@FreeBSD.org, arch@FreeBSD.org, adrian@FreeBSD.org, nvass@gmx.com, dougb@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 01:36:31 -0000 On 11 Jan, John-Mark Gurney wrote: > Don Lewis wrote this message on Mon, Jan 09, 2012 at 02:09 -0800: >> On 9 Jan, Nikos Vassiliadis wrote: >> > On 1/9/2012 11:25 AM, Doug Barton wrote: >> >> Actually I'm fairly confident that we write dumps backwards from the end >> >> of the swap partition. It's done that way on purpose in case fsck'ing >> >> causes the system to swap, it may still be possible to save the dump. >> > >> > So, dumping core is safe, but not sharing the swap area... >> > It would be nice to be able to do that. >> >> According to the mkswap(8) man page (which hasn't been updated >> since 2.2 even though the machine is running a 2.6 kernel) on a nearby >> Linux machine, the metadata stored in the first page of the swap >> partition. It looks like we could safely coexist if we skipped the first >> page of the partition. Otherwise Linux will want mkswap to be run on the >> partition before it will swap to the partition. > > Don't we already skip the first 8k of the swap partition because back > in the day when bsdlabel's partition sector 0 was the same as the > slice sector 0, and so if any FS or swap wrote to the first 8k, it > would overwrite the bsdlabel? Yes, I mentioned this in a later message on this thread. From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 08:55:46 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 733871065670; Thu, 12 Jan 2012 08:55:46 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id C3FFE8FC0C; Thu, 12 Jan 2012 08:55:45 +0000 (UTC) Received: by qcse13 with SMTP id e13so1277302qcs.13 for ; Thu, 12 Jan 2012 00:55:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=n+OQo2VNsIkQy/CcF9k3a/VosCP5iFbu6Uhh5fA+6NE=; b=FsC4phZxbjPthv3RkeSKW3hCDqvTmuU8RBrzxS81QNOKbomTC3x9YnQfRCYAS+J9p1 W0ARv+lO2izWdeCpTTGQfdy4V/r4APmRP6fbYT7XXmpYZEVObxiLG32jT6qEKKtEOrbu OyptcPza5M1JIl/afo6C8C9YwTlsE+croLqpI= MIME-Version: 1.0 Received: by 10.224.186.130 with SMTP id cs2mr4189855qab.82.1326358544906; Thu, 12 Jan 2012 00:55:44 -0800 (PST) Sender: giovanni.trematerra@gmail.com Received: by 10.229.237.130 with HTTP; Thu, 12 Jan 2012 00:55:44 -0800 (PST) In-Reply-To: References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> Date: Thu, 12 Jan 2012 09:55:44 +0100 X-Google-Sender-Auth: xmwzNoA13NMrqV68bBGV1Pe7vLw Message-ID: From: Giovanni Trematerra To: Bruce Evans Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: jilles@freebsd.org, Attilio Rao , flo@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 08:55:46 -0000 On Wed, Jan 11, 2012 at 12:04 AM, Giovanni Trematerra wrote: > On Tue, Jan 10, 2012 at 10:41 AM, Bruce Evans wrot= e: >> On Mon, 9 Jan 2012, Giovanni Trematerra wrote: >> >>> On Mon, Jan 9, 2012 at 3:34 PM, Bruce Evans wrot= e: >>>> >>>> [skip] > >> >> In this file, I have most experience fixing this function (and open >> and close so that select and poll work). =A0The above looks simple, but >> has a complex interaction with layers above and below it. =A0Most of the >> details are in the socket layer. =A0You had to reimplement these in the >> pipe layer. =A0The most delicate point involving fs_wgen seems to be >> reimplemented correctly in fifo_iseof(). =A0Before I fixed this for >> fifos, poll and select on pipes (especially for EOF) was less broken >> than for fifos, partly because pipes are simpler -- they can't be >> reopened. =A0My tests in /usr/src/tools/regression/poll/ are hopefully >> enough to detect any regressions. =A0Some of the tests are intentionally >> left broken and/or expected to fail, to be bug for bug compatible with >> old kernel bugs. >> > > ok. I'll try that regression test > Hi Bruce, thanks again to point me out those regression tests I missed in first place= . I ran those tests and results were identical with patched and non patched kernel. So at least in that regard the patch doesn't introduce more regressions. There are some tests that fail. If you and others think it's worth to fix them I can take this on. -- Gianni 10.0-CURRENT PATCHED [gianni@devbox: poll]% ./pipepoll 1..20 ok 1 Pipe state 4: expected 0; got 0 ok 2 Pipe state 5: expected POLLIN; got POLLIN ok 3 Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP not ok 4 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP ok 5 Sock state 4: expected 0; got 0 ok 6 Sock state 5: expected POLLIN; got POLLIN ok 7 Sock state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP not ok 8 Sock state 6a: expected POLLHUP; got POLLIN | POLLHUP ok 9 FIFO state 0: expected 0; got 0 ok 10 FIFO state 1: expected 0; got 0 ok 11 FIFO state 2: expected POLLIN; got POLLIN ok 12 FIFO state 2a: expected 0; got 0 not ok 13 FIFO state 3: expected POLLHUP; got POLLIN | POLLHUP ok 14 FIFO state 4: expected 0; got 0 ok 15 FIFO state 5: expected POLLIN; got POLLIN ok 16 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP not ok 17 FIFO state 6a: expected POLLHUP; got POLLIN | POLLHUP not ok 18 FIFO state 6b: poll result 0 expected 1. expected POLLHUP; got 0 not ok 19 FIFO state 6c: expected POLLHUP; got POLLIN | POLLHUP not ok 20 FIFO state 6d: expected POLLHUP; got POLLIN | POLLHUP [gianni@devbox: poll]% ./pipeselect 1..20 ok 1 Pipe state 4: expected clear; got clear ok 2 Pipe state 5: expected set; got set ok 3 Pipe state 6: expected set; got set ok 4 Pipe state 6a: expected set; got set ok 5 Sock state 4: expected clear; got clear ok 6 Sock state 5: expected set; got set ok 7 Sock state 6: expected set; got set ok 8 Sock state 6a: expected set; got set not ok 9 FIFO state 0: expected set; got clear ok 10 FIFO state 1: expected clear; got clear ok 11 FIFO state 2: expected set; got set ok 12 FIFO state 2a: expected clear; got clear ok 13 FIFO state 3: expected set; got set ok 14 FIFO state 4: expected clear; got clear ok 15 FIFO state 5: expected set; got set ok 16 FIFO state 6: expected set; got set ok 17 FIFO state 6a: expected set; got set not ok 18 FIFO state 6b: expected set; got clear ok 19 FIFO state 6c: expected set; got set ok 20 FIFO state 6d: expected set; got set STOCK KERNEL 10.0-CURRENT [gianni@devbox: poll]% ./pipepoll 1..20 ok 1 Pipe state 4: expected 0; got 0 ok 2 Pipe state 5: expected POLLIN; got POLLIN ok 3 Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP not ok 4 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP ok 5 Sock state 4: expected 0; got 0 ok 6 Sock state 5: expected POLLIN; got POLLIN ok 7 Sock state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP not ok 8 Sock state 6a: expected POLLHUP; got POLLIN | POLLHUP ok 9 FIFO state 0: expected 0; got 0 ok 10 FIFO state 1: expected 0; got 0 ok 11 FIFO state 2: expected POLLIN; got POLLIN ok 12 FIFO state 2a: expected 0; got 0 not ok 13 FIFO state 3: expected POLLHUP; got POLLIN | POLLHUP ok 14 FIFO state 4: expected 0; got 0 ok 15 FIFO state 5: expected POLLIN; got POLLIN ok 16 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP not ok 17 FIFO state 6a: expected POLLHUP; got POLLIN | POLLHUP not ok 18 FIFO state 6b: poll result 0 expected 1. expected POLLHUP; got 0 not ok 19 FIFO state 6c: expected POLLHUP; got POLLIN | POLLHUP not ok 20 FIFO state 6d: expected POLLHUP; got POLLIN | POLLHUP [gianni@devbox: poll]% ./pipeselect 1..20 ok 1 Pipe state 4: expected clear; got clear ok 2 Pipe state 5: expected set; got set ok 3 Pipe state 6: expected set; got set ok 4 Pipe state 6a: expected set; got set ok 5 Sock state 4: expected clear; got clear ok 6 Sock state 5: expected set; got set ok 7 Sock state 6: expected set; got set ok 8 Sock state 6a: expected set; got set not ok 9 FIFO state 0: expected set; got clear ok 10 FIFO state 1: expected clear; got clear ok 11 FIFO state 2: expected set; got set ok 12 FIFO state 2a: expected clear; got clear ok 13 FIFO state 3: expected set; got set ok 14 FIFO state 4: expected clear; got clear ok 15 FIFO state 5: expected set; got set ok 16 FIFO state 6: expected set; got set ok 17 FIFO state 6a: expected set; got set not ok 18 FIFO state 6b: expected set; got clear ok 19 FIFO state 6c: expected set; got set ok 20 FIFO state 6d: expected set; got set From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 12:52:04 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F9F7106566B; Thu, 12 Jan 2012 12:52:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail02.syd.optusnet.com.au (mail02.syd.optusnet.com.au [211.29.132.183]) by mx1.freebsd.org (Postfix) with ESMTP id 183328FC0C; Thu, 12 Jan 2012 12:52:03 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail02.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0CCq0Ju011672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jan 2012 23:52:01 +1100 Date: Thu, 12 Jan 2012 23:52:00 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Giovanni Trematerra In-Reply-To: Message-ID: <20120112221422.V1340@besplex.bde.org> References: <20120110005155.S2378@besplex.bde.org> <20120110153807.H943@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: flo@freebsd.org, Attilio Rao , Konstantin Belousov , freebsd-arch@freebsd.org, jilles@freebsd.org Subject: Re: pipe/fifo code merged. X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 12:52:04 -0000 On Thu, 12 Jan 2012, Giovanni Trematerra wrote: > thanks again to point me out those regression tests I missed in first place. > I ran those tests and results were identical with patched and non > patched kernel. > So at least in that regard the patch doesn't introduce more regressions. > There are some tests that fail. If you and others think it's worth to > fix them I can > take this on. Matching the old behaviour is good enough for now. > ... > STOCK KERNEL 10.0-CURRENT > > [gianni@devbox: poll]% ./pipepoll > 1..20 > ok 1 Pipe state 4: expected 0; got 0 > ok 2 Pipe state 5: expected POLLIN; got POLLIN > ok 3 Pipe state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP > not ok 4 Pipe state 6a: expected POLLHUP; got POLLIN | POLLHUP > ok 5 Sock state 4: expected 0; got 0 > ok 6 Sock state 5: expected POLLIN; got POLLIN > ok 7 Sock state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP > not ok 8 Sock state 6a: expected POLLHUP; got POLLIN | POLLHUP > ok 9 FIFO state 0: expected 0; got 0 > ok 10 FIFO state 1: expected 0; got 0 > ok 11 FIFO state 2: expected POLLIN; got POLLIN > ok 12 FIFO state 2a: expected 0; got 0 > not ok 13 FIFO state 3: expected POLLHUP; got POLLIN | POLLHUP > ok 14 FIFO state 4: expected 0; got 0 > ok 15 FIFO state 5: expected POLLIN; got POLLIN > ok 16 FIFO state 6: expected POLLIN | POLLHUP; got POLLIN | POLLHUP > not ok 17 FIFO state 6a: expected POLLHUP; got POLLIN | POLLHUP Returning spurious POLLIN with POLLHUP (when there is no input available) is not too serious. It means that applications can't trust POLLIN alone. They must also check POLLHUP (which they should check anyway) and then if both are set they are reduced to read()ing the file to see if there is any input, much like they have to do if they use select() instead of poll() (since select() cannot distiginguish these conditions). Applications that have been converted from poll() to select() should still do the right thing by still using read(). However, gdb was broken by this conversion: echo 'p 0' | gdb /bin/cat This prints "Hangup detected on fd 0\n", then "error detected on stdin", then exits with status 0. It never sees its input. It should do the same thing as for normal input of 'p 0\n' -- that is see the input and execute it, then see the EOF and not complain about it (twice), then exit with status 0. This is because it thinks that POLLHUP implies that there is no more input. It doesn't even check POLLIN when it sees POLLHUP. If it checked, then it would have to worry about spurious POLLIN, but here the problem is the opposite -- there is non-spurious POLLIN, and non-spurions POLLHUP. POLLHUP must be acted on immediately when the kernel detected it, to unblock poll() or select(), even when there is unread input, since otherwise any unread input would leave the poll blocked forever. > not ok 18 FIFO state 6b: poll result 0 expected 1. expected POLLHUP; got 0 This one is more interesting and delicate. Linux fails in the same way as FreeBSD here. 6b through 6d test that if there is a hackup condition, then new and old readers all see it.... > not ok 19 FIFO state 6c: expected POLLHUP; got POLLIN | POLLHUP > not ok 20 FIFO state 6d: expected POLLHUP; got POLLIN | POLLHUP Back to spurious POLLIN. ... 6b tests a new reader and fails because the new reader doesn't see the hangup condition. 6c tests that the old reader still sees the hangup condition after the open for the new reader has (possibly and erroneously) cleared it. This passes. 6d tests that the old reader still sees the hangup condition after the new reader has gone away. This passes too. Note that it is only possibly to get a new reader by opening with O_NONBLOCK. Otherwise, the open for the new reader blocks so the difference between never having had a "connection" and having a hangup condition for a previous connection cannot be seen. Some users want the new open to not see the hangup so that poll on it blocks waiting for a new connection. I want it to see the hangup condition so that the condition only depends on the "file" state and not on the timing of the opens. Note that the hangup condition is cleared when the last reader that can see it goes away. Thus there is only a difference in unusual cases. There are 2 main types of unusual cases: - when there are races between herds of readers and writers. Now it seems best to not have the hangup condition depend so much on the timing. - when some readers intentionally don't go away on hangup, but wait for a new connection. Such waiting is difficult either way. Sticky hangup prevents blocking in poll on the old fd's. Now it seems best for new readers to not see the old hangups. My preferred behviour prevents this when the old readers don't go away. They might stay either because you want them to, or because you you can't control them and their owner wants them to. But very complicated setups that intentionally go near the unusual cases probably need external synchronization to avoid races and access control to prevent uncontrolled accesses changing the fifo state and eating the i/o. It is not just hangup that involves device state. > [gianni@devbox: poll]% ./pipeselect > 1..20 > ok 1 Pipe state 4: expected clear; got clear > ok 2 Pipe state 5: expected set; got set > ok 3 Pipe state 6: expected set; got set > ok 4 Pipe state 6a: expected set; got set > ok 5 Sock state 4: expected clear; got clear > ok 6 Sock state 5: expected set; got set > ok 7 Sock state 6: expected set; got set > ok 8 Sock state 6a: expected set; got set > not ok 9 FIFO state 0: expected set; got clear Everything except #9 for select() gives the expected results in my version. State 0 is just the state after initial open in O_RDONLY mode. See the large comment about this in pipeselect.c (it says that select() must see POLLIN although poll() must not). Perhaps the comment is wrong or out of date. > ok 10 FIFO state 1: expected clear; got clear > ok 11 FIFO state 2: expected set; got set > ok 12 FIFO state 2a: expected clear; got clear > ok 13 FIFO state 3: expected set; got set > ok 14 FIFO state 4: expected clear; got clear > ok 15 FIFO state 5: expected set; got set > ok 16 FIFO state 6: expected set; got set > ok 17 FIFO state 6a: expected set; got set > not ok 18 FIFO state 6b: expected set; got clear > ok 19 FIFO state 6c: expected set; got set > ok 20 FIFO state 6d: expected set; got set select() passes all except #9 and #18 because it can't distinguish the spurious POLLIN from POLLHUP. I added some tests, mainly for POLLOUT. Unfortunately, this patch won't apply cleanly, because -current has some changes that I haven't merged. I forget if the kernel needs any changes to pass these. Not many anyway. Output tests and the changes in -current are missing for pipeselect.c. % Index: pipepoll.c % =================================================================== % RCS file: /home/ncvs/src/tools/regression/poll/pipepoll.c,v % retrieving revision 1.1 % diff -u -2 -r1.1 pipepoll.c % --- pipepoll.c 12 Jul 2009 12:50:43 -0000 1.1 % +++ pipepoll.c 25 Aug 2009 13:58:28 -0000 % @@ -30,4 +30,7 @@ % result = "POLLIN"; % break; % + case POLLOUT: % + result = "POLLOUT"; % + break; % case POLLHUP: % result = "POLLHUP"; % @@ -36,4 +39,13 @@ % result = "POLLIN | POLLHUP"; % break; % + case POLLIN | POLLOUT: % + result = "POLLIN | POLLOUT"; % + break; % + case POLLIN | POLLOUT | POLLHUP: % + result = "POLLIN | POLLOUT | POLLHUP"; % + break; % + case POLLOUT | POLLHUP: % + result = "POLLOUT | POLLHUP"; % + break; % default: % asprintf(&ncresult, "%#x", events); % @@ -81,10 +93,10 @@ % } % pfd.fd = fd; % - pfd.events = POLLIN; % + pfd.events = POLLIN | POLLOUT; % % if (filetype == FT_FIFO) { % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % - report(num++, "0", 0, pfd.revents); % + report(num++, "0", POLLOUT, pfd.revents); % } % kill(ppid, SIGUSR1); Note that this is buggy. The fifo is still opened O_RDONLY, so in -current poll() returns 0, though it should probably return POLLERR (see previous mail). IIRC, this test for POLLOUT passes under Linux. This verifies that Linux doesn't check the open mode for output. The tests could be expanded to check intentionally that silly combinations of flags and open modes don't work. Except for the above, they apparently avoid the silly combinations, else the change to check the open mode would have caused more failures. % @@ -104,5 +116,5 @@ % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % - report(num++, "1", 0, pfd.revents); % + report(num++, "1", POLLOUT, pfd.revents); % kill(ppid, SIGUSR1); % % @@ -112,10 +124,10 @@ % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % - report(num++, "2", POLLIN, pfd.revents); % + report(num++, "2", POLLIN | POLLOUT, pfd.revents); % if (read(fd, buf, sizeof buf) != 1) % err(1, "read"); % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % - report(num++, "2a", 0, pfd.revents); % + report(num++, "2a", POLLOUT, pfd.revents); % kill(ppid, SIGUSR1); % % @@ -140,5 +152,5 @@ % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % - report(num++, "4", 0, pfd.revents); % + report(num++, "4", POLLOUT, pfd.revents); % kill(ppid, SIGUSR1); % % @@ -148,5 +160,5 @@ % if (poll(&pfd, 1, 0) < 0) % err(1, "poll"); % - report(num++, "5", POLLIN, pfd.revents); % + report(num++, "5", POLLIN | POLLOUT, pfd.revents); % kill(ppid, SIGUSR1); % % @@ -193,7 +205,16 @@ % report(num++, "6d", POLLHUP, pfd.revents); % } % - close(fd); % kill(ppid, SIGUSR1); % % + if (filetype != FT_FIFO) % + close (fd); % + else { % + usleep(1); % + while (state != 7) % + ; % + close(fd); % + kill(ppid, SIGUSR1); % + } % + % exit(0); % } % @@ -202,4 +223,6 @@ % parent(int fd) % { % + struct pollfd pfd; % + % usleep(1); % while (state != 1) % @@ -210,4 +233,10 @@ % err(1, "open for write"); % } % + pfd.fd = fd; % + pfd.events = POLLIN | POLLOUT; % + % + if (poll(&pfd, 1, 0) < 0) % + err(1, "poll"); % + report(-1, "1p", POLLOUT, pfd.revents); % kill(cpid, SIGUSR1); % % @@ -253,4 +282,19 @@ % while (state != 7) % ; % + fd = open(FIFONAME, O_WRONLY | O_NONBLOCK); % + if (fd < 0) % + err(1, "open for write"); % + pfd.fd = fd; % + if (poll(&pfd, 1, 0) < 0) % + err(1, "poll"); % + report(-1, "7", POLLOUT, pfd.revents); % + kill(cpid, SIGUSR1); % + % + usleep(1); % + while (state != 8) % + ; % + if (poll(&pfd, 1, 0) < 0) % + err(1, "poll"); % + report(-1, "8", POLLHUP, pfd.revents); % } % Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 21:17:27 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B1A231065678; Thu, 12 Jan 2012 21:17:27 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 332608FC12; Thu, 12 Jan 2012 21:17:26 +0000 (UTC) Received: by ggki1 with SMTP id i1so1684841ggk.13 for ; Thu, 12 Jan 2012 13:17:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=L8sCeyGwQhnfQn5/Hb9RQm+YGZKkARq5zBnEhGMSXQQ=; b=TBku7DnQmJ3j8MveNCKytvW3I1DjDPHMxprkwgUNWuBGEjvC4ZN9oSjqKFvV1ZfJne 6fBIdl18PH44x+ACP+wyx0FFzoOoKP3CeNgp67PxpdG7Dh5yxJb3N4DoPI+OgyhgWIPz 6hlSUBVCGLV5EUcTcfsAt4lKLW59FckgTg4K8= MIME-Version: 1.0 Received: by 10.50.77.195 with SMTP id u3mr2172199igw.29.1326403046295; Thu, 12 Jan 2012 13:17:26 -0800 (PST) Sender: to.my.trociny@gmail.com Received: by 10.231.143.141 with HTTP; Thu, 12 Jan 2012 13:17:26 -0800 (PST) In-Reply-To: <86fwfnti5t.fsf@kopusha.home.net> References: <86sjjobzmn.fsf@kopusha.home.net> <86fwfnti5t.fsf@kopusha.home.net> Date: Thu, 12 Jan 2012 23:17:26 +0200 X-Google-Sender-Auth: uEvabdR2S0e_8uRHgjuMHFX3zDk Message-ID: From: Mikolaj Golub To: "Robert N. M. Watson" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Kostik Belousov , freebsd-arch@freebsd.org Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 21:17:27 -0000 On Tue, Jan 10, 2012 at 10:30 PM, Mikolaj Golub wrote= : > > On Tue, 10 Jan 2012 14:02:34 +0000 Robert N. M. Watson wrote: > > =A0RNMW> (1) I don't think the new behaviour should be optional -- it was= always > =A0RNMW> the intent that nullfs pass through all behaviours to the underl= ying > =A0RNMW> layer, it's just that certain edge cases didn't appear in the or= iginal > =A0RNMW> implementation. Memory mapping was fixed a few years ago using s= imilar > =A0RNMW> techniques. This will significantly reduce the complexity of you= r > =A0RNMW> patch, and also avoid user confusion since it will now behave "a= s > =A0RNMW> expected". Certainly, mention in future release notes would be > =A0RNMW> appropriate, however. > > I don't mind having only the new behavior, as I can't imagine where I wou= ld > need a nullfs with nosobypass option mounted and I also like when things = are > simple :-). > > On the other hand there might be people who relied on the old behavior an= d who > would be surprised if it had changed. > > So, if other people agree I will remove the old behaviour to make the pat= ch > simpler. Another option would be to have sobypass by default with possibi= lity > to (re)mount fs with nosobypass. > If we agree to have only the new behavior then nullfs won't need modificati= on at all, it will work as expected automatically. The patch could be (with up= dated locking for the connect case): http://people.freebsd.org/~trociny/VOP_UNP.1.patch From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 21:29:43 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 731F11065670; Thu, 12 Jan 2012 21:29:43 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id E95EE8FC1B; Thu, 12 Jan 2012 21:29:41 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q0CLTb76062112 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jan 2012 23:29:37 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q0CLTb0G087012; Thu, 12 Jan 2012 23:29:37 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q0CLTaRV087011; Thu, 12 Jan 2012 23:29:36 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 12 Jan 2012 23:29:36 +0200 From: Kostik Belousov To: Mikolaj Golub Message-ID: <20120112212936.GB31224@deviant.kiev.zoral.com.ua> References: <86sjjobzmn.fsf@kopusha.home.net> <86fwfnti5t.fsf@kopusha.home.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="w0Yn8slUAuN8hGpX" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: "Robert N. M. Watson" , freebsd-arch@freebsd.org Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 21:29:43 -0000 --w0Yn8slUAuN8hGpX Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 12, 2012 at 11:17:26PM +0200, Mikolaj Golub wrote: > On Tue, Jan 10, 2012 at 10:30 PM, Mikolaj Golub wro= te: > > > > On Tue, 10 Jan 2012 14:02:34 +0000 Robert N. M. Watson wrote: > > > > =9ARNMW> (1) I don't think the new behaviour should be optional -- it w= as always > > =9ARNMW> the intent that nullfs pass through all behaviours to the unde= rlying > > =9ARNMW> layer, it's just that certain edge cases didn't appear in the = original > > =9ARNMW> implementation. Memory mapping was fixed a few years ago using= similar > > =9ARNMW> techniques. This will significantly reduce the complexity of y= our > > =9ARNMW> patch, and also avoid user confusion since it will now behave = "as > > =9ARNMW> expected". Certainly, mention in future release notes would be > > =9ARNMW> appropriate, however. > > > > I don't mind having only the new behavior, as I can't imagine where I w= ould > > need a nullfs with nosobypass option mounted and I also like when thing= s are > > simple :-). > > > > On the other hand there might be people who relied on the old behavior = and who > > would be surprised if it had changed. > > > > So, if other people agree I will remove the old behaviour to make the p= atch > > simpler. Another option would be to have sobypass by default with possi= bility > > to (re)mount fs with nosobypass. > > >=20 > If we agree to have only the new behavior then nullfs won't need modifica= tion > at all, it will work as expected automatically. The patch could be (with = updated > locking for the connect case): >=20 > http://people.freebsd.org/~trociny/VOP_UNP.1.patch I suggest to split the exclusive->shared locking change into separate patch, to be committed either before or after the VOP_UNP (better to do it before to not change the interface of VOP). You do not need local variable vp in the default implementations of vops at all, use ap->a_vp directly. --w0Yn8slUAuN8hGpX Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8PUMAACgkQC3+MBN1Mb4jA3wCgwDHY51VQBqrpGtLS27vVgq0i E6QAmwZ73vFa1/UmLcExvGm6Fo5A2TGk =ok8J -----END PGP SIGNATURE----- --w0Yn8slUAuN8hGpX-- From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 21:39:57 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 27B9A106564A; Thu, 12 Jan 2012 21:39:57 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0011A8FC0C; Thu, 12 Jan 2012 21:39:56 +0000 (UTC) Received: from [192.168.2.105] (host86-161-238-124.range86-161.btcentralplus.com [86.161.238.124]) by cyrus.watson.org (Postfix) with ESMTPSA id B0CD346B2E; Thu, 12 Jan 2012 16:39:55 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v1251.1) Content-Type: text/plain; charset=iso-8859-1 From: "Robert N. M. Watson" In-Reply-To: Date: Thu, 12 Jan 2012 21:39:53 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <86sjjobzmn.fsf@kopusha.home.net> <86fwfnti5t.fsf@kopusha.home.net> To: Mikolaj Golub X-Mailer: Apple Mail (2.1251.1) Cc: Kostik Belousov , freebsd-arch@freebsd.org Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 21:39:57 -0000 On 12 Jan 2012, at 21:17, Mikolaj Golub wrote: > If we agree to have only the new behavior then nullfs won't need = modification > at all, it will work as expected automatically. The patch could be = (with updated > locking for the connect case): >=20 > http://people.freebsd.org/~trociny/VOP_UNP.1.patch Greatly simplified. > --- sys/kern/uipc_usrreq.c (revision 229979) > +++ sys/kern/uipc_usrreq.c (working copy) > @@ -542,7 +542,7 @@ > =20 > UNP_LINK_WLOCK(); > UNP_PCB_LOCK(unp); > - vp->v_socket =3D unp->unp_socket; > + VOP_UNPBIND(vp, unp->unp_socket); > unp->unp_vnode =3D vp; > unp->unp_addr =3D soun; > unp->unp_flags &=3D ~UNP_BINDING; I still find myself worried by the fact that unp->unp_vnode points at = the nullfs vnode rather than the underlying vnode, but haven't yet = managed to identify any actual bugs that would result. I'll continue = pondering it over the weekend :-). Robert= From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 21:51:11 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 51C9D1065677; Thu, 12 Jan 2012 21:51:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id BC08D8FC13; Thu, 12 Jan 2012 21:51:10 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q0CLp7Ak065365 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jan 2012 23:51:07 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q0CLp6Fs087118; Thu, 12 Jan 2012 23:51:06 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q0CLp6Wu087117; Thu, 12 Jan 2012 23:51:06 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 12 Jan 2012 23:51:06 +0200 From: Kostik Belousov To: "Robert N. M. Watson" Message-ID: <20120112215106.GC31224@deviant.kiev.zoral.com.ua> References: <86sjjobzmn.fsf@kopusha.home.net> <86fwfnti5t.fsf@kopusha.home.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hS91mLTIjizZlFCb" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Mikolaj Golub , freebsd-arch@freebsd.org Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 21:51:11 -0000 --hS91mLTIjizZlFCb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jan 12, 2012 at 09:39:53PM +0000, Robert N. M. Watson wrote: >=20 > On 12 Jan 2012, at 21:17, Mikolaj Golub wrote: >=20 > > If we agree to have only the new behavior then nullfs won't need modifi= cation > > at all, it will work as expected automatically. The patch could be (wit= h updated > > locking for the connect case): > >=20 > > http://people.freebsd.org/~trociny/VOP_UNP.1.patch >=20 > Greatly simplified. >=20 > > --- sys/kern/uipc_usrreq.c (revision 229979) > > +++ sys/kern/uipc_usrreq.c (working copy) > > @@ -542,7 +542,7 @@ > > =20 > > UNP_LINK_WLOCK(); > > UNP_PCB_LOCK(unp); > > - vp->v_socket =3D unp->unp_socket; > > + VOP_UNPBIND(vp, unp->unp_socket); > > unp->unp_vnode =3D vp; > > unp->unp_addr =3D soun; > > unp->unp_flags &=3D ~UNP_BINDING; >=20 >=20 > I still find myself worried by the fact that unp->unp_vnode points at the= nullfs vnode rather than the underlying vnode, but haven't yet managed to = identify any actual bugs that would result. I'll continue pondering it over= the weekend :-). I think I know what could go wrong there, but due to other bug, this wrongness cannot be realized now. Issue is that for the forced unmount, the unp_vnode is reclaimed, so that the unix domain sockets code references freed memory after reclaim. Probably, some helper should provided by uipc_usrreq, called from VOP_RECLA= IM() implementations for VSOCK types of vnodes. --hS91mLTIjizZlFCb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8PVcoACgkQC3+MBN1Mb4jTlACgr2ba8j+s+1oezEf3Azb44vo4 I2wAoN++39PDxWynxcWOH9bktOstdrTv =DDZw -----END PGP SIGNATURE----- --hS91mLTIjizZlFCb-- From owner-freebsd-arch@FreeBSD.ORG Thu Jan 12 21:58:42 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2AF8D1065677 for ; Thu, 12 Jan 2012 21:58:42 +0000 (UTC) (envelope-from kris@pcbsd.org) Received: from mail.iXsystems.com (newknight.ixsystems.com [206.40.55.70]) by mx1.freebsd.org (Postfix) with ESMTP id 0575A8FC1F for ; Thu, 12 Jan 2012 21:58:41 +0000 (UTC) Received: from mail.ixsystems.com (localhost [127.0.0.1]) by mail.iXsystems.com (Postfix) with ESMTP id C28945E7 for ; Thu, 12 Jan 2012 13:43:09 -0800 (PST) Received: from mail.iXsystems.com ([127.0.0.1]) by mail.ixsystems.com (mail.ixsystems.com [127.0.0.1]) (amavisd-maia, port 10024) with ESMTP id 17949-05 for ; Thu, 12 Jan 2012 13:43:09 -0800 (PST) Received: from [192.168.0.186] (75-130-56-30.static.kgpt.tn.charter.com [75.130.56.30]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.iXsystems.com (Postfix) with ESMTPSA id 2ACBE5E2 for ; Thu, 12 Jan 2012 13:43:09 -0800 (PST) Message-ID: <4F0F53EC.1020608@pcbsd.org> Date: Thu, 12 Jan 2012 16:43:08 -0500 From: Kris Moore User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111227 Thunderbird/9.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org References: <86sjjobzmn.fsf@kopusha.home.net> <86fwfnti5t.fsf@kopusha.home.net> In-Reply-To: X-Enigmail-Version: undefined Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: unix domain sockets on nullfs(5) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 21:58:42 -0000 On 01/12/2012 16:39, Robert N. M. Watson wrote: > On 12 Jan 2012, at 21:17, Mikolaj Golub wrote: > >> If we agree to have only the new behavior then nullfs won't need modification >> at all, it will work as expected automatically. The patch could be (with updated >> locking for the connect case): >> >> http://people.freebsd.org/~trociny/VOP_UNP.1.patch > Greatly simplified. > >> --- sys/kern/uipc_usrreq.c (revision 229979) >> +++ sys/kern/uipc_usrreq.c (working copy) >> @@ -542,7 +542,7 @@ >> >> UNP_LINK_WLOCK(); >> UNP_PCB_LOCK(unp); >> - vp->v_socket = unp->unp_socket; >> + VOP_UNPBIND(vp, unp->unp_socket); >> unp->unp_vnode = vp; >> unp->unp_addr = soun; >> unp->unp_flags &= ~UNP_BINDING; > > I still find myself worried by the fact that unp->unp_vnode points at the nullfs vnode rather than the underlying vnode, but haven't yet managed to identify any actual bugs that would result. I'll continue pondering it over the weekend :-). > > Robert_______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" FYI - Not sure if this helps, but we've been using this patch to expose sockets in the lower layer for 2+ years now, haven't run into any issues as of yet. http://trac.pcbsd.org/browser/pcbsd/current/build-files/src-patches/nullfs-patch -- Kris Moore PC-BSD Software iXsystems From owner-freebsd-arch@FreeBSD.ORG Sat Jan 14 08:46:25 2012 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 26B55106566C; Sat, 14 Jan 2012 08:46:25 +0000 (UTC) (envelope-from dougb@FreeBSD.org) Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id C0E0C15F342; Sat, 14 Jan 2012 08:46:23 +0000 (UTC) Message-ID: <4F1140DF.5070809@FreeBSD.org> Date: Sat, 14 Jan 2012 00:46:23 -0800 From: Doug Barton Organization: http://SupersetSolutions.com/ User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20111222 Thunderbird/9.0 MIME-Version: 1.0 To: Don Lewis References: <201201100017.q0A0HItk037943@gw.catspoiler.org> In-Reply-To: <201201100017.q0A0HItk037943@gw.catspoiler.org> X-Enigmail-Version: undefined OpenPGP: id=1A1ABC84 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org, adrian@FreeBSD.org, alfred@FreeBSD.org Subject: Re: [patch] allow crash dumps to Linux swap partitions X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Jan 2012 08:46:25 -0000 On 01/09/2012 16:17, Don Lewis wrote: > Looks like this is safe to do. There is some code in swaponsomething() > to avoid the first two page-size blocks of the swap file to avoid > overwriting the BSD label if the swap partition starts at sector zero of > a BSD partition. Confirmed. I switched my BSD swap partitions to the same one my linux install is using, and I've booted back and forth several times now. Doug -- You can observe a lot just by watching. -- Yogi Berra Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/