From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 08:21:30 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 77570106566C; Sun, 19 Sep 2010 08:21:30 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 262B88FC18; Sun, 19 Sep 2010 08:21:28 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA19366; Sun, 19 Sep 2010 11:21:26 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OxF9J-000GUo-OK; Sun, 19 Sep 2010 11:21:25 +0300 Message-ID: <4C95C804.1010701@freebsd.org> Date: Sun, 19 Sep 2010 11:21:24 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 08:21:30 -0000 on 19/09/2010 01:16 Jeff Roberson said the following: > Not specifically in reaction to Robert's comment but I would like to add my > thoughts to this notion of resource balancing in buckets. I really prefer not > to do any specific per-zone tuning except in extreme cases. This is because > quite often the decisions we make don't apply to some class of machines or > workloads. I would instead prefer to keep the algorithm adaptable. Agree. > I like the idea of weighting the bucket decisions by the size of the item. > Obviously this has some flaws with compound objects but in the general case it > is good. We should consider increasing the cost of bucket expansion based on > the size of the item. Right now buckets are expanded fairly readily. > > We could also consider decreasing the default bucket size for a zone based on vm > pressure and use. Right now there is no downward pressure on bucket size, only > upward based on trips to the slab layer. > > Additionally we could make a last ditch flush mechanism that runs on each cpu in > turn and flushes some or all of the buckets in per-cpu caches. Presently that is > not done due to synchronization issues. It can't be done from a central place. > It could be done with a callout mechanism or a for loop that binds to each core > in succession. I like all of the tree above approaches. The last one is a bit hard to implement, the first two seem easier. > I believe the combination of these approaches would significantly solve the > problem and should be relatively little new code. It should also preserve the > adaptable nature of the system without penalizing resource heavy systems. I > would be happy to review patches from anyone who wishes to undertake it. FWIW, the approach of simply limiting maximum bucket size based on item size seems to work rather well too, as my testing with zfs+uma shows. I will also try to add code to completely bypass the per-cpu cache for "really huge" items. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 08:32:53 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05A0C106566B; Sun, 19 Sep 2010 08:32:53 +0000 (UTC) (envelope-from jeremie@le-hen.org) Received: from smtpfb2-g21.free.fr (smtpfb2-g21.free.fr [212.27.42.10]) by mx1.freebsd.org (Postfix) with ESMTP id A7B3E8FC12; Sun, 19 Sep 2010 08:32:49 +0000 (UTC) Received: from smtp5-g21.free.fr (smtp5-g21.free.fr [212.27.42.5]) by smtpfb2-g21.free.fr (Postfix) with ESMTP id 5C499D1AA9A; Sun, 19 Sep 2010 10:14:17 +0200 (CEST) Received: from endor.tataz.chchile.org (unknown [82.233.239.98]) by smtp5-g21.free.fr (Postfix) with ESMTP id 37946D4808E; Sun, 19 Sep 2010 10:14:08 +0200 (CEST) Received: from felucia.tataz.chchile.org (felucia.tataz.chchile.org [192.168.1.9]) by endor.tataz.chchile.org (Postfix) with ESMTP id C480533D77; Sun, 19 Sep 2010 08:14:06 +0000 (UTC) Received: by felucia.tataz.chchile.org (Postfix, from userid 1000) id B19B5A1247; Sun, 19 Sep 2010 08:14:06 +0000 (UTC) Date: Sun, 19 Sep 2010 10:14:06 +0200 From: Jeremie Le Hen To: Alexander Kabaev Message-ID: <20100919081406.GH6864@felucia.tataz.chchile.org> References: <20100803150545.GH14016@felucia.tataz.chchile.org> <20100803114651.651e0ea4@kan.dnsalias.net> <20100805191446.GJ14016@felucia.tataz.chchile.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="qD3brAgIG4LbUq6d" Content-Disposition: inline In-Reply-To: <20100805191446.GJ14016@felucia.tataz.chchile.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: kan@FreeBSD.org, freebsd-hackers@FreeBSD.org, Jeremie Le Hen Subject: Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 08:32:53 -0000 --qD3brAgIG4LbUq6d Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi Alexander, On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote: > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote: > > > > I have no objection, but think we should cave in and investigate the > > possibility of using linker script wrapping libc.so in FreeBSD-9.0: > > > > Below is Linux' counterpart: > > > > /* GNU ld script > > Use the shared library, but some functions are only in > > the static library, so try that secondarily. */ > > OUTPUT_FORMAT(elf32-i386) > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED > > ( /lib/ld-linux.so.2 ) ) > > Ok. For now can you commit the proposed modification. I'll try to make > a patch with your proposal. The attached patch does two things: It modifies bsd.lib.mk to support ld scripts for shared libraries and adds such a script to replace the /usr/lib/libc.so symlink to /lib/libc.so.X. Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to the file containing the script itself: GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) During make install, @@SHLIB@@ will be replaced by the real path of the shared library. Thanks. Regards, -- Jeremie Le Hen Humans are born free and equal. But some are more equal than others. Coluche --qD3brAgIG4LbUq6d Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ld_ssp_nonshared.diff" diff -urNp src.orig/Makefile.inc1 src/Makefile.inc1 --- src.orig/Makefile.inc1 2010-07-15 13:21:25.000000000 +0000 +++ src/Makefile.inc1 2010-08-19 17:27:30.000000000 +0000 @@ -256,6 +256,7 @@ WMAKEENV= ${CROSSENV} \ _SHLIBDIRPREFIX=${WORLDTMP} \ VERSION="${VERSION}" \ INSTALL="sh ${.CURDIR}/tools/install.sh" \ + NO_LDSCRIPT_INSTALL=1 \ PATH=${TMPPATH} .if ${MK_CDDL} == "no" WMAKEENV+= NO_CTF=1 diff -urNp src.orig/lib/libc/Makefile src/lib/libc/Makefile --- src.orig/lib/libc/Makefile 2010-08-01 12:35:01.000000000 +0000 +++ src/lib/libc/Makefile 2010-08-11 17:36:15.000000000 +0000 @@ -20,6 +20,7 @@ CFLAGS+=-DNLS CLEANFILES+=tags INSTALL_PIC_ARCHIVE= PRECIOUSLIB= +SHLIB_LDSCRIPT=libc.ldscript # # Only link with static libgcc.a (no libgcc_eh.a). diff -urNp src.orig/lib/libc/libc.ldscript src/lib/libc/libc.ldscript --- src.orig/lib/libc/libc.ldscript 1970-01-01 00:00:00.000000000 +0000 +++ src/lib/libc/libc.ldscript 2010-08-09 11:12:13.000000000 +0000 @@ -0,0 +1 @@ +GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) diff -urNp src.orig/share/mk/bsd.lib.mk src/share/mk/bsd.lib.mk --- src.orig/share/mk/bsd.lib.mk 2010-07-30 15:25:57.000000000 +0000 +++ src/share/mk/bsd.lib.mk 2010-08-22 13:00:15.000000000 +0000 @@ -216,6 +216,14 @@ ${SHLIB_NAME}: ${SOBJS} @[ -z "${CTFMERGE}" -o -n "${NO_CTF}" ] || \ (${ECHO} ${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${SOBJS} && \ ${CTFMERGE} ${CTFFLAGS} -o ${.TARGET} ${SOBJS}) + +.if defined(SHLIB_LINK) && defined(SHLIB_LDSCRIPT) && !empty(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) +_LIBS+= lib${LIB}.ld + +lib${LIB}.ld: ${.CURDIR}/${SHLIB_LDSCRIPT} + sed 's,@@SHLIB@@,${SHLIBDIR}/${SHLIB_NAME},g' \ + ${.CURDIR}/${SHLIB_LDSCRIPT} > lib${LIB}.ld +.endif .endif .if defined(INSTALL_PIC_ARCHIVE) && defined(LIB) && !empty(LIB) && ${MK_TOOLCHAIN} != "no" @@ -293,9 +301,17 @@ _libinstall: ${_INSTALLFLAGS} ${_SHLINSTALLFLAGS} \ ${SHLIB_NAME} ${DESTDIR}${SHLIBDIR} .if defined(SHLIB_LINK) +.if defined(SHLIB_LDSCRIPT) && !empty(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) && empty(NO_LDSCRIPT_INSTALL) + @echo "DEBUG: install lib${LIB}.ld to ${DESTDIR}${LIBDIR}/${SHLIB_LINK}" + ${INSTALL} -S -C -o ${LIBOWN} -g ${LIBGRP} -m ${LIBMODE} \ + ${_INSTALLFLAGS} lib${LIB}.ld ${DESTDIR}${LIBDIR} + ln -fs lib${LIB}.ld ${DESTDIR}${LIBDIR}/${SHLIB_LINK} +.else .if ${SHLIBDIR} == ${LIBDIR} + @echo "DEBUG: symlink (1) ${DESTDIR}${LIBDIR}/${SHLIB_LINK} to ${SHLIB_NAME}" ln -fs ${SHLIB_NAME} ${DESTDIR}${LIBDIR}/${SHLIB_LINK} .else + @echo "DEBUG: symlink (2) ${DESTDIR}${LIBDIR}/${SHLIB_LINK} to ${_SHLIBDIRPREFIX}${SHLIBDIR}/${SHLIB_NAME}" ln -fs ${_SHLIBDIRPREFIX}${SHLIBDIR}/${SHLIB_NAME} \ ${DESTDIR}${LIBDIR}/${SHLIB_LINK} .if exists(${DESTDIR}${LIBDIR}/${SHLIB_NAME}) @@ -303,8 +319,9 @@ _libinstall: rm -f ${DESTDIR}${LIBDIR}/${SHLIB_NAME} .endif .endif -.endif -.endif +.endif # SHLIB_LDSCRIPT +.endif # SHLIB_LINK +.endif # SHIB_NAME .if defined(INSTALL_PIC_ARCHIVE) && defined(LIB) && !empty(LIB) && ${MK_TOOLCHAIN} != "no" ${INSTALL} -o ${LIBOWN} -g ${LIBGRP} -m ${LIBMODE} \ ${_INSTALLFLAGS} lib${LIB}_pic.a ${DESTDIR}${LIBDIR} @@ -372,6 +389,9 @@ clean: .endif .if defined(SHLIB_NAME) .if defined(SHLIB_LINK) +.if defined(SHLIB_LDSCRIPT) && exists(${.CURDIR}/${SHLIB_LDSCRIPT}) + rm -f lib${LIB}.ld +.endif rm -f ${SHLIB_LINK} .endif .if defined(LIB) && !empty(LIB) --qD3brAgIG4LbUq6d-- From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 08:42:07 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 988AA106566C; Sun, 19 Sep 2010 08:42:07 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 426FF8FC0A; Sun, 19 Sep 2010 08:42:05 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA19605; Sun, 19 Sep 2010 11:42:04 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OxFTH-000GWC-Dm; Sun, 19 Sep 2010 11:42:03 +0300 Message-ID: <4C95CCDA.7010007@freebsd.org> Date: Sun, 19 Sep 2010 11:42:02 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 08:42:07 -0000 on 19/09/2010 11:27 Jeff Roberson said the following: > I don't like this because even with very large buffers you can still have high > enough turnover to require per-cpu caching. Kip specifically added UMA support > to address this issue in zfs. If you have allocations which don't require > per-cpu caching and are very large why even use UMA? Good point. Right now I am running with 4 items/bucket limit for items larger than 32KB. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 08:26:37 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EADAB1065670; Sun, 19 Sep 2010 08:26:36 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id ACDC28FC14; Sun, 19 Sep 2010 08:26:36 +0000 (UTC) Received: by pzk7 with SMTP id 7so1182700pzk.13 for ; Sun, 19 Sep 2010 01:26:36 -0700 (PDT) Received: by 10.142.132.11 with SMTP id f11mr6297097wfd.35.1284884796189; Sun, 19 Sep 2010 01:26:36 -0700 (PDT) Received: from [10.0.1.198] (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id l42sm3725264wfa.9.2010.09.19.01.26.33 (version=SSLv3 cipher=RC4-MD5); Sun, 19 Sep 2010 01:26:35 -0700 (PDT) Date: Sat, 18 Sep 2010 22:27:42 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Andriy Gapon In-Reply-To: <4C95C804.1010701@freebsd.org> Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Sun, 19 Sep 2010 11:01:51 +0000 Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 08:26:37 -0000 On Sun, 19 Sep 2010, Andriy Gapon wrote: > on 19/09/2010 01:16 Jeff Roberson said the following: >> Not specifically in reaction to Robert's comment but I would like to add my >> thoughts to this notion of resource balancing in buckets. I really prefer not >> to do any specific per-zone tuning except in extreme cases. This is because >> quite often the decisions we make don't apply to some class of machines or >> workloads. I would instead prefer to keep the algorithm adaptable. > > Agree. > >> I like the idea of weighting the bucket decisions by the size of the item. >> Obviously this has some flaws with compound objects but in the general case it >> is good. We should consider increasing the cost of bucket expansion based on >> the size of the item. Right now buckets are expanded fairly readily. >> >> We could also consider decreasing the default bucket size for a zone based on vm >> pressure and use. Right now there is no downward pressure on bucket size, only >> upward based on trips to the slab layer. >> >> Additionally we could make a last ditch flush mechanism that runs on each cpu in >> turn and flushes some or all of the buckets in per-cpu caches. Presently that is >> not done due to synchronization issues. It can't be done from a central place. >> It could be done with a callout mechanism or a for loop that binds to each core >> in succession. > > I like all of the tree above approaches. > The last one is a bit hard to implement, the first two seem easier. All the last one requires is a loop calling sched_bind() on each available cpu. > >> I believe the combination of these approaches would significantly solve the >> problem and should be relatively little new code. It should also preserve the >> adaptable nature of the system without penalizing resource heavy systems. I >> would be happy to review patches from anyone who wishes to undertake it. > > FWIW, the approach of simply limiting maximum bucket size based on item size > seems to work rather well too, as my testing with zfs+uma shows. > I will also try to add code to completely bypass the per-cpu cache for "really > huge" items. I don't like this because even with very large buffers you can still have high enough turnover to require per-cpu caching. Kip specifically added UMA support to address this issue in zfs. If you have allocations which don't require per-cpu caching and are very large why even use UMA? One thing that would be nice if we are frequently using page size allocations is to eliminate the requirement for a slab header for each page. It seems unnecessary for any zone where the items per slab is 1 but it would require careful modification to support properly. Thanks, Jeff > > -- > Andriy Gapon > From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 11:41:19 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C6941065672; Sun, 19 Sep 2010 11:41:19 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 738DA8FC15; Sun, 19 Sep 2010 11:41:19 +0000 (UTC) Received: from [127.0.0.1] (rhee.cl.cam.ac.uk [128.232.1.202]) by cyrus.watson.org (Postfix) with ESMTPSA id 3536746B5C; Sun, 19 Sep 2010 07:41:18 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: <4C95C804.1010701@freebsd.org> Date: Sun, 19 Sep 2010 12:41:16 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <8D2A1836-CA85-4F1B-A5A5-9B75A8E2DA51@freebsd.org> References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) Cc: Andre Oppermann , Jeff Roberson , Jeff Roberson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 11:41:19 -0000 On 19 Sep 2010, at 09:21, Andriy Gapon wrote: >> I believe the combination of these approaches would significantly = solve the >> problem and should be relatively little new code. It should also = preserve the >> adaptable nature of the system without penalizing resource heavy = systems. I >> would be happy to review patches from anyone who wishes to undertake = it. >=20 > FWIW, the approach of simply limiting maximum bucket size based on = item size > seems to work rather well too, as my testing with zfs+uma shows. > I will also try to add code to completely bypass the per-cpu cache for = "really > huge" items. This is basically what malloc(9) does already: for small items, it = allocates from a series of fixed-size buckets (which could probably use = tuning), but maintains its own stats with respect to the types it maps = into the buckets. This is why there's double-counting between vmstat -z = and vmstat -m, since the former shows the buckets used to allocate the = latter. For large items, malloc(9) goes through UMA, but it's basically a = pass-through to VM, which directly provides pages. This means that for = small malloc types, you get per-CPU caches, and for large malloc types, = you don't. malloc(9) doesn't require fixed-size allocations, but also can't provide = the ctor/dtor partial tear-down caching, nor different effective working = sets of memory for different types. UMA should really only be used directly for memory types where tight = packing, per-CPU caching, and possibly partial tear-down, have benefits. = mbufs are a great example, because we allocate tons and tons of them = continuously in operation. More stable types allocated in smaller = quantities make very little sense, since we waste lots of memory = overhead in allocating buckets that won't be used, etc. Robert= From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 11:42:35 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3417C1065670; Sun, 19 Sep 2010 11:42:35 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0C8748FC21; Sun, 19 Sep 2010 11:42:35 +0000 (UTC) Received: from [127.0.0.1] (rhee.cl.cam.ac.uk [128.232.1.202]) by cyrus.watson.org (Postfix) with ESMTPSA id 9D1B546B91; Sun, 19 Sep 2010 07:42:33 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: text/plain; charset=us-ascii From: "Robert N. M. Watson" In-Reply-To: <4C95CCDA.7010007@freebsd.org> Date: Sun, 19 Sep 2010 12:42:31 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> To: Andriy Gapon X-Mailer: Apple Mail (2.1081) Cc: Andre Oppermann , Jeff Roberson , Jeff Roberson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 11:42:35 -0000 On 19 Sep 2010, at 09:42, Andriy Gapon wrote: > on 19/09/2010 11:27 Jeff Roberson said the following: >> I don't like this because even with very large buffers you can still = have high >> enough turnover to require per-cpu caching. Kip specifically added = UMA support >> to address this issue in zfs. If you have allocations which don't = require >> per-cpu caching and are very large why even use UMA? >=20 > Good point. > Right now I am running with 4 items/bucket limit for items larger than = 32KB. If allocate turnover is low, I'd think that malloc(9) would do better = here. How many allocs/frees per second are there in peak operation? Robert= From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 12:41:47 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABE601065674 for ; Sun, 19 Sep 2010 12:41:47 +0000 (UTC) (envelope-from joerg@britannica.bec.de) Received: from www.sonnenberger.org (www.sonnenberger.org [92.79.50.50]) by mx1.freebsd.org (Postfix) with ESMTP id 6C0318FC16 for ; Sun, 19 Sep 2010 12:41:47 +0000 (UTC) Received: from britannica.bec.de (www.sonnenberger.org [192.168.1.10]) by www.sonnenberger.org (Postfix) with ESMTP id 9061566663 for ; Sun, 19 Sep 2010 14:22:44 +0200 (CEST) Received: by britannica.bec.de (Postfix, from userid 1000) id CC23C117B97; Sun, 19 Sep 2010 14:23:02 +0200 (CEST) Date: Sun, 19 Sep 2010 14:23:02 +0200 From: Joerg Sonnenberger To: freebsd-hackers@freebsd.org Message-ID: <20100919122302.GA11190@britannica.bec.de> Mail-Followup-To: freebsd-hackers@freebsd.org References: <20100829201050.GA60715@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: ar(1) format_decimal failure is fatal? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 12:41:47 -0000 On Sat, Sep 18, 2010 at 12:01:04AM -0400, Benjamin Kaduk wrote: > GNU binutils has recently (well, March 2009) added a -D > ("deterministic") argument to ar(1) which sets the timestamp, uid, > and gid to zero, and the mode to 644. That argument was added based on discussions on NetBSD about doing bit-identical release builds. It was made optional for the possible users of the data, not that we are really aware of anyone using it. The ar(1) support in make basically goes back to a time when replacing the content was a major speed up for incremental builds and it is pretty much useless nowadays. Similary the timestamp, it doesn't tell that much about the content either. I don't think the backend should do silent truncation, that would be very bad. It might be needed to have a flag for backends to allow it though. Joerg From owner-freebsd-hackers@FreeBSD.ORG Sun Sep 19 18:41:53 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1F34106564A; Sun, 19 Sep 2010 18:41:53 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6DDB48FC18; Sun, 19 Sep 2010 18:41:52 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8JIfnHN084086 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 19 Sep 2010 21:41:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o8JIfngB045607; Sun, 19 Sep 2010 21:41:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8JIfkbq045606; Sun, 19 Sep 2010 21:41:46 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 19 Sep 2010 21:41:46 +0300 From: Kostik Belousov To: Jeremie Le Hen Message-ID: <20100919184146.GE2389@deviant.kiev.zoral.com.ua> References: <20100803150545.GH14016@felucia.tataz.chchile.org> <20100803114651.651e0ea4@kan.dnsalias.net> <20100805191446.GJ14016@felucia.tataz.chchile.org> <20100919081406.GH6864@felucia.tataz.chchile.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qsoMWdMv/ifdm7CC" Content-Disposition: inline In-Reply-To: <20100919081406.GH6864@felucia.tataz.chchile.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_40, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: kan@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Sep 2010 18:41:54 -0000 --qsoMWdMv/ifdm7CC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Sep 19, 2010 at 10:14:06AM +0200, Jeremie Le Hen wrote: > Hi Alexander, >=20 > On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote: > > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote: > > > > > > I have no objection, but think we should cave in and investigate the > > > possibility of using linker script wrapping libc.so in FreeBSD-9.0: > > >=20 > > > Below is Linux' counterpart: > > >=20 > > > /* GNU ld script > > > Use the shared library, but some functions are only in > > > the static library, so try that secondarily. */ > > > OUTPUT_FORMAT(elf32-i386) > > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED > > > ( /lib/ld-linux.so.2 ) ) > >=20 > > Ok. For now can you commit the proposed modification. I'll try to make > > a patch with your proposal. >=20 > The attached patch does two things: It modifies bsd.lib.mk to support ld > scripts for shared libraries and adds such a script to replace the > /usr/lib/libc.so symlink to /lib/libc.so.X. >=20 > Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to > the file containing the script itself: > GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) >=20 > During make install, @@SHLIB@@ will be replaced by the real path of the > shared library. You did not included $FreeBSD$ tag into libc.so script. I think it would be useful to have. Could you, please, comment why the script is not installed during the world build stage ? My question is, would the buildworld use the script for linkage ? --qsoMWdMv/ifdm7CC Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkyWWWoACgkQC3+MBN1Mb4iKZQCgj6tfKlGBmlP9RX1Q3lwLBK7M 7jsAnjhys9Nn5gGMrnCV0gHnxWjSFPll =xRK1 -----END PGP SIGNATURE----- --qsoMWdMv/ifdm7CC-- From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 20 14:49:11 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 549A01065698 for ; Mon, 20 Sep 2010 14:49:11 +0000 (UTC) (envelope-from bounces@nabble.com) Received: from kuber.nabble.com (kuber.nabble.com [216.139.236.158]) by mx1.freebsd.org (Postfix) with ESMTP id 36D2A8FC28 for ; Mon, 20 Sep 2010 14:49:10 +0000 (UTC) Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1OxhQL-00038H-0b for freebsd-hackers@freebsd.org; Mon, 20 Sep 2010 07:32:53 -0700 Message-ID: <29760054.post@talk.nabble.com> Date: Mon, 20 Sep 2010 07:32:53 -0700 (PDT) From: Svatopluk Kraus To: freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: onwahe@gmail.com Subject: page table fault, which should map kernel virtual address space X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Sep 2010 14:49:11 -0000 Hallo, this is about 'NKPT' definition, 'kernel_map' submaps, and 'vm_map_findspace' function. Variable 'kernel_map' is used to manage kernel virtual address space. When 'vm_map_findspace' function deals with 'kernel_map' then 'pmap_growkernel' function is called. At least in 'i386' architecture, pmap implementation uses 'pmap_growkernel' function to allocate missing page tables. Missing page tables are problem, because no one checks 'pte' pointer for validity after use of 'vtopte' macro. 'NKPT' definition defines a number of preallocated page tables during system boot. Beyond 'kernel_map', some submaps of 'kernel_map' (buffer_map, pager_map,...) exist as result of 'kmem_suballoc' function call. When this submaps are used (for example 'kmem_alloc_nofault' function) and its virtual address subspace is at the end of used kernel virtual address space at the moment (and above 'NKPT' preallocation), then missing page tables are not allocated and double fault can happen. I have met this scenario and solved it by increasing page tables preallocation count ('NKPT' definition). It's temporary solution which works for the present. Can someone more advanced and sacred in virtual memory module solve it (in 'vm_map_findspace' function for example)? Or tell me that the problem is elsewhere... Thanks, Svata -- View this message in context: http://old.nabble.com/page-table-fault%2C-which-should-map-kernel-virtual-address-space-tp29760054p29760054.html Sent from the freebsd-hackers mailing list archive at Nabble.com. From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 20 16:29:36 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 62F3B10656C5; Mon, 20 Sep 2010 16:29:36 +0000 (UTC) (envelope-from jeremie@le-hen.org) Received: from smtp5-g21.free.fr (smtp5-g21.free.fr [212.27.42.5]) by mx1.freebsd.org (Postfix) with ESMTP id 32B348FC15; Mon, 20 Sep 2010 16:29:33 +0000 (UTC) Received: from endor.tataz.chchile.org (unknown [82.233.239.98]) by smtp5-g21.free.fr (Postfix) with ESMTP id 4CCD5D48096; Mon, 20 Sep 2010 18:29:27 +0200 (CEST) Received: from felucia.tataz.chchile.org (felucia.tataz.chchile.org [192.168.1.9]) by endor.tataz.chchile.org (Postfix) with ESMTP id 3609933D77; Mon, 20 Sep 2010 16:29:26 +0000 (UTC) Received: by felucia.tataz.chchile.org (Postfix, from userid 1000) id 14F23A11ED; Mon, 20 Sep 2010 16:29:26 +0000 (UTC) Date: Mon, 20 Sep 2010 18:29:25 +0200 From: Jeremie Le Hen To: Kostik Belousov Message-ID: <20100920162925.GL6864@felucia.tataz.chchile.org> References: <20100803150545.GH14016@felucia.tataz.chchile.org> <20100803114651.651e0ea4@kan.dnsalias.net> <20100805191446.GJ14016@felucia.tataz.chchile.org> <20100919081406.GH6864@felucia.tataz.chchile.org> <20100919184146.GE2389@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100919184146.GE2389@deviant.kiev.zoral.com.ua> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: kan@freebsd.org, freebsd-hackers@freebsd.org, Jeremie Le Hen Subject: Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Sep 2010 16:29:36 -0000 Kostik, On Sun, Sep 19, 2010 at 09:41:46PM +0300, Kostik Belousov wrote: > On Sun, Sep 19, 2010 at 10:14:06AM +0200, Jeremie Le Hen wrote: > > Hi Alexander, > > > > On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote: > > > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote: > > > > > > > > I have no objection, but think we should cave in and investigate the > > > > possibility of using linker script wrapping libc.so in FreeBSD-9.0: > > > > > > > > Below is Linux' counterpart: > > > > > > > > /* GNU ld script > > > > Use the shared library, but some functions are only in > > > > the static library, so try that secondarily. */ > > > > OUTPUT_FORMAT(elf32-i386) > > > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED > > > > ( /lib/ld-linux.so.2 ) ) > > > > > > Ok. For now can you commit the proposed modification. I'll try to make > > > a patch with your proposal. > > > > The attached patch does two things: It modifies bsd.lib.mk to support ld > > scripts for shared libraries and adds such a script to replace the > > /usr/lib/libc.so symlink to /lib/libc.so.X. > > > > Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points to > > the file containing the script itself: > > GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) > > > > During make install, @@SHLIB@@ will be replaced by the real path of the > > shared library. > > You did not included $FreeBSD$ tag into libc.so script. I think it would be > useful to have. Sure. I will send an updated patch a little later. > Could you, please, comment why the script is not installed during the > world build stage ? My question is, would the buildworld use the script > for linkage ? libc.ld, the generated ldscript in ${.OBJDIR}, is built along with libc.so.7 which is built only once (stage 4.2 of buildworld). In order to get buildworld use the ld script, it would require to generate it twice: once during stage 4.2 using /usr/obj/usr/src/tmp/lib/libc.so.7 and another one afterward using /lib/libc.so.7. Besides I didn't see an advantage to do this because when compiling the base system, CFLAGS and LDFLAGS are well controlled so -fstack-protector will be provided when linking the program. On the other hand, the patch I propose is required for the numerous ports for which we do not control linking flags; lang/perl comes into my mind. If you want to compile it with SSP, you have to patch its build infrastructure (see ports/138228). Regards, -- Jeremie Le Hen Humans are born free and equal. But some are more equal than others. Coluche From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 20 18:32:02 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 460A61065672; Mon, 20 Sep 2010 18:32:02 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EBC1B8FC1B; Mon, 20 Sep 2010 18:31:58 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA17080; Mon, 20 Sep 2010 21:31:56 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4C97A89B.9070806@freebsd.org> Date: Mon, 20 Sep 2010 21:31:55 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100909 Lightning/1.0b2 Thunderbird/3.1.3 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Sep 2010 18:32:02 -0000 on 19/09/2010 11:27 Jeff Roberson said the following: > On Sun, 19 Sep 2010, Andriy Gapon wrote: > >> on 19/09/2010 01:16 Jeff Roberson said the following: >>> Additionally we could make a last ditch flush mechanism that runs on each cpu in >>> turn and flushes some or all of the buckets in per-cpu caches. Presently that is >>> not done due to synchronization issues. It can't be done from a central place. >>> It could be done with a callout mechanism or a for loop that binds to each core >>> in succession. >> >> I like all of the tree above approaches. >> The last one is a bit hard to implement, the first two seem easier. > > All the last one requires is a loop calling sched_bind() on each available cpu. Something like cache_drain() but with sched_bind() in the loop? critical_enter() would be probably also needed to avoid preemption and conflict while acting on cache buckets? -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Mon Sep 20 19:27:14 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0E4A1065673; Mon, 20 Sep 2010 19:27:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 5E0188FC16; Mon, 20 Sep 2010 19:27:13 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o8KJR8kh091815 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 20 Sep 2010 22:27:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o8KJR8AZ095846; Mon, 20 Sep 2010 22:27:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o8KJR836095845; Mon, 20 Sep 2010 22:27:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 20 Sep 2010 22:27:08 +0300 From: Kostik Belousov To: Jeremie Le Hen Message-ID: <20100920192708.GK2389@deviant.kiev.zoral.com.ua> References: <20100803150545.GH14016@felucia.tataz.chchile.org> <20100803114651.651e0ea4@kan.dnsalias.net> <20100805191446.GJ14016@felucia.tataz.chchile.org> <20100919081406.GH6864@felucia.tataz.chchile.org> <20100919184146.GE2389@deviant.kiev.zoral.com.ua> <20100920162925.GL6864@felucia.tataz.chchile.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="dyTp/pkqtoagvozp" Content-Disposition: inline In-Reply-To: <20100920162925.GL6864@felucia.tataz.chchile.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: kan@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: [PATCH] Add -lssp_nonshared to GCC's LIB_SPEC unconditionally X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Sep 2010 19:27:15 -0000 --dyTp/pkqtoagvozp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Sep 20, 2010 at 06:29:25PM +0200, Jeremie Le Hen wrote: > Kostik, >=20 > On Sun, Sep 19, 2010 at 09:41:46PM +0300, Kostik Belousov wrote: > > On Sun, Sep 19, 2010 at 10:14:06AM +0200, Jeremie Le Hen wrote: > > > Hi Alexander, > > >=20 > > > On Thu, Aug 05, 2010 at 09:14:46PM +0200, Jeremie Le Hen wrote: > > > > On Tue, Aug 03, 2010 at 11:46:51AM -0400, Alexander Kabaev wrote: > > > > > > > > > > I have no objection, but think we should cave in and investigate = the > > > > > possibility of using linker script wrapping libc.so in FreeBSD-9.= 0: > > > > >=20 > > > > > Below is Linux' counterpart: > > > > >=20 > > > > > /* GNU ld script > > > > > Use the shared library, but some functions are only in > > > > > the static library, so try that secondarily. */ > > > > > OUTPUT_FORMAT(elf32-i386) > > > > > GROUP ( /lib/libc.so.6 /usr/lib/libc_nonshared.a AS_NEEDED > > > > > ( /lib/ld-linux.so.2 ) ) > > > >=20 > > > > Ok. For now can you commit the proposed modification. I'll try to= make > > > > a patch with your proposal. > > >=20 > > > The attached patch does two things: It modifies bsd.lib.mk to support= ld > > > scripts for shared libraries and adds such a script to replace the > > > /usr/lib/libc.so symlink to /lib/libc.so.X. > > >=20 > > > Basically, SHLIB_LDSCRIPT is defined in lib/libc/Makefile and points = to > > > the file containing the script itself: > > > GROUP ( @@SHLIB@@ /usr/lib/libssp_nonshared.a ) > > >=20 > > > During make install, @@SHLIB@@ will be replaced by the real path of t= he > > > shared library. > >=20 > > You did not included $FreeBSD$ tag into libc.so script. I think it woul= d be > > useful to have. >=20 > Sure. I will send an updated patch a little later. >=20 > > Could you, please, comment why the script is not installed during the > > world build stage ? My question is, would the buildworld use the script > > for linkage ? >=20 > libc.ld, the generated ldscript in ${.OBJDIR}, is built along with > libc.so.7 which is built only once (stage 4.2 of buildworld). >=20 > In order to get buildworld use the ld script, it would require to > generate it twice: once during stage 4.2 using > /usr/obj/usr/src/tmp/lib/libc.so.7 and another one afterward using > /lib/libc.so.7. >=20 > Besides I didn't see an advantage to do this because when compiling the > base system, CFLAGS and LDFLAGS are well controlled so -fstack-protector > will be provided when linking the program. On the other hand, the patch > I propose is required for the numerous ports for which we do not control > linking flags; lang/perl comes into my mind. If you want to compile it > with SSP, you have to patch its build infrastructure (see ports/138228). You make the script only useful for the stack protection. If build process does not use libc.so script, but installed system does, you - require to maintain two places where (not much) hypothetical libc changes should go; - make it very puzzling to debug the issues with the build of the usermode. Please, do this in the consistent manner, so that the script can be adopted for other uses. --dyTp/pkqtoagvozp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkyXtYwACgkQC3+MBN1Mb4hIUwCfenawEo+oOW3yd1zCt1wImfEf qPUAoNy5G06i0ZBp8tdLeaLdWl6ywivp =qYu3 -----END PGP SIGNATURE----- --dyTp/pkqtoagvozp-- From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 06:20:05 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 094601065679; Tue, 21 Sep 2010 06:20:05 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id AD66F8FC1F; Tue, 21 Sep 2010 06:20:03 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA26183; Tue, 21 Sep 2010 09:20:01 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OxwCv-000Nmu-0W; Tue, 21 Sep 2010 09:20:01 +0300 Message-ID: <4C984E90.60507@freebsd.org> Date: Tue, 21 Sep 2010 09:20:00 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> In-Reply-To: <4C95CCDA.7010007@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 06:20:05 -0000 on 19/09/2010 11:42 Andriy Gapon said the following: > on 19/09/2010 11:27 Jeff Roberson said the following: >> I don't like this because even with very large buffers you can still have high >> enough turnover to require per-cpu caching. Kip specifically added UMA support >> to address this issue in zfs. If you have allocations which don't require >> per-cpu caching and are very large why even use UMA? > > Good point. > Right now I am running with 4 items/bucket limit for items larger than 32KB. But I also have two counter-points actually :) 1. Uniformity. E.g. you can handle all ZFS I/O buffers via the same mechanism regardless of buffer size. 2. (Open)Solaris does that for a while and it seems to suit them well. Not saying that they are perfect, or the best, or an example to follow, but still that means quite a bit (for me). -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 06:26:26 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FDC71065674; Tue, 21 Sep 2010 06:26:26 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 059F48FC12; Tue, 21 Sep 2010 06:26:24 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA26291; Tue, 21 Sep 2010 09:26:22 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OxwJ4-000Nn6-FI; Tue, 21 Sep 2010 09:26:22 +0300 Message-ID: <4C98500D.5040109@freebsd.org> Date: Tue, 21 Sep 2010 09:26:21 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 06:26:26 -0000 on 19/09/2010 01:16 Jeff Roberson said the following: > Additionally we could make a last ditch flush mechanism that runs on each cpu in How would you qualify a "last ditch" trigger? Would this be called from "standard" vm_lowmem look or would there be some extra check for even more severe memory condition? > turn and flushes some or all of the buckets in per-cpu caches. Presently that is > not done due to synchronization issues. It can't be done from a central place. > It could be done with a callout mechanism or a for loop that binds to each core > in succession. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 07:09:33 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0CFB51065670; Tue, 21 Sep 2010 07:09:33 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A3D918FC0C; Tue, 21 Sep 2010 07:09:31 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA26948; Tue, 21 Sep 2010 10:09:29 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Oxwym-000Np8-Of; Tue, 21 Sep 2010 10:09:28 +0300 Message-ID: <4C985A28.6050706@freebsd.org> Date: Tue, 21 Sep 2010 10:09:28 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 07:09:33 -0000 on 21/09/2010 09:39 Jeff Roberson said the following: > I'm afraid there is not enough context here for me to know what 'the same > mechanism' is or what solaris does. Can you elaborate? This was in my first post: [[[ There is this good book: http://books.google.com/books?id=r_cecYD4AKkC&printsec=frontcover Please see section 6.2.4.5 on page 225 and table 6-11 on page 226. And also this code: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/kmem.c#971 ]]] > I prefer not to take the weight of specific examples too heavily when > considering the allocator as it must handle many cases and many types of > systems. I believe there are cases where you want large allocations to be > handled by per-cpu caches, regardless of whether ZFS is one such case. If ZFS > does not need them, then it should simply allocate directly from the VM. > However, I don't want to introduce some maximum constraint unless it can be > shown that adequate behavior is not generated from some more adaptable algorithm. Yes, I agree in general. But sometimes simplicity has its benefits too as opposed to complex dynamic behavior that _might_ result from adaptive algorithms. Anyway, I have some early patches to implement first two of your suggestions and I am testing them now. Looks good to me so far. Parameters in the adaptions would probably need some additional tuning. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 07:12:10 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EBFCD106564A; Tue, 21 Sep 2010 07:12:10 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9AA478FC23; Tue, 21 Sep 2010 07:12:09 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA26990; Tue, 21 Sep 2010 10:12:07 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Oxx1L-000NpQ-1w; Tue, 21 Sep 2010 10:12:07 +0300 Message-ID: <4C985AC6.60906@freebsd.org> Date: Tue, 21 Sep 2010 10:12:06 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C98500D.5040109@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 07:12:11 -0000 on 21/09/2010 09:35 Jeff Roberson said the following: > On Tue, 21 Sep 2010, Andriy Gapon wrote: > >> on 19/09/2010 01:16 Jeff Roberson said the following: >>> Additionally we could make a last ditch flush mechanism that runs on each cpu in >> >> How would you qualify a "last ditch" trigger? >> Would this be called from "standard" vm_lowmem look or would there be some extra >> check for even more severe memory condition? > > If lowmem does not make enough progress to improve the condition. Do we have a good way to detect that? I see that currently vm_lowmem is always invoked with argument value of zero. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 06:34:31 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF9A11065674; Tue, 21 Sep 2010 06:34:31 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 611C88FC13; Tue, 21 Sep 2010 06:34:31 +0000 (UTC) Received: by yxn35 with SMTP id 35so1934146yxn.13 for ; Mon, 20 Sep 2010 23:34:30 -0700 (PDT) Received: by 10.151.106.12 with SMTP id i12mr9953197ybm.106.1285050869234; Mon, 20 Sep 2010 23:34:29 -0700 (PDT) Received: from [10.0.1.198] (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id u24sm9751004yba.9.2010.09.20.23.34.25 (version=SSLv3 cipher=RC4-MD5); Mon, 20 Sep 2010 23:34:28 -0700 (PDT) Date: Mon, 20 Sep 2010 20:35:33 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Andriy Gapon In-Reply-To: <4C98500D.5040109@freebsd.org> Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C98500D.5040109@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Tue, 21 Sep 2010 10:37:46 +0000 Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 06:34:32 -0000 On Tue, 21 Sep 2010, Andriy Gapon wrote: > on 19/09/2010 01:16 Jeff Roberson said the following: >> Additionally we could make a last ditch flush mechanism that runs on each cpu in > > How would you qualify a "last ditch" trigger? > Would this be called from "standard" vm_lowmem look or would there be some extra > check for even more severe memory condition? If lowmem does not make enough progress to improve the condition. Jeff > >> turn and flushes some or all of the buckets in per-cpu caches. Presently that is >> not done due to synchronization issues. It can't be done from a central place. >> It could be done with a callout mechanism or a for loop that binds to each core >> in succession. > > -- > Andriy Gapon > From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 06:38:55 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C0421065696; Tue, 21 Sep 2010 06:38:55 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com [209.85.161.182]) by mx1.freebsd.org (Postfix) with ESMTP id 818658FC17; Tue, 21 Sep 2010 06:38:54 +0000 (UTC) Received: by gxk8 with SMTP id 8so1932020gxk.13 for ; Mon, 20 Sep 2010 23:38:54 -0700 (PDT) Received: by 10.101.136.30 with SMTP id o30mr10502522ann.224.1285051133708; Mon, 20 Sep 2010 23:38:53 -0700 (PDT) Received: from [10.0.1.198] (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id d4sm13839209and.39.2010.09.20.23.38.50 (version=SSLv3 cipher=RC4-MD5); Mon, 20 Sep 2010 23:38:52 -0700 (PDT) Date: Mon, 20 Sep 2010 20:39:58 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Andriy Gapon In-Reply-To: <4C984E90.60507@freebsd.org> Message-ID: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Mailman-Approved-At: Tue, 21 Sep 2010 10:38:19 +0000 Cc: Andre Oppermann , Jeff Roberson , Robert Watson , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 06:38:55 -0000 On Tue, 21 Sep 2010, Andriy Gapon wrote: > on 19/09/2010 11:42 Andriy Gapon said the following: >> on 19/09/2010 11:27 Jeff Roberson said the following: >>> I don't like this because even with very large buffers you can still have high >>> enough turnover to require per-cpu caching. Kip specifically added UMA support >>> to address this issue in zfs. If you have allocations which don't require >>> per-cpu caching and are very large why even use UMA? >> >> Good point. >> Right now I am running with 4 items/bucket limit for items larger than 32KB. > > But I also have two counter-points actually :) > 1. Uniformity. E.g. you can handle all ZFS I/O buffers via the same mechanism > regardless of buffer size. > 2. (Open)Solaris does that for a while and it seems to suit them well. Not > saying that they are perfect, or the best, or an example to follow, but still > that means quite a bit (for me). I'm afraid there is not enough context here for me to know what 'the same mechanism' is or what solaris does. Can you elaborate? I prefer not to take the weight of specific examples too heavily when considering the allocator as it must handle many cases and many types of systems. I believe there are cases where you want large allocations to be handled by per-cpu caches, regardless of whether ZFS is one such case. If ZFS does not need them, then it should simply allocate directly from the VM. However, I don't want to introduce some maximum constraint unless it can be shown that adequate behavior is not generated from some more adaptable algorithm. Thanks, Jeff > > -- > Andriy Gapon > From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 16:16:09 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5C77B106566B for ; Tue, 21 Sep 2010 16:16:09 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9143A8FC1A for ; Tue, 21 Sep 2010 16:16:08 +0000 (UTC) Received: by qwg5 with SMTP id 5so4985938qwg.13 for ; Tue, 21 Sep 2010 09:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=NKUtT+HBidC3RbyT64+qU0jrci+Ce6CNOSpqazWAw3k=; b=LEqjht7K27SKPkF5XM2E7yHJrH46c9Nk5xmeDjhpR2pBgASdydZMeyZM+eY4EPjlxP kDvQSDhd1iAWMVGmcRuty+DJDw7OX3SdJkD4MDhTZjA7qoP/+eojEpaERSHv1C0m09qG 4MiYYGHTvGum4XFukLWqH6D4yETO5Ua3RGRQk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=nW2x4U3hkVi2IY4wxLIAt6s6Cb8Gc7V0zDTJezCPGKMq78C892RGNvC5uiaKFD6Vsp m6Eeaa5/ppLjnwtaaWjS3ScvHlMprvoiWWCSvgNJVxpxXUS/9J41dKZhBsfBmAhTZSkj BGp8+qj+l/Y1nRCGcXgCdaEcnmUBgkIAqPx8Q= MIME-Version: 1.0 Received: by 10.224.79.28 with SMTP id n28mr6991262qak.175.1285085767738; Tue, 21 Sep 2010 09:16:07 -0700 (PDT) Received: by 10.229.37.85 with HTTP; Tue, 21 Sep 2010 09:16:07 -0700 (PDT) In-Reply-To: References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org> Date: Tue, 21 Sep 2010 11:16:07 -0500 Message-ID: From: Alan Cox To: Jeff Roberson Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Robert Watson , Jeff Roberson , Andre Oppermann , Andriy Gapon , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 16:16:09 -0000 On Tue, Sep 21, 2010 at 1:39 AM, Jeff Roberson wrote: > On Tue, 21 Sep 2010, Andriy Gapon wrote: > > on 19/09/2010 11:42 Andriy Gapon said the following: >> >>> on 19/09/2010 11:27 Jeff Roberson said the following: >>> >>>> I don't like this because even with very large buffers you can still >>>> have high >>>> enough turnover to require per-cpu caching. Kip specifically added UMA >>>> support >>>> to address this issue in zfs. If you have allocations which don't >>>> require >>>> per-cpu caching and are very large why even use UMA? >>>> >>> >>> Good point. >>> Right now I am running with 4 items/bucket limit for items larger than >>> 32KB. >>> >> >> But I also have two counter-points actually :) >> 1. Uniformity. E.g. you can handle all ZFS I/O buffers via the same >> mechanism >> regardless of buffer size. >> 2. (Open)Solaris does that for a while and it seems to suit them well. >> Not >> saying that they are perfect, or the best, or an example to follow, but >> still >> that means quite a bit (for me). >> > > I'm afraid there is not enough context here for me to know what 'the same > mechanism' is or what solaris does. Can you elaborate? > > I prefer not to take the weight of specific examples too heavily when > considering the allocator as it must handle many cases and many types of > systems. I believe there are cases where you want large allocations to be > handled by per-cpu caches, regardless of whether ZFS is one such case. If > ZFS does not need them, then it should simply allocate directly from the VM. > However, I don't want to introduce some maximum constraint unless it can be > shown that adequate behavior is not generated from some more adaptable > algorithm. > > Actually, I think that there is a middle ground between "per-cpu caches" and "directly from the VM" that we are missing. When I've looked at the default configuration of ZFS (without the extra UMA zones enabled), there is an incredible amount of churn on the kmem map caused by the implementation of uma_large_malloc() and uma_large_free() going directly to the kmem map. Not only are the obvious things happening, like allocating and freeing kernel virtual addresses and underlying physical pages on every call, but also system-wide TLB shootdowns and sometimes superpage demotions are occurring. I have some trouble believing that the large allocations being performed by ZFS really need per-CPU caching, but I can certainly believe that they could benefit from not going directly to the kmem map on every uma_large_malloc() and uma_large_free(). In other words, I think it would make a lot of sense to have a thin layer between UMA and the kmem map that caches allocated but unused ranges of pages. Regards, Alan From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 16:51:15 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2CB5E106566B for ; Tue, 21 Sep 2010 16:51:15 +0000 (UTC) (envelope-from ozkan.kirik@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id DBB498FC25 for ; Tue, 21 Sep 2010 16:51:14 +0000 (UTC) Received: by qwg5 with SMTP id 5so5016819qwg.13 for ; Tue, 21 Sep 2010 09:51:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=5BCveOt79RqRp3ivO/xbk1bkfxos3Of73yZbYWFDhtM=; b=AxDjGzgVpeVMthJSXbHafE8544PdSMeZ/cX0LYAgn4nU2XH1qxbocEw8oIa6j1LE99 uu5VY42WsOgWNsVJhZhW4bKhkFG3IpkbEIZnrnXHGaOpcMVTLMG0BNUF8zRzTrdfNakI 8Exhj70qm8mtxvEjN9gtXqDUu1vJWthiUwcSg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=co8cthDVkT1rHRQqFMTJ+4BeLvkv8hWAxbQgdGUmtyNt87HPoThS2hiT67PDXLyHAU BVpf0iVlM3l+gwQS2hkNPyEE0n7K+iNhZNr8uYE/F3mAdWhSB2jQkF5TpD7TTIgw3EzU qDcIy302ODcP0n9fMJQOngxehPOluIM8R66ug= MIME-Version: 1.0 Received: by 10.229.251.79 with SMTP id mr15mr7530655qcb.37.1285086198944; Tue, 21 Sep 2010 09:23:18 -0700 (PDT) Received: by 10.229.192.204 with HTTP; Tue, 21 Sep 2010 09:23:18 -0700 (PDT) Date: Tue, 21 Sep 2010 19:23:18 +0300 Message-ID: From: =?ISO-8859-1?Q?=D6zkan_KIRIK?= To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Kernel side buffer overflow issue X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 16:51:15 -0000 Hi, I am using FreeBSD 8.1-STABLE-201008 snapshot. System behaves strangely. Unexpected and meaningless messages seen at consoles. You can download the screen shot from : http://193.255.128.30/~ryland/syslogd.jpg Additionally default router changes unexpectedly. I tried all FreeBSD 7.1, 7.2, 7.3, 8.1-STABLE-201008 releases ( both i386 and amd64 ). All this versions are affected. I inspected logs if someone logged in or changed route (with route -n monitor command). When the default route changed, there isn't any messages at the "route -n monitor" command output. I think there can be a buffer overflow in kernel code. When dummynet enabled, this problem could be seen more frequently. This problem repeats once per 10 minute. I wrote a shell script which monitors the default router. I saw that sometimes netstat -rn shows that default router is changed as 10.0.16.251 or 10.6.10.240 etc. which are client IP addresses but routing still routes to right router 193.X.Y.Z . After a while, routing really fails. You can download the tcpdump capture file from http://193.255.128.30/~ryland/flowdata_10_0_16_251 . This file captured while the default router changes. Tcpdump capture, belongs to the IP Address which shown in default router (10.0.16.251) the tcpdump command: tcpdump -w /home/flowdata_10_0_16_251 -ni bce0.116 host 10.0.16.251 ---------------------------------------------------------------------- dummynet rules are: 30000 pipe 3 tcp from 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 to any dst-port 8000,80,22,25,88,110,443,1720,1863,1521,3389,4489 via em0 // Upload 30000 pipe 3 udp from 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 to any dst-port 53 via em0 // Upload 30000 pipe 4 tcp from 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 to any via em0 // Upload 30000 pipe 4 udp from 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 to any via em0 // Upload .... LOTS OF NAT RULES HERE (in kernel nat) 60000 pipe 1 tcp from any 8000,80,22,25,88,110,443,1720,1863,1521,3389,4489 to 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 via bce0* // Download 60000 pipe 1 udp from any 53 to 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 via bce0* // Download 60000 pipe 2 tcp from any to 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 via bce0* // Download 60000 pipe 2 udp from any to 10.0.0.0/8,192.168.0.0/16,172.16.0.0/12 via bce0* // Download /sbin/ipfw pipe 1 config bw 8192Kbit/s mask dst-ip 0xffffffff /sbin/ipfw pipe 3 config bw 1024Kbit/s mask src-ip 0xffffffff /sbin/ipfw pipe 2 config bw 4096Kbit/s mask dst-ip 0xffffffff /sbin/ipfw pipe 4 config bw 1024Kbit/s mask src-ip 0xffffffff ---------------------------------------------------------------------- sysctl vars: net.inet.ip.fw.dyn_max=65535 net.inet.ip.fw.dyn_ack_lifetime=100 net.inet.ip.fw.dyn_short_lifetime=10 net.inet.ip.fw.one_pass=0 kern.maxfiles=65000 kern.ipc.somaxconn=1024 net.inet.ip.process_options=0 net.inet.ip.fastforwarding=1 net.link.ether.ipfw=1 net.inet.ip.fw.dyn_buckets=65536 kern.maxvnodes=400000 net.inet.ip.dummynet.hash_size=256 ( also tried with 8192 ) net.inet.ip.dummynet.pipe_slot_limit=500 net.inet.ip.dummynet.io_fast=1 ---------------------------------------------------------------------- /boot/loader.conf: autoboot_delay="1" beastie_disable="YES" kern.ipc.nmbclusters=98304 vm.kmem_size="2048M" vm.kmem_size_max="2048M" splash_bmp_load="YES" vesa_load="YES" bitmap_load="YES" bitmap_name="/boot/splash.bmp" hw.ata.ata_dma=0 kern.hz="10000" ---------------------------------------------------------------------- kernel config ( additionally to GENERIC ): device tap device if_bridge device vlan device carp options GEOM_BDE options IPFIREWALL options IPFIREWALL_VERBOSE options HZ=4000 options IPFIREWALL_VERBOSE_LIMIT=4000 options IPFIREWALL_FORWARD options IPFIREWALL_DEFAULT_TO_ACCEPT options IPFIREWALL_NAT options DUMMYNET options IPDIVERT options IPSTEALTH options NETGRAPH options NETGRAPH_IPFW options LIBALIAS options NETGRAPH_NAT options NETGRAPH_PPPOE options NETGRAPH_SOCKET options NETGRAPH_ETHER options DEVICE_POLLING device crypto options IPSEC ---------------------------------------------------------------------- Some Information about network: System has 3 NICS as WAN, LAN, DMZ. There are VLANs on WAN and LAN interfaces Throuput between 20Mbps and 100Mbps. Any ideas? Regards, Ozkan KIRIK Mersin University @ Turkey From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 17:38:05 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EDCE106566C for ; Tue, 21 Sep 2010 17:38:05 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id E2D2B8FC0A for ; Tue, 21 Sep 2010 17:38:04 +0000 (UTC) Received: by qwg5 with SMTP id 5so5056479qwg.13 for ; Tue, 21 Sep 2010 10:38:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=5YcA9YnxRLob+EkNSsDAfiFOqL0XGqBpCUn90J2fFzo=; b=pUN0vbd+f81WarS7gKbP2B8I051VTfs2aF0GV2kgibv4shb0H5KeppQ4EApPkiGaP6 F3KMe4p8Arrk4qC3FXxQt5rTbdfxOcFscM0VDZtHXBqkRYzQM6bbMeSFaiWM0dwt3pLI uV39nljmtj5CeZ5cOuugDEz8Im8gwY0oJOqSM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=nmyFUGSta3JhT4rt96AkAf2YGcjloUK1K0pMizEvJz7XUaszYojKeonoUPTeAcGrWp 4bO180Dsgyah8P8RpNCz2iBcjzOCV+gk3NJW2aXEbnqdlp1z6b/mkNKrdu4CMUP0Da/o 73+/nXqadvnnazltTTdhSPwkCFQgF7utz5awY= MIME-Version: 1.0 Received: by 10.229.191.135 with SMTP id dm7mr7722109qcb.29.1285090683991; Tue, 21 Sep 2010 10:38:03 -0700 (PDT) Received: by 10.229.37.85 with HTTP; Tue, 21 Sep 2010 10:38:03 -0700 (PDT) In-Reply-To: <29760054.post@talk.nabble.com> References: <29760054.post@talk.nabble.com> Date: Tue, 21 Sep 2010 12:38:03 -0500 Message-ID: From: Alan Cox To: Svatopluk Kraus Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org Subject: Re: page table fault, which should map kernel virtual address space X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 17:38:05 -0000 On Mon, Sep 20, 2010 at 9:32 AM, Svatopluk Kraus wrote: > > Hallo, > > this is about 'NKPT' definition, 'kernel_map' submaps, > and 'vm_map_findspace' function. > > Variable 'kernel_map' is used to manage kernel virtual address > space. When 'vm_map_findspace' function deals with 'kernel_map' > then 'pmap_growkernel' function is called. > > At least in 'i386' architecture, pmap implementation uses > 'pmap_growkernel' function to allocate missing page tables. > Missing page tables are problem, because no one checks > 'pte' pointer for validity after use of 'vtopte' macro. > > 'NKPT' definition defines a number of preallocated > page tables during system boot. > > Beyond 'kernel_map', some submaps of 'kernel_map' (buffer_map, > pager_map,...) exist as result of 'kmem_suballoc' function call. > When this submaps are used (for example 'kmem_alloc_nofault' > function) and its virtual address subspace is at the end of > used kernel virtual address space at the moment (and above 'NKPT' > preallocation), then missing page tables are not allocated > and double fault can happen. > > No, the page tables are allocated. If you create a submap X of the kernel map using kmem_suballoc(), then a vm_map_findspace() is performed by vm_map_find() on the kernel map to find space for the submap X. As you note above, the call to vm_map_findspace() on the kernel map will call pmap_growkernel() if needed to extend the kernel page table. If you create another submap X' of X, then that submap X' can only map addresses that fall within the range for X. So, any necessary page table pages were allocated when X was created. That said, there may actually be a problem with the implementation of the superpage_align parameter to kmem_suballoc(). If a submap is created with superpage_align equal to TRUE, but the submap's size is not a multiple of the superpage size, then vm_map_find() may not allocate a page table page for the last megabyte or so of the submap. There are only a few places where kmem_suballoc() is called with superpage_align set to TRUE. If you changed them to FALSE, that is an easy way to test this hypothesis. Regards, Alan From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 21 22:28:36 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 96CD2106564A; Tue, 21 Sep 2010 22:28:36 +0000 (UTC) (envelope-from sepotvin@FreeBSD.org) Received: from relais.videotron.ca (relais.videotron.ca [24.201.245.36]) by mx1.freebsd.org (Postfix) with ESMTP id 645808FC15; Tue, 21 Sep 2010 22:28:36 +0000 (UTC) MIME-version: 1.0 Content-type: multipart/mixed; boundary="Boundary_(ID_aBuVVhu/JWJqoEVI3WdyTA)" Received: from leia.telcobridges.lan ([208.94.105.59]) by VL-MR-MRZ20.ip.videotron.ca (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPA id <0L94000AN8ZFST80@VL-MR-MRZ20.ip.videotron.ca>; Tue, 21 Sep 2010 17:28:30 -0400 (EDT) Message-id: <4C992380.3040700@FreeBSD.org> Date: Tue, 21 Sep 2010 17:28:32 -0400 From: "Stephane E. Potvin" Organization: FreeBSD Project User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100921 Thunderbird/3.1.4 To: Tim Kientzle References: <20100829201050.GA60715@stack.nl> In-reply-to: X-Enigmail-Version: 1.1.2 Cc: freebsd-hackers@freebsd.org, Benjamin Kaduk , Jilles Tjoelker , kaiw@freebsd.org Subject: Re: ar(1) format_decimal failure is fatal? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Sep 2010 22:28:36 -0000 This is a multi-part message in MIME format. --Boundary_(ID_aBuVVhu/JWJqoEVI3WdyTA) Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: 7BIT -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/18/10 03:24, Tim Kientzle wrote: > > On Sep 17, 2010, at 9:01 PM, Benjamin Kaduk wrote: > >> On Sun, 29 Aug 2010, Jilles Tjoelker wrote: >> >>> On Sat, Aug 28, 2010 at 07:08:34PM -0400, Benjamin Kaduk wrote: >>>> [...] >>>> building static egacy library >>>> ar: fatal: Numeric user ID too large >>>> *** Error code 70 >>> >>>> This error appears to be coming from >>>> lib/libarchive/archive_write_set_format_ar.c , which seems to only have >>>> provisions for outputting a user ID in AR_uid_size = 6 columns. >> [...] >>>> It looks like this macro was so defined in version 1.1 of that file, with >>>> commit message "'ar' format support for libarchive, contributed by Kai >>>> Wang.". This doesn't make it terribly clear whether the 'ar' format >>>> mandates this length, or if it is an implementation decision... > > There's no official standard for the ar format, only old > conventions and compatibility with legacy implementations. > >>> I wonder if the uid/gid fields are useful at all for ar archives. Ar >>> archives are usually not extracted, and when they are, the current >>> user's values seem good enough. The uid/gid also prevent exactly >>> reproducible builds (together with the timestamp). >> >> GNU binutils has recently (well, March 2009) added a -D ("deterministic") argument to ar(1) which sets the timestamp, uid, and gid to zero, and the mode to 644. If that argument is not given, linux's ar(1) happily uses my 8-digit uid as-is; the manual page seems to imply that it will handle 15 or 16 digits in that field. > > Please send me a small example file... I don't think I've seen > this format variant. Maybe we can extend our ar(1) to support > this variant. > > Personally, I wonder if it wouldn't make sense to just always > force the timestamp, uid, and gid to zero. I find it hard > to believe anyone is using ar(1) as a general-purpose archiving > tool. Of course, it should be trivial to add -D support to our ar(1). > >> I propose that format_{decimal,octal}() return ARCHIVE_FAILED for negative input, and ARCHIVE_WARN for overflow. archive_write_ar_header() can then catch ARCHIVE_WARN from the format_foo functions and continue on, propagating the ARCHIVE_WARN return value at the end of its execution ... > > This sounds entirely reasonable to me. I personally don't see much > advantage to distinguishing negative versus overflow, but certainly > have no objections to that part. Definitely ar(1) should not abort on > a simple ARCHIVE_WARN. > >> Would (one of) you be willing to review a patch to that effect? > > Happy to do so. > Hi, I've been using the attached patch for quite some time now. It basically replace the offending gid/uid with nobody's id when necessary. If I remember correctly, Tim was supposed to add them to the upstream version of libarchive and then import them back in fbsd. Tim, do you remember what happened with those? Regards, Steph -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkyZI38ACgkQmdOXtTCX/nt2WwCgqvd4GIyE5zRvL5kkHCWTGoAA yA0AoJ/8Dx2QrLXAJHkOrd1YqW+QR03h =KxCW -----END PGP SIGNATURE----- --Boundary_(ID_aBuVVhu/JWJqoEVI3WdyTA) Content-type: text/plain; CHARSET=US-ASCII; name=libarchive_bigids.diff Content-transfer-encoding: 7BIT Content-disposition: attachment; filename=libarchive_bigids.diff Index: usr.bin/tar/write.c =================================================================== --- usr.bin/tar/write.c (revision 212556) +++ usr.bin/tar/write.c (working copy) @@ -439,7 +439,30 @@ { const char *arg; struct archive_entry *entry, *sparse_entry; + struct passwd nobody_pw, *nobody_ppw; + struct group nobody_gr, *nobody_pgr; + char id_buffer[128]; + /* + * Some formats (like ustar) have a limit on the size of the uids/gids + * supported. Tell libarchive to use the uid/gid of nobody in this case + * instead of failing. + */ + getpwnam_r("nobody", &nobody_pw, id_buffer, sizeof (id_buffer), + &nobody_ppw); + if (nobody_ppw) + archive_write_set_nobody_uid(a, nobody_ppw->pw_uid); + else + bsdtar_warnc(0, + "nobody's uid not found, large uids won't be supported."); + getgrnam_r("nobody", &nobody_gr, id_buffer, sizeof (id_buffer), + &nobody_pgr); + if (nobody_pgr) + archive_write_set_nobody_gid(a, nobody_pgr->gr_gid); + else + bsdtar_warnc(0, + "nobody's gid not found, large gids won't be supported."); + /* Allocate a buffer for file data. */ if ((bsdtar->buff = malloc(FILEDATABUFLEN)) == NULL) bsdtar_errc(1, 0, "cannot allocate memory"); Index: usr.bin/tar/bsdtar.1 =================================================================== --- usr.bin/tar/bsdtar.1 (revision 212556) +++ usr.bin/tar/bsdtar.1 (working copy) @@ -1027,3 +1027,6 @@ convention can cause hard link information to be lost. (This is a consequence of the incompatible ways that different archive formats store hardlink information.) +.Pp +Owner and group of the files added to the archive will be replaced by +"nobody" if they are larger than 6 digits and the ustar format is used. Index: usr.bin/ar/ar.1 =================================================================== --- usr.bin/ar/ar.1 (revision 212556) +++ usr.bin/ar/ar.1 (working copy) @@ -402,3 +402,6 @@ .Lb libarchive and the .Lb libelf . +.Sh BUGS +Owner and group of the files added to the archive will be replaced +by "nobody" if they are larger than 6 digits. Index: usr.bin/ar/write.c =================================================================== --- usr.bin/ar/write.c (revision 212556) +++ usr.bin/ar/write.c (working copy) @@ -41,6 +41,8 @@ #include #include #include +#include +#include #include "ar.h" @@ -554,6 +556,9 @@ size_t s_sz; /* size of archive symbol table. */ size_t pm_sz; /* size of pseudo members */ int i, nr; + struct passwd nobody_pw, *nobody_ppw; + struct group nobody_gr, *nobody_pgr; + char id_buffer[128]; if (elf_version(EV_CURRENT) == EV_NONE) bsdar_errc(bsdar, EX_SOFTWARE, 0, @@ -610,6 +615,27 @@ archive_write_set_format_ar_svr4(a); archive_write_set_compression_none(a); + /* + * The archive format doesn't support ids larger than 6 char. + * Try to tell libarchive to use uid/gid of nobody in case the uid/gid + * of the file being added is too large. + */ + getpwnam_r("nobody", &nobody_pw, id_buffer, sizeof (id_buffer), + &nobody_ppw); + if (nobody_ppw) + archive_write_set_nobody_uid(a, nobody_ppw->pw_uid); + else + bsdar_warnc(bsdar, 0, + "nobody's uid not found, large uids won't be supported."); + + getgrnam_r("nobody", &nobody_gr, id_buffer, sizeof (id_buffer), + &nobody_pgr); + if (nobody_pgr) + archive_write_set_nobody_gid(a, nobody_pgr->gr_gid); + else + bsdar_warnc(bsdar, 0, + "nobody's gid not found, large gids won't be supported."); + AC(archive_write_open_filename(a, bsdar->filename)); /* Index: lib/libarchive/archive_write.c =================================================================== --- lib/libarchive/archive_write.c (revision 212556) +++ lib/libarchive/archive_write.c (working copy) @@ -114,6 +114,11 @@ } memset(nulls, 0, a->null_length); a->nulls = nulls; + + /* Initialize the nobody ids */ + a->nobody_uid = -1; + a->nobody_gid = -1; + /* * Set default compression, but don't set a default format. * Were we to set a default format here, we would force every @@ -284,8 +289,59 @@ return (a->bytes_in_last_block); } +/* + * Set the uid to use when the uid is too large to fit into the archive. + * Usually set to 'nobody' + */ +int +archive_write_set_nobody_uid(struct archive *_a, id_t uid) +{ + struct archive_write *a = (struct archive_write *)_a; + __archive_check_magic(&a->archive, ARCHIVE_WRITE_MAGIC, + ARCHIVE_STATE_ANY, "archive_write_set_nobody_uid"); + a->nobody_uid = uid; + return (ARCHIVE_OK); +} /* + * Return the value set above. -1 indicates it has not been set. + */ +id_t +archive_write_get_nobody_uid(struct archive *_a) +{ + struct archive_write *a = (struct archive_write *)_a; + __archive_check_magic(&a->archive, ARCHIVE_WRITE_MAGIC, + ARCHIVE_STATE_ANY, "archive_write_get_nobody_uid"); + return (a->nobody_uid); +} + +/* + * Set the gid to use when the gid is too large to fit into the archive. + * Usually set to 'nobody' + */ +int +archive_write_set_nobody_gid(struct archive *_a, id_t gid) +{ + struct archive_write *a = (struct archive_write *)_a; + __archive_check_magic(&a->archive, ARCHIVE_WRITE_MAGIC, + ARCHIVE_STATE_ANY, "archive_write_set_nobody_gid"); + a->nobody_gid = gid; + return (ARCHIVE_OK); +} + +/* + * Return the value set avobe. -1 indicates it has not been set. + */ +id_t +archive_write_get_nobody_gid(struct archive *_a) +{ + struct archive_write *a = (struct archive_write *)_a; + __archive_check_magic(&a->archive, ARCHIVE_WRITE_MAGIC, + ARCHIVE_STATE_ANY, "archive_write_get_nobody_gid"); + return (a->nobody_gid); +} + +/* * dev/ino of a file to be rejected. Used to prevent adding * an archive to itself recursively. */ Index: lib/libarchive/archive_write_set_format_ustar.c =================================================================== --- lib/libarchive/archive_write_set_format_ustar.c (revision 212556) +++ lib/libarchive/archive_write_set_format_ustar.c (working copy) @@ -365,13 +365,17 @@ } if (format_number(archive_entry_uid(entry), h + USTAR_uid_offset, USTAR_uid_size, USTAR_uid_max_size, strict)) { - archive_set_error(&a->archive, ERANGE, "Numeric user ID too large"); - ret = ARCHIVE_FAILED; + if (a->nobody_uid == -1 || format_number(a->nobody_uid, h + USTAR_uid_offset, USTAR_uid_size, USTAR_uid_max_size, strict)) { + archive_set_error(&a->archive, ERANGE, "Numeric user ID too large"); + ret = ARCHIVE_FAILED; + } } if (format_number(archive_entry_gid(entry), h + USTAR_gid_offset, USTAR_gid_size, USTAR_gid_max_size, strict)) { - archive_set_error(&a->archive, ERANGE, "Numeric group ID too large"); - ret = ARCHIVE_FAILED; + if (a->nobody_uid == -1 || format_number(a->nobody_gid, h + USTAR_gid_offset, USTAR_gid_size, USTAR_gid_max_size, strict)) { + archive_set_error(&a->archive, ERANGE, "Numeric group ID too large"); + ret = ARCHIVE_FAILED; + } } if (format_number(archive_entry_size(entry), h + USTAR_size_offset, USTAR_size_size, USTAR_size_max_size, strict)) { Index: lib/libarchive/archive.h =================================================================== --- lib/libarchive/archive.h (revision 212556) +++ lib/libarchive/archive.h (working copy) @@ -514,6 +514,15 @@ int bytes_in_last_block); __LA_DECL int archive_write_get_bytes_in_last_block(struct archive *); +/* The uid/gid to use when the uid/gid of the file that is to be archived + * is too large to be expressed in the archive format selected. */ +__LA_DECL int archive_write_set_nobody_uid(struct archive *, + id_t nobody_uid); +__LA_DECL id_t archive_write_get_nobody_uid(struct archive *); +__LA_DECL int archive_write_set_nobody_gid(struct archive *, + id_t nobody_gid); +__LA_DECL id_t archive_write_get_nobody_gid(struct archive *); + /* The dev/ino of a file that won't be archived. This is used * to avoid recursively adding an archive to itself. */ __LA_DECL int archive_write_set_skip_file(struct archive *, dev_t, ino_t); Index: lib/libarchive/archive_write_private.h =================================================================== --- lib/libarchive/archive_write_private.h (revision 212556) +++ lib/libarchive/archive_write_private.h (working copy) @@ -103,6 +103,13 @@ struct archive_entry *); ssize_t (*format_write_data)(struct archive_write *, const void *buff, size_t); + + /* + * Uid/Gid that should be used when the file uid/gid is too large + * to be adequately expressed in the archive format (usually nobody). + */ + id_t nobody_uid; + id_t nobody_gid; }; /* Index: lib/libarchive/archive_write_set_format_ar.c =================================================================== --- lib/libarchive/archive_write_set_format_ar.c (revision 212556) +++ lib/libarchive/archive_write_set_format_ar.c (working copy) @@ -299,14 +299,18 @@ return (ARCHIVE_WARN); } if (format_decimal(archive_entry_uid(entry), buff + AR_uid_offset, AR_uid_size)) { - archive_set_error(&a->archive, ERANGE, - "Numeric user ID too large"); - return (ARCHIVE_WARN); + if (a->nobody_uid == -1 || format_decimal(a->nobody_uid, buff + AR_uid_offset, AR_uid_size)) { + archive_set_error(&a->archive, ERANGE, + "Numeric user ID too large"); + return (ARCHIVE_WARN); + } } if (format_decimal(archive_entry_gid(entry), buff + AR_gid_offset, AR_gid_size)) { - archive_set_error(&a->archive, ERANGE, - "Numeric group ID too large"); - return (ARCHIVE_WARN); + if (a->nobody_gid == -1 || format_decimal(a->nobody_gid, buff + AR_gid_offset, AR_gid_size)) { + archive_set_error(&a->archive, ERANGE, + "Numeric group ID too large"); + return (ARCHIVE_WARN); + } } if (format_octal(archive_entry_mode(entry), buff + AR_mode_offset, AR_mode_size)) { archive_set_error(&a->archive, ERANGE, Index: lib/libarchive/archive_write.3 =================================================================== --- lib/libarchive/archive_write.3 (revision 212556) +++ lib/libarchive/archive_write.3 (working copy) @@ -38,6 +38,10 @@ .Nm archive_write_get_bytes_per_block , .Nm archive_write_set_bytes_per_block , .Nm archive_write_set_bytes_in_last_block , +.Nm archive_write_set_nobody_uid , +.Nm archive_write_get_nobody_uid , +.Nm archive_write_set_nobody_gid , +.Nm archive_write_get_nobody_gid , .Nm archive_write_set_compression_bzip2 , .Nm archive_write_set_compression_compress , .Nm archive_write_set_compression_gzip , @@ -68,6 +72,14 @@ .Ft int .Fn archive_write_set_bytes_in_last_block "struct archive *" "int" .Ft int +.Fn archive_write_set_nobody_uid "struct archive *" "id_t" +.Ft id_t +.Fn archive_write_get_nobody_uid "struct archive *" +.Ft int +.Fn archive_write_set_nobody_gid "struct archive *" "id_t" +.Ft id_t +.Fn archive_write_get_nobody_gid "struct archive *" +.Ft int .Fn archive_write_set_compression_bzip2 "struct archive *" .Ft int .Fn archive_write_set_compression_compress "struct archive *" @@ -174,6 +186,19 @@ Retrieve the currently-set value for last block size. A value of -1 here indicates that the library should use default values. .It Xo +.Fn archive_write_set_nobody_uid , +.Fn archive_write_set_nobody_gid +.Xc +The uid/gid to use when the uid/gid of the file that is to be written +is too large for the underlying archive format. +.It Xo +.Fn archive_write_get_nobody_uid , +.Fn archive_write_get_nobody_gid +.Xc +Retrieve the currently-set value for the nobody uid/gid. +A value of -1 here indicates that the library should use default +behavior (usually failing with an error). +.It Xo .Fn archive_write_set_format_cpio , .Fn archive_write_set_format_pax , .Fn archive_write_set_format_pax_restricted , --Boundary_(ID_aBuVVhu/JWJqoEVI3WdyTA) Content-type: application/octet-stream; name=libarchive_bigids.diff.sig Content-transfer-encoding: base64 Content-disposition: attachment; filename=libarchive_bigids.diff.sig iEYEABECAAYFAkyZI4AACgkQmdOXtTCX/nuawwCgiUl5FgTQKlfLNdWtFsl0PXrenoIAn1yK rmwVn1kFg2Hp0/1/5EV7c8wJ --Boundary_(ID_aBuVVhu/JWJqoEVI3WdyTA)-- From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 07:25:28 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DFF43106566C; Wed, 22 Sep 2010 07:25:28 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5B8868FC1E; Wed, 22 Sep 2010 07:25:26 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA24079; Wed, 22 Sep 2010 10:25:24 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OyJhk-0001A1-Fu; Wed, 22 Sep 2010 10:25:24 +0300 Message-ID: <4C99AF63.3000900@freebsd.org> Date: Wed, 22 Sep 2010 10:25:23 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100918 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: alc@freebsd.org, Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Robert Watson , Jeff Roberson , Alan Cox , Andre Oppermann , freebsd-hackers@freebsd.org Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 07:25:29 -0000 on 21/09/2010 19:16 Alan Cox said the following: > Actually, I think that there is a middle ground between "per-cpu caches" and > "directly from the VM" that we are missing. When I've looked at the default > configuration of ZFS (without the extra UMA zones enabled), there is an > incredible amount of churn on the kmem map caused by the implementation of > uma_large_malloc() and uma_large_free() going directly to the kmem map. Not > only are the obvious things happening, like allocating and freeing kernel > virtual addresses and underlying physical pages on every call, but also > system-wide TLB shootdowns and sometimes superpage demotions are occurring. > > I have some trouble believing that the large allocations being performed by ZFS > really need per-CPU caching, but I can certainly believe that they could benefit > from not going directly to the kmem map on every uma_large_malloc() and > uma_large_free(). In other words, I think it would make a lot of sense to have > a thin layer between UMA and the kmem map that caches allocated but unused > ranges of pages. Alan, thank you very much for the testing and analysis. These are very good points. So, for the reference, here are two patches that I came up with: 1. original patch that attempts to implement Solaris-like behavior but doesn't go all the way to disabling per-CPU caches: http://people.freebsd.org/~avg/uma-1.diff 2. patch that attempts to implement Jeff's three suggestions; I've tested per-CPU cache size adaptive behavior, works well, but haven't tested per-CPU cache draining yet: http://people.freebsd.org/~avg/uma-2.diff -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 10:55:50 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFA9A1065674; Wed, 22 Sep 2010 10:55:50 +0000 (UTC) (envelope-from sobomax@FreeBSD.org) Received: from sippysoft.com (gk1.360sip.com [72.236.70.240]) by mx1.freebsd.org (Postfix) with ESMTP id 882BD8FC1A; Wed, 22 Sep 2010 10:55:50 +0000 (UTC) Received: from [192.168.1.38] (S0106005004e13421.vs.shawcable.net [70.71.175.212]) (authenticated bits=0) by sippysoft.com (8.14.3/8.14.3) with ESMTP id o8MAavGC042791 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Wed, 22 Sep 2010 03:36:58 -0700 (PDT) (envelope-from sobomax@FreeBSD.org) Message-ID: <4C99DC48.1020208@FreeBSD.org> Date: Wed, 22 Sep 2010 03:36:56 -0700 From: Maxim Sobolev Organization: Sippy Software, Inc. User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: "current@freebsd.org" , FreeBSD Hackers Content-Type: text/plain; charset=KOI8-U; format=flowed Content-Transfer-Encoding: 7bit Cc: Jeff Roberson Subject: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 10:55:50 -0000 Hi, Is there any reason to keep MAXCPU at 16 in the default kernel config? There are quite few servers on the market today that have 24 or even 32 physical cores. With hyper-threading this can even go as high as 48 or 64 virtual cpus. People who buy such hardware might get very disappointed finding out that the FreeBSD is not going to use such hardware to its full potential. Does anybody object if I'd bump MAXCPU to 32, which is still low but might me more reasonable default these days, or at least make it an kernel configuration option documented in the NOTES? Thanks! -Maxim From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 11:16:47 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E1471065673 for ; Wed, 22 Sep 2010 11:16:47 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id 7F3C68FC15 for ; Wed, 22 Sep 2010 11:16:46 +0000 (UTC) Received: by qyk31 with SMTP id 31so5686519qyk.13 for ; Wed, 22 Sep 2010 04:16:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=HOZZgwT9rFmy4LQ068LGIqQdSxBXqRCUYQdWMjPMbqI=; b=w+3NECHb3c2hzk4flj9s0Y9zTQ3UaoEdr/E2aVex5XNAeAIP6q6bmc6oYZoIyhr7EX SL3iEV3FvTvOP2cbqXrjaTq/ewQPOiPJkyykC4bT5SK2ylmzmppN8DEi+M9Mk68GYi2j r9oP3n285LOZZp0KJ4Kv2+FTh5PF+1O0jOeKM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=qDWj2Mo4gEfjSTM1UhkwCFIReE6Bz51cNayTjOWiv/vsfxbu3db6nARDqTkvBUG4YP SZvvinqSL9GkRJONIFNDviyZ0lyZ/A6iEgpIjA9MlMU73RtDSnfo8FIUXeFJ0Gw5oK7I UKR3ycwgcqfFy3VGNCRymSK+AfqMl5LhPDOis= MIME-Version: 1.0 Received: by 10.224.19.200 with SMTP id c8mr13200qab.70.1285154205704; Wed, 22 Sep 2010 04:16:45 -0700 (PDT) Received: by 10.229.50.8 with HTTP; Wed, 22 Sep 2010 04:16:45 -0700 (PDT) In-Reply-To: <4C99DC48.1020208@FreeBSD.org> References: <4C99DC48.1020208@FreeBSD.org> Date: Wed, 22 Sep 2010 15:16:45 +0400 Message-ID: From: pluknet To: Maxim Sobolev Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD Hackers , Jeff Roberson , "current@freebsd.org" Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 11:16:47 -0000 2010/9/22 Maxim Sobolev : > Hi, > > Is there any reason to keep MAXCPU at 16 in the default kernel config? There > are quite few servers on the market today that have 24 or even 32 physical > cores. With hyper-threading this can even go as high as 48 or 64 virtual > cpus. People who buy such hardware might get very disappointed finding out > that the FreeBSD is not going to use such hardware to its full potential. > > Does anybody object if I'd bump MAXCPU to 32, which is still low but might > me more reasonable default these days, or at least make it an kernel > configuration option documented in the NOTES? Please correct me, if I'm about smth. different, but isn't it already? /sys/amd64/include/param.h:#define MAXCPU 32 /sys/arm/include/param.h:#define MAXCPU 2 /sys/i386/include/param.h:#define MAXCPU 32 /sys/ia64/include/param.h:#define MAXCPU 32 /sys/mips/include/param.h:#define MAXCPU MAXSMPCPU /sys/powerpc/include/param.h:#define MAXCPU 2 /sys/sparc64/include/param.h:#define MAXCPU 16 /sys/sun4v/include/param.h:#define MAXCPU 32 (almost 2y ago for x86) -- wbr, pluknet From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 13:41:18 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 673291065670; Wed, 22 Sep 2010 13:41:18 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 34DFF8FC08; Wed, 22 Sep 2010 13:41:18 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id B782E46B17; Wed, 22 Sep 2010 09:41:17 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 46D668A04E; Wed, 22 Sep 2010 09:41:14 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 22 Sep 2010 09:37:12 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <4C99DC48.1020208@FreeBSD.org> In-Reply-To: <4C99DC48.1020208@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-u" Content-Transfer-Encoding: 7bit Message-Id: <201009220937.13155.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 22 Sep 2010 09:41:15 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Maxim Sobolev , Jeff Roberson , "current@freebsd.org" , FreeBSD Hackers Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 13:41:18 -0000 On Wednesday, September 22, 2010 6:36:56 am Maxim Sobolev wrote: > Hi, > > Is there any reason to keep MAXCPU at 16 in the default kernel config? > There are quite few servers on the market today that have 24 or even 32 > physical cores. With hyper-threading this can even go as high as 48 or > 64 virtual cpus. People who buy such hardware might get very > disappointed finding out that the FreeBSD is not going to use such > hardware to its full potential. > > Does anybody object if I'd bump MAXCPU to 32, which is still low but > might me more reasonable default these days, or at least make it an > kernel configuration option documented in the NOTES? ? % grep MAXCPU ~/work/freebsd/svn/head/sys/amd64/include/param.h #define MAXCPU 32 #define MAXCPU 1 In fact: % grep MAXCPU ~/work/freebsd/svn/stable/8/sys/amd64/include/param.h #define MAXCPU 32 #define MAXCPU 1 Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for existing klds. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 17:33:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17DA61065675 for ; Wed, 22 Sep 2010 17:33:46 +0000 (UTC) (envelope-from curtis.penner2@gmail.com) Received: from mail-px0-f182.google.com (mail-px0-f182.google.com [209.85.212.182]) by mx1.freebsd.org (Postfix) with ESMTP id DC1D78FC0A for ; Wed, 22 Sep 2010 17:33:45 +0000 (UTC) Received: by pxi17 with SMTP id 17so269246pxi.13 for ; Wed, 22 Sep 2010 10:33:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=SNI5Ek1jZa/aTYeV0c8LtOphXj1AvFIGmsuDQyObDLA=; b=NgwntF9UWydc9YbprhJFlu11gcpWleS7qSNMX8sSYKGVhA2WWlN9Ac5rThMDXZBNT7 YHI854WQjdIke1ki/zA8mvrCgVxeAhRyXGkifK47qehs8yUVRLVMPFkvizChtHIEsYwh 3M4uxlD2CxMz8tFiKaCEUX/UeVIbqCoKuK1W0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=E5Cn3e7eYHTJzP7tytpnQiqHnTYGx7G20vkGhUgo5tWkW8c9Cmj1XQvwPfkqfgUuh6 O68ePfy1ht8f6bHa+eg1jf9Z1eT42LYHYiIOE93j1yPmBJwj/W8vVyKYRZw+WQ8Ebv3l 2odHTmLoKjoR03h6hMAY1FqaHuwQ4uxQqtsmc= Received: by 10.114.152.6 with SMTP id z6mr530865wad.151.1285175311540; Wed, 22 Sep 2010 10:08:31 -0700 (PDT) Received: from [192.168.77.141] (64-71-25-34.static.wiline.com [64.71.25.34]) by mx.google.com with ESMTPS id o17sm17938183wal.21.2010.09.22.10.08.28 (version=SSLv3 cipher=RC4-MD5); Wed, 22 Sep 2010 10:08:29 -0700 (PDT) Message-ID: <4C9A380E.7070807@gmail.com> Date: Wed, 22 Sep 2010 10:08:30 -0700 From: Curtis Penner User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100915 Lightning/1.0b1 Thunderbird/3.0.8 MIME-Version: 1.0 To: freebsd-hackers@freebsd.org References: <4C99DC48.1020208@FreeBSD.org> <201009220937.13155.jhb@freebsd.org> In-Reply-To: <201009220937.13155.jhb@freebsd.org> Content-Type: text/plain; charset=KOI8-U; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 17:33:46 -0000 MAXCPU at 32 has been good in the 32bit days. Soon there will be (if not already) systems that will have 16cores/socket or more, and motherboards that have 4 sockets or more. Combining this with hyper-threading, you have gone significantly beyond the limits of feasible server. Bumping the number now is not feasible. But in release 9+ or 10, this number could be bumped to something in the order of 1024 or more. This will not be easy as there are considerable performance and compatibility problems. But with Moore's law it will happen, and freeBSD will need to adapt to stay relevant. Curtis Penner On 09/22/2010 06:37 AM, John Baldwin wrote: > On Wednesday, September 22, 2010 6:36:56 am Maxim Sobolev wrote: > >> Hi, >> >> Is there any reason to keep MAXCPU at 16 in the default kernel config? >> There are quite few servers on the market today that have 24 or even 32 >> physical cores. With hyper-threading this can even go as high as 48 or >> 64 virtual cpus. People who buy such hardware might get very >> disappointed finding out that the FreeBSD is not going to use such >> hardware to its full potential. >> >> Does anybody object if I'd bump MAXCPU to 32, which is still low but >> might me more reasonable default these days, or at least make it an >> kernel configuration option documented in the NOTES? >> > ? > > % grep MAXCPU ~/work/freebsd/svn/head/sys/amd64/include/param.h > #define MAXCPU 32 > #define MAXCPU 1 > > In fact: > > % grep MAXCPU ~/work/freebsd/svn/stable/8/sys/amd64/include/param.h > #define MAXCPU 32 > #define MAXCPU 1 > > Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for > existing klds. > > From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 18:34:08 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F01CC1065757 for ; Wed, 22 Sep 2010 18:34:08 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id AB3C98FC19 for ; Wed, 22 Sep 2010 18:34:08 +0000 (UTC) Received: by qwd6 with SMTP id 6so79430qwd.13 for ; Wed, 22 Sep 2010 11:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=MTUeAeAhe8IYHKYyDx8ALYuVn3TIfRbGF0T3l84jWK8=; b=X+jOib2kLoRJXC0eTSaCp1kUwgr1kYpJAa0m8qivk+Kqs1Pd2bqWqwEGxRqJfFmHWk d926zVKl7L3E5wc0iNBrnC3RCpXyWZpbEQZlqRDbXDdQCwDluDQybu81WZJVL6xi+xJH 9IIuJSCkmiKqC8GLIJ078Sw07zHk0iQbaIQI8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=aELlcfgMorGn7Z87JpDOzNm7ybRmmI0hpjzgiETET+0x/hZTwP70/eGaFDxtKK4QzV p15aayIpNai5k9Tue2UFhXJyPggU7QKjon2uZAwK2G+Q2ViHKnmUtDI5SjCnq71jq7a+ b66/FDzhQ9JYdCST/QF+cUd5TStzvAMpX/O2Y= MIME-Version: 1.0 Received: by 10.224.105.199 with SMTP id u7mr406795qao.131.1285180447838; Wed, 22 Sep 2010 11:34:07 -0700 (PDT) Received: by 10.229.50.8 with HTTP; Wed, 22 Sep 2010 11:34:07 -0700 (PDT) In-Reply-To: <4C9A380E.7070807@gmail.com> References: <4C99DC48.1020208@FreeBSD.org> <201009220937.13155.jhb@freebsd.org> <4C9A380E.7070807@gmail.com> Date: Wed, 22 Sep 2010 22:34:07 +0400 Message-ID: From: pluknet To: Curtis Penner Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 18:34:09 -0000 2010/9/22 Curtis Penner : > MAXCPU at 32 has been good in the 32bit days. =A0Soon there will be (if n= ot > already) systems that will have 16cores/socket or more, and motherboards > that have 4 sockets or more. =A0Combining this with hyper-threading, you = have > gone significantly beyond the limits of feasible server. There is a one (16cores per socket, up to 4 sockets, 512 way). http://www.oracle.com/us/corporate/press/173536 --=20 wbr, pluknet From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 21:06:19 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 343581065670 for ; Wed, 22 Sep 2010 21:06:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 076B78FC17 for ; Wed, 22 Sep 2010 21:06:19 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 99C8846B0C; Wed, 22 Sep 2010 17:06:18 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C0C558A03C; Wed, 22 Sep 2010 17:06:17 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Wed, 22 Sep 2010 17:04:56 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100819; KDE/4.4.5; amd64; ; ) References: <4C99DC48.1020208@FreeBSD.org> <201009220937.13155.jhb@freebsd.org> <4C9A380E.7070807@gmail.com> In-Reply-To: <4C9A380E.7070807@gmail.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="koi8-u" Content-Transfer-Encoding: 7bit Message-Id: <201009221704.56514.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 22 Sep 2010 17:06:17 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Curtis Penner Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 21:06:19 -0000 On Wednesday, September 22, 2010 1:08:30 pm Curtis Penner wrote: > MAXCPU at 32 has been good in the 32bit days. Soon there will be (if > not already) systems that will have 16cores/socket or more, and > motherboards that have 4 sockets or more. Combining this with > hyper-threading, you have gone significantly beyond the limits of > feasible server. My point was in response to Maxim's mail about bumping it from 16. Going higher than 32 is a bigger project (but in progress-ish) as it involves transitioning away from a simple int to hold CPU ID bitmasks (cpumask_t) and using cpuset_t instead. -- John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Wed Sep 22 21:25:50 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CBD951065670; Wed, 22 Sep 2010 21:25:50 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from sippysoft.com (gk1.360sip.com [72.236.70.240]) by mx1.freebsd.org (Postfix) with ESMTP id 8DC908FC15; Wed, 22 Sep 2010 21:25:50 +0000 (UTC) Received: from [192.168.1.38] (S0106005004e13421.vs.shawcable.net [70.71.175.212]) (authenticated bits=0) by sippysoft.com (8.14.3/8.14.3) with ESMTP id o8MKrDmf049419 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Wed, 22 Sep 2010 13:53:15 -0700 (PDT) (envelope-from sobomax@sippysoft.com) Message-ID: <4C9A6CB8.3010400@sippysoft.com> Date: Wed, 22 Sep 2010 13:53:12 -0700 From: Maxim Sobolev Organization: Sippy Software, Inc. User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.9) Gecko/20100915 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: John Baldwin References: <4C99DC48.1020208@FreeBSD.org> <201009220937.13155.jhb@freebsd.org> In-Reply-To: <201009220937.13155.jhb@freebsd.org> Content-Type: text/plain; charset=KOI8-U; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Wed, 22 Sep 2010 21:40:25 +0000 Cc: FreeBSD Hackers , Jeff Roberson , freebsd-current@FreeBSD.ORG, "current@freebsd.org" Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2010 21:25:50 -0000 On 9/22/2010 6:37 AM, John Baldwin wrote: > Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for > existing klds. Ah, ok, sorry, I did only check RELENG_7. Can we make it a kernel option then? Regards, -- Maksym Sobolyev Sippy Software, Inc. Internet Telephony (VoIP) Experts T/F: +1-646-651-1110 Web: http://www.sippysoft.com MSN: sales@sippysoft.com Skype: SippySoft From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 23 10:32:32 2010 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 94963106566B; Thu, 23 Sep 2010 10:32:32 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6E2F48FC0C; Thu, 23 Sep 2010 10:32:32 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id EFBFB46B5C; Thu, 23 Sep 2010 06:32:31 -0400 (EDT) Date: Thu, 23 Sep 2010 11:32:31 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Maxim Sobolev In-Reply-To: <4C9A6CB8.3010400@sippysoft.com> Message-ID: References: <4C99DC48.1020208@FreeBSD.org> <201009220937.13155.jhb@freebsd.org> <4C9A6CB8.3010400@sippysoft.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Hackers , Jeff Roberson , freebsd-current@FreeBSD.ORG, "current@freebsd.org" , John Baldwin Subject: Re: Bumping MAXCPU on amd64? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Sep 2010 10:32:32 -0000 On Wed, 22 Sep 2010, Maxim Sobolev wrote: > On 9/22/2010 6:37 AM, John Baldwin wrote: >> Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for >> existing klds. > > Ah, ok, sorry, I did only check RELENG_7. Can we make it a kernel option > then? In principle, yes, but MAXCPU is used to size various kernel data structures inspected by userspace crash post-mortem tools, etc. I've done a bit of work to teach some of those tools (in particular, vmstat -z and vmstat -m) to extract the version of maxcpu compiled into the kernel instead just relying on the version of MAXCPU present when the command line tool was compiled. However, I think a better long-term approach here is to generally eliminate sizing based on MAXCPU and instead size based on the number of CPUs present. Certain kernel subsystems already do this (UMA, netisr, ...) but others don't (malloc(9), ...). Additional hands on this project would probably help :-). As John mentioned, the other issue is the use of fixed-width types instead of variable-length CPU bitmasks to name cores for IPIs, etc. There are people actively working on this, but it's a non-trivial project as kernel code likes to do things like cpumask & othermask. My expectation is that this problem will be solved in 9.0 but I don't see any obvious MFC paths for 8.x due to KBI issues. It could be that this forces our hand in terms of breaking the KBI at some point in the 8.x series, unclear... Robert From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 24 12:17:50 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB9D2106566B; Fri, 24 Sep 2010 12:17:50 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C85568FC08; Fri, 24 Sep 2010 12:17:49 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA16338; Fri, 24 Sep 2010 15:17:47 +0300 (EEST) (envelope-from avg@freebsd.org) Message-ID: <4C9C96EA.9060100@freebsd.org> Date: Fri, 24 Sep 2010 15:17:46 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.9) Gecko/20100920 Lightning/1.0b2 Thunderbird/3.1.4 MIME-Version: 1.0 To: Jeff Roberson References: <4C93236B.4050906@freebsd.org> <4C935F56.4030903@freebsd.org> <4C95C804.1010701@freebsd.org> <4C95CCDA.7010007@freebsd.org> <4C984E90.60507@freebsd.org> <4C99AF63.3000900@freebsd.org> In-Reply-To: <4C99AF63.3000900@freebsd.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Jeff Roberson Subject: Re: zfs + uma X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Sep 2010 12:17:50 -0000 on 22/09/2010 10:25 Andriy Gapon said the following: > 2. patch that attempts to implement Jeff's three suggestions; I've tested > per-CPU cache size adaptive behavior, works well, but haven't tested per-CPU > cache draining yet: > http://people.freebsd.org/~avg/uma-2.diff Now I've fully tested this change, found out that it is a very bad idea to call cache_drain/cache_drain2 on UMA_ZFLAG_INTERNAL zones, and updated the patch. Everything seems to work as expected. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 25 01:22:57 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8DF311065673 for ; Sat, 25 Sep 2010 01:22:57 +0000 (UTC) (envelope-from neelnatu@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 283898FC0A for ; Sat, 25 Sep 2010 01:22:56 +0000 (UTC) Received: by wyb33 with SMTP id 33so4268945wyb.13 for ; Fri, 24 Sep 2010 18:22:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=FU+eW1D4ITpTLIHvkWrQ7uWGwPPpBYfiJcbaakDGVno=; b=W+2pWbf4h0GlY9IIjh+mrhTrDxGwocporWgiv0tsjHGWzq8wQ7n0JFPNgN8o8sa9MS obTKwAqHG4mQJrKrLSP03ixa3Wtns3XrTHkocLRwq4cat2oK1vbniXV5TlUAjs84TxmP ihZYlQRwh3I7SfLVCCjUPtYmi6zJsrQlszIHU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=VKdP+MqPTA32gPLZSyMzbshT9ZxhE3kkM9Lzs12Nx1OrIudAU6Q4Oo/5DmCtL/iBh/ Ts0pUuRUwlGkFYUxbhjhFsfUVli6VIHaQ3BCUiE0IMyoujgE/MKCPzSL96+nyg9vbMRD u6N+nlqqgnFvFng2Bi86sYsPz/ZzE2cVAW+CU= MIME-Version: 1.0 Received: by 10.216.9.3 with SMTP id 3mr3379409wes.66.1285376445042; Fri, 24 Sep 2010 18:00:45 -0700 (PDT) Received: by 10.216.133.5 with HTTP; Fri, 24 Sep 2010 18:00:44 -0700 (PDT) Date: Fri, 24 Sep 2010 18:00:44 -0700 Message-ID: From: Neel Natu To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: PATCH: fix bogus error message "bus_dmamem_alloc failed to align memory properly" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Sep 2010 01:22:57 -0000 Hi, This patch fixes the bogus error message from bus_dmamem_alloc() about the buffer not being aligned properly. The problem is that the check is against a virtual address as opposed to the physical address. contigmalloc() makes guarantees about the alignment of physical addresses but not the virtual address mapping it. Any objections if I commit this patch? best Neel Index: sys/powerpc/powerpc/busdma_machdep.c =================================================================== --- sys/powerpc/powerpc/busdma_machdep.c (revision 213113) +++ sys/powerpc/powerpc/busdma_machdep.c (working copy) @@ -529,7 +529,7 @@ CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d", __func__, dmat, dmat->flags, ENOMEM); return (ENOMEM); - } else if ((uintptr_t)*vaddr & (dmat->alignment - 1)) { + } else if (vtophys(*vaddr) & (dmat->alignment - 1)) { printf("bus_dmamem_alloc failed to align memory properly.\n"); } #ifdef NOTYET Index: sys/sparc64/sparc64/bus_machdep.c =================================================================== --- sys/sparc64/sparc64/bus_machdep.c (revision 213113) +++ sys/sparc64/sparc64/bus_machdep.c (working copy) @@ -652,7 +652,7 @@ } if (*vaddr == NULL) return (ENOMEM); - if ((uintptr_t)*vaddr % dmat->dt_alignment) + if (vtophys(*vaddr) % dmat->dt_alignment) printf("%s: failed to align memory properly.\n", __func__); return (0); } Index: sys/ia64/ia64/busdma_machdep.c =================================================================== --- sys/ia64/ia64/busdma_machdep.c (revision 213113) +++ sys/ia64/ia64/busdma_machdep.c (working copy) @@ -455,7 +455,7 @@ } if (*vaddr == NULL) return (ENOMEM); - else if ((uintptr_t)*vaddr & (dmat->alignment - 1)) + else if (vtophys(*vaddr) & (dmat->alignment - 1)) printf("bus_dmamem_alloc failed to align memory properly.\n"); return (0); } Index: sys/i386/i386/busdma_machdep.c =================================================================== --- sys/i386/i386/busdma_machdep.c (revision 213113) +++ sys/i386/i386/busdma_machdep.c (working copy) @@ -540,7 +540,7 @@ CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d", __func__, dmat, dmat->flags, ENOMEM); return (ENOMEM); - } else if ((uintptr_t)*vaddr & (dmat->alignment - 1)) { + } else if (vtophys(*vaddr) & (dmat->alignment - 1)) { printf("bus_dmamem_alloc failed to align memory properly.\n"); } if (flags & BUS_DMA_NOCACHE) Index: sys/amd64/amd64/busdma_machdep.c =================================================================== --- sys/amd64/amd64/busdma_machdep.c (revision 213113) +++ sys/amd64/amd64/busdma_machdep.c (working copy) @@ -526,7 +526,7 @@ CTR4(KTR_BUSDMA, "%s: tag %p tag flags 0x%x error %d", __func__, dmat, dmat->flags, ENOMEM); return (ENOMEM); - } else if ((uintptr_t)*vaddr & (dmat->alignment - 1)) { + } else if (vtophys(*vaddr) & (dmat->alignment - 1)) { printf("bus_dmamem_alloc failed to align memory properly.\n"); } if (flags & BUS_DMA_NOCACHE)