From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 25 10:07:27 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 320CE1065673 for ; Sun, 25 Jul 2010 10:07:27 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 86A978FC21 for ; Sun, 25 Jul 2010 10:07:26 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id NAA03028; Sun, 25 Jul 2010 13:07:24 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Ocy79-000CxD-Ou; Sun, 25 Jul 2010 13:07:23 +0300 Message-ID: <4C4C0CD9.6000002@freebsd.org> Date: Sun, 25 Jul 2010 13:07:21 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: RW References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> In-Reply-To: <20100725003144.3cfead39@gumby.homeunix.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jul 2010 10:07:27 -0000 on 25/07/2010 02:31 RW said the following: > On Sat, 24 Jul 2010 23:23:07 +0300 > Andriy Gapon wrote: > >> There is a good deal of comments in the vm_pageout.c code that imply >> that we use a hysteresis approach to deal with low available pages >> condition. >> >> >> In general, the hysteresis, the comments and the code make sense. >> My doubt, though, is about the block of code that is right below the >> comment quoted above: >> if (vm_pages_needed && !vm_page_count_min()) { >> if (!vm_paging_needed()) >> vm_pages_needed = 0; >> wakeup(&cnt.v_free_count); >> } > > As I understand it the hysteresis is done inside vm_pageout_scan, and > the expectation is that one pass will typically satisfy this because the > design aims to keep enough clean pages in the inactive queue. I have seen these lines in vm_pageout_scan: /* * Calculate the number of pages we want to either free or move * to the cache. */ page_shortage = vm_paging_target() + addl_page_shortage_init; ... /* * Compute the number of pages we want to try to move from the * active queue to the inactive queue. */ page_shortage = vm_paging_target() + cnt.v_inactive_target - cnt.v_inactive_count; page_shortage += addl_page_shortage; But I am not sure about "clean pages in the inactive queue" part. >From what I can see in the code, pagedaemon only tries to maintain a certain number of pages on inactive queue - I am speaking about vm_pageout_page_stats(). But I do not see any code ensuring level of _clean_ inactive pages. And, if I am not mistaken, there is no guarantee even that those pages will not be re-activated when pagedaemon actually scans them. > I'm not sure if the vm_paging_needed() call is correct or not, but it > may be that that the intent is to avoid immediately going back to a > depleted inactive queue when cache+free is within normal bounds, > because it could result in avoidable paging to swap. Well, OTOH, if the current pass results in many pages being re-activated and many pages still left on the inactive queue because they are dirty (see maxlaunder in vm_pageout_scan), then it is premature to quit paging when we only reached bare minimum of available pages (see pass and maxlaunder again). IMHO, of course. As a side discussion, I wonder if current setting of v_inactive_target is adequate. It "feels" that it should be bigger. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 25 10:20:24 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0C64C1065670 for ; Sun, 25 Jul 2010 10:20:24 +0000 (UTC) (envelope-from culot@0xd0.org) Received: from 0xd0.org (ks28346.kimsufi.com [91.121.92.146]) by mx1.freebsd.org (Postfix) with ESMTP id BA48C8FC14 for ; Sun, 25 Jul 2010 10:20:23 +0000 (UTC) Received: from 0xd0.org (doudou.0xd0.org [172.16.0.254]) by 0xd0.org (8.14.4/8.14.4) with ESMTP id o6P9davv015960; Sun, 25 Jul 2010 11:39:36 +0200 (CEST) (envelope-from culot@0xd0.org) Received: (from culot@localhost) by 0xd0.org (8.14.4/8.14.4/Submit) id o6P9datc015959; Sun, 25 Jul 2010 11:39:36 +0200 (CEST) (envelope-from culot) Date: Sun, 25 Jul 2010 11:39:35 +0200 From: Frederic Culot To: hackers@freebsd.org Message-ID: <20100725093935.GC1917@culot.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline X-PGP-Key: http://culot.org/public/pgp-key.txt User-Agent: Mutt/1.5.20 (2009-06-14) X-Mailman-Approved-At: Sun, 25 Jul 2010 11:49:29 +0000 Cc: Subject: lint(1) improvements from OpenBSD X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jul 2010 10:20:24 -0000 Hi, I noticed on the the FreeBSD list of projects and ideas an item related to lint(1) and the port of improvements from the OpenBSD project: http://www.freebsd.org/projects/ideas/ideas.html#p-lint I would like to know more about this project but unfortunately no technical contact was specified on the web page, hence I write to the hackers list. Does someone have more information related to this project (what improvements does the text refer to)? Has someone started working on it? Thanks, Frederic -- mail: frederic@culot.org web: http://culot.org From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 25 13:41:49 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 331B7106566B for ; Sun, 25 Jul 2010 13:41:49 +0000 (UTC) (envelope-from rwmaillists@googlemail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id BEA2E8FC17 for ; Sun, 25 Jul 2010 13:41:46 +0000 (UTC) Received: by wwe15 with SMTP id 15so5732688wwe.31 for ; Sun, 25 Jul 2010 06:41:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=+MN1Bej/vBcNquCWcQC9tvuSvpsR5gBKGfI67C4dp1o=; b=HxNUBxrR69lewU4aZokBlpvh48g0nerwzOzWM6OxcqaHgYV/CMRRR0ozOkdTVZAfgr px+8LtyQUIP6sekxM3EQGKgxaVibhf2B/0kSa9HbjH8eY33vATftNOLclFsrpYrRDSip LInGu6PTLcyjOsEdo8fR/kAXHhas3TCuXgfAs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=uPxzYdcLnSXb17lQdgSa8FpqgRGjIXvLtjhYCGOGDzqN/bkq8RtfGmgyN97uNHWe/u K2NHvcB3GyNYu5dLW2O3T2BydQBCFTJhocrpRcJRN/XHejeeECIP81E6WVmrGmYbLq1O O2KT6h1vxA61DxkTv2sLWliL+/oXyXJFoElVw= Received: by 10.227.140.154 with SMTP id i26mr5950534wbu.199.1280065305903; Sun, 25 Jul 2010 06:41:45 -0700 (PDT) Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk [87.81.140.128]) by mx.google.com with ESMTPS id e31sm2151183wbe.5.2010.07.25.06.41.44 (version=SSLv3 cipher=RC4-MD5); Sun, 25 Jul 2010 06:41:45 -0700 (PDT) Date: Sun, 25 Jul 2010 14:41:41 +0100 From: RW To: freebsd-hackers@freebsd.org Message-ID: <20100725144141.6f1f33cc@gumby.homeunix.com> In-Reply-To: <4C4C0CD9.6000002@freebsd.org> References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jul 2010 13:41:49 -0000 On Sun, 25 Jul 2010 13:07:21 +0300 Andriy Gapon wrote: > on 25/07/2010 02:31 RW said the following: > > As I understand it the hysteresis is done inside vm_pageout_scan, > > and the expectation is that one pass will typically satisfy this > > because the design aims to keep enough clean pages in the inactive > > queue. > > But I am not sure about "clean pages in the inactive queue" ... But I > do not see any code ensuring level of _clean_ inactive pages. In FreeBSD the inactive queue contains disk cache pages which normally provide most of the clean pages needed. In addition pages are dribbled out to swap, and the resulting clean pages are placed at the back of the inactive queue to make another pass. > > > I'm not sure if the vm_paging_needed() call is correct or not, but > > it may be that that the intent is to avoid immediately going back > > to a depleted inactive queue when cache+free is within normal > > bounds, because it could result in avoidable paging to swap. > > Well, OTOH, if the current pass results in many pages being > re-activated and many pages still left on the inactive queue because > they are dirty (see maxlaunder in vm_pageout_scan), Dirty-pages make three passes through the inactive queue: twice dirty, once clean. They are paged-out at the end of the second paass, so it's unlike that they reactivated except under very heavy thrashing. > then it is > premature to quit paging when we only reached bare minimum of > available pages (see pass and maxlaunder again). IMHO, of course. It's not the bare minimum, that's another level that vm_page_count_min() tests for. From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 25 14:19:46 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3F29106567C for ; Sun, 25 Jul 2010 14:19:46 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 5A1E38FC17 for ; Sun, 25 Jul 2010 14:19:45 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA05221; Sun, 25 Jul 2010 17:19:43 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Od23L-000DCN-8E; Sun, 25 Jul 2010 17:19:43 +0300 Message-ID: <4C4C47FD.6080802@freebsd.org> Date: Sun, 25 Jul 2010 17:19:41 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: RW References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> <20100725144141.6f1f33cc@gumby.homeunix.com> In-Reply-To: <20100725144141.6f1f33cc@gumby.homeunix.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jul 2010 14:19:47 -0000 on 25/07/2010 16:41 RW said the following: > On Sun, 25 Jul 2010 13:07:21 +0300 > Andriy Gapon wrote: > >> on 25/07/2010 02:31 RW said the following: > >>> As I understand it the hysteresis is done inside vm_pageout_scan, >>> and the expectation is that one pass will typically satisfy this >>> because the design aims to keep enough clean pages in the inactive >>> queue. > >> But I am not sure about "clean pages in the inactive queue" ... But I >> do not see any code ensuring level of _clean_ inactive pages. > > In FreeBSD the inactive queue contains disk cache pages which normally > provide most of the clean pages needed. In addition pages are dribbled > out to swap, and the resulting clean pages are placed at the back of > the inactive queue to make another pass. Well, "normally" and "most" are not quite quantitative. Personally, I do not see any guarantees that inactive queue would contain enough clean pages to reach paging target on a single pass. >>> I'm not sure if the vm_paging_needed() call is correct or not, but >>> it may be that that the intent is to avoid immediately going back >>> to a depleted inactive queue when cache+free is within normal >>> bounds, because it could result in avoidable paging to swap. >> Well, OTOH, if the current pass results in many pages being >> re-activated and many pages still left on the inactive queue because >> they are dirty (see maxlaunder in vm_pageout_scan), > > Dirty-pages make three passes through the inactive queue: twice dirty, > once clean. They are paged-out at the end of the second paass, so it's > unlike that they reactivated except under very heavy thrashing. I didn't mean to say that dirty pages would get re-activated. Clean pages can perfectly be re-activated if they were referenced since their de-activation time. >> then it is >> premature to quit paging when we only reached bare minimum of >> available pages (see pass and maxlaunder again). IMHO, of course. > > It's not the bare minimum, that's another level that vm_page_count_min() > tests for. I meant bare minimum to stop paging, that is, going above lower watermark of the paging hysteresis. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 25 20:28:54 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 839071065677 for ; Sun, 25 Jul 2010 20:28:54 +0000 (UTC) (envelope-from rwmaillists@googlemail.com) Received: from mail-ww0-f42.google.com (mail-ww0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 450648FC13 for ; Sun, 25 Jul 2010 20:28:53 +0000 (UTC) Received: by wwf26 with SMTP id 26so2268976wwf.1 for ; Sun, 25 Jul 2010 13:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=xiKsSYWPKBjVqgiw15IxND+oktrVcFIGgbQe6f3FOgE=; b=ngAELpadb3V2LUS/r/D64X4MXF02oWR2EGOEhT5bGDQtC9tP6eVnZf9OVfTb6qC7CL +aUCZC9OjnT77vJGONK/g+hdkkF3Dchu+MZmjD3zVqnCSTuz9PNK++SsSO2b6NMlDeq7 d9oWX4klKdQPOPcOhL4NdMLLKvsKz5daf14JE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=cMlivFKKM3Xl9B98y2bkG+t0sRMTrJWhbPAKEFqV7ncgFiTAGW6vZWuXHzafxgb+3W nfaLFCWkhS1QKy48i1wkW7aCjy1EsOtMWlj3EnrGN2SQ/jDNXcMn6iNfS+Bvhr+IRDHQ W1N1my0v7jwzoWV8nOJ5fw2m4ZHBFQ5Nb6q3M= Received: by 10.227.146.147 with SMTP id h19mr6344713wbv.222.1280089732914; Sun, 25 Jul 2010 13:28:52 -0700 (PDT) Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk [87.81.140.128]) by mx.google.com with ESMTPS id e31sm2397228wbe.23.2010.07.25.13.28.51 (version=SSLv3 cipher=RC4-MD5); Sun, 25 Jul 2010 13:28:52 -0700 (PDT) Date: Sun, 25 Jul 2010 21:28:49 +0100 From: RW To: freebsd-hackers@freebsd.org Message-ID: <20100725212849.1e07f40c@gumby.homeunix.com> In-Reply-To: <4C4C47FD.6080802@freebsd.org> References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> <20100725144141.6f1f33cc@gumby.homeunix.com> <4C4C47FD.6080802@freebsd.org> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jul 2010 20:28:54 -0000 On Sun, 25 Jul 2010 17:19:41 +0300 Andriy Gapon wrote: > on 25/07/2010 16:41 RW said the following: > > In FreeBSD the inactive queue contains disk cache pages which > > normally provide most of the clean pages needed. In addition pages > > are dribbled out to swap, and the resulting clean pages are placed > > at the back of the inactive queue to make another pass. > > Well, "normally" and "most" are not quite quantitative. > Personally, I do not see any guarantees that inactive queue would > contain enough clean pages to reach paging target on a single pass. I didn't say it say it was guaranteed. I just think the scenario where a first pass ends up between the watermarks is rare. And when it happens I don't see a compelling reason to do extra paging to reach an arbitrary target. I think the comment about not clearing vm_pages_needed is referring to clearing it below the low-watermark because the daemon would then get woken-up almost immediately. > I meant bare minimum to stop paging, that is, going above lower > watermark of the paging hysteresis. From owner-freebsd-hackers@FreeBSD.ORG Sun Jul 25 20:43:13 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE979106564A for ; Sun, 25 Jul 2010 20:43:13 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4F1038FC0A for ; Sun, 25 Jul 2010 20:43:12 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA08883; Sun, 25 Jul 2010 23:43:10 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1Od82P-000Day-NR; Sun, 25 Jul 2010 23:43:09 +0300 Message-ID: <4C4CA1DC.2050902@freebsd.org> Date: Sun, 25 Jul 2010 23:43:08 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: RW References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> <20100725144141.6f1f33cc@gumby.homeunix.com> <4C4C47FD.6080802@freebsd.org> <20100725212849.1e07f40c@gumby.homeunix.com> In-Reply-To: <20100725212849.1e07f40c@gumby.homeunix.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jul 2010 20:43:13 -0000 on 25/07/2010 23:28 RW said the following: > On Sun, 25 Jul 2010 17:19:41 +0300 > Andriy Gapon wrote: > >> on 25/07/2010 16:41 RW said the following: > >>> In FreeBSD the inactive queue contains disk cache pages which >>> normally provide most of the clean pages needed. In addition pages >>> are dribbled out to swap, and the resulting clean pages are placed >>> at the back of the inactive queue to make another pass. >> Well, "normally" and "most" are not quite quantitative. >> Personally, I do not see any guarantees that inactive queue would >> contain enough clean pages to reach paging target on a single pass. > > I didn't say it say it was guaranteed. I just think the scenario where > a first pass ends up between the watermarks is rare. And when it > happens I don't see a compelling reason to do extra paging to reach an > arbitrary target. Well, it seems neither I nor you have data to show whether it's rare or not (and it would greatly depend on workload too). As to "arbitrary target" - well, that's the whole point of hysteresis-like behavior. We start paging also at an "arbitrary" point. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 26 12:19:37 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7088B1065675; Mon, 26 Jul 2010 12:19:37 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4FBFD8FC0C; Mon, 26 Jul 2010 12:19:37 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id CADC946B3B; Mon, 26 Jul 2010 08:19:36 -0400 (EDT) Date: Mon, 26 Jul 2010 13:19:36 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Alexander Motin In-Reply-To: <4C4B720A.6020802@FreeBSD.org> Message-ID: References: <4C4AF046.40507@FreeBSD.org> <4C4B720A.6020802@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Rui Paulo Subject: Re: Intel TurboBoost in practice X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 12:19:37 -0000 On Sun, 25 Jul 2010, Alexander Motin wrote: >> The numbers that you are showing doesn't show much difference. Have you >> tried buildworld? > > If you mean relative difference -- as I have told, it's mostly because of my > CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most > of time if CPU is not overheated. It probably doesn't, as it works on clear > table under air conditioner. So maximal effect I can expect on is 4.2%. In > such situation 2.8% probably not so bad to illustrate that feature works and > there is space for further improvements. If I had Core i5-750S I would > expect 33% boost. Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per configuration? Robert > > If you mean absolute difference, here are results or four buildworld runs: > hw.acpi.cpu.cx_lowest=C1: 4654.23 sec > hw.acpi.cpu.cx_lowest=C2: 4556.37 sec > hw.acpi.cpu.cx_lowest=C2: 4570.85 sec > hw.acpi.cpu.cx_lowest=C1: 4679.83 sec > Benefit is about 2.1%. Each time results were erased and sources > pre-cached into RAM. Storage was SSD, so disk should not be an issue. > > -- > Alexander Motin > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 26 14:12:26 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB2C11065673; Mon, 26 Jul 2010 14:12:26 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 1BD528FC12; Mon, 26 Jul 2010 14:12:25 +0000 (UTC) Received: by fxm13 with SMTP id 13so119931fxm.13 for ; Mon, 26 Jul 2010 07:12:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=G7oTVotFxTUAVpJGOpyCve/ZLseePrYacSwaTC6aNno=; b=QbPZspZFmIbc++j9Ib4gbVezU1T2mNbNj99c/AOMgOMOH0K5BjnWEnBYVBPJ7VxOJ+ wLtBsL+xUntTjL0rTnDQKh3qRjIDkwGLs8MDru4PAGmPBrPm6K15ejm5lT1IMPkKnVAs K/mgn4DrCMuHg4SGSiLX8s0hn/V1ZqfEyUGhw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=Gzu6YNHAKGwISQKktLl7d04TmFpIszqfBmLdg+Y296sFpSZ0tbiMk6RHP+ZymPEhJL Jmrs/uPjH364K4syI3ULulkkGR57spgefllJNdIzPZaWDFZ0jJHd09UCEANPl3YLNmK/ PdhOu3tRT3+711/SFFJhM0HlVxJrwam4Jgc7o= Received: by 10.223.109.140 with SMTP id j12mr6505414fap.22.1280153544092; Mon, 26 Jul 2010 07:12:24 -0700 (PDT) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id w11sm1401657fao.13.2010.07.26.07.12.21 (version=SSLv3 cipher=RC4-MD5); Mon, 26 Jul 2010 07:12:22 -0700 (PDT) Sender: Alexander Motin Message-ID: <4C4D9779.8080505@FreeBSD.org> Date: Mon, 26 Jul 2010 17:11:05 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Robert Watson References: <4C4AF046.40507@FreeBSD.org> <4C4B720A.6020802@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Rui Paulo Subject: Re: Intel TurboBoost in practice X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 14:12:26 -0000 Robert Watson wrote: > On Sun, 25 Jul 2010, Alexander Motin wrote: >>> The numbers that you are showing doesn't show much difference. Have >>> you tried buildworld? >> >> If you mean relative difference -- as I have told, it's mostly because >> of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is >> enabled most of time if CPU is not overheated. It probably doesn't, as >> it works on clear table under air conditioner. So maximal effect I can >> expect on is 4.2%. In such situation 2.8% probably not so bad to >> illustrate that feature works and there is space for further >> improvements. If I had Core i5-750S I would expect 33% boost. > > Can I recommend the use of ministat(1) and sample sizes of at least 8 > runs per configuration? Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh kernel with disabled debug. Results are quite close to original: -2.73% and -2.19% of time. x C1 + C2 * C3 +-----------------------------------------------------------------+ |+ * x | |+ * x | |+ * x | |+ * x | |+ * x | |+ * x | |+ * x | |+ ** x | |+ + ** xx | |+ + ** ** xx x| | |__M_A____| | |A| | | |A| | +-----------------------------------------------------------------+ N Min Max Median Avg Stddev x 15 12.68 12.84 12.69 12.698667 0.039254966 + 15 12.35 12.36 12.35 12.351333 0.0035186578 Difference at 95.0% confidence -0.347333 +/- 0.0208409 -2.7352% +/- 0.164119% (Student's t, pooled s = 0.0278687) * 15 12.41 12.44 12.42 12.42 0.0075592895 Difference at 95.0% confidence -0.278667 +/- 0.0211391 -2.19446% +/- 0.166467% (Student's t, pooled s = 0.0282674) I also checked one more aspect -- TurboBoost works only when CPU runs at highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201 to 3067 and repeated the test: x C1 + C2 * C3 +-----------------------------------------------------------------+ | x + * | | x + * | | x + * | | x + * *| | x x + * *| | x x + + * *| | x x + + * *| | x x + + * *| | x x + + + + * *| ||MA| | | |_MA_| | | M_A_|| +-----------------------------------------------------------------+ N Min Max Median Avg Stddev x 15 13.72 13.73 13.72 13.723333 0.0048795004 + 15 13.79 13.82 13.8 13.803333 0.0072374686 Difference at 95.0% confidence 0.08 +/- 0.00461567 0.582949% +/- 0.0336337% (Student's t, pooled s = 0.00617213) * 15 13.89 13.9 13.89 13.894 0.0050709255 Difference at 95.0% confidence 0.170667 +/- 0.00372127 1.24362% +/- 0.0271164% (Student's t, pooled s = 0.00497613) In that case using C2 or C3 predictably caused small performance reduce, as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0 won't ever sleep during test, it's TLB shutdown IPIs to other cores still probably could suffer from waiting other cores' wakeup. Obviously in first test these 0.58% and 1.24% were subtracted from the TurboBoost's maximal benefit of 4.3% on this CPU. -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 26 17:53:56 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B91A1065674 for ; Mon, 26 Jul 2010 17:53:56 +0000 (UTC) (envelope-from rwmaillists@googlemail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id D09BF8FC17 for ; Mon, 26 Jul 2010 17:53:55 +0000 (UTC) Received: by wyj26 with SMTP id 26so2847735wyj.13 for ; Mon, 26 Jul 2010 10:53:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=GiYkcUGb5QlxxMiNTgWsntXXs7CQ8p5NUnU9HoLq+2o=; b=RHVgfEkrQsr3A4fGGFwt49QAgfYcX47e4LAPnlJcrPqJlC/zKpzoUfJfaaRZiaZamu Z6jJoPMzx5iVd7kErH/J9D9BOhkU27CMBdVKSObcUBOTLglvvv0WH2NDp6djEkkCn4hd 9MpLs+eXKJTz37o3kaVWIQm3K0EKW5Z5QdVTU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=date:from:to:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=e/bBCOjwvFxXJd1ehmvNCCm1Uwzu30OjldZhi8wILA2QgpZTW7RpzfvegyPKqZlIgf wWNVvUc0UbuA+M6Z2ckk4wnR6im14MGGRTxWNSpMoALfTv+pLeVI7o6JC88f4Dmpg0aJ 68Yp2tINY30cVh62vFnSF2uvvYQDmbwITyy3k= Received: by 10.227.144.129 with SMTP id z1mr7691304wbu.85.1280166834752; Mon, 26 Jul 2010 10:53:54 -0700 (PDT) Received: from gumby.homeunix.com (bb-87-81-140-128.ukonline.co.uk [87.81.140.128]) by mx.google.com with ESMTPS id l6sm2105804wed.25.2010.07.26.10.53.49 (version=SSLv3 cipher=RC4-MD5); Mon, 26 Jul 2010 10:53:51 -0700 (PDT) Date: Mon, 26 Jul 2010 18:53:48 +0100 From: RW To: freebsd-hackers@freebsd.org Message-ID: <20100726185348.63ebf916@gumby.homeunix.com> In-Reply-To: <4C4CA1DC.2050902@freebsd.org> References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> <20100725144141.6f1f33cc@gumby.homeunix.com> <4C4C47FD.6080802@freebsd.org> <20100725212849.1e07f40c@gumby.homeunix.com> <4C4CA1DC.2050902@freebsd.org> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-portbld-freebsd8.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 17:53:56 -0000 On Sun, 25 Jul 2010 23:43:08 +0300 Andriy Gapon wrote: > on 25/07/2010 23:28 RW said the following: > > I didn't say it say it was guaranteed. I just think the scenario > > where a first pass ends up between the watermarks is rare. And when > > it happens I don't see a compelling reason to do extra paging to > > reach an arbitrary target. > > Well, it seems neither I nor you have data to show whether it's rare > or not (and it would greatly depend on workload too). > As to "arbitrary target" - well, that's the whole point of > hysteresis-like behavior. We start paging also at an "arbitrary" > point. If after the first pass with light-paging the high watermark isn't reached then the choices are 1) loop and immediately do a heavy-paging pass. 2) wait and let the daemon get woken-up for another light-paging pass - only go to heavy-paging when this strategy isn't keeping up with demand. To me (2) is doing the right thing. It's trying to satisfy demand from existing clean pages, and only paging heavily as a last resort. From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 26 19:00:33 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C32171065676 for ; Mon, 26 Jul 2010 19:00:33 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0F7CF8FC19 for ; Mon, 26 Jul 2010 19:00:32 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29272; Mon, 26 Jul 2010 22:00:29 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OdSua-000H6x-NS; Mon, 26 Jul 2010 22:00:29 +0300 Message-ID: <4C4DDB4B.9000307@freebsd.org> Date: Mon, 26 Jul 2010 22:00:27 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: RW , freebsd-hackers@freebsd.org References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> <20100725144141.6f1f33cc@gumby.homeunix.com> <4C4C47FD.6080802@freebsd.org> <20100725212849.1e07f40c@gumby.homeunix.com> <4C4CA1DC.2050902@freebsd.org> In-Reply-To: <4C4CA1DC.2050902@freebsd.org> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:00:33 -0000 on 25/07/2010 23:43 Andriy Gapon said the following: > on 25/07/2010 23:28 RW said the following: >> I didn't say it say it was guaranteed. I just think the scenario where >> a first pass ends up between the watermarks is rare. And when it >> happens I don't see a compelling reason to do extra paging to reach an >> arbitrary target. > > Well, it seems neither I nor you have data to show whether it's rare or not (and > it would greatly depend on workload too). > As to "arbitrary target" - well, that's the whole point of hysteresis-like > behavior. We start paging also at an "arbitrary" point. Well, it seems that you are right (at least to a certain degree) - with "moderately high" memory load (starting lots of memory hungry "real" applications and not letting them sit idle) a single pass was always sufficient. Even with my suggested change! :-) I.e. that single pass was always able to shoot to or over the high watermark. So, in fact, there is not much (any?) difference between current code and patched code in this case. But not quite so with stress2 swap test. In that case more than one pass was needed in almost all the cases. Again, this is with patched vm_pageout(). Which brings another interesting point which was overlooked initially. vm_pageout() loop can make at most two passes back-to-back, after that it slows down to make an additional pass every 1/2 seconds: if (vm_pages_needed) { /* * Still not done, take a second pass without waiting * (unlimited dirty cleaning), otherwise sleep a bit * and try again. */ ++pass; if (pass > 1) msleep(&vm_pages_needed, &vm_page_queue_free_mtx, PVM, "psleep", hz / 2); } else { With the patched code and stress2 I indeed observed pagedaemon spending time in this sleep. On the other hand, current unpatched code is more optimistic about calling it done. So even if only a handful of pages is freed and available memory goes just above low watermark, pagedaemon would decide that it had a successful pass and would reset pass count to zero. Those freed pages would, of course, get consumed immediately and a new pass would be requested. Since the history is lost at this point, there would be no rate limit for the new pass. So my _theory_ is that in very harsh conditions doing true hysteresis would result in many _accounted_ passes and thus throttled down pagedaemon. On the other hand, the current code would still do many passes because of the constant memory pressure, but they will be (mostly) unaccounted and thus pagedaemon would be scanning pages 'like crazy'. In other words: with current code available page count would rapidly oscillate around low watermark, while with patched code available page count would mostly stay low. Not sure which one is better. But for me, in such extreme conditions, slowing things down sounds better than spinning pagedaemon. P.S. Just in case, I would like to point out that the patch doesn't change condition when the waiters are notified about available memory - it is still !vm_page_count_min(). The patches only changes when vm_pages_needed is reset. This is kind of obvious, but I decided to make it explicit. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 26 19:32:33 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3FD4106564A; Mon, 26 Jul 2010 19:32:33 +0000 (UTC) (envelope-from courtney.shaun@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 19F808FC18; Mon, 26 Jul 2010 19:32:32 +0000 (UTC) Received: by wwe15 with SMTP id 15so581956wwe.31 for ; Mon, 26 Jul 2010 12:32:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=sCHIA8LGc3F/5dcOcWE236UgUgbOoWMSsa6rlJTih2Y=; b=A3QeF/1CEsvb8Jfv0/Z+DzmB2rknKGbCMeaNnrtYESHLTT33/pEONKYAFpDfPBhPBo i0YEqPYCHOeuG/j0R+xFb9lnk/EzXGxOrkYUNZBSYiZDpqcB+gXEjYGSd+kH2zNyReyx +wvsV0yUFx1nDxxXfCY1x6xLVngXAZD+z5WU8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=UbmhJqr1DpK2Q/5jImCXUtbfymliTPw/94XZ8FRrZ9aJZaiZu7kN9wWWlWfjtLe/Iu 5/hvzuzui5Uop8IuH+OYFx7OmoK3q80GF0msY33kCp3t9hkJm8lU4et190h45cu7fX2j 0hJiNv7N/HZm4fr9jMXT9FuWqOSLUB3Ah7Mss= MIME-Version: 1.0 Received: by 10.227.69.195 with SMTP id a3mr7731099wbj.58.1280170863856; Mon, 26 Jul 2010 12:01:03 -0700 (PDT) Received: by 10.216.38.198 with HTTP; Mon, 26 Jul 2010 12:01:03 -0700 (PDT) Date: Tue, 27 Jul 2010 03:01:03 +0800 Message-ID: From: "courtney.shaun@gmail.com" To: ka@pacific.net, freebsd-ports@freebsd.org, courtney.shaun@gmail.com, freebsd-hackers@freebsd.org, image001.gif@01CB1DD3.639 Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: discount news : g--& X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:32:33 -0000 discount news : the South Africa's World Cup has finished ,but i know that the promotions in www.yong-rong.com for the South Africa's World Cup has not finished ,do you know ?i think the Website can be tested,because I have bought some ,that company mainly sell all kinds of MP3,TV,Motorbike,Cellphone,Laptop etc,you can buy their products as soon as possible ,maybe the promotions will be end soon. good luck ,Greeting ! 5--& From owner-freebsd-hackers@FreeBSD.ORG Mon Jul 26 19:59:35 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D62DF106564A for ; Mon, 26 Jul 2010 19:59:35 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 282C08FC15 for ; Mon, 26 Jul 2010 19:59:34 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29983; Mon, 26 Jul 2010 22:59:32 +0300 (EEST) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OdTpk-000HAo-HB; Mon, 26 Jul 2010 22:59:32 +0300 Message-ID: <4C4DE923.5030307@freebsd.org> Date: Mon, 26 Jul 2010 22:59:31 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: RW References: <4C4B4BAB.3000005@freebsd.org> <20100725003144.3cfead39@gumby.homeunix.com> <4C4C0CD9.6000002@freebsd.org> <20100725144141.6f1f33cc@gumby.homeunix.com> <4C4C47FD.6080802@freebsd.org> <20100725212849.1e07f40c@gumby.homeunix.com> <4C4CA1DC.2050902@freebsd.org> <20100726185348.63ebf916@gumby.homeunix.com> In-Reply-To: <20100726185348.63ebf916@gumby.homeunix.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: pageout question X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2010 19:59:35 -0000 on 26/07/2010 20:53 RW said the following: > If after the first pass with light-paging the high watermark isn't > reached then the choices are > > 1) loop and immediately do a heavy-paging pass. > > 2) wait and let the daemon get woken-up for another light-paging pass - > only go to heavy-paging when this strategy isn't keeping up with demand. > > To me (2) is doing the right thing. It's trying to satisfy demand from > existing clean pages, and only paging heavily as a last resort. Well, based on my observations, if the first pass doesn't reach the high watermark, then we are in a high pressure situation and so we would have to do some heavy-lifting anyways. In my opinion, it's better to start doing more work at once than trying to pretend that situation would somehow resolve itself. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 00:53:34 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB3821065674 for ; Tue, 27 Jul 2010 00:53:34 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from tarsier.geekcn.org (tarsier.geekcn.org [IPv6:2001:470:a803::1]) by mx1.freebsd.org (Postfix) with ESMTP id 646938FC14 for ; Tue, 27 Jul 2010 00:53:34 +0000 (UTC) Received: from mail.geekcn.org (tarsier.geekcn.org [211.166.10.233]) by tarsier.geekcn.org (Postfix) with ESMTP id 673FBA5FA72; Tue, 27 Jul 2010 08:53:33 +0800 (CST) X-Virus-Scanned: amavisd-new at geekcn.org Received: from tarsier.geekcn.org ([211.166.10.233]) by mail.geekcn.org (mail.geekcn.org [211.166.10.233]) (amavisd-new, port 10024) with LMTP id kbCuxVA79dsK; Tue, 27 Jul 2010 08:53:26 +0800 (CST) Received: from delta.delphij.net (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tarsier.geekcn.org (Postfix) with ESMTPSA id 61CFDA5FB1C; Tue, 27 Jul 2010 08:53:24 +0800 (CST) DomainKey-Signature: a=rsa-sha1; s=default; d=delphij.net; c=nofws; q=dns; h=message-id:date:from:reply-to:organization:user-agent: mime-version:to:cc:subject:references:in-reply-to: x-enigmail-version:openpgp:content-type:content-transfer-encoding; b=r3V9044kgeX9ZIBwGF8yBD5x32PcXJ218wCKskt3adQsJ6GIjGlV6RG57ovEUN+xY WWzN+DvOzzNmbtKCdFp2A== Message-ID: <4C4E2DFF.1010203@delphij.net> Date: Mon, 26 Jul 2010 17:53:19 -0700 From: Xin LI Organization: The Geek China Organization User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.11) Gecko/20100721 Thunderbird/3.0.6 ThunderBrowse/3.3.1 MIME-Version: 1.0 To: Frederic Culot References: <20100725093935.GC1917@culot.org> In-Reply-To: <20100725093935.GC1917@culot.org> X-Enigmail-Version: 1.0.1 OpenPGP: id=3FCA37C1; url=http://www.delphij.net/delphij.asc Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: hackers@freebsd.org Subject: Re: lint(1) improvements from OpenBSD X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: d@delphij.net List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 00:53:34 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 2010/07/25 02:39, Frederic Culot wrote: > Hi, > > I noticed on the the FreeBSD list of projects and ideas an item related to > lint(1) and the port of improvements from the OpenBSD project: > > http://www.freebsd.org/projects/ideas/ideas.html#p-lint > > I would like to know more about this project but unfortunately no technical > contact was specified on the web page, hence I write to the hackers list. > > Does someone have more information related to this project (what improvements > does the text refer to)? > Has someone started working on it? I think it's talking about OpenBSD's xlint (src/usr.bin/xlint). No I am not aware of anyone working on this. Cheers, - -- Xin LI http://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (FreeBSD) iQEcBAEBCAAGBQJMTi3/AAoJEATO+BI/yjfBxMAIAK5Hz21ipEJFMao1U0BXUEun WGofq+cokgXYA94JsfOrl/KmwwaEetZVp21Gc1yyL+Kp4ZYvzpv+eEzdm98TH5rv wHJp298j/hs0gxkrDP2XqnIrjd+YCuJg19CbZ7rEC6SeuAJ4mEJR1DW6dpmM7TSa lZnGgTnZp6SMUY2knU2GQfQjd+f0IXP370ksjSF3CPMwaKHzKoCLLWHR9uBacGjb QLPU4AvmExxfTa6icsfCVNNcIeFdq6653Hq9HJdsvGbkX623PMxzcG/BfeIETDUo /zwOnx1Pp27cpvVNf7K6tqt2aNZlr2Fjxq9mz4hy6yAnVmJiqX2vz1Z2jAN6lrw= =YWXj -----END PGP SIGNATURE----- From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 15:54:07 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38DD41065674 for ; Tue, 27 Jul 2010 15:54:07 +0000 (UTC) (envelope-from kraduk@googlemail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id C0C7C8FC0C for ; Tue, 27 Jul 2010 15:54:06 +0000 (UTC) Received: by fxm13 with SMTP id 13so732719fxm.13 for ; Tue, 27 Jul 2010 08:54:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=wrObfz+kTJ4eg0jG+yJnvmK9b+X3/OR9JUotbo3rDrA=; b=hi5IeyZjJah0s4lsVaneNUh3L3+KgYV/mOsJl0ErlKPlBMmYHKDo4KQTFxd9imiNMv p58y/MXEeAZFQvbExhDBtIkQ5O/+lvTf+h1EfTTOEVz3qRXDUCpTJMDls599Q+2wNI5f 09wTSNSkIy916IMEKt72/lOBcyaZF3lMjF3ks= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=nH5441msTxi91qqN03ocfPATABxRJpJkDkwKP2cWvU8jsM2AUKMB3JdGI1Mv7uxYF1 uSPw+BTgRXbmy9TcCS50DPqzpWZcySXVoIOxU8gL9EQxhJtzYqxJ5199Bmj5OdmfraIQ Zc8gDhMsVLwyXp2MTZuNKevVA4qGVgkqNvid4= MIME-Version: 1.0 Received: by 10.239.188.19 with SMTP id n19mr553838hbh.154.1280244560784; Tue, 27 Jul 2010 08:29:20 -0700 (PDT) Received: by 10.239.160.201 with HTTP; Tue, 27 Jul 2010 08:29:20 -0700 (PDT) Date: Tue, 27 Jul 2010 16:29:20 +0100 Message-ID: From: krad To: freebsd-hackers@freebsd.org, FreeBSD Questions X-Mailman-Approved-At: Tue, 27 Jul 2010 16:13:31 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: possible NFS lockups X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 15:54:07 -0000 I have a production mail system with an nfs backend. Every now and again we see the nfs die on a particular head end. However it doesn't die across all the nodes. This suggests to me there isnt an issue with the filer itself and the stats from the filer concur with that. The symptoms are lines like this appearing in dmesg nfs server 10.44.17.138:/vol/vol1/mail: not responding nfs server 10.44.17.138:/vol/vol1/mail: is alive again trussing df it seems to hang on getfsstat, this is presumably when it tries the nfs mounts eg __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0) mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1746583552 (0x681ac000) mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1747632128 (0x682ac000) munmap(0x681ac000,344064) = 0 (0x0) getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9) I have played with mount options a fair bit but they dont make much difference. This is what they are set to at present 10.44.17.138:/vol/vol1/mail /mail/0 nfs rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 0 When this locking is occuring I find that if I do a show mount or mount 10.44.17.138:/vol/vol1/mail again under another mount point I can access it fine. One thing I have just noticed is that lockd and statd always seem to have died when this happens. Restarting does not help I find all this a bit perplexing. Can anyone offer any help into why this might be happening. I have dtrace compliled into the kernel if that could help with debugging From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 17:17:47 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D97C1065677; Tue, 27 Jul 2010 17:17:47 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 38DDB8FC16; Tue, 27 Jul 2010 17:17:46 +0000 (UTC) Received: by pwj9 with SMTP id 9so662965pwj.13 for ; Tue, 27 Jul 2010 10:17:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:reply-to :in-reply-to:references:date:message-id:subject:from:to:cc :content-type; bh=m7CoIsuMyLOGGjrWguBYVDJhpHS83sS73rWb1ddug7s=; b=uBqLvIs9m4IAldxXS69QRiaPxKZBoDe9s8cZwr3w84kWlqTGR/rrwJf40O9YP8hDHw 6uU/PIQY4xkbbIWZmnDFtLPjD3N51ZUyjvxEDWJA0o+d5zXMzBVXMhuaoHrkIYiJt0/x JLmDMAF9PUdC8/H7VTDvxHk1rK7vdqrer5RNc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; b=u4m9oDheAKwSwbolgN7j8/BkPGuymmHM4m3KcCBqrbs3fcH0Wo4gkt77jQmqH/hBvU 9Qfpw9t5mAlImVtz6Evu3IsoGv+H0zY9CLYcaBArXdkyAJFHVyIcu86eQwl1x6l8EV34 UbF1xNJAixWVbwZmzLCHVq5LrMY5GnHKz5qaw= MIME-Version: 1.0 Received: by 10.114.59.10 with SMTP id h10mr13520595waa.194.1280251066633; Tue, 27 Jul 2010 10:17:46 -0700 (PDT) Received: by 10.114.173.9 with HTTP; Tue, 27 Jul 2010 10:17:46 -0700 (PDT) In-Reply-To: <4C4D9779.8080505@FreeBSD.org> References: <4C4AF046.40507@FreeBSD.org> <4C4B720A.6020802@FreeBSD.org> <4C4D9779.8080505@FreeBSD.org> Date: Tue, 27 Jul 2010 12:17:46 -0500 Message-ID: From: Alan Cox To: Alexander Motin Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Robert Watson , Rui Paulo Subject: Re: Intel TurboBoost in practice X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: alc@freebsd.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 17:17:47 -0000 On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin wrote: > Robert Watson wrote: > > On Sun, 25 Jul 2010, Alexander Motin wrote: > >>> The numbers that you are showing doesn't show much difference. Have > >>> you tried buildworld? > >> > >> If you mean relative difference -- as I have told, it's mostly because > >> of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is > >> enabled most of time if CPU is not overheated. It probably doesn't, as > >> it works on clear table under air conditioner. So maximal effect I can > >> expect on is 4.2%. In such situation 2.8% probably not so bad to > >> illustrate that feature works and there is space for further > >> improvements. If I had Core i5-750S I would expect 33% boost. > > > > Can I recommend the use of ministat(1) and sample sizes of at least 8 > > runs per configuration? > > Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh > kernel with disabled debug. Results are quite close to original: -2.73% > and -2.19% of time. > x C1 > + C2 > * C3 > +-----------------------------------------------------------------+ > |+ * x | > |+ * x | > |+ * x | > |+ * x | > |+ * x | > |+ * x | > |+ * x | > |+ ** x | > |+ + ** xx | > |+ + ** ** xx x| > | |__M_A____| | > |A| | > | |A| | > +-----------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 15 12.68 12.84 12.69 12.698667 0.039254966 > + 15 12.35 12.36 12.35 12.351333 0.0035186578 > Difference at 95.0% confidence > -0.347333 +/- 0.0208409 > -2.7352% +/- 0.164119% > (Student's t, pooled s = 0.0278687) > * 15 12.41 12.44 12.42 12.42 0.0075592895 > Difference at 95.0% confidence > -0.278667 +/- 0.0211391 > -2.19446% +/- 0.166467% > (Student's t, pooled s = 0.0282674) > > I also checked one more aspect -- TurboBoost works only when CPU runs at > highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201 > to 3067 and repeated the test: > x C1 > + C2 > * C3 > +-----------------------------------------------------------------+ > | x + * | > | x + * | > | x + * | > | x + * *| > | x x + * *| > | x x + + * *| > | x x + + * *| > | x x + + * *| > | x x + + + + * *| > ||MA| | > | |_MA_| | > | M_A_|| > +-----------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 15 13.72 13.73 13.72 13.723333 0.0048795004 > + 15 13.79 13.82 13.8 13.803333 0.0072374686 > Difference at 95.0% confidence > 0.08 +/- 0.00461567 > 0.582949% +/- 0.0336337% > (Student's t, pooled s = 0.00617213) > * 15 13.89 13.9 13.89 13.894 0.0050709255 > Difference at 95.0% confidence > 0.170667 +/- 0.00372127 > 1.24362% +/- 0.0271164% > (Student's t, pooled s = 0.00497613) > > In that case using C2 or C3 predictably caused small performance reduce, > as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0 > won't ever sleep during test, it's TLB shutdown IPIs to other cores > still probably could suffer from waiting other cores' wakeup. > > In the deeper sleep states, are the TLB contents actually maintained while the processor sleeps? (I notice that in some configurations, we actually flush dirty data from the cache before sleeping.) Alan From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 17:59:26 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 05BE6106575F for ; Tue, 27 Jul 2010 17:59:26 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from mail.ambrisko.com (mail.ambrisko.com [64.174.51.43]) by mx1.freebsd.org (Postfix) with ESMTP id B87028FC38 for ; Tue, 27 Jul 2010 17:59:24 +0000 (UTC) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO www.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 27 Jul 2010 10:31:19 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by www.ambrisko.com (8.14.3/8.14.3) with ESMTP id o6RHgCau070018; Tue, 27 Jul 2010 10:42:12 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.3/8.14.3/Submit) id o6RHgB6M070017; Tue, 27 Jul 2010 10:42:11 -0700 (PDT) (envelope-from ambrisko) From: Doug Ambrisko Message-Id: <201007271742.o6RHgB6M070017@ambrisko.com> In-Reply-To: To: Garrett Cooper Date: Tue, 27 Jul 2010 10:42:11 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL94b (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Cc: FreeBSD-Hackers Subject: Re: Set default pxeboot vfs.root.mountfrom to nfs? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 17:59:26 -0000 Garrett Cooper writes: | Hi Hackers, | I realize this is a trivial patch, but it's a minor item that I | found kind of fascinating (and not thoroughly documented elsewhere | because many examples are booting mfsroots instead of directly booting | off nfs roots), but I'm proposing that pxeboot default to | vfs.root.mountfrom="nfs" to reduce the need for special case | loader.conf files just for pxe booting (and thus, enable | out-of-the-box netbooting ^o^!!!). | Thoughts? | | Index: boot/i386/libi386/pxe.c | =================================================================== | --- boot/i386/libi386/pxe.c (revision 209563) | +++ boot/i386/libi386/pxe.c (working copy) | @@ -308,6 +308,7 @@ | } | setenv("boot.nfsroot.server", inet_ntoa(rootip), 1); | setenv("boot.nfsroot.path", rootpath, 1); | + setenv("vfs.root.mountfrom", "nfs", 0); | setenv("dhcp.host-name", hostname, 1); | } | } Interesting, are you looking at my patch from work or came up with the same thing? We had this patch here for years. I haven't checked it in due to tracking done why it wasn't done in the first place so I didn't break any assumptions. FWIW, I have seen no issues with patch in either NFS boots or MFS roots. Doug A. From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 18:44:55 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1B311065670; Tue, 27 Jul 2010 18:44:55 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id CD8508FC14; Tue, 27 Jul 2010 18:44:54 +0000 (UTC) Received: by fxm13 with SMTP id 13so861548fxm.13 for ; Tue, 27 Jul 2010 11:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=vwM3oOzazoVICoeBt30W6FdEX3ICVIxGQZyUCLrVI0s=; b=sJjGMhgZ2zl2L2B9nJke024C3vz8RdA/UNuw/0cSYa6DriYn8dta2NbRAX7HbVqBSh vH3B+ssWSVnJqS2879EkFJAQRcl6k4WAUWDZllHAuED4K+QItYATR/2jliLHnQpIEAZB FzG3/qUglHSZDf0Cvo8S8Ctd2WB43gAD+HcAQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=nvvJGTqd0X9Ud9x+5Dk85V33PEoy6uH+ZKrflOrBCZIVXXyUHv4kxktIIsxpKdnMD/ 71Ef19verD6Kt3PCYIoQyoOLZ8ELA9PzkmN1BAQ1A/ZeBQ5XTqfGEkrHFrC/Gyjt5hld u+m7kxyZqrmimtW6nc2dcVUevpDYJiF2EA+Oc= Received: by 10.223.119.131 with SMTP id z3mr8524223faq.61.1280256293689; Tue, 27 Jul 2010 11:44:53 -0700 (PDT) Received: from mavbook.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id e22sm1418129faa.0.2010.07.27.11.44.52 (version=SSLv3 cipher=RC4-MD5); Tue, 27 Jul 2010 11:44:52 -0700 (PDT) Sender: Alexander Motin Message-ID: <4C4F2921.5030604@FreeBSD.org> Date: Tue, 27 Jul 2010 21:44:49 +0300 From: Alexander Motin User-Agent: Thunderbird 2.0.0.24 (X11/20100402) MIME-Version: 1.0 To: alc@freebsd.org References: <4C4AF046.40507@FreeBSD.org> <4C4B720A.6020802@FreeBSD.org> <4C4D9779.8080505@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel TurboBoost in practice X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 18:44:55 -0000 Alan Cox wrote: > On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin > wrote: > > In that case using C2 or C3 predictably caused small performance reduce, > as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0 > won't ever sleep during test, it's TLB shutdown IPIs to other cores > still probably could suffer from waiting other cores' wakeup. > > In the deeper sleep states, are the TLB contents actually maintained > while the processor sleeps? (I notice that in some configurations, we > actually flush dirty data from the cache before sleeping.) As I understand, we flush caches only as last resort, if platform does not supports special techniques, such as disabling arbitration or making CPU to wake up on bus mastering. But same ACPI C-states could map into different CPU C-states. Some of these CPU states (like C6) could imply caches invalidation, though I am not sure it can be seen outside. ACPI 3.0 specification tells nothing about TLBs, so I am not sure we can count on their invalidation, except we do it ourselves, like it is done for caches when CPU can't keep their coherency while sleeping. -- Alexander Motin From owner-freebsd-hackers@FreeBSD.ORG Tue Jul 27 19:55:44 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EFEC1065679; Tue, 27 Jul 2010 19:55:44 +0000 (UTC) (envelope-from kraduk@googlemail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7B6CC8FC0C; Tue, 27 Jul 2010 19:55:43 +0000 (UTC) Received: by fxm13 with SMTP id 13so902905fxm.13 for ; Tue, 27 Jul 2010 12:55:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=rWOz8nNTQrOgYW0+uBnP1vVr1kQ+FGW9soiIlD9YH1Y=; b=jWpt2IsMr8ji1teqnvJMuAVOqaBF7/d+bJD7f2HzYLpofzScxxytBTAvRG2JZSzLrZ jUZk4im/Y8knMz0DQ4l4rQYVMHgw+p4fSf2etlXhF0y5Xii8g3cupBcOUz5SIk5xfvAq D7yph8fjDKhHXPHfJ5oC426HAks6OtqcS7Qlw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=o2GC10U3LVUpaZERgwNIdmsH+Af35E+SBF3pqnLamLfTK0Qm/mJAiZbG+PCzsu9YKX XPGs+S3i2us5fG3GIGrm5KzaSBHrjCM+TTptYU63MpRLdY8DyDxV1FerGAs8J04pbF8m pVhQB3TjaBhzapkqgWA5QvULQWHgPKwaDI91I= MIME-Version: 1.0 Received: by 10.239.154.204 with SMTP id f12mr585988hbc.143.1280260542150; Tue, 27 Jul 2010 12:55:42 -0700 (PDT) Received: by 10.239.160.201 with HTTP; Tue, 27 Jul 2010 12:55:42 -0700 (PDT) In-Reply-To: References: Date: Tue, 27 Jul 2010 20:55:42 +0100 Message-ID: From: krad To: freebsd-hackers@freebsd.org, FreeBSD Questions X-Mailman-Approved-At: Tue, 27 Jul 2010 21:01:04 +0000 Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Re: possible NFS lockups X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2010 19:55:44 -0000 On 27 July 2010 16:29, krad wrote: > I have a production mail system with an nfs backend. Every now and again we > see the nfs die on a particular head end. However it doesn't die across all > the nodes. This suggests to me there isnt an issue with the filer itself and > the stats from the filer concur with that. > > The symptoms are lines like this appearing in dmesg > > nfs server 10.44.17.138:/vol/vol1/mail: not responding > nfs server 10.44.17.138:/vol/vol1/mail: is alive again > > trussing df it seems to hang on getfsstat, this is presumably when it tries > the nfs mounts > > eg > > __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0) > mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = > 1746583552 (0x681ac000) > mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = > 1747632128 (0x682ac000) > munmap(0x681ac000,344064) = 0 (0x0) > getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9) > > > I have played with mount options a fair bit but they dont make much > difference. This is what they are set to at present > > 10.44.17.138:/vol/vol1/mail /mail/0 nfs > rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 0 > > When this locking is occuring I find that if I do a show mount or mount > 10.44.17.138:/vol/vol1/mail again under another mount point I can access > it fine. > > One thing I have just noticed is that lockd and statd always seem to have > died when this happens. Restarting does not help > > > I find all this a bit perplexing. Can anyone offer any help into why this > might be happening. I have dtrace compliled into the kernel if that could > help with debugging > sorry i missed a bit of critical info # uname -a FreeBSD X 8.1-STABLE FreeBSD 8.1-STABLE #2: Mon Jul 26 16:10:19 BST 2010 root@mk-pimap-7.b2b.uk.tiscali.com:/usr/obj/usr/src/sys/DTRACE i From owner-freebsd-hackers@FreeBSD.ORG Wed Jul 28 07:18:44 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D7031065672 for ; Wed, 28 Jul 2010 07:18:44 +0000 (UTC) (envelope-from asimex@gmx.net) Received: from mail.gmx.net (mailout-de.gmx.net [213.165.64.22]) by mx1.freebsd.org (Postfix) with SMTP id EB3478FC14 for ; Wed, 28 Jul 2010 07:18:43 +0000 (UTC) Received: (qmail 2007 invoked by uid 0); 28 Jul 2010 06:52:02 -0000 Received: from 212.118.142.74 by www022.gmx.net with HTTP; Wed, 28 Jul 2010 08:52:02 +0200 (CEST) Content-Type: text/plain; charset="utf-8" Date: Wed, 28 Jul 2010 08:52:01 +0200 From: "Andreas Feid" In-Reply-To: Message-ID: <20100728065201.234030@gmx.net> MIME-Version: 1.0 References: To: krad , freebsd-questions@freebsd.org, freebsd-hackers@freebsd.org X-Authenticated: #138425 X-Flags: 0001 X-Mailer: WWW-Mail 6100 (Global Message Exchange) X-Priority: 3 X-Provags-ID: V01U2FsdGVkX18dDfFVGfYdia+o8sWzdtp/V94DshEznloD4fKIZ3 fUPI8aJEZZlEwjwBUvkc9dkDi35aL7ebKYmg== Content-Transfer-Encoding: 8bit X-GMX-UID: put6eCARRkkNbs1mcWRqSIdudWkvKNM6 X-FuHaFi: 0.51000000000000001 Cc: Subject: Re: possible NFS lockups X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jul 2010 07:18:44 -0000 I have a few remarks and questions; what happens when the system is in this state? Your access to the mount fails but is restored after a while, or do you need to remount, under normal conditions the access should be restored automaticlly. The error message per se is indicating a busy server and should clear up after a while, as you have seen. How frequent do you see the error, once per hour, day? If you say filer, I assume you are talking about a Netapp filer, it might be worth taking a perfstat when the error happens, and when the condition exists. I think dtrace will not really help since this seems a server issue to me. As the filer is used to store mails, I assume we are talking about qmail or similiar environment with a huge number of small files, I would like to know how the directory structure looks on the filer. If possible get a perfstat and provide the directory structure offline to me and I will have a look. -Andreas -------- Original-Nachricht -------- > Datum: Tue, 27 Jul 2010 20:55:42 +0100 > Von: krad > An: freebsd-hackers@freebsd.org, FreeBSD Questions > Betreff: Re: possible NFS lockups > On 27 July 2010 16:29, krad wrote: > > > I have a production mail system with an nfs backend. Every now and again > we > > see the nfs die on a particular head end. However it doesn't die across > all > > the nodes. This suggests to me there isnt an issue with the filer itself > and > > the stats from the filer concur with that. > > > > The symptoms are lines like this appearing in dmesg > > > > nfs server 10.44.17.138:/vol/vol1/mail: not responding > > nfs server 10.44.17.138:/vol/vol1/mail: is alive again > > > > trussing df it seems to hang on getfsstat, this is presumably when it > tries > > the nfs mounts > > > > eg > > > > __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0) > > mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = > > 1746583552 (0x681ac000) > > mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) > = > > 1747632128 (0x682ac000) > > munmap(0x681ac000,344064) = 0 (0x0) > > getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9) > > > > > > I have played with mount options a fair bit but they dont make much > > difference. This is what they are set to at present > > > > 10.44.17.138:/vol/vol1/mail /mail/0 nfs > > rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 > 0 > > > > When this locking is occuring I find that if I do a show mount or mount > > 10.44.17.138:/vol/vol1/mail again under another mount point I can access > > it fine. > > > > One thing I have just noticed is that lockd and statd always seem to > have > > died when this happens. Restarting does not help > > > > > > I find all this a bit perplexing. Can anyone offer any help into why > this > > might be happening. I have dtrace compliled into the kernel if that > could > > help with debugging > > > > sorry i missed a bit of critical info > > # uname -a > FreeBSD X 8.1-STABLE FreeBSD 8.1-STABLE #2: Mon Jul 26 16:10:19 BST 2010 > root@mk-pimap-7.b2b.uk.tiscali.com:/usr/obj/usr/src/sys/DTRACE i > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 06:01:11 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A34C6106564A for ; Thu, 29 Jul 2010 06:01:11 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 700E38FC08 for ; Thu, 29 Jul 2010 06:01:11 +0000 (UTC) Received: by iwn35 with SMTP id 35so300960iwn.13 for ; Wed, 28 Jul 2010 23:01:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=ylIhYcZE9IGSSLxwskrRAXRMkjbc9szOSqHUOoj3veE=; b=haUdb5OR5IQY/16R9uFEF6Nhxq8Qw1CulRZJYPA0n/Va2dgZHtS/EuKYSbLrEfIDZl X4nXClzqsI/edHxdrIaFVhZcx24JuXW+h12j35UE1s5zdsmsmFealQXUjOJvfXmDnO6n aVvCoL6XngSmVwIgmWTzChZBeMGETarKWjmyk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=PSKZGLdLIs6OOs0C/PHXPRRhRs7FUmOaLPSBYLeF1S0Nku8HT7GUgX9FywtpO9Ezuf aUD3dqt0kJz3GuM4Zr/hqtTi+gydkhDOStDX2bKY+e38MTmHD3+1VbzcC6OAcUKqaHmo qhtPcAoanQEc6s8GVVIy0pK+MKphBGPy49Xww= MIME-Version: 1.0 Received: by 10.231.59.13 with SMTP id j13mr13391175ibh.77.1280383270838; Wed, 28 Jul 2010 23:01:10 -0700 (PDT) Received: by 10.231.169.18 with HTTP; Wed, 28 Jul 2010 23:01:10 -0700 (PDT) Date: Wed, 28 Jul 2010 23:01:10 -0700 Message-ID: From: Garrett Cooper To: hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: nanosleep - does it make sense with tv_sec < 0? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 06:01:11 -0000 Hi Hackers, I ran into an oddity with the POSIX spec that seems a bit unrealistic: [EINVAL] The rqtp argument specified a nanosecond value less than zero or greater than or equal to 1000 million. Seems like it should also apply for seconds < 0. We current silently pass this argument in kern/kern_time.c:kern_nanosleep: int kern_nanosleep(struct thread *td, struct timespec *rqt, struct timespec *rmt) { struct timespec ts, ts2, ts3; struct timeval tv; int error; if (rqt->tv_nsec < 0 || rqt->tv_nsec >= 1000000000) return (EINVAL); if (rqt->tv_sec < 0 || (rqt->tv_sec == 0 && rqt->tv_nsec == 0)) // <-- first clause here return (0); but I'm wondering whether or not it makes logical sense for us to do this (sleep for a negative amount of time?)... FWIW Linux returns -1 and sets EINVAL in this case, which makes more sense to me. Thanks, -Garrett From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 06:26:54 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2CADC1065672; Thu, 29 Jul 2010 06:26:54 +0000 (UTC) (envelope-from yanegomi@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id CF8308FC14; Thu, 29 Jul 2010 06:26:53 +0000 (UTC) Received: by iwn35 with SMTP id 35so326475iwn.13 for ; Wed, 28 Jul 2010 23:26:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:cc:content-type; bh=PEU5dOx2jKlgYHOSv1Mr0Zmf6a772ZRXY75on/0Ca/k=; b=XrIsrEoBzk5X2AmErw87EpDuTe40lxcEb0XpFl5eotJ/SQdpPD2LmnvZS/S3SvxUhu quyRr7QbR0zVx16WySH0Vkadnxj6mGNrKKYuwW1AA07lHtxYwEBVAvJZAbfI8jqDdbNT Kweja7rSC2jFwnP1IDBg6UebD1NDq+NQE2xuw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=P1Ymru1q0qwHX0rTZbN3hNpqrf3q52txZwW/jxHJ+cKhSx4aTJGN+W4bcChlkYNHWj nuPZyCsfa0UAXIHjxlapurBH1DwnelSLobRMkJF6dzdhthJUB3LHo575gT07CBv6q+dp XaT+o9UHrkm4N5gpn3wlgO9i/Z3C5IzQ3BFd8= MIME-Version: 1.0 Received: by 10.231.184.68 with SMTP id cj4mr13562211ibb.93.1280384812889; Wed, 28 Jul 2010 23:26:52 -0700 (PDT) Received: by 10.231.169.18 with HTTP; Wed, 28 Jul 2010 23:26:52 -0700 (PDT) Date: Wed, 28 Jul 2010 23:26:52 -0700 Message-ID: From: Garrett Cooper To: hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: standards@freebsd.org Subject: Deterministic failure to meet sysconf(_SC_TIMER_MAX) for CLOCK_REALTIME X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 06:26:54 -0000 Hi, Running the following noted test [1], I always run into issues on the 29th iteration and EAGAIN: $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable $ conformance/behavior/timers/1-1.run-test timer_create() did not return success for iteration 29: Resource temporarily unavailable Interestingly enough, sysconf(_SC_TIMER_MAX) returns 54; this is the requirement that the test is attempting to validate (that at least _SC_TIMER_MAX timers can be created via timer_create). The timers kernel code is capped to 25 by default, by a preprocessor define in .../sys/sysctl.h: /sys/sys/sysctl.h:#define CTL_P1003_1B_TIMER_MAX 25 /* int */ Doesn't make sense why an additional 4 timers were created. Oh, and the sysctl reports something else entirely: p1003_1b.timers: 200112 p1003_1b.delaytimer_max: 2147483647 p1003_1b.timer_max: 32 So, what number is the source of truth and why don't they all match? Thanks! -Garrett PS I'm still running a CURRENT kernel based off of r206173... [1] http://ltp.git.sourceforge.net/git/gitweb.cgi?p=ltp/ltp-dev.git;a=blob;f=testcases/open_posix_testsuite/conformance/behavior/timers/1-1.c;h=ac043b0913e93f8db93cc74e249316f5ff82bdc8;hb=HEAD From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 09:57:21 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFE021065670 for ; Thu, 29 Jul 2010 09:57:21 +0000 (UTC) (envelope-from rhfb@akira.stdio.com) Received: from akira.stdio.com (akira.stdio.com [204.152.114.29]) by mx1.freebsd.org (Postfix) with SMTP id 8C5848FC18 for ; Thu, 29 Jul 2010 09:57:11 +0000 (UTC) Received: from akira (localhost [127.0.0.1]) by akira.stdio.com (Postfix) with SMTP id AD3F3C2 for ; Thu, 29 Jul 2010 05:39:58 -0400 (EDT) Date: Thu Jul 29 05:41:06 EDT 2010 In-Reply-To: From: To: References: Message-Id: <20100729094046.AD3F3C2@akira.stdio.com> Subject: (no subject) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 09:57:21 -0000 I have a similar problem. I have a NFS server (8.0 upgraded a couple times since Feb 2010) that locks up and requires a reboot. The clients are busy vm's from VMWare ESXi using the NFS server for vmdk virtual disk storage. The ESXi reports nfs server inactive and all the vm's post disk write errors when trying to write to their disk. /etc/rc.d/nfsd restart fails to work (it can not kill the nfsd process) The nfsd process runs at 100% cpu at rc_lo state in top. reboot is the only fix. It has only happened under two circumstances. 1) Installation of a VM using Windows 2008. 2) Migrating 16 million mail messages from a physical server to a VM running FreeBSD with ZFS file system as a VM on the ESXi box that uses NFS to store the VM's ZFS disk. The NFS server uses ZFS also. From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 14:43:30 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ABC0A106566B for ; Thu, 29 Jul 2010 14:43:29 +0000 (UTC) (envelope-from pebu3op@googlemail.com) Received: from mail.net.t-labs.tu-berlin.de (mail.net.t-labs.tu-berlin.de [130.149.220.252]) by mx1.freebsd.org (Postfix) with ESMTP id 253028FC17 for ; Thu, 29 Jul 2010 14:43:28 +0000 (UTC) Received: from raven.net.t-labs.tu-berlin.de (raven.net.t-labs.tu-berlin.de [130.149.220.18]) by mail.net.t-labs.tu-berlin.de (Postfix) with ESMTP id B680570015BA for ; Thu, 29 Jul 2010 16:13:16 +0200 (CEST) From: Alexander Fiveg Organization: Google To: freebsd-hackers@freebsd.org Date: Thu, 29 Jul 2010 16:13:15 +0200 User-Agent: KMail/1.9.10 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201007291613.15719.pebu3op@googlemail.com> Subject: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pebu3op@googlemail.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 14:43:30 -0000 Hello hackers, while working on the "ringmap"-project I've faced a problem of "no coherency in the memory regions mapped from kernel into the user-space". Details: While integrating ringmap with the ixgbe-driver, I've made some changes to the ixgbe: 1. The mbufs for received packets will be only allocated once. 2. Allocated mbufs will be reused as in ring-buffer one after the other (no new mbufs will be allocated again). 3. Packet buffers (mbuf->m_data) will mapped into the user-space. So, the user-space process has access to the packets after those DMA-transfer from the network adapter into the RAM Problem: Sometimes the user-space process sees not new DMAed data in the mapped packet-buffer, but the OLD data that was previously stored in the same packet buffer. If I try to monitor the received data in the kernel, the kernel sees the data correctly. But sometimes it is vice versa: the user-space process sees the correct new data and the kernel sees the old data in the buffer. It seems to be that the memory-buffer for packets is not synchronized with all CPU's caches. Probably [user|kernel]-thread tries sometimes to reads the old dirty data from the cache of the CPU the thread running on. (In the same time the other thread sees the new data in the same mapped buffer). Can you please provide me with some information that would be helpful for avoiding this unexpected coherence-problem. Alex P.S. Details about hardware and used software: 1. /var/run/dmesg.boot : ... CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x20f10 Family = f Model = 21 Stepping = 0 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x3 real memory = 3758030848 (3583 MB) avail memory = 3677495296 (3507 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 4 package(s) x 2 core(s) ... 2. uname -v FreeBSD 9.0-CURRENT #3 3. sysctl kern.osreldate kern.osreldate: 900014 4. //depot/projects/soc2010/ringmap/ From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 16:13:27 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B71B1065675 for ; Thu, 29 Jul 2010 16:13:27 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A95B58FC1D for ; Thu, 29 Jul 2010 16:13:26 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA26736; Thu, 29 Jul 2010 19:13:24 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C51A8A3.7080808@icyb.net.ua> Date: Thu, 29 Jul 2010 19:13:23 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: pebu3op@googlemail.com References: <201007291613.15719.pebu3op@googlemail.com> In-Reply-To: <201007291613.15719.pebu3op@googlemail.com> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 16:13:27 -0000 on 29/07/2010 17:13 Alexander Fiveg said the following: > P.S. Details about hardware and used software: > 1. /var/run/dmesg.boot : > ... > CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) > Origin = "AuthenticAMD" Id = 0x20f10 Family = f Model = 21 Stepping = 0 > > Features=0x178bfbff > Features2=0x1 > AMD Features=0xe2500800 > AMD Features2=0x3 > real memory = 3758030848 (3583 MB) > avail memory = 3677495296 (3507 MB) > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs > FreeBSD/SMP: 4 package(s) x 2 core(s) > ... > > 2. uname -v > FreeBSD 9.0-CURRENT #3 > > 3. sysctl kern.osreldate > kern.osreldate: 900014 > > 4. //depot/projects/soc2010/ringmap/ No help, but just curious - do use amd64 variant? If yes, can you reproduce the problem with i386? -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 16:45:48 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DDC101065746 for ; Thu, 29 Jul 2010 16:45:48 +0000 (UTC) (envelope-from pebu3op@googlemail.com) Received: from mail.net.t-labs.tu-berlin.de (mail.net.t-labs.tu-berlin.de [130.149.220.252]) by mx1.freebsd.org (Postfix) with ESMTP id A08E08FC08 for ; Thu, 29 Jul 2010 16:45:48 +0000 (UTC) Received: from raven.net.t-labs.tu-berlin.de (raven.net.t-labs.tu-berlin.de [130.149.220.18]) by mail.net.t-labs.tu-berlin.de (Postfix) with ESMTP id 3A92870015BA; Thu, 29 Jul 2010 18:45:47 +0200 (CEST) From: Alexander Fiveg Organization: Google To: Andriy Gapon Date: Thu, 29 Jul 2010 18:45:45 +0200 User-Agent: KMail/1.9.10 References: <201007291613.15719.pebu3op@googlemail.com> <4C51A8A3.7080808@icyb.net.ua> In-Reply-To: <4C51A8A3.7080808@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201007291845.46015.pebu3op@googlemail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pebu3op@googlemail.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 16:45:49 -0000 On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote: > on 29/07/2010 17:13 Alexander Fiveg said the following: > > P.S. Details about hardware and used software: > > 1. /var/run/dmesg.boot : > > ... > > CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) > > Origin = "AuthenticAMD" Id = 0x20f10 Family = f Model = 21 Stepping > > = 0 > > > > Features=0x178bfbff >MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1 > > AMD Features=0xe2500800 > > AMD Features2=0x3 > > real memory = 3758030848 (3583 MB) > > avail memory = 3677495296 (3507 MB) > > ACPI APIC Table: > > FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs > > FreeBSD/SMP: 4 package(s) x 2 core(s) > > ... > > > > 2. uname -v > > FreeBSD 9.0-CURRENT #3 > > > > 3. sysctl kern.osreldate > > kern.osreldate: 900014 > > > > 4. //depot/projects/soc2010/ringmap/ > > No help, but just curious - do use amd64 variant? > If yes, can you reproduce the problem with i386? No, my kernel is i386, but I will try test it with amd64. Thanks Alex From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 16:57:36 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD0521065675 for ; Thu, 29 Jul 2010 16:57:36 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 26D548FC1B for ; Thu, 29 Jul 2010 16:57:35 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA27313; Thu, 29 Jul 2010 19:57:33 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C51B2FD.6070702@icyb.net.ua> Date: Thu, 29 Jul 2010 19:57:33 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: pebu3op@googlemail.com References: <201007291613.15719.pebu3op@googlemail.com> <4C51A8A3.7080808@icyb.net.ua> In-Reply-To: <4C51A8A3.7080808@icyb.net.ua> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@freebsd.org Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 16:57:36 -0000 on 29/07/2010 19:13 Andriy Gapon said the following: > on 29/07/2010 17:13 Alexander Fiveg said the following: >> P.S. Details about hardware and used software: >> 1. /var/run/dmesg.boot : >> ... >> CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) >> Origin = "AuthenticAMD" Id = 0x20f10 Family = f Model = 21 Stepping = 0 >> >> Features=0x178bfbff >> Features2=0x1 >> AMD Features=0xe2500800 >> AMD Features2=0x3 >> real memory = 3758030848 (3583 MB) >> avail memory = 3677495296 (3507 MB) >> ACPI APIC Table: >> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs >> FreeBSD/SMP: 4 package(s) x 2 core(s) >> ... >> >> 2. uname -v >> FreeBSD 9.0-CURRENT #3 >> >> 3. sysctl kern.osreldate >> kern.osreldate: 900014 >> >> 4. //depot/projects/soc2010/ringmap/ In fact I have a suspicion that the problem might have to do with multiple mappings of the shared pages, but far from sure... Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT; starting at the following words: «The PAT allows any memory type to be specified in the page tables, and therefore it is possible to have a single physical page mapped to two or more different linear addresses, each with different memory types. Intel does not support this practice...» -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 17:02:41 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CCE01065680 for ; Thu, 29 Jul 2010 17:02:41 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id D9A4C8FC14 for ; Thu, 29 Jul 2010 17:02:40 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA27387; Thu, 29 Jul 2010 20:02:38 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C51B42D.1060402@icyb.net.ua> Date: Thu, 29 Jul 2010 20:02:37 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: pebu3op@googlemail.com References: <201007291613.15719.pebu3op@googlemail.com> <4C51A8A3.7080808@icyb.net.ua> <201007291845.46015.pebu3op@googlemail.com> In-Reply-To: <201007291845.46015.pebu3op@googlemail.com> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 17:02:41 -0000 on 29/07/2010 19:45 Alexander Fiveg said the following: > On Thursday 29 July 2010 18:13:23 Andriy Gapon wrote: >> on 29/07/2010 17:13 Alexander Fiveg said the following: >>> P.S. Details about hardware and used software: >>> 1. /var/run/dmesg.boot : >>> ... >>> CPU: Dual Core AMD Opteron(tm) Processor 865 (1800.01-MHz 686-class CPU) >>> Origin = "AuthenticAMD" Id = 0x20f10 Family = f Model = 21 Stepping >>> = 0 >>> >>> Features=0x178bfbff>> MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x1 >>> AMD Features=0xe2500800 >>> AMD Features2=0x3 >>> real memory = 3758030848 (3583 MB) >>> avail memory = 3677495296 (3507 MB) >>> ACPI APIC Table: >>> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs >>> FreeBSD/SMP: 4 package(s) x 2 core(s) >>> ... >>> >>> 2. uname -v >>> FreeBSD 9.0-CURRENT #3 >>> >>> 3. sysctl kern.osreldate >>> kern.osreldate: 900014 >>> >>> 4. //depot/projects/soc2010/ringmap/ >> No help, but just curious - do use amd64 variant? >> If yes, can you reproduce the problem with i386? > > No, my kernel is i386, but I will try test it with amd64. Oh, nevermind actually. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 20:03:03 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64BA71065789 for ; Thu, 29 Jul 2010 20:03:03 +0000 (UTC) (envelope-from babkin@verizon.net) Received: from vms173019pub.verizon.net (vms173019pub.verizon.net [206.46.173.19]) by mx1.freebsd.org (Postfix) with ESMTP id 3B4DD8FC08 for ; Thu, 29 Jul 2010 20:03:03 +0000 (UTC) Received: from vms170009.mailsrvcs.net ([unknown] [172.18.12.132]) by vms173019.mailsrvcs.net (Sun Java(tm) System Messaging Server 7u2-7.02 32bit (built Apr 16 2009)) with ESMTPA id <0L6C00J6E50W3RW1@vms173019.mailsrvcs.net> for freebsd-hackers@freebsd.org; Thu, 29 Jul 2010 15:02:58 -0500 (CDT) Received: from 130.214.17.1 ([130.214.17.1]) by vms170009.mailsrvcs.net (Verizon Webmail) with HTTP; Thu, 29 Jul 2010 15:02:56 -0500 (CDT) Date: Thu, 29 Jul 2010 15:02:56 -0500 (CDT) From: Sergey Babkin To: avg@icyb.net.ua Message-id: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: quoted-printable X-Originating-IP: [130.214.17.1] X-Mailman-Approved-At: Thu, 29 Jul 2010 20:13:59 +0000 Cc: freebsd-hackers@freebsd.org, pebu3op@googlemail.com Subject: Re: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 20:03:03 -0000 Jul 29, 2010 12:58:07 PM, avg@icyb.net.ua wrote: >on 29/07/2010 19:13 Andriy Gapon said the following: >> on 29/07/2010 17:13 Alexander Fiveg said the following: >In fact I have a suspicion that the problem might have to do with multiple >mappings of the shared pages, but far from sure... >Take a look at Intel=C2=AE 64 and IA-32 Architectures Software Developer= =E2=80=99s Manual >Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming = the PAT; >starting at the following words: >=C2=ABThe PAT allows any memory type to be specified in the page tables, a= nd therefore >it is possible to have a single physical page mapped to two or more differ= ent >linear addresses, each with different memory types. Intel does not support= this >practice...=C2=BB My guess would be that the memory type is not marked as DMA-capable. AFAIK = the Intel CPUs do the hardware snooping on the physical addresses, so they have no coheren= cy issues benween=20 themselves. However if a DMA writer changes the memory, this I think does n= ot get normally=20 propagated to the front-side bus, and the CPUs would not see it. You may ne= ed to either explicitly flush the CPU cache before accessing these pages or mark them as= non-cacheable. -SB From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 20:16:42 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC9491065672 for ; Thu, 29 Jul 2010 20:16:42 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4254A8FC08 for ; Thu, 29 Jul 2010 20:16:41 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA29684; Thu, 29 Jul 2010 23:16:32 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OeZWq-0002Kr-4u; Thu, 29 Jul 2010 23:16:32 +0300 Message-ID: <4C51E198.8060800@icyb.net.ua> Date: Thu, 29 Jul 2010 23:16:24 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: Sergey Babkin References: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net> In-Reply-To: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-hackers@freebsd.org, pebu3op@googlemail.com Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 20:16:43 -0000 on 29/07/2010 23:02 Sergey Babkin said the following: > Jul 29, 2010 12:58:07 PM, avg@icyb.net.ua wrote: > >> on 29/07/2010 19:13 Andriy Gapon said the following: >>> on 29/07/2010 17:13 Alexander Fiveg said the following: >> In fact I have a suspicion that the problem might have to do with multiple >> mappings of the shared pages, but far from sure... >> Take a look at Intel® 64 and IA-32 Architectures Software Developer’s Manual >> Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 Programming the PAT; >> starting at the following words: >> «The PAT allows any memory type to be specified in the page tables, and therefore >> it is possible to have a single physical page mapped to two or more different >> linear addresses, each with different memory types. Intel does not support this >> practice...» > > My guess would be that the memory type is not marked as DMA-capable. AFAIK the Intel CPUs > do the hardware snooping on the physical addresses, so they have no coherency issues benween > themselves. However if a DMA writer changes the memory, this I think does not get normally > propagated to the front-side bus, and the CPUs would not see it. You may need to either > explicitly flush the CPU cache before accessing these pages or mark them as non-cacheable. My guess was approximately the same - if one mapping is done in kernel for DMA purposes, then the memory type is, most likely, set to uncached. But the userland mapping of the same pages most likely marks the same pages (via different virtual addresses) as cached. Depending on the hardware and on what mappings were used on a particular CPU (core) to access that memory, there could be differences in interaction with DMA. -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 20:29:31 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9CB11065677 for ; Thu, 29 Jul 2010 20:29:31 +0000 (UTC) (envelope-from ligregni@unixmexico.org) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id BC7D08FC12 for ; Thu, 29 Jul 2010 20:29:31 +0000 (UTC) Received: by iwn35 with SMTP id 35so733647iwn.13 for ; Thu, 29 Jul 2010 13:29:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.146.135 with SMTP id h7mr401546ibv.149.1280433620838; Thu, 29 Jul 2010 13:00:20 -0700 (PDT) Received: by 10.231.192.65 with HTTP; Thu, 29 Jul 2010 13:00:20 -0700 (PDT) Date: Thu, 29 Jul 2010 15:00:20 -0500 Message-ID: From: Sergio Ligregni To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Improvement for Distributed Audit Project X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 20:29:32 -0000 I am Sergio Ligregni, from Mexico, I am currently working in the Distribute= d Audit Project at GSoC 2010, I want to ask your help in these things: HELP NEEDED: /*++++++++++++++++++++++*/ - which code should I base my development in getting parameters from a file= ? (I've searched some audit.c, auditd_fbsd.c, auditd.c but not got the function to do that, maybe I missed something), currently I have files like= : /var/audit /var2/audit 1000 yes 53686 and got the parameters with sscanf, but the right way (the one I want to know wich code to take as baseline): dir:/var/audit /var2/audit time: 1000 slave_dir: yes port: 53686 and not to use sscanf (the avoiding of that function is a security concern made by my mentor). I think I can do an algorithm to implement that, but maybe there is a better/safer way to do in order to keeping the standard. /*++++++++++++++++++++++*/ Currently I have this function to verify if a file is a trail, having it's name, this is very poor and it needs to be improved, any ideas? /* * When exploring /var/audit/ (or the directory where the trails are), not * all files are trails so we must ensure we will only deal with the ones * that are trails. */ static int is_audit_trail(char *path) { /* * We have these posibilities, only the first one is allowed * 20100619223115.20100619223131 20100619223131.not_terminated * current */ if (strlen(path) =3D=3D 29 && path[14] =3D=3D '.' && isdigit(path[15])) { /* XXX To improve this checking later */ return 1; } return 0; } /*++++++++++++++++++++++*/ By the way the Wiki and the Perforce Repository for this project are: http://wiki.freebsd.org/SOC2010SergioLigregni http://p4db.freebsd.org/depotTreeBrowser.cgi?FSPC=3D//depot/projects/soc201= 0/disaudit&HIDEDEL=3DNO Thanks! --=20 ----------------------------------------------------------- Sergio Andr=E9s Ligregni Arredondo Estudiante Ingenier=EDa en Sistemas Computacionales, ITQ. Is UNIX Hot Enough for You? | FreeBSD From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 21:17:13 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CAC111065672 for ; Thu, 29 Jul 2010 21:17:13 +0000 (UTC) (envelope-from emaste@freebsd.org) Received: from mail1.sandvine.com (Mail1.sandvine.com [64.7.137.162]) by mx1.freebsd.org (Postfix) with ESMTP id 51F878FC18 for ; Thu, 29 Jul 2010 21:17:13 +0000 (UTC) Received: from labgw2.phaedrus.sandvine.com (192.168.222.22) by WTL-EXCH-1.sandvine.com (192.168.196.31) with Microsoft SMTP Server id 14.0.694.0; Thu, 29 Jul 2010 17:06:10 -0400 Received: by labgw2.phaedrus.sandvine.com (Postfix, from userid 10332) id D6E0A33C00; Thu, 29 Jul 2010 17:06:22 -0400 (EDT) Date: Thu, 29 Jul 2010 17:06:22 -0400 From: Ed Maste To: Message-ID: <20100729210622.GA84094@sandvine.com> References: <201007281510.o6SFAV5J052045@svn.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <201007281510.o6SFAV5J052045@svn.freebsd.org> User-Agent: Mutt/1.4.2.1i Subject: Re: svn commit: r210561 - projects/sv/sys/net X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 21:17:13 -0000 On Wed, Jul 28, 2010 at 03:10:31PM +0000, Attilio Rao wrote: > Log: > Initial import of the netdump files. > They still need a lot of polishing and cleanup so they might not be > considered definitive at all. This code is a port to recent FreeBSD of Darrell Anderson's network crashdump support, which was done in the 4.x days. I can't find a current website with the original versions but archive.org has a cache of course: http://web.archive.org/web/20041204223729/http://www.cs.duke.edu/~anderson/freebsd/netdump/ Quoting from the old readme: Netdump provides FreeBSD kernel crash dumping over the network. Netdump is a FreeBSD kernel module client and user-level server. A normal kernel crash writes a raw dump of memory to a dedicated partition (usually the swap partition) using a low-level disk routine, and then copies that raw dump into a file (via savecore) during the following boot process. Netdump replaces the standard dump routine. During a crash, a netdump client broadcasts to locate a netdump server, then sends the dump as UDP/IP packets (with retransmission after loss). The netdump server creates a dump file suitable for gdb. If netdump fails (for example, no netdump server is located), a normal disk dump is performed. There is cleanup work to be done still, but we plan to have this in shape for 9.0. -Ed From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 21:41:23 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A391106567B for ; Thu, 29 Jul 2010 21:41:23 +0000 (UTC) (envelope-from pebu3op@googlemail.com) Received: from mail.net.t-labs.tu-berlin.de (mail.net.t-labs.tu-berlin.de [130.149.220.252]) by mx1.freebsd.org (Postfix) with ESMTP id 0F9AF8FC0C for ; Thu, 29 Jul 2010 21:41:22 +0000 (UTC) Received: from raven.net.t-labs.tu-berlin.de (raven.net.t-labs.tu-berlin.de [130.149.220.18]) by mail.net.t-labs.tu-berlin.de (Postfix) with ESMTP id 0DEF5700D29E; Thu, 29 Jul 2010 23:41:22 +0200 (CEST) From: Alexander Fiveg Organization: Google To: Andriy Gapon Date: Thu, 29 Jul 2010 23:41:20 +0200 User-Agent: KMail/1.9.10 References: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net> <4C51E198.8060800@icyb.net.ua> In-Reply-To: <4C51E198.8060800@icyb.net.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <201007292341.21123.pebu3op@googlemail.com> Cc: freebsd-hackers@freebsd.org, Sergey Babkin Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pebu3op@googlemail.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 21:41:23 -0000 On Thursday 29 July 2010 22:16:24 Andriy Gapon wrote: > on 29/07/2010 23:02 Sergey Babkin said the following: > > Jul 29, 2010 12:58:07 PM, avg@icyb.net.ua wrote: > >> on 29/07/2010 19:13 Andriy Gapon said the following: > >>> on 29/07/2010 17:13 Alexander Fiveg said the following: > >> > >> In fact I have a suspicion that the problem might have to do with > >> multiple mappings of the shared pages, but far from sure... > >> Take a look at Intel=C2=AE 64 and IA-32 Architectures Software Develop= er=E2=80=99s > >> Manual Volume 3A - System Programming Guide, Part 1; Chapter 11.12.4 > >> Programming the PAT; starting at the following words: > >> =C2=ABThe PAT allows any memory type to be specified in the page table= s, and > >> therefore it is possible to have a single physical page mapped to two = or > >> more different linear addresses, each with different memory types. Int= el > >> does not support this practice...=C2=BB > > > > My guess would be that the memory type is not marked as DMA-capable. > > AFAIK the Intel CPUs do the hardware snooping on the physical addresses, > > so they have no coherency issues benween themselves. However if a DMA > > writer changes the memory, this I think does not get normally propagated > > to the front-side bus, and the CPUs would not see it. You may need to > > either explicitly flush the CPU cache before accessing these pages or > > mark them as non-cacheable. > > My guess was approximately the same - if one mapping is done in kernel for > DMA purposes, then the memory type is, most likely, set to uncached. But > the userland mapping of the same pages most likely marks the same pages > (via different virtual addresses) as cached. Depending on the hardware a= nd > on what mappings were used on a particular CPU (core) to access that > memory, there could be differences in interaction with DMA. Thanks a lot for your answers. But i am afraid i do not have enough=20 experience to solve these tasks. Could you please provide me with helpful=20 information how to:=20 =2D get access to the pages associated with a certain memory-buffer ?=20 I mean, I want to get the structures, that describe the page properties I=20 should change (for instance, in order to make the page non-cacheable). if you are aware of any good papers or examples in the system code, where=20 these topics are covered, I would appreciate it if you gave me the=20 references.=20 Alex From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 22:09:25 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2074A1065693 for ; Thu, 29 Jul 2010 22:09:25 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 6A0F88FC0C for ; Thu, 29 Jul 2010 22:09:24 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA01097; Fri, 30 Jul 2010 01:09:19 +0300 (EEST) (envelope-from avg@icyb.net.ua) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1OebHz-0002Sb-GZ; Fri, 30 Jul 2010 01:09:19 +0300 Message-ID: <4C51FC0E.9050204@icyb.net.ua> Date: Fri, 30 Jul 2010 01:09:18 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100603) MIME-Version: 1.0 To: pebu3op@googlemail.com References: <382607918.1356296.1280433776963.JavaMail.root@vms170009.mailsrvcs.net> <4C51E198.8060800@icyb.net.ua> <201007292341.21123.pebu3op@googlemail.com> In-Reply-To: <201007292341.21123.pebu3op@googlemail.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Sergey Babkin Subject: Re: coherence-problem on the mapped memory buffer X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 22:09:25 -0000 on 30/07/2010 00:41 Alexander Fiveg said the following: > Thanks a lot for your answers. But i am afraid i do not have enough > experience to solve these tasks. Could you please provide me with helpful > information how to: > - get access to the pages associated with a certain memory-buffer ? > I mean, I want to get the structures, that describe the page properties I > should change (for instance, in order to make the page non-cacheable). > > if you are aware of any good papers or examples in the system code, where > these topics are covered, I would appreciate it if you gave me the > references. I don't have a recipe, but some pointers to get you started: 1. investigate BUS_DMA_NOCACHE, see bus_dma(9) 2. check sys/dev/sound/pci/hda/hdac.c for HDAC_F_DMA_NOCACHE and comment about PCIe snoop - this might be relevenat 3. see pmap_change_attr for way to change caching type for a memory mapping 4. hope that more knowledgeable people (experts) provide their advice, keep nudging them via mailing list(s) :-) -- Andriy Gapon From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 23:39:03 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C90D106566B for ; Thu, 29 Jul 2010 23:39:03 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 167308FC17 for ; Thu, 29 Jul 2010 23:39:02 +0000 (UTC) Received: by iwn35 with SMTP id 35so939082iwn.13 for ; Thu, 29 Jul 2010 16:39:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=srbXnpSkwnZyWReN2hEzf3ACyGvYQNjoyKDQ2bOvOmM=; b=Lw8PqhJT7jzmhrrw0mb4DelDo6R7ni5ImEh+khweawC/vLa0g+0k5HjLBnDalhAaYe w0FC3vfL+HVMtlcuNOLWLRHWk10WZQj/OQGm4zVNOAGaLhCRulckdh28TKgrz0mm3Q+1 i8o92pn23drjfNBcUWXpGNQcf82H+tO9O2aUY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; b=GuBzI/LH20/o4wBarwm2qn/9hv8ddPKnIgsvQc8mTr8oDYqJIStJxBA7p183x82J0q vcE1Euz0vS5vJ8Hv4yQz5G9bbumHQfktJUvASXGLGD47sxFDeDESJ7A8qELfvfFgmWdo ZbGwwFum7dRihK/0FN58fddWLCLiy6p+9pvXk= MIME-Version: 1.0 Received: by 10.42.9.69 with SMTP id l5mr186837icl.80.1280446742146; Thu, 29 Jul 2010 16:39:02 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.42.6.85 with HTTP; Thu, 29 Jul 2010 16:39:02 -0700 (PDT) Date: Thu, 29 Jul 2010 16:39:02 -0700 X-Google-Sender-Auth: 4ouVY9hWjuzZ2dhMKYwYcWwNxKs Message-ID: From: mdf@FreeBSD.org To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: sched_pin() versus PCPU_GET X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 23:39:03 -0000 We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td->td_flags &= ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: ffffffff802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx ffffffff802db217: 00 00 ffffffff802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); ffffffff802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax ffffffff802db226: 00 00 if (lock_list != NULL && lock_list->ll_count != 0) { ffffffff802db228: 48 85 c0 test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); ffffffff802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) ffffffff802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) if (lock_list != NULL && lock_list->ll_count != 0) { ffffffff802db239: 0f 84 ff 00 00 00 je ffffffff802db33e ffffffff802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the "mov %gs:0x48,%rax" PCPU_GET to before the sched_pin() increment? Thanks, matthew From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 29 23:57:26 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D5671065673 for ; Thu, 29 Jul 2010 23:57:26 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 60E658FC20 for ; Thu, 29 Jul 2010 23:57:26 +0000 (UTC) Received: by iwn35 with SMTP id 35so960151iwn.13 for ; Thu, 29 Jul 2010 16:57:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=JofS1w1C5pEvTz/bYg2iJafvnkpVzgN2GwEf13Wku84=; b=MVX6SDjnXVf1N0n/VKO2BCCcHcoN1toO2ZPkgJj40gbHUqVrhCFLEHQsMSQ02VdJDH e0F4DUnZpc3/uebcojSTxlUCHA0I2a23SiufSZKyTCBMZf+984aDHpU/qq7QnrE4a8Ps 25RGtXOAEAHba0vLNX9Wq3i4Wk091uQVw0Jb4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=rKfRVw8WxuSy1fzHecX4/IQI7LLO1CRL//1b8O/Jnhp9EbPp1WwI8uSk/DB6rcjxn8 b/umbfzEVKdqYXsKW8x+AZykjxNfy47hwNlZJ5M5kQI9g1VOsqOWFN4xvrvsZl1uab8F 4mP3qgSB2i0bNAtGqlF83QGUDGaSvJgWxNNxA= MIME-Version: 1.0 Received: by 10.42.9.4 with SMTP id k4mr194785ick.72.1280447845206; Thu, 29 Jul 2010 16:57:25 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.42.6.85 with HTTP; Thu, 29 Jul 2010 16:57:25 -0700 (PDT) In-Reply-To: References: Date: Thu, 29 Jul 2010 16:57:25 -0700 X-Google-Sender-Auth: E1Ba7yahsoiKgm4UgP7rPIrjA_w Message-ID: From: mdf@FreeBSD.org To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: sched_pin() versus PCPU_GET X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jul 2010 23:57:26 -0000 On Thu, Jul 29, 2010 at 4:39 PM, wrote: > We've seen a few instances at work where witness_warn() in ast() > indicates the sched lock is still held, but the place it claims it was > held by is in fact sometimes not possible to keep the lock, like: > > =A0 =A0 =A0 =A0thread_lock(td); > =A0 =A0 =A0 =A0td->td_flags &=3D ~TDF_SELECT; > =A0 =A0 =A0 =A0thread_unlock(td); > > What I was wondering is, even though the assembly I see in objdump -S > for witness_warn has the increment of td_pinned before the PCPU_GET: > > ffffffff802db210: =A0 =A0 =A0 65 48 8b 1c 25 00 00 =A0 =A0mov =A0 =A0%gs:= 0x0,%rbx > ffffffff802db217: =A0 =A0 =A0 00 00 > ffffffff802db219: =A0 =A0 =A0 ff 83 04 01 00 00 =A0 =A0 =A0 incl =A0 0x10= 4(%rbx) > =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with thread m= igration. > =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks owner= ship, > =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned. > =A0 =A0 =A0 =A0 */ > =A0 =A0 =A0 =A0sched_pin(); > =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks); > ffffffff802db21f: =A0 =A0 =A0 65 48 8b 04 25 48 00 =A0 =A0mov =A0 =A0%gs:= 0x48,%rax > ffffffff802db226: =A0 =A0 =A0 00 00 > =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > ffffffff802db228: =A0 =A0 =A0 48 85 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tes= t =A0 %rax,%rax > =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with thread m= igration. > =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks owner= ship, > =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned. > =A0 =A0 =A0 =A0 */ > =A0 =A0 =A0 =A0sched_pin(); > =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks); > ffffffff802db22b: =A0 =A0 =A0 48 89 85 f0 fe ff ff =A0 =A0mov =A0 =A0%rax= ,-0x110(%rbp) > ffffffff802db232: =A0 =A0 =A0 48 89 85 f8 fe ff ff =A0 =A0mov =A0 =A0%rax= ,-0x108(%rbp) > =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > ffffffff802db239: =A0 =A0 =A0 0f 84 ff 00 00 00 =A0 =A0 =A0 je =A0 =A0 ff= ffffff802db33e > > ffffffff802db23f: =A0 =A0 =A0 44 8b 60 50 =A0 =A0 =A0 =A0 =A0 =A0 mov =A0= =A00x50(%rax),%r12d > > is it possible for the hardware to do any re-ordering here? > > The reason I'm suspicious is not just that the code doesn't have a > lock leak at the indicated point, but in one instance I can see in the > dump that the lock_list local from witness_warn is from the pcpu > structure for CPU 0 (and I was warned about sched lock 0), but the > thread id in panic_cpu is 2. =A0So clearly the thread was being migrated > right around panic time. > > This is the amd64 kernel on stable/7. =A0I'm not sure exactly what kind > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. > > So... do we need some kind of barrier in the code for sched_pin() for > it to really do what it claims? =A0Could the hardware have re-ordered > the "mov =A0 =A0%gs:0x48,%rax" PCPU_GET to before the sched_pin() > increment? So after some research, the answer I'm getting is "maybe". What I'm concerned about is whether the h/w reordered the read of PCPU_GET in front of the previous store to increment td_pinned. While not an ultimate authority, http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems implies that stores can be reordered after loads for both Intel and amd64 chips, which would I believe account for the behavior seen here. Thanks, matthew From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 09:44:18 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E61C01065675; Fri, 30 Jul 2010 09:44:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 7EC948FC20; Fri, 30 Jul 2010 09:44:17 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o6U9iDmO001589 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 30 Jul 2010 12:44:13 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o6U9iD0M029019; Fri, 30 Jul 2010 12:44:13 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o6U9iD0r029018; Fri, 30 Jul 2010 12:44:13 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 30 Jul 2010 12:44:13 +0300 From: Kostik Belousov To: mdf@freebsd.org Message-ID: <20100730094413.GJ22295@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="OdQvBiqfLsaeimeB" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org Subject: Re: sched_pin() versus PCPU_GET X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 09:44:19 -0000 --OdQvBiqfLsaeimeB Content-Type: text/plain; charset=koi8-r Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jul 29, 2010 at 04:57:25PM -0700, mdf@freebsd.org wrote: > On Thu, Jul 29, 2010 at 4:39 PM, wrote: > > We've seen a few instances at work where witness_warn() in ast() > > indicates the sched lock is still held, but the place it claims it was > > held by is in fact sometimes not possible to keep the lock, like: > > > > =9A =9A =9A =9Athread_lock(td); > > =9A =9A =9A =9Atd->td_flags &=3D ~TDF_SELECT; > > =9A =9A =9A =9Athread_unlock(td); > > > > What I was wondering is, even though the assembly I see in objdump -S > > for witness_warn has the increment of td_pinned before the PCPU_GET: > > > > ffffffff802db210: =9A =9A =9A 65 48 8b 1c 25 00 00 =9A =9Amov =9A =9A%g= s:0x0,%rbx > > ffffffff802db217: =9A =9A =9A 00 00 > > ffffffff802db219: =9A =9A =9A ff 83 04 01 00 00 =9A =9A =9A incl =9A 0x= 104(%rbx) > > =9A =9A =9A =9A * Pin the thread in order to avoid problems with thread= migration. > > =9A =9A =9A =9A * Once that all verifies are passed about spinlocks own= ership, > > =9A =9A =9A =9A * the thread is in a safe path and it can be unpinned. > > =9A =9A =9A =9A */ > > =9A =9A =9A =9Asched_pin(); > > =9A =9A =9A =9Alock_list =3D PCPU_GET(spinlocks); > > ffffffff802db21f: =9A =9A =9A 65 48 8b 04 25 48 00 =9A =9Amov =9A =9A%g= s:0x48,%rax > > ffffffff802db226: =9A =9A =9A 00 00 > > =9A =9A =9A =9Aif (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > > ffffffff802db228: =9A =9A =9A 48 85 c0 =9A =9A =9A =9A =9A =9A =9A =9At= est =9A %rax,%rax > > =9A =9A =9A =9A * Pin the thread in order to avoid problems with thread= migration. > > =9A =9A =9A =9A * Once that all verifies are passed about spinlocks own= ership, > > =9A =9A =9A =9A * the thread is in a safe path and it can be unpinned. > > =9A =9A =9A =9A */ > > =9A =9A =9A =9Asched_pin(); > > =9A =9A =9A =9Alock_list =3D PCPU_GET(spinlocks); > > ffffffff802db22b: =9A =9A =9A 48 89 85 f0 fe ff ff =9A =9Amov =9A =9A%r= ax,-0x110(%rbp) > > ffffffff802db232: =9A =9A =9A 48 89 85 f8 fe ff ff =9A =9Amov =9A =9A%r= ax,-0x108(%rbp) > > =9A =9A =9A =9Aif (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > > ffffffff802db239: =9A =9A =9A 0f 84 ff 00 00 00 =9A =9A =9A je =9A =9A = ffffffff802db33e > > > > ffffffff802db23f: =9A =9A =9A 44 8b 60 50 =9A =9A =9A =9A =9A =9A mov = =9A =9A0x50(%rax),%r12d > > > > is it possible for the hardware to do any re-ordering here? > > > > The reason I'm suspicious is not just that the code doesn't have a > > lock leak at the indicated point, but in one instance I can see in the > > dump that the lock_list local from witness_warn is from the pcpu > > structure for CPU 0 (and I was warned about sched lock 0), but the > > thread id in panic_cpu is 2. =9ASo clearly the thread was being migrated > > right around panic time. > > > > This is the amd64 kernel on stable/7. =9AI'm not sure exactly what kind > > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. > > > > So... do we need some kind of barrier in the code for sched_pin() for > > it to really do what it claims? =9ACould the hardware have re-ordered > > the "mov =9A =9A%gs:0x48,%rax" PCPU_GET to before the sched_pin() > > increment? >=20 > So after some research, the answer I'm getting is "maybe". What I'm > concerned about is whether the h/w reordered the read of PCPU_GET in > front of the previous store to increment td_pinned. While not an > ultimate authority, > http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_systems > implies that stores can be reordered after loads for both Intel and > amd64 chips, which would I believe account for the behavior seen here. >=20 Am I right that you suggest that in the sequence mov %gs:0x0,%rbx [1] incl 0x104(%rbx) [2] mov %gs:0x48,%rax [3] interrupt and preemption happen between points [2] and [3] ? And the %rax value after the thread was put back onto the (different) new CPU and executed [3] was still from the old cpu' pcpu area ? I do not believe this is possible. CPU is always self-consistent. Context switch from the thread can only occur on the return from interrupt handler, in critical_exit() or such. This code is executing on the same processor, and thus should already see the effect of [2], that would prevent context switch. If interrupt happens between [1] and [2], then context saving code should still see the consistent view of the register file state, regardless of the processor issuing speculative read of *%gs:0x48. Return from the interrupt is the serialization point due to iret, causing read in [3] to be reissued. --OdQvBiqfLsaeimeB Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkxSnuwACgkQC3+MBN1Mb4hAcwCgpwr8EgJm76cM3HJSlDyM9MaF 8UcAn2570On4CnWqPKpIDR70UoY+AVg9 =EFO7 -----END PGP SIGNATURE----- --OdQvBiqfLsaeimeB-- From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 13:44:01 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 723BE1065676 for ; Fri, 30 Jul 2010 13:44:01 +0000 (UTC) (envelope-from mdf356@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 293DB8FC13 for ; Fri, 30 Jul 2010 13:44:00 +0000 (UTC) Received: by gyg4 with SMTP id 4so774573gyg.13 for ; Fri, 30 Jul 2010 06:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=QAbQg1mb5WxeZymaeMAqZd+gWN0GbJv2/gXq1MqK65o=; b=Gy4VgkG1lW/041bGAE/bjFmi1QiopcKcGSDjO73OA1ZPRRr2IOzQ6c4WlddhcY+IRV X5Wzsmks5FceYKmzMHKsgi6wvhVXsBXxHAAjPLFVX/dEoF8WbDuwJ2mymvzB2tk/naJ5 vjeWidWNZusYWXF1kgQ7xMy9xGobtTRU9E13I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=HNtKjcFCzj0tB0m/6Q7I52/KA/vKT63Y41ZTZCYqLmPgi4JEyejYGZBAe+dQfPA77n cwh7k8OgefknLYkTpXtvj7ORWTxkiPbX9hqKKpSs9+qptylCbwwUHx1X5f9OE58iwaZC RDq4lbAyUDCPgz2aWjOnklA4YxfmFLKc0B87c= MIME-Version: 1.0 Received: by 10.151.63.18 with SMTP id q18mr3310765ybk.100.1280497440267; Fri, 30 Jul 2010 06:44:00 -0700 (PDT) Sender: mdf356@gmail.com Received: by 10.42.6.85 with HTTP; Fri, 30 Jul 2010 06:44:00 -0700 (PDT) In-Reply-To: <20100730094413.GJ22295@deviant.kiev.zoral.com.ua> References: <20100730094413.GJ22295@deviant.kiev.zoral.com.ua> Date: Fri, 30 Jul 2010 06:44:00 -0700 X-Google-Sender-Auth: VCKwH3Z8JSv6YA9bSy2UlTXLAfE Message-ID: From: mdf@FreeBSD.org To: Kostik Belousov Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: sched_pin() versus PCPU_GET X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 13:44:01 -0000 2010/7/30 Kostik Belousov : > On Thu, Jul 29, 2010 at 04:57:25PM -0700, mdf@freebsd.org wrote: >> On Thu, Jul 29, 2010 at 4:39 PM, =A0 wrote: >> > We've seen a few instances at work where witness_warn() in ast() >> > indicates the sched lock is still held, but the place it claims it was >> > held by is in fact sometimes not possible to keep the lock, like: >> > >> > =A0 =A0 =A0 =A0thread_lock(td); >> > =A0 =A0 =A0 =A0td->td_flags &=3D ~TDF_SELECT; >> > =A0 =A0 =A0 =A0thread_unlock(td); >> > >> > What I was wondering is, even though the assembly I see in objdump -S >> > for witness_warn has the increment of td_pinned before the PCPU_GET: >> > >> > ffffffff802db210: =A0 =A0 =A0 65 48 8b 1c 25 00 00 =A0 =A0mov =A0 =A0%= gs:0x0,%rbx >> > ffffffff802db217: =A0 =A0 =A0 00 00 >> > ffffffff802db219: =A0 =A0 =A0 ff 83 04 01 00 00 =A0 =A0 =A0 incl =A0 0= x104(%rbx) >> > =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with threa= d migration. >> > =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks ow= nership, >> > =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned. >> > =A0 =A0 =A0 =A0 */ >> > =A0 =A0 =A0 =A0sched_pin(); >> > =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks); >> > ffffffff802db21f: =A0 =A0 =A0 65 48 8b 04 25 48 00 =A0 =A0mov =A0 =A0%= gs:0x48,%rax >> > ffffffff802db226: =A0 =A0 =A0 00 00 >> > =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) = { >> > ffffffff802db228: =A0 =A0 =A0 48 85 c0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= test =A0 %rax,%rax >> > =A0 =A0 =A0 =A0 * Pin the thread in order to avoid problems with threa= d migration. >> > =A0 =A0 =A0 =A0 * Once that all verifies are passed about spinlocks ow= nership, >> > =A0 =A0 =A0 =A0 * the thread is in a safe path and it can be unpinned. >> > =A0 =A0 =A0 =A0 */ >> > =A0 =A0 =A0 =A0sched_pin(); >> > =A0 =A0 =A0 =A0lock_list =3D PCPU_GET(spinlocks); >> > ffffffff802db22b: =A0 =A0 =A0 48 89 85 f0 fe ff ff =A0 =A0mov =A0 =A0%= rax,-0x110(%rbp) >> > ffffffff802db232: =A0 =A0 =A0 48 89 85 f8 fe ff ff =A0 =A0mov =A0 =A0%= rax,-0x108(%rbp) >> > =A0 =A0 =A0 =A0if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) = { >> > ffffffff802db239: =A0 =A0 =A0 0f 84 ff 00 00 00 =A0 =A0 =A0 je =A0 =A0= ffffffff802db33e >> > >> > ffffffff802db23f: =A0 =A0 =A0 44 8b 60 50 =A0 =A0 =A0 =A0 =A0 =A0 mov = =A0 =A00x50(%rax),%r12d >> > >> > is it possible for the hardware to do any re-ordering here? >> > >> > The reason I'm suspicious is not just that the code doesn't have a >> > lock leak at the indicated point, but in one instance I can see in the >> > dump that the lock_list local from witness_warn is from the pcpu >> > structure for CPU 0 (and I was warned about sched lock 0), but the >> > thread id in panic_cpu is 2. =A0So clearly the thread was being migrat= ed >> > right around panic time. >> > >> > This is the amd64 kernel on stable/7. =A0I'm not sure exactly what kin= d >> > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. >> > >> > So... do we need some kind of barrier in the code for sched_pin() for >> > it to really do what it claims? =A0Could the hardware have re-ordered >> > the "mov =A0 =A0%gs:0x48,%rax" PCPU_GET to before the sched_pin() >> > increment? >> >> So after some research, the answer I'm getting is "maybe". =A0What I'm >> concerned about is whether the h/w reordered the read of PCPU_GET in >> front of the previous store to increment td_pinned. =A0While not an >> ultimate authority, >> http://en.wikipedia.org/wiki/Memory_ordering#In_SMP_microprocessor_syste= ms >> implies that stores can be reordered after loads for both Intel and >> amd64 chips, which would I believe account for the behavior seen here. > > Am I right that you suggest that in the sequence > =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x0,%rbx =A0 =A0 =A0[1] > =A0 =A0 =A0 =A0incl =A0 =A00x104(%rbx) =A0 =A0 =A0 [2] > =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x48,%rax =A0 =A0 [3] > interrupt and preemption happen between points [2] and [3] ? > And the %rax value after the thread was put back onto the (different) new > CPU and executed [3] was still from the old cpu' pcpu area ? Right, but I'm also asking if it's possible the hardware executed the instructions as: =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x0,%rbx =A0 =A0 =A0[1] =A0 =A0 =A0 =A0mov =A0 =A0 %gs:0x48,%rax =A0 =A0 [3] =A0 =A0 =A0 =A0incl =A0 =A00x104(%rbx) =A0 =A0 =A0 [2] On PowerPC this is definitely possible and I'd use an isync to prevent the re-ordering. I haven't been able to confirm that Intel/AMD present such a strict ordering that no barrier is needed. It's admittedly a very tight window, and we've only seen it twice, but I have no other way to explain the symptom. Unfortunately in the dump gdb shows both %rax and %gs as 0, so I can't confirm that they had a value I'd expect from another CPU. The only thing I do have is panic_cpu being different than the CPU at the time of PCPU_GET(spinlock), but of course there's definitely a window there. > I do not believe this is possible. CPU is always self-consistent. Context > switch from the thread can only occur on the return from interrupt > handler, in critical_exit() or such. This code is executing on the > same processor, and thus should already see the effect of [2], that > would prevent context switch. Right, but if the hardware allowed reads to pass writes, then %rax would have an incorrect value which would be saved at interrupt time, and restored on another processor. I can add a few sanity asserts to try to prove this one way or another and hope they don't mess with the timing; this has only shown up when testing with a hugely multi-threaded CIFS server. The only reason I'm hammering at OOO execution being the explanation is that it seems like the only way to explain the symptoms... unless I prefer to believe that PCPU_GET is completely busted, which seems less likely. Thanks, matthew From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 14:10:04 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 974D71065672; Fri, 30 Jul 2010 14:10:04 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4F9E48FC0C; Fri, 30 Jul 2010 14:10:04 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id BCBEA46B2C; Fri, 30 Jul 2010 10:10:03 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id BC5438A03C; Fri, 30 Jul 2010 10:10:02 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Fri, 30 Jul 2010 10:08:22 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201007301008.22501.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 30 Jul 2010 10:10:02 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: mdf@freebsd.org Subject: Re: sched_pin() versus PCPU_GET X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 14:10:04 -0000 On Thursday, July 29, 2010 7:39:02 pm mdf@freebsd.org wrote: > We've seen a few instances at work where witness_warn() in ast() > indicates the sched lock is still held, but the place it claims it was > held by is in fact sometimes not possible to keep the lock, like: >=20 > thread_lock(td); > td->td_flags &=3D ~TDF_SELECT; > thread_unlock(td); >=20 > What I was wondering is, even though the assembly I see in objdump -S > for witness_warn has the increment of td_pinned before the PCPU_GET: >=20 > ffffffff802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx > ffffffff802db217: 00 00 > ffffffff802db219: ff 83 04 01 00 00 incl 0x104(%rbx) > * Pin the thread in order to avoid problems with thread migration. > * Once that all verifies are passed about spinlocks ownership, > * the thread is in a safe path and it can be unpinned. > */ > sched_pin(); > lock_list =3D PCPU_GET(spinlocks); > ffffffff802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax > ffffffff802db226: 00 00 > if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > ffffffff802db228: 48 85 c0 test %rax,%rax > * Pin the thread in order to avoid problems with thread migration. > * Once that all verifies are passed about spinlocks ownership, > * the thread is in a safe path and it can be unpinned. > */ > sched_pin(); > lock_list =3D PCPU_GET(spinlocks); > ffffffff802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) > ffffffff802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) > if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > ffffffff802db239: 0f 84 ff 00 00 00 je ffffffff802db33e > > ffffffff802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d >=20 > is it possible for the hardware to do any re-ordering here? >=20 > The reason I'm suspicious is not just that the code doesn't have a > lock leak at the indicated point, but in one instance I can see in the > dump that the lock_list local from witness_warn is from the pcpu > structure for CPU 0 (and I was warned about sched lock 0), but the > thread id in panic_cpu is 2. So clearly the thread was being migrated > right around panic time. >=20 > This is the amd64 kernel on stable/7. I'm not sure exactly what kind > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. >=20 > So... do we need some kind of barrier in the code for sched_pin() for > it to really do what it claims? Could the hardware have re-ordered > the "mov %gs:0x48,%rax" PCPU_GET to before the sched_pin() > increment? Hmmm, I think it might be able to because they refer to different locations. Note this rule in section 8.2.2 of Volume 3A: =E2=80=A2 Reads may be reordered with older writes to different locations= but not with older writes to the same location. It is certainly true that sparc64 could reorder with RMO. I believe ia64=20 could reorder as well. Since sched_pin/unpin are frequently used to provid= e=20 this sort of synchronization, we could use memory barriers in pin/unpin like so: sched_pin() { td->td_pinned =3D atomic_load_acq_int(&td->td_pinned) + 1; } sched_unpin() { atomic_store_rel_int(&td->td_pinned, td->td_pinned - 1); } We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but t= hey=20 are slightly more heavyweight, though it would be more clear what is happen= ing=20 I think. =2D-=20 John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 14:33:52 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 90A301065740; Fri, 30 Jul 2010 14:33:52 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 620228FC18; Fri, 30 Jul 2010 14:33:52 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id ED26346B38; Fri, 30 Jul 2010 10:33:51 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 1981B8A03C; Fri, 30 Jul 2010 10:33:51 -0400 (EDT) From: John Baldwin To: freebsd-hackers@freebsd.org Date: Fri, 30 Jul 2010 10:31:34 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <201007301008.22501.jhb@freebsd.org> In-Reply-To: <201007301008.22501.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201007301031.34266.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 30 Jul 2010 10:33:51 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: mdf@freebsd.org Subject: Re: sched_pin() versus PCPU_GET X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 14:33:52 -0000 On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: > On Thursday, July 29, 2010 7:39:02 pm mdf@freebsd.org wrote: > > We've seen a few instances at work where witness_warn() in ast() > > indicates the sched lock is still held, but the place it claims it was > > held by is in fact sometimes not possible to keep the lock, like: > >=20 > > thread_lock(td); > > td->td_flags &=3D ~TDF_SELECT; > > thread_unlock(td); > >=20 > > What I was wondering is, even though the assembly I see in objdump -S > > for witness_warn has the increment of td_pinned before the PCPU_GET: > >=20 > > ffffffff802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx > > ffffffff802db217: 00 00 > > ffffffff802db219: ff 83 04 01 00 00 incl 0x104(%rbx) > > * Pin the thread in order to avoid problems with thread migration. > > * Once that all verifies are passed about spinlocks ownership, > > * the thread is in a safe path and it can be unpinned. > > */ > > sched_pin(); > > lock_list =3D PCPU_GET(spinlocks); > > ffffffff802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax > > ffffffff802db226: 00 00 > > if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > > ffffffff802db228: 48 85 c0 test %rax,%rax > > * Pin the thread in order to avoid problems with thread migration. > > * Once that all verifies are passed about spinlocks ownership, > > * the thread is in a safe path and it can be unpinned. > > */ > > sched_pin(); > > lock_list =3D PCPU_GET(spinlocks); > > ffffffff802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) > > ffffffff802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) > > if (lock_list !=3D NULL && lock_list->ll_count !=3D 0) { > > ffffffff802db239: 0f 84 ff 00 00 00 je ffffffff802db33e > > > > ffffffff802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d > >=20 > > is it possible for the hardware to do any re-ordering here? > >=20 > > The reason I'm suspicious is not just that the code doesn't have a > > lock leak at the indicated point, but in one instance I can see in the > > dump that the lock_list local from witness_warn is from the pcpu > > structure for CPU 0 (and I was warned about sched lock 0), but the > > thread id in panic_cpu is 2. So clearly the thread was being migrated > > right around panic time. > >=20 > > This is the amd64 kernel on stable/7. I'm not sure exactly what kind > > of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. > >=20 > > So... do we need some kind of barrier in the code for sched_pin() for > > it to really do what it claims? Could the hardware have re-ordered > > the "mov %gs:0x48,%rax" PCPU_GET to before the sched_pin() > > increment? >=20 > Hmmm, I think it might be able to because they refer to different locatio= ns. >=20 > Note this rule in section 8.2.2 of Volume 3A: >=20 > =E2=80=A2 Reads may be reordered with older writes to different locatio= ns but not > with older writes to the same location. >=20 > It is certainly true that sparc64 could reorder with RMO. I believe ia64= =20 > could reorder as well. Since sched_pin/unpin are frequently used to prov= ide=20 > this sort of synchronization, we could use memory barriers in pin/unpin > like so: >=20 > sched_pin() > { > td->td_pinned =3D atomic_load_acq_int(&td->td_pinned) + 1; > } >=20 > sched_unpin() > { > atomic_store_rel_int(&td->td_pinned, td->td_pinned - 1); > } >=20 > We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but= they=20 > are slightly more heavyweight, though it would be more clear what is happ= ening=20 > I think. However, to actually get a race you'd have to have an interrupt fire and migrate you so that the speculative read was from the other CPU. However, I don't think the speculative read would be preserved in that case. The CPU has to return to a specific PC when it returns from the interrupt and it has no way of storing the state for what speculative reordering it might be doing, so presumably it is thrown away? I suppose it is possible that it actually retires both instructions (but reordered) and then returns to the = PC value after the read of listlocks after the interrupt. However, in that ca= se the scheduler would not migrate as it would see td_pinned !=3D 0. To get t= he race you have to have the interrupt take effect prior to modifying td_pinne= d, so I think the processor would have to discard the reordered read of listlocks so it could safely resume execution at the 'incl' instruction. The other nit there on x86 at least is that the incl instruction is doing both a read and a write and another rule in the section 8.2.2 is this: =E2=80=A2 Reads are not reordered with other reads. That would seem to prevent the read of listlocks from passing the read of td_pinned in the incl instruction on x86. =2D-=20 John Baldwin From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 15:00:26 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC8D6106564A for ; Fri, 30 Jul 2010 15:00:26 +0000 (UTC) (envelope-from rank1seeker@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 86CCD8FC22 for ; Fri, 30 Jul 2010 15:00:26 +0000 (UTC) Received: by wwc33 with SMTP id 33so1423720wwc.31 for ; Fri, 30 Jul 2010 08:00:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=rkfvYqgwLmAiQ7cGdZZlemSdDrTdBemOqWhoRD2Pido=; b=ImMxtW8J+9+6IHLiurEkxMFr2KIwpaPTby4A+pe0NCECuSD1k+JZVu1uRdnMbT+wRp Ay58ULqevM9IhU9j2hEPsSkizic0h1uepunj3m0Ed0biJMjo5t625Ad3y53WjO/tsTHo w4q7lXiFi4S1J3+8Os6y3g/L1IQ5Z0gxz5lJM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=iGYneZboAw6gKnbncEk+WUq47hYJ3+o24KXZ3uVBl/F+hjodjDnHiyEER7EXmrgvLs LnsE2e5yVkJGN1hagfjeZJrAdlzLTaJoUfQF0e1uYE3KCh67PvW4pWlVQW8p+FEXVqp1 rdD3AmWB4GMu7Tn73l+WcjMqenMoAem3j/c/w= MIME-Version: 1.0 Received: by 10.216.90.3 with SMTP id d3mr1716348wef.99.1280500170993; Fri, 30 Jul 2010 07:29:30 -0700 (PDT) Received: by 10.216.181.13 with HTTP; Fri, 30 Jul 2010 07:29:30 -0700 (PDT) Date: Fri, 30 Jul 2010 16:29:30 +0200 Message-ID: From: "Domagoj S." To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: ls, mount point aware X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 15:00:27 -0000 As I can see, more and more base apps, are aware of mount points. I.e; In 8.1, chgrp(1), chown(8) and cp(1) now have an -x flag. And what about human users? 'ls' command, should in it's long list of directories, show something like: Hey, this directory, is also a mount point. One letter flag? From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 16:41:31 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E56F1065675; Fri, 30 Jul 2010 16:41:31 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id ACAA78FC23; Fri, 30 Jul 2010 16:41:30 +0000 (UTC) Received: by yxe42 with SMTP id 42so863530yxe.13 for ; Fri, 30 Jul 2010 09:41:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=g5889z5QNz8+5jVBzBgrGBGu2dUAHQAXbw3EtWvYYOs=; b=SuIYxp/hGWcIkXj/y+PgaBM5nQHLeKky6lBbxmRlPd/OH+Xv+vk4bPvWKenAJgVsXK 5BNOfLeN/kMaxJbfQjxkIRTkX6rfL2oNIJ/kRRWUJX9Wce7uIMeKz4qvZ14Dx0UM8cIb TMuA6H/RDfiAHtJNIEh+zYpH1VoU1WUjL+Bo8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Z98yd0BNkpgzKimBDixv1Va2Y7O6wPXU4rWHXr+yUcC17SntWkWw+XJ332/VFvIdRL Dp+sy52qWS/n2EPonpZ+FhH28eDCQOZR/DG96skDNl1OdHS22TliigUmI++k5rVfIro4 nII0yn6ImVpLA8QMAlLpGus7grORLVwMOC5OM= MIME-Version: 1.0 Received: by 10.150.11.12 with SMTP id 12mr3566707ybk.309.1280508089855; Fri, 30 Jul 2010 09:41:29 -0700 (PDT) Received: by 10.231.28.130 with HTTP; Fri, 30 Jul 2010 09:41:29 -0700 (PDT) In-Reply-To: References: Date: Fri, 30 Jul 2010 11:41:29 -0500 Message-ID: From: "Sam Fourman Jr." To: krad Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-hackers@freebsd.org, FreeBSD Questions Subject: Re: possible NFS lockups X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 16:41:31 -0000 On Tue, Jul 27, 2010 at 10:29 AM, krad wrote: > I have a production mail system with an nfs backend. Every now and again we > see the nfs die on a particular head end. However it doesn't die across all > the nodes. This suggests to me there isnt an issue with the filer itself and > the stats from the filer concur with that. > > The symptoms are lines like this appearing in dmesg > > nfs server 10.44.17.138:/vol/vol1/mail: not responding > nfs server 10.44.17.138:/vol/vol1/mail: is alive again > > trussing df it seems to hang on getfsstat, this is presumably when it tries > the nfs mounts > I also have this problem, where nfs locks up on a FreeBSD 9 server and a FreeBSD RELENG_8 client -- Sam Fourman Jr. Fourman Networks http://www.fourmannetworks.com From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 30 17:41:40 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2795F106568A for ; Fri, 30 Jul 2010 17:41:40 +0000 (UTC) (envelope-from fabiokaminski@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id CEE698FC1F for ; Fri, 30 Jul 2010 17:41:39 +0000 (UTC) Received: by gwj23 with SMTP id 23so900694gwj.13 for ; Fri, 30 Jul 2010 10:41:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=w6F5fb9I2Yyazc5w7EH0ymhlc3UzzSd3Z9W3RmIViQQ=; b=fVStMXhgG/iQ/pZY52A4wQeTvP74q9XTsVCsJ+j++232EGEFmg5sK33xkHlcSwkkR0 IjYoDRo22UJk26hF0n6i60I8xqmd6KTzos3s1tYXa10D04mdQk+jVr0H0wPyGENLSqQB EuLhe0nlx4eNOi18MMmCXNX4vhzuIAyZqnHS0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=BTuvpAUkGdjvCZXC3yj83zcavudb6csw9OZ1tcxM71GkcISgPv4q1L9ljQR+4NSSSs zEw4j3QkwWZH4nlfzlzvBIsl6TZZrcDrpwz82x773xlySvv6erWOKV71dN/3A4w7pXT7 5q2IU5jfLgFsfPdekK2RS6vA3fXM6HXZMj2+g= MIME-Version: 1.0 Received: by 10.90.119.18 with SMTP id r18mr2251414agc.92.1280510167727; Fri, 30 Jul 2010 10:16:07 -0700 (PDT) Received: by 10.231.207.15 with HTTP; Fri, 30 Jul 2010 10:16:07 -0700 (PDT) Date: Fri, 30 Jul 2010 14:16:07 -0300 Message-ID: From: Fabio Kaminski To: hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: freebsd exokernel X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jul 2010 17:41:40 -0000 Hi folks, i know its a kind of off topic, but i think this is the perfect list for this.. anyone here think a little bit like me, and like the exokernel idea? the primary idea is to leverage only things like schedulling, and drivers to kernel ring .. and downgrade things like VFS and MM to userland rings as library.. so an aplication could optionally use those as libs abstracting things generically.. (like a bicicle with wheels)... and when you really need or want.. you can go into the bare metal and create your own application abstraction.. imagine what it could represent in performance since the layers get optimized and are not on top of other layers... and without the context switch between user an kernel ring?? for a applicartion virtual machine like java(with its own schedulling, mm and fs layers), or a database (fs and memory layers) or a virtualization software.. if we write a database for instance and want to outperform disks, the actual scenario is: or you invade the kernel of the OS and implement your abstraction(you has to know all the sou rce of it) and part you code in the userland :s , or you dont mess with the kernel at all (its too impratical) and keep in the userland.. and everybody has to be ruled by only one homogenic way to "see" things.. your application may have luck.. this kernel abstraction its good for you.. but you may has not.. and even if you can see the gold.. you cant advance any further.. the mit guys create one based on 98 (i think) openbsd, and they created a web server that (now optional) tcp protocol where persisted on disk, so its protocol agnostic, and can change its communication wall in runtime.. sometimes im looking to where evething is going in technology, and we are kind of stepping back.. putting more layers on top of others layers... and slowing everything.. instead of getting it faster as it can.. i would like to share experience and what you think about this.. would it be a feasible project to borrow things from freebsd, and start a project like this? anyone like this idea ?? anyway, just some thoughts for now.. From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 31 12:07:15 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C7091106564A for ; Sat, 31 Jul 2010 12:07:15 +0000 (UTC) (envelope-from jhs@berklix.com) Received: from tower.berklix.org (tower.berklix.org [83.236.223.114]) by mx1.freebsd.org (Postfix) with ESMTP id 535E58FC13 for ; Sat, 31 Jul 2010 12:07:14 +0000 (UTC) Received: from park.js.berklix.net (p549A7B73.dip.t-dialin.net [84.154.123.115]) (authenticated bits=0) by tower.berklix.org (8.14.2/8.14.2) with ESMTP id o6VC7Buw046647; Sat, 31 Jul 2010 12:07:13 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by park.js.berklix.net (8.13.8/8.13.8) with ESMTP id o6VC71sW065852; Sat, 31 Jul 2010 14:07:01 +0200 (CEST) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost [127.0.0.1]) by fire.js.berklix.net (8.14.3/8.14.3) with ESMTP id o6VC6rdn023424; Sat, 31 Jul 2010 14:06:58 +0200 (CEST) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201007311206.o6VC6rdn023424@fire.js.berklix.net> To: Fabio Kaminski From: "Julian H. Stacey" Organization: http://www.berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://www.berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Fri, 30 Jul 2010 14:16:07 -0300." Date: Sat, 31 Jul 2010 14:06:53 +0200 Sender: jhs@berklix.com Cc: hackers@freebsd.org Subject: Re: freebsd exokernel X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 12:07:15 -0000 > would it be a feasible project to borrow things from freebsd, and start a > project like this? anyone like this idea ?? The code is free to use :-) > anyway, just some thoughts for now.. See also eg Mach. http://en.wikipedia.org/wiki/Mach http://en.wikipedia.org/wiki/Mach_%28kernel%29 Cheers, Julian -- Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com Mail plain text. Not HTML, Not quoted-printable, Not Base64. From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 31 12:48:40 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8CB61065673 for ; Sat, 31 Jul 2010 12:48:40 +0000 (UTC) (envelope-from dr.clau@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 343998FC1A for ; Sat, 31 Jul 2010 12:48:39 +0000 (UTC) Received: by fxm13 with SMTP id 13so1342200fxm.13 for ; Sat, 31 Jul 2010 05:48:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:cc:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=AicojR9GZY5luGwPmOikwm7xttxPz36iIRAJ9Qxz8UM=; b=UrOaw25OKpL966o+xq3iQfQlST4T8r0HKJ9AncGGAtlJUfV+kwW63avo+0X46t5K2Z +DN3crFnEOFeuEpycdkbSSBILUV/6bSNyD/dNj6+ZYMzSA+vjfthNUJsUYagOBtH0+EC j5sPGgRO2/R4BpplgTGfhDQcWtsGO5Xu6ewBk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=aRpZ0E/lpeml7Pae4Xj2Q4D1D2aoLMQ2zsrxKCyPT7pgiekJ/Zk5+SD0VfGDl2htPf QArY7OWSRdmRz3WeUc57v/ehUY3L4qecDhvymq6wjpubAtX7mtAh2hwnanibjEH4/bZj D5cPcvxacx/9hQeNGJXVQMZb+dbQmgBL4RuKs= Received: by 10.223.112.10 with SMTP id u10mr3370312fap.50.1280578889058; Sat, 31 Jul 2010 05:21:29 -0700 (PDT) Received: from [127.0.0.103] ([89.47.225.20]) by mx.google.com with ESMTPS id r27sm1197488faa.0.2010.07.31.05.21.26 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 31 Jul 2010 05:21:27 -0700 (PDT) Message-ID: <4C54154A.9040306@gmail.com> Date: Sat, 31 Jul 2010 15:21:30 +0300 From: CDP User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.11) Gecko/20100722 Thunderbird/3.0.6 MIME-Version: 1.0 To: "Julian H. Stacey" References: <201007311206.o6VC6rdn023424@fire.js.berklix.net> In-Reply-To: <201007311206.o6VC6rdn023424@fire.js.berklix.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Fabio Kaminski , hackers@freebsd.org Subject: Re: freebsd exokernel X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 12:48:40 -0000 On 07/31/10 15:06, Julian H. Stacey wrote: >> would it be a feasible project to borrow things from freebsd, and start a >> project like this? anyone like this idea ?? > > The code is free to use :-) > >> anyway, just some thoughts for now.. > > See also eg Mach. > http://en.wikipedia.org/wiki/Mach > http://en.wikipedia.org/wiki/Mach_%28kernel%29 Add this to the list (have a look at the external links too): http://en.wikipedia.org/wiki/L4_microkernel_family You might also want to look at this: http://os.inf.tu-dresden.de/L4/LinuxOnL4/overview.shtml Regards, Claudiu. From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 31 18:32:44 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA54D1065674 for ; Sat, 31 Jul 2010 18:32:44 +0000 (UTC) (envelope-from fabiokaminski@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 996258FC14 for ; Sat, 31 Jul 2010 18:32:44 +0000 (UTC) Received: by iwn35 with SMTP id 35so3278092iwn.13 for ; Sat, 31 Jul 2010 11:32:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=MinJBvy6g6YN3JY6JI1tQ2Ianso9weBxOHBKZwAmiYs=; b=JuiVkj1LiMOnmaKmcflweJRgvkdwGsh5/CgL7Jdvwc5eAcG64gSzYMiHDfhUoiKmSL TXiZuGyDykJJPIacHY70wwQ/ggZTR83EC8I1UpvL1nZrun7G2pwPtPGMBBZmNtu9X2Bw kg2neLfzEHQovMOPnAIUTRRONyCuJ6j/uEJJk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=ixoyFaPNLR0JQOPNy4MvjmCqSLHsHuZzeG+w5/wcMC4TbZdg2E/wx9wpUVuQkmKqBy RckF99hZI/9O12PBa/iBKXDfypnM1YDY4K5PuW9v4dmenUKoTYGRA3nf7ZxMwU470Qjl OQUbNLv9F1Ye8wl2fUJoK7uaaATxBQSOrGkAM= MIME-Version: 1.0 Received: by 10.231.193.135 with SMTP id du7mr3703714ibb.176.1280601163772; Sat, 31 Jul 2010 11:32:43 -0700 (PDT) Received: by 10.231.207.15 with HTTP; Sat, 31 Jul 2010 11:32:43 -0700 (PDT) In-Reply-To: <4C54154A.9040306@gmail.com> References: <201007311206.o6VC6rdn023424@fire.js.berklix.net> <4C54154A.9040306@gmail.com> Date: Sat, 31 Jul 2010 15:32:43 -0300 Message-ID: From: Fabio Kaminski To: CDP Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "Julian H. Stacey" , hackers@freebsd.org Subject: Re: freebsd exokernel X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 18:32:44 -0000 yes , i have snifed the mach.. but i dont like the message passing idea.. its from the microkernel species and theres even a nouveau reincarnation called barrelfish http://www.barrelfish.org .. wich is a sort of microkernel but running one kernel core nucleus for each core and message passing each other.. (this is very promissing for virtualization.. but monolitic still be the fastest) its more like this L4 kernel.. good link indeed.... but with security included.. in fact the original mit exokernel its more like a resource policy system... http://en.wikipedia.org/wiki/Exokernel and i think they solve the problem that L4 has, that you are left alone.. and the applications are obligated to implement thought parts by themselfs.. putting the abstractions in the userland as libraries.. so if you want user ZFS ,Bsd VMM, Btrfs or create your own abstraction or mix some, its just link with the proper .so file.. without needing to create a half kernel/half app application.. thanks for the links On Sat, Jul 31, 2010 at 9:21 AM, CDP wrote: > On 07/31/10 15:06, Julian H. Stacey wrote: > >> would it be a feasible project to borrow things from freebsd, and start a >>> project like this? anyone like this idea ?? >>> >> >> The code is free to use :-) >> >> anyway, just some thoughts for now.. >>> >> >> See also eg Mach. >> http://en.wikipedia.org/wiki/Mach >> http://en.wikipedia.org/wiki/Mach_%28kernel%29 >> > > Add this to the list (have a look at the external links too): > http://en.wikipedia.org/wiki/L4_microkernel_family > > You might also want to look at this: > http://os.inf.tu-dresden.de/L4/LinuxOnL4/overview.shtml > > Regards, > Claudiu. > From owner-freebsd-hackers@FreeBSD.ORG Sat Jul 31 21:21:53 2010 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7664C1065673 for ; Sat, 31 Jul 2010 21:21:53 +0000 (UTC) (envelope-from mashtizadeh@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3AB998FC15 for ; Sat, 31 Jul 2010 21:21:52 +0000 (UTC) Received: by iwn35 with SMTP id 35so3414270iwn.13 for ; Sat, 31 Jul 2010 14:21:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=PeQU3hWstFS/5yyb3zr+OfYjTSqBFv20hwS0CP+NzqQ=; b=VxzmbHRd4k3Iu7f8n62Nzu5TtqnfZpS+BRpFfp/tAnqgClolCZSZaNlp7gsiDDydGA /ZroIQSZuINo64ClI4kBKbvy4upfW4ztWWdco4RPiFfXoEo0MGIzPbKy+e2t1KfCCgLf wost9UrNFk/EfmEUtk/uNfafxQr0tLa2UGym4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=VXo+foLpEGC8y1Yexqw6czaJQWTwlkVq/sCBrQlUS7Ik2Ej4+QT729L2mR7A0HcBlr UI34fKEoEgcYWz7gsv8J5gPxvYINZdOaUDgxoASvkUI1dsiCEf0PRxWTgU1A0diw5tt1 ujWZI23nZ0KST2T/SiBJK4AxHi/Mei5QT8N74= MIME-Version: 1.0 Received: by 10.231.174.84 with SMTP id s20mr4303990ibz.94.1280609419350; Sat, 31 Jul 2010 13:50:19 -0700 (PDT) Received: by 10.231.205.201 with HTTP; Sat, 31 Jul 2010 13:50:19 -0700 (PDT) In-Reply-To: References: <201007311206.o6VC6rdn023424@fire.js.berklix.net> <4C54154A.9040306@gmail.com> Date: Sat, 31 Jul 2010 13:50:19 -0700 Message-ID: From: Ali Mashtizadeh To: Fabio Kaminski Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "Julian H. Stacey" , hackers@freebsd.org Subject: Re: freebsd exokernel X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2010 21:21:53 -0000 Hi Fabio, Exokernels are great operating systems for prototyping or learning. You obviously incur a lot more performance hits when you implement such an architecture. I haven't looked into the details of DragonflyBSD too much but they have enough infrastructure to run a userlevel kernel that is sort of paravirtualized. From what I've read it seems it has enough infrastructure for you to use the platform as an exokernel without too much modification. Might be a good starting point for you. In addition to the original exokernel work from MIT you might want to check out corey which has some interesting work on multicore scalability. http://pdos.csail.mit.edu/corey/ Thanks, ~ Ali On Sat, Jul 31, 2010 at 11:32 AM, Fabio Kaminski wrote: > yes , i have snifed the mach.. but i dont like the message passing idea.. > its from the microkernel species > and theres even a nouveau reincarnation called barrelfish > http://www.barrelfish.org .. wich is a sort of microkernel but running on= e > kernel core nucleus for each core and message passing each other.. (this = is > very promissing for virtualization.. but monolitic still be the fastest) > > its more like this L4 kernel.. good link indeed.... but with security > included.. in fact the original mit exokernel its more like a resource > policy system... http://en.wikipedia.org/wiki/Exokernel > > and i think they solve the problem that L4 has, that you are left alone.. > and the applications are obligated to implement =C2=A0thought parts by > themselfs.. putting the abstractions in the userland as libraries.. so if > you want user ZFS ,Bsd VMM, Btrfs or create your own abstraction or mix > some, its just link with the proper .so file.. without needing to create = a > half kernel/half app application.. > > thanks for the links > > On Sat, Jul 31, 2010 at 9:21 AM, CDP wrote: > >> On 07/31/10 15:06, Julian H. Stacey wrote: >> >>> would it be a feasible project to borrow things from freebsd, and start= a >>>> project like this? anyone like this idea ?? >>>> >>> >>> The code is free to use :-) >>> >>> =C2=A0anyway, just some thoughts for now.. >>>> >>> >>> See also eg Mach. >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0http://en.wikipedia.org/wiki/Mach >>> =C2=A0 =C2=A0 =C2=A0 =C2=A0http://en.wikipedia.org/wiki/Mach_%28kernel%= 29 >>> >> >> Add this to the list (have a look at the external links too): >> http://en.wikipedia.org/wiki/L4_microkernel_family >> >> You might also want to look at this: >> http://os.inf.tu-dresden.de/L4/LinuxOnL4/overview.shtml >> >> Regards, >> =C2=A0 =C2=A0 =C2=A0 =C2=A0Claudiu. >> > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " > --=20 Ali Mashtizadeh =D8=B9=D9=84=DB=8C =D9=85=D8=B4=D8=AA=DB=8C =D8=B2=D8=A7=D8=AF=D9=87