From owner-freebsd-hackers@freebsd.org Sun Oct 14 06:14:42 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1B6D710C6E8B for ; Sun, 14 Oct 2018 06:14:42 +0000 (UTC) (envelope-from srs0=ssn3=m2=sigsegv.be=kristof@codepro.be) Received: from venus.codepro.be (venus.codepro.be [IPv6:2a01:4f8:162:1127::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.codepro.be", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A234E7B10D; Sun, 14 Oct 2018 06:14:41 +0000 (UTC) (envelope-from srs0=ssn3=m2=sigsegv.be=kristof@codepro.be) Received: from [192.168.228.1] (unknown [12.203.80.130]) (Authenticated sender: kp) by venus.codepro.be (Postfix) with ESMTPSA id 603A534253; Sun, 14 Oct 2018 08:14:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sigsegv.be; s=mail; t=1539497679; bh=45aDuJ7ynYYkWhFlgasdK/KwjpVmfk8jku5nA/CzVlM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=JRyfHenKgARpre92K4pTzxzWsSxdU4wA9uqkCtp3mdKRzO3svykKEGCT0A6DWzeXa ZMCngE5hMokOPGsAMjcC1IqyeR4Efw/DGK+OkVk+3wxtEhPQ78occELfYa2LmDG9fD lQ5nxOSKA5OqHHNXVT1LSEdWfm4L9ab5W74XScn8= From: "Kristof Provost" To: "Lev Serebryakov" Cc: freebsd-hackers@FreeBSD.org Subject: Re: What are ck_queue.h guarantees? Date: Sat, 13 Oct 2018 23:14:31 -0700 X-Mailer: MailMate (2.0BETAr6123) Message-ID: In-Reply-To: <1551957390.20181013184006@serebryakov.spb.ru> References: <1551957390.20181013184006@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2018 06:14:42 -0000 On 13 Oct 2018, at 8:40, Lev Serebryakov wrote: > Hello Freebsd-hackers, > > Concurrency Kit documentation says: > > ==== > ck_queue is a queue.h-compatible implementation of many-reader-single- > writer queues. It allows for safe concurrent iteration, peeking and > read- > side access in the presence of a single concurrent writer without any > usage of locks. > ==== > > But in all places at kernel I peeked, CK_XXXX macros are protected by > locks. Yes, even read ones. > Note that the implementation of if_maddr_rlock() doesn’t actually take a lock. Instead it calls epoch_enter_preempt(). > Why is it so? Why do we bother to use CK_XXX API (which adds all > needed > barriers and uses CASes) if all accesses are protected by locks, > anyway? > ck_queues are safe to use, even when elements are being added or removed. Missing new elements is usually fine, but what happens if an element we’re looking at right now is being removed by a different thread? We might still be using it when the removing thread frees it. That’s what the epoch code protects against. It allows the removing thread to know when no other thread is using the removed item any more (and thus when it’s safe to actually delete it). Hence the ‘lock’ and ‘unlock’ calls. They don’t actually take a lock, and there’s no contention. Many threads can enter the section between lock and unlock at the same time. I suspect the ‘lock’/‘unlock’ naming is mostly historical here: i.e. it used to be a real lock, and when it was replaced by the epoch-based approach the functions were not renamed. Best regards, Kristof From owner-freebsd-hackers@freebsd.org Sun Oct 14 09:58:22 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C12F10CD04D for ; Sun, 14 Oct 2018 09:58:22 +0000 (UTC) (envelope-from munro@penski.net) Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1B11D819A4 for ; Sun, 14 Oct 2018 09:58:21 +0000 (UTC) (envelope-from munro@penski.net) Received: by mail-ed1-x531.google.com with SMTP id b7-v6so15135213edd.9 for ; Sun, 14 Oct 2018 02:58:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ip9-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vRjmFTiMhk7IeHcdK2cP4kwouWCZZHTG8YoEM0c+bAc=; b=TyegoWZVD9n7y3axDYsCpc3Nti5HfFPAmfwODYeRfCxBVcDWAr1pVi8lLinPnIu0mu DmGtDYtzl85i3qF1W7NKCuIDaeVfg7tMcRihv+QSQEtyVNHw83uOEGbCSkXObYfSTZZ7 IJOGJ4+QuAqpG9Ot0/BtYUTT4qnKyqKNzfdorn0Vp8J4ZN8YBa4xLyui1QBqtHcy63bm +XDLz1G8ubpFGYu3O7/+RRnmymTFi7+SBjDGcWWkRfBtWIdbnOvEX5fGU/krc7mWAKi0 8P/8vynyfpQBvitCcJzV1u159+lyo/HqvKkk8gGsQwibrwTymdUynPGI3ZLf1lkbhd6H 7J8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vRjmFTiMhk7IeHcdK2cP4kwouWCZZHTG8YoEM0c+bAc=; b=nMisesEMxDqsXC1YZqjVkDWCDAnv8XkhMc6Xy8pR5IIf0Or0T0Nx+EGZwNX9RBuIuI teFYb+JBS3FLoXcMztMsiGqEapaKjxaEukuZK/4reGQZN6hbm3xPqiC+Er1wNslsM0zt rq158Y0Fq4hkz0H9fbu43HcvBYdqyAUQSz68DoOq2DMJdn0uQs35rkCtVu1kdMOYPHlE GV7AWK/94s/tQEPbfSolwbglvK72KGmV05JhZccjpKpmKVAV8cQCoQpSMfFk++RYhB4t yL+PhfLL4N4TbC8wbTsHhFOQ5Iy5NhUqEwW4egQKuvkjPkJizmDsmtzMLkt+BW5yIEvi 7Xmw== X-Gm-Message-State: ABuFfojQE7DYTsMaNBlGPVAVCU3seSTUpS14UmhZsLxbhKaE6sEP21zw xdt1e82QEJ2SH7FK02qaWZVXc+0iM9Llz6mW2jfbDw== X-Google-Smtp-Source: ACcGV60SksBgwkqDrXf57IQxQOq/hI5rpc84kBDoWf5G8vtDw+HudYayOsGnatdkihKFgJskeDH3P3tnieZW5+xW6ds= X-Received: by 2002:a17:906:6983:: with SMTP id i3-v6mr14813721ejr.141.1539511100565; Sun, 14 Oct 2018 02:58:20 -0700 (PDT) MIME-Version: 1.0 References: <20181011001954.GV5335@kib.kiev.ua> <20181013235021.GX5335@kib.kiev.ua> In-Reply-To: <20181013235021.GX5335@kib.kiev.ua> From: Thomas Munro Date: Sun, 14 Oct 2018 22:58:08 +1300 Message-ID: Subject: Re: PostgresSQL vs super pages To: Konstantin Belousov Cc: freebsd-hackers@freebsd.org, alc@freebsd.org, mjg@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2018 09:58:22 -0000 On Sun, 14 Oct 2018 at 12:50, Konstantin Belousov wrote: > On Thu, Oct 11, 2018 at 02:01:20PM +1300, Thomas Munro wrote: > > On Thu, 11 Oct 2018 at 13:20, Konstantin Belousov wrote: > > > On Thu, Oct 11, 2018 at 12:59:41PM +1300, Thomas Munro wrote: > > > > shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e) > > > > ftruncate(46,0x400000) = 0 (0x0) > > > Try to write zeroes instead of truncating. > > > This should activate the fast path in the fault handler, and if the > > > pages allocated for backing store of the shm object were from reservation, > > > you should get superpage mapping on the first fault without promotion. > > > > If you just write() to a newly shm_open()'d fd you get a return code > > of 0 so I assume that doesn't work. If you ftruncate() to the desired > > size first, then loop writing 8192 bytes of zeroes at a time, it > > works. But still no super pages. I tried also with a write buffer of > > 2MB of zeroes, but still no super pages. I tried abandoning > > shm_open() and instead using a mapped file, and still no super pages. > > I did not quite scientific experiment, but you would need to try to find > the differences between what I did and what you observe. Below is the > naive test program that directly implements my suggestion, and the > output from the procstat -v for it after all things were set up. > ... > 98579 0x800e00000 0x801200000 rw- 1024 1030 3 0 --S- df Huh. Your program doesn't result in an S mapping on my laptop, but I tried on an EC2 t2.2xlarge machine and there it promotes to S, even if I comment out the write() loop (the loop that assigned to every byte is enough). The difference might be the amount of memory on the system: on my 4GB laptop, it is very reluctant to use super pages (but I have seen it do it, so I know it can). On a 32GB system, it does it immediately, and it works nicely for PostgreSQL too. So perhaps my problem is testing on a small RAM system, though I don't understand why.