From owner-freebsd-hackers@freebsd.org Sun Oct 14 22:33:59 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DFA5A10C4CE8 for ; Sun, 14 Oct 2018 22:33:58 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CEB77812F; Sun, 14 Oct 2018 22:33:58 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-pf1-x441.google.com with SMTP id l17-v6so8687253pff.2; Sun, 14 Oct 2018 15:33:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=wJ/Eu9M8pL1T+BEmcfAbfudEWLX5p3XhimN/kHJ0ZJc=; b=UBMA+pNCJKKpzrpYtVosy9a8jxZrZcV1J8KZsyctuoudYMyY/+PVA4vt7uIyrwchQ+ tf0YPdVVsIEHo1yWs9NesaqVlGpplwpE6foATryQGwPApdfQeunskO0CZbRLUN7Achpz 9gI7lAPaGc1U0OWjnwTHPEn5HoTHkGVo/GctrJo4briGCngtG4SRnbICyL8I2v1IoR+x 1U7jRnL7lIPrO0Q99rG6AJKPv+3B5jwSOY1dMqjWvZ8gAEu+tS/xkMGnO4Xwe8Ae+zEl jznSozv8BraKE803K67TIIgYEyai7GbdznTfqS4JBgh7ZfO+MVP0crEiJ90xnggPUoYT 4JZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=wJ/Eu9M8pL1T+BEmcfAbfudEWLX5p3XhimN/kHJ0ZJc=; b=mQhuCJVnGAOlkySwomIQCfkylEax4g0EDBp/yNBM433jq2iDpQOcY2fnU3Zqv+uCKV S1fO2G4cjJgfyr1m7olZDLU6pHvQ3fboRE8AOJPDw3pPVAvNBnlIGdkE+YKJWWSoRloD 74BFBJYOsO9zP7A3xW1RiXR/bOx0ZSKr75MC5hQ2FiCJL2g9r7Tt2bfR0TSKup8ei+eI S76rDHs90YgNtd30ad2L8F9uwWfLshsKdRhnCo83WHMPcDhlUf9Xna0060SmR1m1jnLt fnEBuFchxQR7z5JTYuiBz+rJYYFLr2HRFwMwyz/ahve1DPYq2GXI4wC0nZ+r651d3wuQ Xo4g== X-Gm-Message-State: ABuFfohWopy1q9yR4B0o0nzwNJ5+P71RcBPu5PRrz7lhvv8/IYSaFXZn 1wZAy52p/RlP0i+XLZCMkVL7y+LR X-Google-Smtp-Source: ACcGV60kZfdrB6vswa+8PmIEorKePATWYMf0pjpS/ofCLLvB7UASOPiQWLlwMLTYf0oGC7FjIBXXGw== X-Received: by 2002:a62:68c3:: with SMTP id d186-v6mr6813478pfc.195.1539556437115; Sun, 14 Oct 2018 15:33:57 -0700 (PDT) Received: from raichu (toroon0560w-lp130-09-70-52-226-56.dsl.bell.ca. [70.52.226.56]) by smtp.gmail.com with ESMTPSA id 84-v6sm10374363pfv.33.2018.10.14.15.33.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Oct 2018 15:33:55 -0700 (PDT) Sender: Mark Johnston Date: Sun, 14 Oct 2018 18:33:49 -0400 From: Mark Johnston To: Konstantin Belousov Cc: Thomas Munro , alc@freebsd.org, freebsd-hackers@freebsd.org, mjg@freebsd.org Subject: Re: PostgresSQL vs super pages Message-ID: <20181014223349.GA9022@raichu> References: <20181011001954.GV5335@kib.kiev.ua> <20181013235021.GX5335@kib.kiev.ua> <20181014114544.GA5335@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181014114544.GA5335@kib.kiev.ua> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Oct 2018 22:33:59 -0000 On Sun, Oct 14, 2018 at 02:45:44PM +0300, Konstantin Belousov wrote: > On Sun, Oct 14, 2018 at 10:58:08PM +1300, Thomas Munro wrote: > > On Sun, 14 Oct 2018 at 12:50, Konstantin Belousov wrote: > > > On Thu, Oct 11, 2018 at 02:01:20PM +1300, Thomas Munro wrote: > > > > On Thu, 11 Oct 2018 at 13:20, Konstantin Belousov wrote: > > > > > On Thu, Oct 11, 2018 at 12:59:41PM +1300, Thomas Munro wrote: > > > > > > shm_open("/PostgreSQL.1721888107",O_RDWR|O_CREAT|O_EXCL,0600) = 46 (0x2e) > > > > > > ftruncate(46,0x400000) = 0 (0x0) > > > > > Try to write zeroes instead of truncating. > > > > > This should activate the fast path in the fault handler, and if the > > > > > pages allocated for backing store of the shm object were from reservation, > > > > > you should get superpage mapping on the first fault without promotion. > > > > > > > > If you just write() to a newly shm_open()'d fd you get a return code > > > > of 0 so I assume that doesn't work. If you ftruncate() to the desired > > > > size first, then loop writing 8192 bytes of zeroes at a time, it > > > > works. But still no super pages. I tried also with a write buffer of > > > > 2MB of zeroes, but still no super pages. I tried abandoning > > > > shm_open() and instead using a mapped file, and still no super pages. > > > > > > I did not quite scientific experiment, but you would need to try to find > > > the differences between what I did and what you observe. Below is the > > > naive test program that directly implements my suggestion, and the > > > output from the procstat -v for it after all things were set up. > > > > > ... > > > 98579 0x800e00000 0x801200000 rw- 1024 1030 3 0 --S- df > > > > Huh. Your program doesn't result in an S mapping on my laptop, but I > > tried on an EC2 t2.2xlarge machine and there it promotes to S, even if > > I comment out the write() loop (the loop that assigned to every byte > > is enough). The difference might be the amount of memory on the > > system: on my 4GB laptop, it is very reluctant to use super pages (but > > I have seen it do it, so I know it can). On a 32GB system, it does it > > immediately, and it works nicely for PostgreSQL too. So perhaps my > > problem is testing on a small RAM system, though I don't understand > > why. > How many free memory does your system have ? Free as reported by top. If > the free memory is low and fragmented, and I suppose it is on 4G laptop > which you use with X, browser and other memory-consuming applications, > system would have troubles filling the reverve, i.e reserving 2M of > 2M-aligned physical pages. BTW, this can be explicitly verified with the sysctl vm.phys_free sysctl. Superpage promotion requires free 2MB chunks from freelist 0, pool 0. > > You can try the test programs right after booting into single user mode.