From owner-freebsd-fs@freebsd.org Mon Feb 18 11:30:43 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8257814D6D13 for ; Mon, 18 Feb 2019 11:30:43 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 451FE8A127; Mon, 18 Feb 2019 11:30:42 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-wr1-f68.google.com with SMTP id c8so17939985wrs.4; Mon, 18 Feb 2019 03:30:42 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=hEeRw0KcqAUWq5RMLFNSg5KYrI9ewPX4k6cMP2265LA=; b=DMu9hEGd+8gxp7D5nrjyUvq/rejotl9FjcCm99JWmzTHuyABZeyI8BM4vGxzN/UAl8 cUQiHmV8eCaqaCtZ0TBZLWbUHOIpaYHH+oiFfyGzDVTlUmwCZIxhVDHOCXn0m2NVNvPW 82GDQB7d0pGLlD+wZT5E0NbczBMzoGFLVDcjY8nInQusI+bZREPIuO5DmoOfRdW9XpEg PGBoMB386nsWg5tDOnjcJp+4rIW4fGSKENmP2dWrs5eRly2+ezWNg750DZpNVepNkKct +/KC+pDqDiNzC32ymS3EAV8xhUZjPWRMcdcSn5BwCKfIy/N6QFfq5yq1VSRbi7nipsB7 JH7Q== X-Gm-Message-State: AHQUAuYIr1HmViVxApEuyNzbBOZ+BAIAqQxlG7Lr4G7jCAcqQlN1fYP8 RI+ceR82LD1F/FD8xPBimu/lYaks X-Google-Smtp-Source: AHgI3Ibn8KSfP5VeQaG7Ah1IMgt1TDgyyTUFHhttGycfjbfK0Q7dlt5826R4emO4RqLgwx9T75pUIQ== X-Received: by 2002:a5d:5410:: with SMTP id g16mr15814772wrv.214.1550487804256; Mon, 18 Feb 2019 03:03:24 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id w23sm14189044wmc.38.2019.02.18.03.03.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Feb 2019 03:03:23 -0800 (PST) Subject: Re: fsync and latest PostgreSQL To: Palle Girgensohn Cc: Konstantin Belousov , "freebsd-fs@freebsd.org" References: <7BEDD281-471D-47E7-A34D-CEF6C6296A01@FreeBSD.org> <20190215124132.GP24863@kib.kiev.ua> From: Andriy Gapon Openpgp: preference=signencrypt Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata= xsFNBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABzR5BbmRyaXkgR2Fw b24gPGF2Z0BGcmVlQlNELm9yZz7CwZQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/ QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/ psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryM7BTQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+ 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+ 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L HgZY3zfKOqcAEQEAAcLBfAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp Message-ID: Date: Mon, 18 Feb 2019 13:03:22 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20190215124132.GP24863@kib.kiev.ua> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 451FE8A127 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of agapon@gmail.com designates 209.85.221.68 as permitted sender) smtp.mailfrom=agapon@gmail.com X-Spamd-Result: default: False [-4.14 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; RCVD_COUNT_THREE(0.00)[3]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.96)[-0.964,0]; FORGED_SENDER(0.30)[avg@FreeBSD.org,agapon@gmail.com]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; MID_RHS_MATCH_FROM(0.00)[]; FROM_NEQ_ENVFROM(0.00)[avg@FreeBSD.org,agapon@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; MIME_TRACE(0.00)[0:+]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[FreeBSD.org]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[68.221.85.209.list.dnswl.org : 127.0.5.0]; IP_SCORE(-1.17)[ipnet: 209.85.128.0/17(-3.80), asn: 15169(-1.98), country: US(-0.07)]; RWL_MAILSPIKE_POSSIBLE(0.00)[68.221.85.209.rep.mailspike.net : 127.0.0.17]; FREEMAIL_CC(0.00)[gmail.com] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Feb 2019 11:30:43 -0000 On 15/02/2019 14:41, Konstantin Belousov wrote: > On Fri, Feb 15, 2019 at 01:09:08PM +0100, Palle Girgensohn wrote: >> Hi! >> >> I'm packaging postgresql ports for FreeBSD. I need your advice about a change to the PostgreSQL backend that seems to be aimed at working around a problem in Linux where the OS "lies" about fsync. >> >> There's a description here [1]: >> >> >>> data_sync_retry (boolean) >>> >>> When set to false, which is the default, PostgreSQL will raise a PANIC-level error on failure to flush modified data files to the filesystem. This causes the database server to crash. >>> >>> On some operating systems, the status of data in the kernel's page cache is unknown after a write-back failure. In some cases it might have been entirely forgotten, making it unsafe to retry; the second attempt may be reported as successful, when in fact the data has been lost. In these circumstances, the only way to avoid data loss is to recover from the WAL after any failure is reported, preferably after investigating the root cause of the failure and replacing any faulty hardware. >>> >>> If set to true, PostgreSQL will instead report an error but continue to run so that the data flushing operation can be retried in a later checkpoint. Only set it to true after investigating the operating system's treatment of buffered data in case of write-back failure. >> >> >> >> An email by the committer [2] indicates that it is safe to set data_sync_retry = true for "all file systems on FreeBSD" but makes not recommendations: >> >> >>> I personally believe it is safe to run with data_sync_retry = on on >>> any file system on FreeBSD, and ZFS on any operating system... but I >>> see no need to make recommendations about that in the documentation, >>> other than that you should investigate the behaviour of your operating >>> system if you really want to turn it on. >> >> >> I'm pondering about setting this knob to default true in the FreeBSD ports. Any thoughts or comments about that? >> >> Cheers, >> Palle >> >> >> >> [1] https://www.postgresql.org/docs/11/runtime-config-error-handling.html#GUC-DATA-SYNC-RETRY >> >> [2] https://www.postgresql.org/message-id/CAEepm%3D16aauN3LMHrVZ-uoqU8-k7aoSdGC3t7PghewVVsjUwtQ%40mail.gmail.com >> > > At least for UFS, fsync(2) and fdatasync(2) wait for the write to finish > and do not throw away dirty buffers which happens to get a write error. > We are also careful to re-dirty such buffers when async write fails with > any error except ENXIO. So the error from fsync(2) does not invalidate > non-written data, and next fsync(2) call would retry the write. > > Practically this means that the dirty buffers for the device with the > failing writes are accumulated in the system. > > In principle, this is also true for filesystems that correctly use > buffer cache, e.g. msdosfs. So it might be relevant for other writeable > filesystems, but I did not looked. > > I cannot comment about ZFS. ZFS also never silently loses dirty data. A pool will enter a mode specified by its failmode property upon a final write error. -- Andriy Gapon