From nobody Tue May 18 21:37:22 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 989CF5C1135 for ; Tue, 18 May 2021 21:37:25 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Fl8VF3hqkz3lNR for ; Tue, 18 May 2021 21:37:25 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: by mail-qk1-x733.google.com with SMTP id o27so10815556qkj.9 for ; Tue, 18 May 2021 14:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=jEA65BHlvg3jMb+4gfx7+GgUXqMbwBzYeKlwM2jFBk8=; b=mp5iBkpxlHVku7OIwXzZkI77TsInAPe+yp3IK5U4yI1D7qDKJAn3xzwH9BzXiPHihF I6GIi1moxEc9Ry2+FTFYSu8kpDW8I+XF7mHcDN2JCCO5X3Z0uo0Zcpra6Th6GTKm24R/ CD9aW5u59qLEy8unDwJpwkT0zuW6CuB9Qj4FI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=jEA65BHlvg3jMb+4gfx7+GgUXqMbwBzYeKlwM2jFBk8=; b=ZnbohSvKpiIFaszc48sNQjs1Hlcb2IwsDJB4qaoR4KB4Y5uxHe/VUMhUtzlEITVrbj soD3tC6nnWjRbVA+kQwddTKYlKvZOO0w4TMfpTWOgNvNmuC9nY4oLYAP9IhUR2x5KTrN tp+D68NVFbQd23z5SBnSKcvHdZ3BPb30WOCoDBvccLn+pew8yEFrO6LZUCB91HZ08PTf X5n36ZGmEqNVTkG4eaWVIGL+8dVjlPhh9c+ikvybBNsps7kEw6np3DrLHWeAiCIf0U/9 VnL4RZIouwuP5UN8qWklofAHgwfoeE2nPeygTfJM1A2Ipqgkv4cPGY3hwK8/1JQ1JV9E CVzQ== X-Gm-Message-State: AOAM533RFTP13peDE40wzL7MqLn7Oz7JgoTC60F21TGxz8cpf3XDuPYX STf/9QbQlwCYGCRxI6woyApHiA== X-Google-Smtp-Source: ABdhPJx9UCd3Zva3XKNBiXy/r6nhY+XjhlD5YPrtTzN3SyT7Vs0fq7MOmQ/9bDPETkt8yZMuehqBvg== X-Received: by 2002:a37:a1c2:: with SMTP id k185mr8047472qke.210.1621373844351; Tue, 18 May 2021 14:37:24 -0700 (PDT) Received: from smtpclient.apple (i80.cfv.net. [204.9.51.80]) by smtp.gmail.com with ESMTPSA id c20sm14183441qtm.52.2021.05.18.14.37.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 May 2021 14:37:23 -0700 (PDT) From: Kevin Day Message-Id: <3CF9B306-6006-41F8-A880-0AE3AF240BF6@dragondata.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_797C748B-D663-4CAA-9C40-1EEB701A5B17" List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\)) Subject: Re: The pagedaemon evicts ARC before scanning the inactive page list Date: Tue, 18 May 2021 16:37:22 -0500 In-Reply-To: Cc: FreeBSD Hackers , Mark Johnston To: Alan Somers References: X-Mailer: Apple Mail (2.3654.80.0.2.43) X-Rspamd-Queue-Id: 4Fl8VF3hqkz3lNR X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] --Apple-Mail=_797C748B-D663-4CAA-9C40-1EEB701A5B17 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii I'm not sure if this is the exact same thing, but I believe I'm seeing = similar in 12.2-RELEASE as well. Mem: 5628M Active, 4043M Inact, 8879M Laundry, 12G Wired, 1152M Buf, = 948M Free ARC: 8229M Total, 1010M MFU, 6846M MRU, 26M Anon, 32M Header, 315M Other 7350M Compressed, 9988M Uncompressed, 1.36:1 Ratio Swap: 2689M Total, 2337M Used, 352M Free, 86% Inuse Inact will keep growing, then it will exhaust all swap to the point it's = complaining (swap_pager_getswapspace(xx): failed), and never recover = until it reboots. ARC will keep shrinking and growing, but inactive = grows forever. While it hasn't hit a point it's breaking things since = the last reboot, on a bigger server (below) I can watch Inactive slowly = grow and never free until it's swapping so badly I have to reboot. Mem: 9648M Active, 604G Inact, 22G Laundry, 934G Wired, 1503M Buf, 415G = Free > On May 18, 2021, at 4:07 PM, Alan Somers wrote: >=20 > I'm using ZFS on servers with tons of RAM and running FreeBSD = 12.2-RELEASE. Sometimes they get into a pathological situation where = most of that RAM sits unused. For example, right now one of them has: >=20 > 2 GB Active > 529 GB Inactive > 16 GB Free > 99 GB ARC total > 469 GB ARC max > 86 GB ARC target >=20 > When a server gets into this situation, it stays there for days, with = the ARC target barely budging. All that inactive memory never gets = reclaimed and put to a good use. Frequently the server never recovers = until a reboot. >=20 > I have a theory for what's going on. Ever since r334508^ the = pagedaemon sends the vm_lowmem event _before_ it scans the inactive page = list. If the ARC frees enough memory, then vm_pageout_scan_inactive = won't need to free any. Is that order really correct? For reference, = here's the relevant code, from vm_pageout_worker: >=20 > shortage =3D pidctrl_daemon(&vmd->vmd_pid, vmd->vmd_free_count); > if (shortage > 0) { > ofree =3D vmd->vmd_free_count; > if (vm_pageout_lowmem() && vmd->vmd_free_count > ofree) > shortage -=3D min(vmd->vmd_free_count - ofree, > (u_int)shortage); > target_met =3D vm_pageout_scan_inactive(vmd, shortage, > &addl_shortage); > } else > addl_shortage =3D 0 >=20 > Raising vfs.zfs.arc_min seems to workaround the problem. But ideally = that wouldn't be necessary. >=20 > -Alan >=20 > ^ https://svnweb.freebsd.org/base?view=3Drevision&revision=3D334508 = --Apple-Mail=_797C748B-D663-4CAA-9C40-1EEB701A5B17 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii I'm = not sure if this is the exact same thing, but I believe I'm seeing = similar in 12.2-RELEASE as well.

Mem: 5628M Active, 4043M Inact, 8879M = Laundry, 12G Wired, 1152M Buf, 948M Free
ARC: 8229M = Total, 1010M MFU, 6846M MRU, 26M Anon, 32M Header, 315M Other
     7350M Compressed, 9988M Uncompressed, = 1.36:1 Ratio
Swap: 2689M Total, 2337M Used, 352M = Free, 86% Inuse

Inact will keep growing, then it will exhaust all swap to the = point it's complaining (swap_pager_getswapspace(xx): failed), and never = recover until it reboots. ARC will keep shrinking and growing, but = inactive grows forever. While it hasn't hit a point it's breaking things = since the last reboot, on a bigger server (below) I can watch Inactive = slowly grow and never free until it's swapping so badly I have to = reboot.

Mem: = 9648M Active, 604G Inact, 22G Laundry, 934G Wired, 1503M Buf, 415G = Free




On May = 18, 2021, at 4:07 PM, Alan Somers <asomers@freebsd.org>= wrote:

I'm using ZFS on servers with = tons of RAM and running FreeBSD 12.2-RELEASE.  Sometimes they get = into a pathological situation where most of that RAM sits unused.  = For example, right now one of them has:

2 GB   Active
529 GB = Inactive
16 GB  Free
99 GB  ARC total
469 GB ARC = max
86 GB  ARC target

When a server gets into = this situation, it stays there for days, with the ARC target barely = budging.  All that inactive memory never gets reclaimed and put to = a good use.  Frequently the server never recovers until a = reboot.

I have a theory for what's going on.  Ever since = r334508^ the pagedaemon sends the vm_lowmem event _before_ it scans the = inactive page list.  If the ARC frees enough memory, then = vm_pageout_scan_inactive won't need to free any.  Is that order = really correct?  For reference, here's the relevant code, from = vm_pageout_worker:

shortage =3D pidctrl_daemon(&vmd->vmd_pid, = vmd->vmd_free_count);
if (shortage > 0) = {
        ofree =3D = vmd->vmd_free_count;
        if = (vm_pageout_lowmem() && vmd->vmd_free_count > ofree)
          &nb= sp;     shortage -=3D min(vmd->vmd_free_count - = ofree,
              =       (u_int)shortage);
        target_met =3D = vm_pageout_scan_inactive(vmd, shortage,
    =         &addl_shortage);
= } else
        = addl_shortage =3D 0

Raising vfs.zfs.arc_min seems to workaround the = problem.  But ideally that wouldn't be necessary.

-Alan


= --Apple-Mail=_797C748B-D663-4CAA-9C40-1EEB701A5B17--