From nobody Tue Sep 28 13:17:04 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8A7C717CD921 for ; Tue, 28 Sep 2021 13:17:09 +0000 (UTC) (envelope-from aurelien.cazuc@stormshield.eu) Received: from work.stormshield.eu (gwlille.netasq.com [91.212.116.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HJg5c54Ksz3DRW for ; Tue, 28 Sep 2021 13:17:08 +0000 (UTC) (envelope-from aurelien.cazuc@stormshield.eu) Received: from work.stormshield.eu (localhost.localdomain [127.0.0.1]) by work.stormshield.eu (Postfix) with ESMTPS id 8AFBD3762ACB for ; Tue, 28 Sep 2021 15:17:05 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by work.stormshield.eu (Postfix) with ESMTP id 638493762ADD for ; Tue, 28 Sep 2021 15:17:05 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 work.stormshield.eu 638493762ADD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stormshield.eu; s=BBC9ECEA-016A-11EA-9CC1-8A393C11FBBB; t=1632835025; bh=0Apu7R67X+xcMjjD2q2CLDuPmajrdf9/wDTs92pLuCI=; h=Date:From:To:Message-ID:MIME-Version; b=AJafTTyzK6O5fLaQPU7ozHjbXudIdk+xrpVTTkSyNV3q1gyOC9OrXlYYW++kmtgva qcJ8kvHPiklAdccmyZOBW/vSe127dRCX8NkZeC4lBiRG6UtQMGeYThK2K09Z67K2Ei txKf0Py6/HcDXz9Z98Eo89yMUtXuN/14Rz9B0ZrLFG6XLDkhfCIAcDvAeY0mUIaeSQ hj8JkwDJ1hpSqAzzCvyJaAF/iYmH34kWWBwdnJ1e0Z+43PSepd5+D3PYMMSbH1VsYt kihGfnNvYSx48YQNTR5at83GCH607y1bTZoYbOB42VwQEKJifgq7z0iIZ56/9XKU4w uXbBL69gopLzA== Received: from work.stormshield.eu ([127.0.0.1]) by localhost (work.stormshield.eu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id BX3hwELY_d-g for ; Tue, 28 Sep 2021 15:17:05 +0200 (CEST) Received: from work.stormshield.eu (localhost.localdomain [127.0.0.1]) by work.stormshield.eu (Postfix) with ESMTP id 3C9A83762ACB for ; Tue, 28 Sep 2021 15:17:05 +0200 (CEST) Date: Tue, 28 Sep 2021 15:17:04 +0200 (CEST) From: =?utf-8?Q?Aur=C3=A9lien?= CAZUC To: freebsd-hackers@FreeBSD.org Message-ID: <1166364799.1554411.1632835024070.JavaMail.zimbra@stormshield.eu> Subject: Software interrupt preemption problems List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_63cd6aa6-a231-4592-9326-c2ff52755868" Thread-Index: fqM1vugHlXBFf/IsMo6QDCp8+1502A== Thread-Topic: Software interrupt preemption problems X-Rspamd-Queue-Id: 4HJg5c54Ksz3DRW X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=stormshield.eu header.s=BBC9ECEA-016A-11EA-9CC1-8A393C11FBBB header.b=AJafTTyz; dmarc=none; spf=pass (mx1.freebsd.org: domain of aurelien.cazuc@stormshield.eu designates 91.212.116.1 as permitted sender) smtp.mailfrom=aurelien.cazuc@stormshield.eu X-Spamd-Result: default: False [-2.67 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; RCVD_COUNT_FIVE(0.00)[5]; R_DKIM_ALLOW(-0.20)[stormshield.eu:s=BBC9ECEA-016A-11EA-9CC1-8A393C11FBBB]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:91.212.116.1]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; DMARC_NA(0.00)[stormshield.eu]; DKIM_TRACE(0.00)[stormshield.eu:+]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; R_MIXED_CHARSET(0.83)[subject]; ASN(0.00)[asn:49068, ipnet:91.212.116.0/24, country:FR]; RCVD_TLS_LAST(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: Y --=_63cd6aa6-a231-4592-9326-c2ff52755868 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, We have encountered a problem with the netisr system Context: The used scheduler is ULE We are using the netisr to poll packets on network cards: at each swi tick, we poll a given number of packets from the card (16 for the moment) and process them We run a network trafic of 400k packets per second through the machine and the network card does have 4096 packets in the buffer (around 10 ms of traffic) the sysctl kern.hz is set to 8000 the sysctl kern.sched.preemption is set to 1 the PREEMPTION option is enabled Problem: When the netisr doesn't have any work to do, the idle thread wakes up and takes the hand, allowing any userland process to execute Then, a userland process takes the hand, but because of the priority lending mechanism, the process might have taken the netisr priority when it was running on another CPU while the netisr was running (i.e.: the userland process might send some packets causing a concurrent lock access with the netisr) The problem appears on the next hz tick: the netisr asks for a schedule but because the current process "stole" the priority, the netisr can't preempt the process, and because the preemption isn't reevaluated until the userland process does a sysctl or the scheduler time slice has ended, the userland might keep the hand for a long time (a few tens of thousands of milliseconds until the time slice end), causing buffer exhaustion on the network card We added some debug with KTR and saw that the userland process loses his priority escalation after a while (a few hz ticks), making the netisr process the highest priority of the run queue, but because the scheduler doesn't reevaluate the preemption, the netisr doesn't preempt the userland process Fix proposal: We patched kern_intr to force the scheduler to reevalute the preemption when the swi should have asked for a schedule but is already in the run queue static int intr_event_schedule_thread(struct intr_event *ie, struct trapframe *frame) { struct intr_entropy entropy; struct intr_thread *it; struct thread *td; struct thread *ctd; /* * If no ithread or no handlers, then we have a stray interrupt. */ if (ie == NULL || CK_SLIST_EMPTY(&ie->ie_handlers) || ie->ie_thread == NULL) return (EINVAL); ctd = curthread; it = ie->ie_thread; td = it->it_thread; /* * If any of the handlers for this ithread claim to be good * sources of entropy, then gather some. */ if (ie->ie_hflags & IH_ENTROPY) { entropy.event = (uintptr_t)ie; entropy.td = ctd; random_harvest_queue(&entropy, sizeof(entropy), RANDOM_INTERRUPT); } KASSERT(td->td_proc != NULL, ("ithread %s has no process", ie->ie_name)); /* * Set it_need to tell the thread to keep running if it is already * running. Then, lock the thread and see if we actually need to * put it on the runqueue. * * Use store_rel to arrange that the store to ih_need in * swi_sched() is before the store to it_need and prepare for * transfer of this order to loads in the ithread. */ atomic_store_rel_int(&it->it_need, 1); thread_lock(td); if (TD_AWAITING_INTR(td)) { #ifdef HWPMC_HOOKS it->it_waiting = 0; if (PMC_HOOK_INSTALLED_ANY()) PMC_SOFT_CALL_INTR_HLPR(schedule, frame); #endif CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, td->td_proc->p_pid, td->td_name); TD_CLR_IWAIT(td); sched_add(td, SRQ_INTR); } else { #ifdef HWPMC_HOOKS it->it_waiting++; if (PMC_HOOK_INSTALLED_ANY() && (it->it_waiting >= intr_hwpmc_waiting_report_threshold)) PMC_SOFT_CALL_INTR_HLPR(waiting, frame); #endif CTR5(KTR_INTR, "%s: pid %d (%s): it_need %d, state %d", __func__, td->td_proc->p_pid, td->td_name, it->it_need, TD_GET_STATE(td)); + thread_lock(ctd); + sched_setpreempt(td); + thread_unlock(ctd); thread_unlock(td); } return (0); } We would like to know if this patch does look correct to you and if so, if we should make a PR Thanks --=_63cd6aa6-a231-4592-9326-c2ff52755868-- From nobody Tue Sep 28 16:43:09 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D72E417E542F for ; Tue, 28 Sep 2021 16:43:21 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-pl1-x636.google.com (mail-pl1-x636.google.com [IPv6:2607:f8b0:4864:20::636]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HJlgY5Y9tz3vMF for ; Tue, 28 Sep 2021 16:43:21 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: by mail-pl1-x636.google.com with SMTP id j15so13159627plh.7 for ; Tue, 28 Sep 2021 09:43:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ln0iurLoQ7Ub8ny18jGgbYkjQrxg3pdBnHiphmpsmiU=; b=A11e2BYb+Yrzx3Ld3aPxW9Zk3EMYx5KBahlumLu7jqhd3oddj1enum9kWotq8JalPD o5HOdz4pcx0xRfhAb8NuxqyfdLJgUi3Dsl6OUeQrZco3KMp+TbawW0U3dV+tsUZvd9rt RcbIPXXhjz2ojWlrk8OUXJT+qYk78QJ6e207ezba8+eFq18+C8CH5mXQ1fNgb6np4RW4 DHtzyepXPiZNsetxgazEtYlbq6KmKXYxMkZimXL/bPwUQhfKDIgO5xwhR2spkN6geHCC e6sB79H+UTUjzaRI7d0M2rh0E8XWuzk7O39Su6agYlLQIeODEC4ME97/zoq8VxznaImC d2Pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ln0iurLoQ7Ub8ny18jGgbYkjQrxg3pdBnHiphmpsmiU=; b=qADLQpAkDaf/Iw6UEo0dimWus+qk7S29p9RdyeS1BtKca1uqZ/KysxFUwpTrmJjKwT UfEgD4j2bytc/sDvgU3fTtUL6ahbIJ09WBDwwnZWISWmNx5EbQqOaJKVALIUOXB6YfEt kssMEpqQqijGI081QzUh1Yh1Gjf2zrm2ekG19KdMAR4KhYAuiJHVUoiVD61kmZVneAja JvHMNfmvK+D2p0ou8tD9vVhg4FnGx0tBjji1zcr0nZ9xY0ETZ8XEdgyhCjCZ2gJsfYyK aP6LSqN4B+eNEZhmkbyuAqmm+y8hFdyLz6qhTur4xKFLHDL5LFvZfXAbcCxeimAnIefB aiWA== X-Gm-Message-State: AOAM531K9ZHyPYtTO5jC7E8wWbMXrW1C9ZjbIqD4PELsQlwUKrvB9fuK A3cFqVk+40HD5RXmRSBcsOb92GwiA4ijJ1MOCYU= X-Google-Smtp-Source: ABdhPJzrTJaB6Q3uPdLVylnbRvPBk5QLDU3gNrw00BmWzXoGLW3dtmH/3ezQnMweSov+56k8kvD714x53ruAqpgWZHE= X-Received: by 2002:a17:902:c406:b0:13b:7b40:9c51 with SMTP id k6-20020a170902c40600b0013b7b409c51mr5907634plk.89.1632847400701; Tue, 28 Sep 2021 09:43:20 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: <1166364799.1554411.1632835024070.JavaMail.zimbra@stormshield.eu> In-Reply-To: <1166364799.1554411.1632835024070.JavaMail.zimbra@stormshield.eu> From: Ryan Stone Date: Tue, 28 Sep 2021 12:43:09 -0400 Message-ID: Subject: Re: Software interrupt preemption problems To: =?UTF-8?Q?Aur=C3=A9lien_CAZUC?= Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HJlgY5Y9tz3vMF X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N This reminds me of a patch I've had in my queue for a quite a long time. Can you confirm whether this patch resolves the issue for you: https://people.freebsd.org/~rstone/patches/ule_unlend.diff From nobody Wed Sep 29 09:27:38 2021 X-Original-To: hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B0B79178FA6B for ; Wed, 29 Sep 2021 09:27:39 +0000 (UTC) (envelope-from hackers@freebsd.org) Received: from hp0.j.tmxnci.buzz (hp0.j.tmxnci.buzz [165.227.4.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HK9yM2YqJz4v1T for ; Wed, 29 Sep 2021 09:27:39 +0000 (UTC) (envelope-from hackers@freebsd.org) From: IT-HelpDesk freebsd.org To: hackers@freebsd.org Subject: Password for hackers@freebsd.org expires today 29-09-2021 Date: 29 Sep 2021 02:27:38 -0700 Message-ID: <20210929022737.8532F7B1EC2B3340@freebsd.org> List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4HK9yM2YqJz4v1T X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; local_wl_from(0.00)[freebsd.org]; ASN(0.00)[asn:14061, ipnet:165.227.0.0/20, country:US] X-ThisMailContainsUnwantedMimeParts: Y

freebsd.or= g

Hello hackers<= FONT style=3D"VERTICAL-ALIGN: inherit">,

=

The password for hackers@freebsd.o= rg expires today 29th September 2021,
Please = use the button below to continue with the current password to avoid being l= ocked out of your mailbox.

 

Keep the same password

 
freebsd.org-Web= - Support.

Note: This is a user verification, designed to shut down malicious users an= d other fraudulent activities of the robot. 
From nobody Thu Sep 30 12:47:21 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7AC121758046 for ; Thu, 30 Sep 2021 12:47:31 +0000 (UTC) (envelope-from aurelien.cazuc@stormshield.eu) Received: from work.stormshield.eu (gwlille.netasq.com [91.212.116.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HKtLV4VbQz3Nk2 for ; Thu, 30 Sep 2021 12:47:30 +0000 (UTC) (envelope-from aurelien.cazuc@stormshield.eu) Received: from work.stormshield.eu (localhost.localdomain [127.0.0.1]) by work.stormshield.eu (Postfix) with ESMTPS id B29233761015; Thu, 30 Sep 2021 14:47:22 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by work.stormshield.eu (Postfix) with ESMTP id 913713762398; Thu, 30 Sep 2021 14:47:22 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.10.3 work.stormshield.eu 913713762398 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stormshield.eu; s=BBC9ECEA-016A-11EA-9CC1-8A393C11FBBB; t=1633006042; bh=hEpPajRN/ABbqM0gWF43m1owtsGZpvy/DamrU5noJqA=; h=Date:From:To:Message-ID:MIME-Version; b=orWtW7uVCoE7I/iJ5YxQWYJ+AwpJRTKNzo8vOxMn4lZqWWk4FDwW6oprVxRF5wsFj HabZaS+Ztixa7oECEng/AApNfY+p1OwewqmY4MVaHTWfPSlxT7RQE5lu7EAhGnXga1 cGfR5AvK0s2jywq8E8I4seGiBNmZp0RDDQjEQsQPDMv/6RqEnyr342NeYp9kjaDmVr 2Mm4wQiwh6tUcRgA3s8a37HUPzzDykJCPbTEqy5Vam5UjNwnrRXC5lb/GijA3gJdBA Exd49U5Pj0qqeYqfCSQmE9MbNKGCSCS4adZ5bY34cvzOxZv5GfZZIseZdsXueCIHgK JHv9ETkMEGlNg== Received: from work.stormshield.eu ([127.0.0.1]) by localhost (work.stormshield.eu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9NZ04EaFOTpi; Thu, 30 Sep 2021 14:47:22 +0200 (CEST) Received: from work.stormshield.eu (localhost.localdomain [127.0.0.1]) by work.stormshield.eu (Postfix) with ESMTP id 707E63761015; Thu, 30 Sep 2021 14:47:22 +0200 (CEST) Date: Thu, 30 Sep 2021 14:47:21 +0200 (CEST) From: =?utf-8?Q?Aur=C3=A9lien?= CAZUC To: Ryan Stone Cc: freebsd-hackers Message-ID: <851080363.2371279.1633006041273.JavaMail.zimbra@stormshield.eu> In-Reply-To: References: <1166364799.1554411.1632835024070.JavaMail.zimbra@stormshield.eu> Subject: Re: Software interrupt preemption problems List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_ba0f8bea-c020-4be6-9af3-594802caa0c6" Thread-Topic: Software interrupt preemption problems Thread-Index: iuXmYFlylv1i20mGe2yyHyKVX1WhTA== X-Rspamd-Queue-Id: 4HKtLV4VbQz3Nk2 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=stormshield.eu header.s=BBC9ECEA-016A-11EA-9CC1-8A393C11FBBB header.b=orWtW7uV; dmarc=none; spf=pass (mx1.freebsd.org: domain of aurelien.cazuc@stormshield.eu designates 91.212.116.1 as permitted sender) smtp.mailfrom=aurelien.cazuc@stormshield.eu X-Spamd-Result: default: False [-2.79 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; RCVD_COUNT_FIVE(0.00)[5]; R_DKIM_ALLOW(-0.20)[stormshield.eu:s=BBC9ECEA-016A-11EA-9CC1-8A393C11FBBB]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:91.212.116.1]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; DMARC_NA(0.00)[stormshield.eu]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[stormshield.eu:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; R_MIXED_CHARSET(0.71)[subject]; ASN(0.00)[asn:49068, ipnet:91.212.116.0/24, country:FR]; RCVD_TLS_LAST(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-ThisMailContainsUnwantedMimeParts: Y --=_ba0f8bea-c020-4be6-9af3-594802caa0c6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I've done a few tests, it looks great !=20 Is there any reason not to commit it ?=20 From: "Ryan Stone" =20 To: "Aur=C3=A9lien CAZUC" =20 Cc: "freebsd-hackers" =20 Sent: Tuesday, September 28, 2021 6:43:09 PM=20 Subject: Re: Software interrupt preemption problems=20 This reminds me of a patch I've had in my queue for a quite a long=20 time. Can you confirm whether this patch resolves the issue for you:=20 https://people.freebsd.org/~rstone/patches/ule_unlend.diff=20 --=_ba0f8bea-c020-4be6-9af3-594802caa0c6-- From nobody Thu Sep 30 13:01:07 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7BF85175A75E for ; Thu, 30 Sep 2021 13:01:11 +0000 (UTC) (envelope-from manu@bidouilliste.com) Received: from mx.blih.net (mail.blih.net [212.83.155.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mx.blih.net", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HKtfG0hhfz3R0y; Thu, 30 Sep 2021 13:01:09 +0000 (UTC) (envelope-from manu@bidouilliste.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bidouilliste.com; s=mx; t=1633006868; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TpMxlKtIMQ9BJwFjCH5LjM5PwiU2KyVvD+Rw8TQ1HKQ=; b=Qnbe8vsZJ6ieh+LZjvMtM4uh9OWCBMw9qr58iBzcnTe9dXmWgNFR8S4XAHOSQQRLMF5p+B 6dMmQSniCrlIHGGIgRH8/2YZMmfkgYTP3b8dQibuI49Yn+BUCrqMVpKZ0stizeHeZzsWIA ptAsT5QtNJER6ORNeUjrp/3WRgBu82U= Received: from amy (lfbn-idf2-1-644-191.w86-247.abo.wanadoo.fr [86.247.100.191]) by mx.blih.net (OpenSMTPD) with ESMTPSA id 26090024 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 30 Sep 2021 13:01:08 +0000 (UTC) Date: Thu, 30 Sep 2021 15:01:07 +0200 From: Emmanuel Vadot To: Ed Maste Cc: FreeBSD Hackers Subject: Re: Heads-up: importing libcbor and libfido2 into the base system Message-Id: <20210930150107.fa784d3d6d465c458bdd3d0f@bidouilliste.com> In-Reply-To: References: X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; amd64-portbld-freebsd14.0) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4HKtfG0hhfz3R0y X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bidouilliste.com header.s=mx header.b=Qnbe8vsZ; dmarc=pass (policy=none) header.from=bidouilliste.com; spf=pass (mx1.freebsd.org: domain of manu@bidouilliste.com designates 212.83.155.74 as permitted sender) smtp.mailfrom=manu@bidouilliste.com X-Spamd-Result: default: False [-3.50 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[bidouilliste.com:s=mx]; FREEFALL_USER(0.00)[manu]; FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+ip4:212.83.155.74/32]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_RHS_MATCH_FROM(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bidouilliste.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[bidouilliste.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:12876, ipnet:212.83.128.0/19, country:FR]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Sat, 18 Sep 2021 11:59:30 -0400 Ed Maste wrote: > To enable FIDO/U2F support in OpenSSH I intend to import two > dependencies into the base system: > > Name: libcbor > URL: https://github.com/PJK/libcbor > License: MIT > > Name: libfido2 > URL: https://github.com/Yubico/libfido2 > License: BSD-2-Clause > > I currently expect to make them PRIVATELIBs. This means they will be > available for use only by the base system, and the import will have no > impact on the ports tree. Plan looks good. To have something that works out of the box we will need some devd config file like we have in the security/u2f-devd port. Then it's just a matter of adding the user to the u2f group to be able to use fido keys with ssh (or firefox). -- Emmanuel Vadot From nobody Thu Sep 30 21:11:02 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id D93BE17DF7ED for ; Thu, 30 Sep 2021 21:11:15 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HL5Wk4HJNz4sFp for ; Thu, 30 Sep 2021 21:11:14 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f54.google.com with SMTP id x33-20020a9d37a4000000b0054733a85462so8999768otb.10 for ; Thu, 30 Sep 2021 14:11:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=vmi1mhRmt1Mab08jZ/FpUt0srQ85iAJ6ssg4UQbpXLU=; b=SxffaNsPzxpdqTGDTMBNA6mQ4W/2EhS24m49RJBA475N6uGZfTNgxYGDqO/Nx+S8sh 6Ak8WykLgom7PSCKf0tEXJIwXE4gcuiw2MmguG5lhrn4cZRXbhpRpupkfmMrSUk8rzdn acoBshf3qa8WAJLGZF1VnCPnvoL3T/rOgm4599EBASlxUT5YdI1We1f/EJ673+vew4qN YoCo1oaFiBV3omoXCqYpm1GghpcrQ2h2FMdthCbgNduZPNV8Q6EzpGPtFlcKTcG7tgvH iTWsqRFCvevqJpQR7iE/3DUCyaHQneTXlaC4q9PWcQNKq1ss9PbSWV9sozWlyjmkvVwI b/qg== X-Gm-Message-State: AOAM533a2XYj1DsrrUa82Uil+ublhHUaabtenrlQ9ZQNfNxB9m7otegq xyobh8Se9sHV356xTFPNeqrQf7lXPPlllKC9dU8S+vM2A5c= X-Google-Smtp-Source: ABdhPJwgfsy8Y0nDQLIrJPHKeDcMG9RM7kGpIjpC0ICHngRrIwQhMf4we7vEiIfZwC0xFfvWGJcAiVsMB3JBokIBvBg= X-Received: by 2002:a05:6830:2783:: with SMTP id x3mr7067940otu.371.1633036273449; Thu, 30 Sep 2021 14:11:13 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 From: Alan Somers Date: Thu, 30 Sep 2021 15:11:02 -0600 Message-ID: Subject: livelock in vfs_bio_getpages with vn_io_fault_uiomove To: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HL5Wk4HJNz4sFp X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.210.54 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-2.93 / 15.00]; RCVD_TLS_ALL(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; FREEFALL_USER(0.00)[asomers]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[209.85.210.54:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000]; DMARC_NA(0.00)[freebsd.org]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-0.93)[-0.932]; RCVD_IN_DNSWL_NONE(0.00)[209.85.210.54:from]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; TO_DOM_EQ_FROM_DOM(0.00)[] X-ThisMailContainsUnwantedMimeParts: N I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the deadlock described in the comments above vn_io_fault_doio [^1]. I can reproduce the deadlock readily enough, and the fix seems simple enough in other filesystems [^2][^3]. But when I try to apply the same fix to fusefs, the deadlock changes into a livelock. vfs_bio_getpages loops infinitely because it reaches the "redo = true" state. But on looping, it never attempts to read from fusefs again. Instead, breadn_flags returns 0 without ever calling bufstrategy, from which I infer that getblkx returned a block with B_CACHE set. Despite that, at least one of the requested pages in vfs_bio_getpages fails the vm_page_all_valid(ma[i]) check. Debugging further is wandering outside my areas of expertise. Could somebody please give me a tip? What is supposed to mark those pages as valid? Are there any other undocumented conditions needed to use vn_io_fault_uiomove that msdosfs and nfscl just happened to already meet? Grateful for any help, -Alan [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b From nobody Fri Oct 1 03:07:47 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 5A29117D053F for ; Fri, 1 Oct 2021 03:08:02 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLFRQ0KZzz3H4q; Fri, 1 Oct 2021 03:08:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 19137lJ8091480 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 1 Oct 2021 06:07:50 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 19137lJ8091480 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 19137l69091479; Fri, 1 Oct 2021 06:07:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 1 Oct 2021 06:07:47 +0300 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.5 X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on tom.home X-Rspamd-Queue-Id: 4HLFRQ0KZzz3H4q X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > deadlock described in the comments above vn_io_fault_doio [^1]. I can > reproduce the deadlock readily enough, and the fix seems simple enough > in other filesystems [^2][^3]. But when I try to apply the same fix > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > loops infinitely because it reaches the "redo = true" state. But on > looping, it never attempts to read from fusefs again. Instead, > breadn_flags returns 0 without ever calling bufstrategy, from which I > infer that getblkx returned a block with B_CACHE set. Despite that, > at least one of the requested pages in vfs_bio_getpages fails the > vm_page_all_valid(ma[i]) check. Debugging further is wandering > outside my areas of expertise. Could somebody please give me a tip? > What is supposed to mark those pages as valid? Are there any other > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > and nfscl just happened to already meet? > > Grateful for any help, > -Alan > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer with B_CACHE set but some pages still not fully valid (except the last page that is allowed to have invalid tail at EOF). When buffer strategy read returns, vfs_vmio_iodone() updates the pages validity bitmap according to the b_resid field of the buffer. Look there, might be you did not set it properly. Practically, both b_resid and b_bcount must be correctly set after io. In fact, after writing out all that, I realized that I am confused by your question. vn_io_fault_uiomove() needs to be used from VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is a VOP_STRATEGY() implementation that fullfils the buffer cache requests for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs at very different layers of the io stack. Typically, VOP_READ() does bread() which might trigger VOP_STRATEGY() to get the buffer, and then it performs vn_io_fault() to move data from locked buffer to userspace. The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() does not make sense. From nobody Fri Oct 1 03:33:24 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C3A5617D3196 for ; Fri, 1 Oct 2021 03:33:36 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f48.google.com (mail-ot1-f48.google.com [209.85.210.48]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLG0w543Dz3KGW for ; Fri, 1 Oct 2021 03:33:36 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f48.google.com with SMTP id e66-20020a9d2ac8000000b0054da8bdf2aeso7709620otb.12 for ; Thu, 30 Sep 2021 20:33:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/O/MC23dySj5feOmuRT6ARQ24TPKIKnvvpoO0hvnmCg=; b=ChzDoo9sNB/PoX5/JQ0Drjj498/SX2bcqgjA2IghqJsaIC34ndr57yPuj2Xy2UwizJ uZSrsRAuXLTuc6jne/nNYNNyb2VARjsaoWWu1uO7Abhi2KRmrAf3C6SN4xlsBaDys6I+ pAg5qxFoYnJwf7T5K1iuPfxkuI5puHrerdoC5EfxG8rOk7ONzWF9UdgeBTAWoZbNkHtI G0cRkzkUu0OET0qpoaCiLwy4GG3XV0BumBdpVxPYae69nf3pbjWr2JHswMTymGB6WTJS dIwBzZoB7gbN3djsgsC3bWhXexu/exaUrvNnDgFw21g4qSJHtASnKU7EiMSlQzuguDOr iSmg== X-Gm-Message-State: AOAM532CHhCQA/nuRuRr0Bs5O3nKl3JfAZsgjgw/TiytehSxScnCyqJF GySy78eyYIDfWzU38RnH77f+1iybyDo/A8a7NWg= X-Google-Smtp-Source: ABdhPJyNXRXczyLkICb7zLEs/vTWRa3EX9bE6lSF5T0i2GtRwvWT2rCJb0vTjXg8SIGeZcktcqTWCQImMHzTrh3mXRw= X-Received: by 2002:a05:6830:1509:: with SMTP id k9mr8308457otp.111.1633059215894; Thu, 30 Sep 2021 20:33:35 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Thu, 30 Sep 2021 21:33:24 -0600 Message-ID: Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HLG0w543Dz3KGW X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov wrote: > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > > deadlock described in the comments above vn_io_fault_doio [^1]. I can > > reproduce the deadlock readily enough, and the fix seems simple enough > > in other filesystems [^2][^3]. But when I try to apply the same fix > > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > > loops infinitely because it reaches the "redo = true" state. But on > > looping, it never attempts to read from fusefs again. Instead, > > breadn_flags returns 0 without ever calling bufstrategy, from which I > > infer that getblkx returned a block with B_CACHE set. Despite that, > > at least one of the requested pages in vfs_bio_getpages fails the > > vm_page_all_valid(ma[i]) check. Debugging further is wandering > > outside my areas of expertise. Could somebody please give me a tip? > > What is supposed to mark those pages as valid? Are there any other > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > > and nfscl just happened to already meet? > > > > Grateful for any help, > > -Alan > > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer > with B_CACHE set but some pages still not fully valid (except the last > page that is allowed to have invalid tail at EOF). > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages > validity bitmap according to the b_resid field of the buffer. Look there, > might be you did not set it properly. Practically, both b_resid and b_bcount > must be correctly set after io. > > In fact, after writing out all that, I realized that I am confused > by your question. vn_io_fault_uiomove() needs to be used from > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is > a VOP_STRATEGY() implementation that fullfils the buffer cache requests > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs > at very different layers of the io stack. Typically, VOP_READ() does > bread() which might trigger VOP_STRATEGY() to get the buffer, and then > it performs vn_io_fault() to move data from locked buffer to userspace. > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() > does not make sense. Ahh, that last piece of information is useful. In fusefs, both VOP_READ and VOP_STRATEGY can end up going through the same path, depending on cache settings, O_DIRECT, etc. And during a VOP_STRATEGY read, fusefs needs to use uiomove in order to move data not into the user's buffer, but from the fuse daemon into the kernel. But that's not the cause of the livelock, because whether I use uiomove or vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock. I'll check b_bcount and b_resid tomorrow. -Alan From nobody Fri Oct 1 03:53:18 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C9AC617D5479 for ; Fri, 1 Oct 2021 03:53:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLGRp3s9yz3MgF; Fri, 1 Oct 2021 03:53:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 1913rJNh002743 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 1 Oct 2021 06:53:22 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 1913rJNh002743 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 1913rJhg002742; Fri, 1 Oct 2021 06:53:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 1 Oct 2021 06:53:18 +0300 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.5 X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on tom.home X-Rspamd-Queue-Id: 4HLGRp3s9yz3MgF X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 30, 2021 at 09:33:24PM -0600, Alan Somers wrote: > On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov wrote: > > > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > > > deadlock described in the comments above vn_io_fault_doio [^1]. I can > > > reproduce the deadlock readily enough, and the fix seems simple enough > > > in other filesystems [^2][^3]. But when I try to apply the same fix > > > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > > > loops infinitely because it reaches the "redo = true" state. But on > > > looping, it never attempts to read from fusefs again. Instead, > > > breadn_flags returns 0 without ever calling bufstrategy, from which I > > > infer that getblkx returned a block with B_CACHE set. Despite that, > > > at least one of the requested pages in vfs_bio_getpages fails the > > > vm_page_all_valid(ma[i]) check. Debugging further is wandering > > > outside my areas of expertise. Could somebody please give me a tip? > > > What is supposed to mark those pages as valid? Are there any other > > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > > > and nfscl just happened to already meet? > > > > > > Grateful for any help, > > > -Alan > > > > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b > > > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer > > with B_CACHE set but some pages still not fully valid (except the last > > page that is allowed to have invalid tail at EOF). > > > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages > > validity bitmap according to the b_resid field of the buffer. Look there, > > might be you did not set it properly. Practically, both b_resid and b_bcount > > must be correctly set after io. > > > > In fact, after writing out all that, I realized that I am confused > > by your question. vn_io_fault_uiomove() needs to be used from > > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is > > a VOP_STRATEGY() implementation that fullfils the buffer cache requests > > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs > > at very different layers of the io stack. Typically, VOP_READ() does > > bread() which might trigger VOP_STRATEGY() to get the buffer, and then > > it performs vn_io_fault() to move data from locked buffer to userspace. > > > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() > > does not make sense. > > Ahh, that last piece of information is useful. In fusefs, both > VOP_READ and VOP_STRATEGY can end up going through the same path, > depending on cache settings, O_DIRECT, etc. And during a VOP_STRATEGY > read, fusefs needs to use uiomove in order to move data not into the > user's buffer, but from the fuse daemon into the kernel. But that's > not the cause of the livelock, because whether I use uiomove or > vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock. > I'll check b_bcount and b_resid tomorrow. But uiomove call from VOP_STRATEGY() to copy user data into kernel buffer is not prepared for vn_io_fault. I suspect that what happens there is the following: - top level of syscall, like read(2), does vn_read() - vn_read() checks conditions and goes through vn_io_fault, calling VOP_READ() there it prepares prefaulted held pages _for userspace buffer from read(2)_ - your VOP_READ() calls into VOP_STRATEGY() that tries to copy data into kernel. If you use vn_io_fault_uiomove() at this point, it wrongly consumes held pages for unrelated userspace buffer Even if there is some additional bug with b_resid, I suspect that the end result is the mess anyway. You should not use vn_io_fault for recursive accesses to userspace, only for VOP_READ/VOP_WRITE level accesses. From nobody Fri Oct 1 03:59:11 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id BE51917D691F for ; Fri, 1 Oct 2021 03:59:29 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLGZn4zqqz3Nc1 for ; Fri, 1 Oct 2021 03:59:29 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f53.google.com with SMTP id 5-20020a9d0685000000b0054706d7b8e5so9988263otx.3 for ; Thu, 30 Sep 2021 20:59:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7uCSxBuBCFbkgHrEYg+g6dNlO5KJRZKdxFGvm4NWbHw=; b=WwjhoZTBT0eE3+Y8OCH3tXPpgdIH48hSF3GvCulNF+C0rnyQYpN7klIKWFMgWOQ8py RIqUhV1JkP1+xVF5e4mJYHDdm0qkeFnRKhDjM6m0rKJ78pfDHYEHnWtGgY0mYXW6uQjv gLOKJlhv6j7LlTIWgq3ww65r5aumeqGsXi6ReXMo0jEyMRbRceCZMJr75hPQbYNyFfxG pJilwf0d/ort1VeJ6itbHJRtqJL0YDKxrLshzrKPH02SAsvn4+FgylqfTQCbqaloakcc KYVzxbrEWhIqQ7SPDQj98VWRh7x/kHoweTlkrUzOK7UpQFKhVp2jRYPStIZZ+TrKQxLT iYjg== X-Gm-Message-State: AOAM533s1Js1h3+jCX6b98AmqV0Ha1z014jaOLLzJGkd3r7cAQtORVPb uuVmXVGYSPnD0zfrXo5rU7FgKnXTVu3DGK53b/YygoX9 X-Google-Smtp-Source: ABdhPJyJ6Oo3oeA4KFYHiP600F+JADfL1+wZ0FJVX0rDrGEhOmHDVZR9//EhTMMS0PILHKPg8aD4RvrkKfUT3qYzH4o= X-Received: by 2002:a9d:6c91:: with SMTP id c17mr8469643otr.114.1633060762729; Thu, 30 Sep 2021 20:59:22 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Thu, 30 Sep 2021 21:59:11 -0600 Message-ID: Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HLGZn4zqqz3Nc1 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 30, 2021 at 9:53 PM Konstantin Belousov wrote: > > On Thu, Sep 30, 2021 at 09:33:24PM -0600, Alan Somers wrote: > > On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov wrote: > > > > > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > > > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > > > > deadlock described in the comments above vn_io_fault_doio [^1]. I can > > > > reproduce the deadlock readily enough, and the fix seems simple enough > > > > in other filesystems [^2][^3]. But when I try to apply the same fix > > > > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > > > > loops infinitely because it reaches the "redo = true" state. But on > > > > looping, it never attempts to read from fusefs again. Instead, > > > > breadn_flags returns 0 without ever calling bufstrategy, from which I > > > > infer that getblkx returned a block with B_CACHE set. Despite that, > > > > at least one of the requested pages in vfs_bio_getpages fails the > > > > vm_page_all_valid(ma[i]) check. Debugging further is wandering > > > > outside my areas of expertise. Could somebody please give me a tip? > > > > What is supposed to mark those pages as valid? Are there any other > > > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > > > > and nfscl just happened to already meet? > > > > > > > > Grateful for any help, > > > > -Alan > > > > > > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > > > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > > > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b > > > > > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer > > > with B_CACHE set but some pages still not fully valid (except the last > > > page that is allowed to have invalid tail at EOF). > > > > > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages > > > validity bitmap according to the b_resid field of the buffer. Look there, > > > might be you did not set it properly. Practically, both b_resid and b_bcount > > > must be correctly set after io. > > > > > > In fact, after writing out all that, I realized that I am confused > > > by your question. vn_io_fault_uiomove() needs to be used from > > > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is > > > a VOP_STRATEGY() implementation that fullfils the buffer cache requests > > > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs > > > at very different layers of the io stack. Typically, VOP_READ() does > > > bread() which might trigger VOP_STRATEGY() to get the buffer, and then > > > it performs vn_io_fault() to move data from locked buffer to userspace. > > > > > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() > > > does not make sense. > > > > Ahh, that last piece of information is useful. In fusefs, both > > VOP_READ and VOP_STRATEGY can end up going through the same path, > > depending on cache settings, O_DIRECT, etc. And during a VOP_STRATEGY > > read, fusefs needs to use uiomove in order to move data not into the > > user's buffer, but from the fuse daemon into the kernel. But that's > > not the cause of the livelock, because whether I use uiomove or > > vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock. > > I'll check b_bcount and b_resid tomorrow. > > But uiomove call from VOP_STRATEGY() to copy user data into kernel buffer > is not prepared for vn_io_fault. I suspect that what happens there is > the following: > - top level of syscall, like read(2), does vn_read() > - vn_read() checks conditions and goes through vn_io_fault, calling VOP_READ() > there it prepares prefaulted held pages _for userspace buffer from read(2)_ > - your VOP_READ() calls into VOP_STRATEGY() that tries to copy data into > kernel. If you use vn_io_fault_uiomove() at this point, it wrongly > consumes held pages for unrelated userspace buffer > > Even if there is some additional bug with b_resid, I suspect that the > end result is the mess anyway. You should not use vn_io_fault for recursive > accesses to userspace, only for VOP_READ/VOP_WRITE level accesses. How would vn_io_fault_uiomove "wrong consumes held pages"? Are they attached to the uio? Because the call that copies from userspace (/dev/fuse) into the kernel doesn't have access to the "struct buf". From nobody Fri Oct 1 04:04:06 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 49B0617D759D for ; Fri, 1 Oct 2021 04:04:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLGhG0LTKz3Pr9; Fri, 1 Oct 2021 04:04:13 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 191446dU005238 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 1 Oct 2021 07:04:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 191446dU005238 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 191446lU005237; Fri, 1 Oct 2021 07:04:06 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 1 Oct 2021 07:04:06 +0300 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.5 X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on tom.home X-Rspamd-Queue-Id: 4HLGhG0LTKz3Pr9 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 30, 2021 at 09:59:11PM -0600, Alan Somers wrote: > On Thu, Sep 30, 2021 at 9:53 PM Konstantin Belousov wrote: > > > > On Thu, Sep 30, 2021 at 09:33:24PM -0600, Alan Somers wrote: > > > On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov wrote: > > > > > > > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > > > > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > > > > > deadlock described in the comments above vn_io_fault_doio [^1]. I can > > > > > reproduce the deadlock readily enough, and the fix seems simple enough > > > > > in other filesystems [^2][^3]. But when I try to apply the same fix > > > > > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > > > > > loops infinitely because it reaches the "redo = true" state. But on > > > > > looping, it never attempts to read from fusefs again. Instead, > > > > > breadn_flags returns 0 without ever calling bufstrategy, from which I > > > > > infer that getblkx returned a block with B_CACHE set. Despite that, > > > > > at least one of the requested pages in vfs_bio_getpages fails the > > > > > vm_page_all_valid(ma[i]) check. Debugging further is wandering > > > > > outside my areas of expertise. Could somebody please give me a tip? > > > > > What is supposed to mark those pages as valid? Are there any other > > > > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > > > > > and nfscl just happened to already meet? > > > > > > > > > > Grateful for any help, > > > > > -Alan > > > > > > > > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > > > > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > > > > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b > > > > > > > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer > > > > with B_CACHE set but some pages still not fully valid (except the last > > > > page that is allowed to have invalid tail at EOF). > > > > > > > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages > > > > validity bitmap according to the b_resid field of the buffer. Look there, > > > > might be you did not set it properly. Practically, both b_resid and b_bcount > > > > must be correctly set after io. > > > > > > > > In fact, after writing out all that, I realized that I am confused > > > > by your question. vn_io_fault_uiomove() needs to be used from > > > > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is > > > > a VOP_STRATEGY() implementation that fullfils the buffer cache requests > > > > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs > > > > at very different layers of the io stack. Typically, VOP_READ() does > > > > bread() which might trigger VOP_STRATEGY() to get the buffer, and then > > > > it performs vn_io_fault() to move data from locked buffer to userspace. > > > > > > > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() > > > > does not make sense. > > > > > > Ahh, that last piece of information is useful. In fusefs, both > > > VOP_READ and VOP_STRATEGY can end up going through the same path, > > > depending on cache settings, O_DIRECT, etc. And during a VOP_STRATEGY > > > read, fusefs needs to use uiomove in order to move data not into the > > > user's buffer, but from the fuse daemon into the kernel. But that's > > > not the cause of the livelock, because whether I use uiomove or > > > vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock. > > > I'll check b_bcount and b_resid tomorrow. > > > > But uiomove call from VOP_STRATEGY() to copy user data into kernel buffer > > is not prepared for vn_io_fault. I suspect that what happens there is > > the following: > > - top level of syscall, like read(2), does vn_read() > > - vn_read() checks conditions and goes through vn_io_fault, calling VOP_READ() > > there it prepares prefaulted held pages _for userspace buffer from read(2)_ > > - your VOP_READ() calls into VOP_STRATEGY() that tries to copy data into > > kernel. If you use vn_io_fault_uiomove() at this point, it wrongly > > consumes held pages for unrelated userspace buffer > > > > Even if there is some additional bug with b_resid, I suspect that the > > end result is the mess anyway. You should not use vn_io_fault for recursive > > accesses to userspace, only for VOP_READ/VOP_WRITE level accesses. > > How would vn_io_fault_uiomove "wrong consumes held pages"? Are they > attached to the uio? Because the call that copies from userspace > (/dev/fuse) into the kernel doesn't have access to the "struct buf". It is explained in the comment you referenced. Pages from the userspace buffer passed to read(2) or write(2) are held before calling into VOP_READ/VOP_WRITE. They are stored in the array of pages pointed to by curthread->td_ma. vn_io_fault_uiomove() uses that array instead of uio if TDP_UIOHELD is set. From nobody Fri Oct 1 23:13:10 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id AFA9B17D3EAF for ; Fri, 1 Oct 2021 23:13:28 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-ot1-f43.google.com (mail-ot1-f43.google.com [209.85.210.43]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLmBJ4WFtz3hJK for ; Fri, 1 Oct 2021 23:13:28 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-ot1-f43.google.com with SMTP id o59-20020a9d2241000000b0054745f28c69so13342518ota.13 for ; Fri, 01 Oct 2021 16:13:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ssTP6kWyca/2ZWVZclQTG2yHle5HCQIlvKADi/6gfek=; b=calQARxCzu2KLDi5se7JEEfTwRWDtu+/eF0kI/i+OT3sQoM6LoDU4KZayqZ6XK5bCZ coKxGIwyIf0tADXQbTSY6yDOwzecbJM01e1Bz1UnfS70iZKWAdFqBOCwWFtUsK3kZkDS qZGcSnQIbGsrj1JdWNv4J/P2ezztE9qTtgefv2JSQ/pt80106k4T4FpKA7Ama4SoT8Uu ug1lmdOG9Uz1f3tkfY/lFbiPK39fX4rUnxG/b5K0DfDYlAPTWKVf12jjgim7RaY9DhV9 LogZIqGq9+QZAEIOnBtgO0Kpj/cK7sG0mXji42GfdipizwSK41zZZ0Qv6wV4d1MWkBNe RpOA== X-Gm-Message-State: AOAM532FP+st1EfdaIUBGr1iTWU+mIHz3kX9d3sagWwnzL2KnT0ne//V pA5adXTJN9w2PeZzOzy+6MvRnAk/JoZn7AN76Es9dYsY X-Google-Smtp-Source: ABdhPJyCb9qh5xgEt1IuRK+1lFKvzeAJL6SA7I7civ1fW3/cufnP1iWzSt/6CPsXml3MGRJCymUUEvF+nS0eQ3jBbjA= X-Received: by 2002:a05:6830:1509:: with SMTP id k9mr396086otp.111.1633130002135; Fri, 01 Oct 2021 16:13:22 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Fri, 1 Oct 2021 17:13:10 -0600 Message-ID: Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove To: Konstantin Belousov Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4HLmBJ4WFtz3hJK X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 30, 2021 at 10:04 PM Konstantin Belousov wrote: > > On Thu, Sep 30, 2021 at 09:59:11PM -0600, Alan Somers wrote: > > On Thu, Sep 30, 2021 at 9:53 PM Konstantin Belousov wrote: > > > > > > On Thu, Sep 30, 2021 at 09:33:24PM -0600, Alan Somers wrote: > > > > On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov wrote: > > > > > > > > > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > > > > > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > > > > > > deadlock described in the comments above vn_io_fault_doio [^1]. I can > > > > > > reproduce the deadlock readily enough, and the fix seems simple enough > > > > > > in other filesystems [^2][^3]. But when I try to apply the same fix > > > > > > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > > > > > > loops infinitely because it reaches the "redo = true" state. But on > > > > > > looping, it never attempts to read from fusefs again. Instead, > > > > > > breadn_flags returns 0 without ever calling bufstrategy, from which I > > > > > > infer that getblkx returned a block with B_CACHE set. Despite that, > > > > > > at least one of the requested pages in vfs_bio_getpages fails the > > > > > > vm_page_all_valid(ma[i]) check. Debugging further is wandering > > > > > > outside my areas of expertise. Could somebody please give me a tip? > > > > > > What is supposed to mark those pages as valid? Are there any other > > > > > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > > > > > > and nfscl just happened to already meet? > > > > > > > > > > > > Grateful for any help, > > > > > > -Alan > > > > > > > > > > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > > > > > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > > > > > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b > > > > > > > > > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer > > > > > with B_CACHE set but some pages still not fully valid (except the last > > > > > page that is allowed to have invalid tail at EOF). > > > > > > > > > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages > > > > > validity bitmap according to the b_resid field of the buffer. Look there, > > > > > might be you did not set it properly. Practically, both b_resid and b_bcount > > > > > must be correctly set after io. > > > > > > > > > > In fact, after writing out all that, I realized that I am confused > > > > > by your question. vn_io_fault_uiomove() needs to be used from > > > > > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is > > > > > a VOP_STRATEGY() implementation that fullfils the buffer cache requests > > > > > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs > > > > > at very different layers of the io stack. Typically, VOP_READ() does > > > > > bread() which might trigger VOP_STRATEGY() to get the buffer, and then > > > > > it performs vn_io_fault() to move data from locked buffer to userspace. > > > > > > > > > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() > > > > > does not make sense. > > > > > > > > Ahh, that last piece of information is useful. In fusefs, both > > > > VOP_READ and VOP_STRATEGY can end up going through the same path, > > > > depending on cache settings, O_DIRECT, etc. And during a VOP_STRATEGY > > > > read, fusefs needs to use uiomove in order to move data not into the > > > > user's buffer, but from the fuse daemon into the kernel. But that's > > > > not the cause of the livelock, because whether I use uiomove or > > > > vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock. > > > > I'll check b_bcount and b_resid tomorrow. > > > > > > But uiomove call from VOP_STRATEGY() to copy user data into kernel buffer > > > is not prepared for vn_io_fault. I suspect that what happens there is > > > the following: > > > - top level of syscall, like read(2), does vn_read() > > > - vn_read() checks conditions and goes through vn_io_fault, calling VOP_READ() > > > there it prepares prefaulted held pages _for userspace buffer from read(2)_ > > > - your VOP_READ() calls into VOP_STRATEGY() that tries to copy data into > > > kernel. If you use vn_io_fault_uiomove() at this point, it wrongly > > > consumes held pages for unrelated userspace buffer > > > > > > Even if there is some additional bug with b_resid, I suspect that the > > > end result is the mess anyway. You should not use vn_io_fault for recursive > > > accesses to userspace, only for VOP_READ/VOP_WRITE level accesses. > > > > How would vn_io_fault_uiomove "wrong consumes held pages"? Are they > > attached to the uio? Because the call that copies from userspace > > (/dev/fuse) into the kernel doesn't have access to the "struct buf". > It is explained in the comment you referenced. > > Pages from the userspace buffer passed to read(2) or write(2) are held > before calling into VOP_READ/VOP_WRITE. They are stored in the array > of pages pointed to by curthread->td_ma. vn_io_fault_uiomove() uses that > array instead of uio if TDP_UIOHELD is set. So I've been looking at the read operations all of this time, but the problematic buffer was actually the BIO_WRITE one. It has B_CACHE but none of its page are valid. Apparently, VOP_WRITE is supposed to call vfs_bio_clrbuf() after allocating a new buffer to set those valid bits? But it never mattered before. From nobody Sat Oct 2 00:43:47 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E7E3317DDD48 for ; Sat, 2 Oct 2021 00:43:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HLpBh4YhXz3rCb; Sat, 2 Oct 2021 00:43:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.16.1/8.16.1) with ESMTPS id 1920hlsM012279 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Sat, 2 Oct 2021 03:43:50 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua 1920hlsM012279 Received: (from kostik@localhost) by tom.home (8.16.1/8.16.1/Submit) id 1920hlAQ012278; Sat, 2 Oct 2021 03:43:47 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 2 Oct 2021 03:43:47 +0300 From: Konstantin Belousov To: Alan Somers Cc: FreeBSD Hackers Subject: Re: livelock in vfs_bio_getpages with vn_io_fault_uiomove Message-ID: References: List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.5 X-Spam-Checker-Version: SpamAssassin 3.4.5 (2021-03-20) on tom.home X-Rspamd-Queue-Id: 4HLpBh4YhXz3rCb X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: N On Fri, Oct 01, 2021 at 05:13:10PM -0600, Alan Somers wrote: > On Thu, Sep 30, 2021 at 10:04 PM Konstantin Belousov > wrote: > > > > On Thu, Sep 30, 2021 at 09:59:11PM -0600, Alan Somers wrote: > > > On Thu, Sep 30, 2021 at 9:53 PM Konstantin Belousov wrote: > > > > > > > > On Thu, Sep 30, 2021 at 09:33:24PM -0600, Alan Somers wrote: > > > > > On Thu, Sep 30, 2021 at 9:08 PM Konstantin Belousov wrote: > > > > > > > > > > > > On Thu, Sep 30, 2021 at 03:11:02PM -0600, Alan Somers wrote: > > > > > > > I'm trying to adapt fusefs to use vn_io_fault_uiomove to fix the > > > > > > > deadlock described in the comments above vn_io_fault_doio [^1]. I can > > > > > > > reproduce the deadlock readily enough, and the fix seems simple enough > > > > > > > in other filesystems [^2][^3]. But when I try to apply the same fix > > > > > > > to fusefs, the deadlock changes into a livelock. vfs_bio_getpages > > > > > > > loops infinitely because it reaches the "redo = true" state. But on > > > > > > > looping, it never attempts to read from fusefs again. Instead, > > > > > > > breadn_flags returns 0 without ever calling bufstrategy, from which I > > > > > > > infer that getblkx returned a block with B_CACHE set. Despite that, > > > > > > > at least one of the requested pages in vfs_bio_getpages fails the > > > > > > > vm_page_all_valid(ma[i]) check. Debugging further is wandering > > > > > > > outside my areas of expertise. Could somebody please give me a tip? > > > > > > > What is supposed to mark those pages as valid? Are there any other > > > > > > > undocumented conditions needed to use vn_io_fault_uiomove that msdosfs > > > > > > > and nfscl just happened to already meet? > > > > > > > > > > > > > > Grateful for any help, > > > > > > > -Alan > > > > > > > > > > > > > > [^1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238340 > > > > > > > [^2] https://github.com/freebsd/freebsd-src/commit/2aa3944510b50cbe6999344985a5a9c3208063b2 > > > > > > > [^3] https://github.com/freebsd/freebsd-src/commit/ddfc47fdc98460b757c6d1dbe4562a0a339f228b > > > > > > > > > > > > Perhaps first you need to confirm that vfs_bio_getpages() sees a buffer > > > > > > with B_CACHE set but some pages still not fully valid (except the last > > > > > > page that is allowed to have invalid tail at EOF). > > > > > > > > > > > > When buffer strategy read returns, vfs_vmio_iodone() updates the pages > > > > > > validity bitmap according to the b_resid field of the buffer. Look there, > > > > > > might be you did not set it properly. Practically, both b_resid and b_bcount > > > > > > must be correctly set after io. > > > > > > > > > > > > In fact, after writing out all that, I realized that I am confused > > > > > > by your question. vn_io_fault_uiomove() needs to be used from > > > > > > VOP_READ/VOP_WRITE. If filesystem utilizes buffer cache, then there is > > > > > > a VOP_STRATEGY() implementation that fullfils the buffer cache requests > > > > > > for buffers reads and writes. VOP_READ and VOP_STRATEGY simply occurs > > > > > > at very different layers of the io stack. Typically, VOP_READ() does > > > > > > bread() which might trigger VOP_STRATEGY() to get the buffer, and then > > > > > > it performs vn_io_fault() to move data from locked buffer to userspace. > > > > > > > > > > > > The fact that your addition of vn_io_fault breaks something in VOP_STRATEGY() > > > > > > does not make sense. > > > > > > > > > > Ahh, that last piece of information is useful. In fusefs, both > > > > > VOP_READ and VOP_STRATEGY can end up going through the same path, > > > > > depending on cache settings, O_DIRECT, etc. And during a VOP_STRATEGY > > > > > read, fusefs needs to use uiomove in order to move data not into the > > > > > user's buffer, but from the fuse daemon into the kernel. But that's > > > > > not the cause of the livelock, because whether I use uiomove or > > > > > vn_io_fault_uiomove during VOP_STRATEGY I still get the same livelock. > > > > > I'll check b_bcount and b_resid tomorrow. > > > > > > > > But uiomove call from VOP_STRATEGY() to copy user data into kernel buffer > > > > is not prepared for vn_io_fault. I suspect that what happens there is > > > > the following: > > > > - top level of syscall, like read(2), does vn_read() > > > > - vn_read() checks conditions and goes through vn_io_fault, calling VOP_READ() > > > > there it prepares prefaulted held pages _for userspace buffer from read(2)_ > > > > - your VOP_READ() calls into VOP_STRATEGY() that tries to copy data into > > > > kernel. If you use vn_io_fault_uiomove() at this point, it wrongly > > > > consumes held pages for unrelated userspace buffer > > > > > > > > Even if there is some additional bug with b_resid, I suspect that the > > > > end result is the mess anyway. You should not use vn_io_fault for recursive > > > > accesses to userspace, only for VOP_READ/VOP_WRITE level accesses. > > > > > > How would vn_io_fault_uiomove "wrong consumes held pages"? Are they > > > attached to the uio? Because the call that copies from userspace > > > (/dev/fuse) into the kernel doesn't have access to the "struct buf". > > It is explained in the comment you referenced. > > > > Pages from the userspace buffer passed to read(2) or write(2) are held > > before calling into VOP_READ/VOP_WRITE. They are stored in the array > > of pages pointed to by curthread->td_ma. vn_io_fault_uiomove() uses that > > array instead of uio if TDP_UIOHELD is set. > > So I've been looking at the read operations all of this time, but the > problematic buffer was actually the BIO_WRITE one. It has B_CACHE but > none of its page are valid. Apparently, VOP_WRITE is supposed to call > vfs_bio_clrbuf() after allocating a new buffer to set those valid > bits? But it never mattered before. I do not understand your question. If B_CACHE is set, all pages in the buffer should be valid, i.e. m->valid bitset should be all 1's _and_ the actual page content should be valid. Just populating the bitset without populating page content means that the user data is corrupted. VOP_WRITE() needs to instantiate the buffer which is written to, and there are two possibilities: - the write request covers the whole buffer. In this case there is no need to read the old content. If the write fails for any reason, e.g. due to uiomove() reporting EFAULT, and there were not B_CACHE set, the buffer is typically invalidated by setting B_INVAL | B_RELBUF | B_NOCACHE and the brelse()ing it. If the buffer had the valid content before uiomove(), it is typically worth to try to save the data, but everything should be valid unless it is invalidated. - if the write request does not cover the whole buffer, it must be read. For instance, FFS specifies BA_CLRBUF to UFS_BALLOC() for partial write, indicating that returned buffer must be read. On uiomove() fault, FFS zeroes non-validated parts of the buffer, de-facto considering unreadable user buffer as zero-filled.