From nobody Wed Apr 10 12:44:59 2024 X-Original-To: net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VF2br1PBdz5H3fW; Wed, 10 Apr 2024 12:45:12 +0000 (UTC) (envelope-from eduardo@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VF2br16Sjz4THK; Wed, 10 Apr 2024 12:45:12 +0000 (UTC) (envelope-from eduardo@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1712753112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=doNA/KZxMAy9H2qRyYQFZaDJlTs3p0PtQwPZnbswvE0=; b=JphAbMSEVC8ZZZFhwP6xtwOfEOROF/BHSi/D+Om7Mux0qwl/BtHW5ywz9tBvvPOWp3rtlV JI1sLkL1KgaGsU33X2t1PRl7Is3s7KlJXkxlrbgd1M5cr+iRtkAFPTjWGnNvtm2g1RxYvU mpv3pkxLNXvKZRCpt9uhue5GVEkbeNn15oCFp4BCMai6GvTokhQyd5KfXMbV1FUFtz3tQn 4JMqNrC71okwFHyTXFbmkyVnkVz4j/eUzDPqL+bFVjzyF0HbrKW96MxhrK2lEthya65lnk vOlrntCHD/6WFuIAH6yan7crn38WHFb8JUUiH8ER0eNHjIV08kAXfk8N549k6g== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1712753112; a=rsa-sha256; cv=none; b=E8w3Mtm8iYk8UsPO3ue9/8qfISFftmmrL2mJOrDn9vcYUobCxa2k9E2sBIpXnNVg+H6Fju u+uDE/SXIJRegF/R5eOl3aw4paJGcZEx02Ffu5rNZ0cpkOOag2jPpb5F42WTcGoa3teeo9 vbgwv9YSkCGOZ2NcTZrDnb7SIgkkruJB1pK1touIWWi6CiE5jY3ppHuvTCLHu9hV9UeeVO nQIgNugZUsKD2fdFD4ZpJ+OmOH7JgEVOsuY2p4Hb1VF/HtlKOOMX7NAdY7PMo+QyIZQPS6 RIVkTQL+/bdEstA+kGLxqmaaIhp+fAbxbFD/kxlRDrPZsEgNt6YUKVK9ecK1HQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1712753112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=doNA/KZxMAy9H2qRyYQFZaDJlTs3p0PtQwPZnbswvE0=; b=ZZXIQOIVdwGa83vXPnjexa5JqOD9ckzyoSCXE0NZRAq4gOb7Hy97RIA4wbjLi+DyUcgQam MoiCcjQbOvHHWDpDOqY5xWLHCBMl8geC8kLs8d742RV3K5P0y6oBBABq6D+f2/iZWjvCHa 7XDt69SjXbzHILEr8VBlbDe1TrCAjGjYXapcN+mk9kQldEuQa18u0nPDzShXnS0WI31ZJb jNYN9OifIJaaqmZ/3vYhVtNq6rCfxPWbUbw2zZK5Mn+jcGwelO8VZk7YUZlKJx2NY+Lr6R ocjmOueFquNoZjwEwhoQXJZ/io2YC8J77yYerCoLTLlpWy0gGWAJt/ytm/DnSg== Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) (Authenticated sender: eduardo) by smtp.freebsd.org (Postfix) with ESMTPSA id 4VF2br0bCqzM4T; Wed, 10 Apr 2024 12:45:12 +0000 (UTC) (envelope-from eduardo@freebsd.org) Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-434a5c9b998so11729521cf.0; Wed, 10 Apr 2024 05:45:12 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCWMb+n1GCBadWwfygFWv59jGFdvPUf4JARMy9xJoF3sVk8UMMzoFie93KCO9xbmeIl/oClhQldrxllYrp+2uMXfAfkz0zfl1oLkbAO6zjAwW+aR28xXaet9E8kbhh67iqp2wTPcSFjpNE/QxPHXLmvqic3Dikmo X-Gm-Message-State: AOJu0YyvBSdAVP7G7MDgkYda5WTrCpjrnKM7LsUSnRA1+9dSFhEb5Ilh rQy8JjWMGO1pk/jyhxV+USmxL97KZMx7Gr4v4GFvf7f5ff0ByQUD8nyQhyDlT99k/pip05nAYIt Ky2HJ/nziQIkuaN5ga3Gp4hfGyfU= X-Google-Smtp-Source: AGHT+IG28PXFtdY1WZ5r6onFzj1ABQIX6rYXb2ZGBE10CiwM7XZOsZlIo+IKvqixNfAvKXVIMzj+/YQu6J19LaTKwJU= X-Received: by 2002:ac8:7fca:0:b0:434:cebd:9551 with SMTP id b10-20020ac87fca000000b00434cebd9551mr2490104qtk.27.1712753111088; Wed, 10 Apr 2024 05:45:11 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: <6e795e9c-8de4-4e02-9a96-8fabfaa4e66f@app.fastmail.com> <6047C8EF-B1B0-4286-93FA-AA38F8A18656@karels.net> <8031cd99-ded8-4b06-93b3-11cc729a8b2c@app.fastmail.com> <38c54399-6c96-44d8-a3a2-3cc1bfbe50c2@app.fastmail.com> <27d8144f-0658-46f6-b8f3-35eb60061644@lakerest.net> <5C9863F7-0F1C-4D02-9F6D-9DDC5FBEB368@freebsd.org> <52479AA6-04F6-4D4A-ABE0-7142B47E28DF@freebsd.org> In-Reply-To: From: Nuno Teixeira Date: Wed, 10 Apr 2024 13:44:59 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Request for Testing: TCP RACK To: tuexen@freebsd.org Cc: Drew Gallatin , Konstantin Belousov , rrs , Mike Karels , garyj@gmx.de, current@freebsd.org, net@freebsd.org, Randall Stewart Content-Type: multipart/alternative; boundary="000000000000c64a570615bd6820" --000000000000c64a570615bd6820 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable (...) Backup server is https://www.rsync.net/ (free 500GB for FreeBSD developers). Nuno Teixeira escreveu (quarta, 10/04/2024 =C3=A0(s) 13:39): > With base stack I can complete restic check successfully > downloading/reading/checking all files from a "big" remote compressed > backup. > Changing it to RACK stack, it fails. > > I run this command often because in the past, compression corruption > occured and this is the equivalent of restoring backup to check its > integrity. > > Maybe someone could do a restic test to check if this is reproducible. > > Thanks, > > > > escreveu (quarta, 10/04/2024 =C3=A0(s) 13:12): > >> >> >> > On 10. Apr 2024, at 13:40, Nuno Teixeira wrote: >> > >> > Hello all, >> > >> > @ current 1500018 and fetching torrents with net-p2p/qbittorrent >> finished ~2GB download and connection UP until the end: >> > >> > --- >> > Apr 10 11:26:46 leg kernel: re0: watchdog timeout >> > Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN >> > Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.6= 7 >> > Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): >> 255.255.255.0 >> > Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0): >> 192.168.1.255 >> > Apr 10 11:26:49 leg kernel: re0: link state changed to UP >> > Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1 >> > --- >> > >> > In the past tests, I've got more watchdog timeouts, connection goes >> down and a reboot needed to put it back (`service netif restart` didn't >> work). >> > >> > Other way to reproduce this is using sysutils/restic (backup program) >> to read/check all files from a remote server via sftp: >> > >> > `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB >> compressed backup. >> > >> > --- >> > watchdog timeout x3 as above >> > --- >> > >> > restic check fail log @ 15% progress: >> > --- >> > >> > Load(, 17310001, 0) returned error, retrying after >> 1.7670599s: connection lost >> > Load(, 17456892, 0) returned error, retrying after >> 4.619104908s: connection lost >> > Load(, 17310001, 0) returned error, retrying after >> 5.477648517s: connection lost >> > List(lock) returned error, retrying after 293.057766ms: connection los= t >> > List(lock) returned error, retrying after 385.206693ms: connection los= t >> > List(lock) returned error, retrying after 1.577594281s: connection los= t >> > >> > >> > Connection continues UP. >> Hi, >> >> I'm not sure what the issue is you are reporting. Could you state >> what behavior you are experiencing with the base stack and with >> the RACK stack. In particular, what the difference is? >> >> Best regards >> Michael >> > >> > Cheers, >> > >> > escreveu (quinta, 28/03/2024 =C3=A0(s) 15:53): >> >> On 28. Mar 2024, at 15:00, Nuno Teixeira wrote: >> >> >> >> Hello all! >> >> >> >> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop >> (amd64)! >> >> >> >> Thanks all! >> > Thanks for the feedback! >> > >> > Best regards >> > Michael >> >> >> >> Drew Gallatin escreveu (quinta, 21/03/2024 >> =C3=A0(s) 12:58): >> >> The entire point is to *NOT* go through the overhead of scheduling >> something asynchronously, but to take advantage of the fact that a >> user/kernel transition is going to trash the cache anyway. >> >> >> >> In the common case of a system which has less than the threshold >> number of connections , we access the tcp_hpts_softclock function pointe= r, >> make one function call, and access hpts_that_need_softclock, and then >> return. So that's 2 variables and a function call. >> >> >> >> I think it would be preferable to avoid that call, and to move the >> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that t= hey >> are in the same cacheline. Then we'd be hitting just a single line in t= he >> common case. (I've made comments on the review to that effect). >> >> >> >> Also, I wonder if the threshold could get higher by default, so that >> hpts is never called in this context unless we're to the point where we'= re >> scheduling thousands of runs of the hpts thread (and taking all those cl= ock >> interrupts). >> >> >> >> Drew >> >> >> >> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: >> >>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: >> >>>> Ok I have created >> >>>> >> >>>> https://reviews.freebsd.org/D44420 >> >>>> >> >>>> >> >>>> To address the issue. I also attach a short version of the patch >> that Nuno >> >>>> can try and validate >> >>>> >> >>>> it works. Drew you may want to try this and validate the >> optimization does >> >>>> kick in since I can >> >>>> >> >>>> only now test that it does not on my local box :) >> >>> The patch still causes access to all cpu's cachelines on each userre= t. >> >>> It would be much better to inc/check the threshold and only schedule >> the >> >>> call when exceeded. Then the call can occur in some dedicated >> context, >> >>> like per-CPU thread, instead of userret. >> >>> >> >>>> >> >>>> >> >>>> R >> >>>> >> >>>> >> >>>> >> >>>> On 3/18/24 3:42 PM, Drew Gallatin wrote: >> >>>>> No. The goal is to run on every return to userspace for every >> thread. >> >>>>> >> >>>>> Drew >> >>>>> >> >>>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: >> >>>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: >> >>>>>>> I got the idea from >> >>>>>>> >> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf >> >>>>>>> The gist is that the TCP pacing stuff needs to run frequently, a= nd >> >>>>>>> rather than run it out of a clock interrupt, its more efficient >> to run >> >>>>>>> it out of a system call context at just the point where we retur= n >> to >> >>>>>>> userspace and the cache is trashed anyway. The current >> implementation >> >>>>>>> is fine for our workload, but probably not idea for a generic >> system. >> >>>>>>> Especially one where something is banging on system calls. >> >>>>>>> >> >>>>>>> Ast's could be the right tool for this, but I'm super unfamiliar >> with >> >>>>>>> them, and I can't find any docs on them. >> >>>>>>> >> >>>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalen= t >> to >> >>>>>>> what's happening here? >> >>>>>> This call would need some AST number added, and then it registers >> the >> >>>>>> ast to run on next return to userspace, for the current thread. >> >>>>>> >> >>>>>> Is it enough? >> >>>>>>> >> >>>>>>> Drew >> >>>>>> >> >>>>>>> >> >>>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: >> >>>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: >> >>>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: >> >>>>>>>>> >> >>>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira >> >>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>> Hello all! >> >>>>>>>>>>> >> >>>>>>>>>>> It works just fine! >> >>>>>>>>>>> System performance is OK. >> >>>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty). >> >>>>>>>>>>> >> >>>>>>>>>>> --- >> >>>>>>>>>>> net.inet.tcp.functions_available: >> >>>>>>>>>>> Stack D >> >>>>>> Alias PCB count >> >>>>>>>>>>> freebsd freebsd 0 >> >>>>>>>>>>> rack * >> >>>>>> rack 38 >> >>>>>>>>>>> --- >> >>>>>>>>>>> >> >>>>>>>>>>> It would be so nice that we can have a sysctl tunnable for >> >>>>>> this patch >> >>>>>>>>>>> so we could do more tests without recompiling kernel. >> >>>>>>>>>> Thanks for testing! >> >>>>>>>>>> >> >>>>>>>>>> @gallatin: can you come up with a patch that is acceptable >> >>>>>> for Netflix >> >>>>>>>>>> and allows to mitigate the performance regression. >> >>>>>>>>> >> >>>>>>>>> Ideally, tcphpts could enable this automatically when it >> >>>>>> starts to be >> >>>>>>>>> used (enough?), but a sysctl could select auto/on/off. >> >>>>>>>> There is already a well-known mechanism to request execution of >> the >> >>>>>>>> specific function on return to userspace, namely AST. The >> difference >> >>>>>>>> with the current hack is that the execution is requested for on= e >> >>>>>> callback >> >>>>>>>> in the context of the specific thread. >> >>>>>>>> >> >>>>>>>> Still, it might be worth a try to use it; what is the reason to >> >>>>>> hit a thread >> >>>>>>>> that does not do networking, with TCP processing? >> >>>>>>>> >> >>>>>>>>> >> >>>>>>>>> Mike >> >>>>>>>>> >> >>>>>>>>>> Best regards >> >>>>>>>>>> Michael >> >>>>>>>>>>> >> >>>>>>>>>>> Thanks all! >> >>>>>>>>>>> Really happy here :) >> >>>>>>>>>>> >> >>>>>>>>>>> Cheers, >> >>>>>>>>>>> >> >>>>>>>>>>> Nuno Teixeira escreveu (domingo, >> >>>>>> 17/03/2024 =C3=A0(s) 20:26): >> >>>>>>>>>>>> >> >>>>>>>>>>>> Hello, >> >>>>>>>>>>>> >> >>>>>>>>>>>>> I don't have the full context, but it seems like the >> >>>>>> complaint is a performance regression in bonnie++ and perhaps oth= er >> >>>>>> things when tcp_hpts is loaded, even when it is not used. Is tha= t >> >>>>>> correct? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> If so, I suspect its because we drive the >> >>>>>> tcp_hpts_softclock() routine from userret(), in order to avoid to= ns >> >>>>>> of timer interrupts and context switches. To test this theory, >> you >> >>>>>> could apply a patch like: >> >>>>>>>>>>>> >> >>>>>>>>>>>> It's affecting overall system performance, bonnie was just >> >>>>>> a way to >> >>>>>>>>>>>> get some numbers to compare. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Tomorrow I will test patch. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks! >> >>>>>>>>>>>> >> >>>>>>>>>>>> -- >> >>>>>>>>>>>> Nuno Teixeira >> >>>>>>>>>>>> FreeBSD Committer (ports) >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> -- >> >>>>>>>>>>> Nuno Teixeira >> >>>>>>>>>>> FreeBSD Committer (ports) >> >>>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>> >> >>> >> >>>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c >> >>>> index 8c4d2d41a3eb..eadbee19f69c 100644 >> >>>> --- a/sys/netinet/tcp_hpts.c >> >>>> +++ b/sys/netinet/tcp_hpts.c >> >>>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry { >> >>>> void *ie_cookie; >> >>>> uint16_t p_num; /* The hpts number one per cpu */ >> >>>> uint16_t p_cpu; /* The hpts CPU */ >> >>>> + uint8_t hit_callout_thresh; >> >>>> /* There is extra space in here */ >> >>>> /* Cache line 0x100 */ >> >>>> struct callout co __aligned(CACHE_LINE_SIZE); >> >>>> @@ -269,6 +270,11 @@ static struct hpts_domain_info { >> >>>> int cpu[MAXCPU]; >> >>>> } hpts_domains[MAXMEMDOM]; >> >>>> >> >>>> +counter_u64_t hpts_that_need_softclock; >> >>>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, >> needsoftclock, CTLFLAG_RD, >> >>>> + &hpts_that_need_softclock, >> >>>> + "Number of hpts threads that need softclock"); >> >>>> + >> >>>> counter_u64_t hpts_hopelessly_behind; >> >>>> >> >>>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, >> CTLFLAG_RD, >> >>>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, >> precision, CTLFLAG_RW, >> >>>> &tcp_hpts_precision, 120, >> >>>> "Value for PRE() precision of callout"); >> >>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, >> >>>> - &conn_cnt_thresh, 0, >> >>>> + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, >> >>>> "How many connections (below) make us use the callout based >> mechanism"); >> >>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, >> >>>> &hpts_does_tp_logging, 0, >> >>>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) >> >>>> struct tcp_hpts_entry *hpts; >> >>>> int ticks_ran; >> >>>> >> >>>> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0) >> >>>> + return; >> >>>> + >> >>>> hpts =3D tcp_choose_hpts_to_run(); >> >>>> >> >>>> if (hpts->p_hpts_active) { >> >>>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) >> >>>> ticks_ran =3D tcp_hptsi(hpts, 1); >> >>>> tv.tv_sec =3D 0; >> >>>> tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; >> >>>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && >> (hpts->hit_callout_thresh =3D=3D 0)) { >> >>>> + hpts->hit_callout_thresh =3D 1; >> >>>> + counter_u64_add(hpts_that_need_softclock, 1); >> >>>> + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thresh) && >> (hpts->hit_callout_thresh =3D=3D 1)) { >> >>>> + hpts->hit_callout_thresh =3D 0; >> >>>> + counter_u64_add(hpts_that_need_softclock, -1); >> >>>> + } >> >>>> if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) { >> >>>> if(hpts->p_direct_wake =3D=3D 0) { >> >>>> /* >> >>>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) >> >>>> cpu_top =3D NULL; >> >>>> #endif >> >>>> tcp_pace.rp_num_hptss =3D ncpus; >> >>>> + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK); >> >>>> hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK); >> >>>> hpts_loops =3D counter_u64_alloc(M_WAITOK); >> >>>> back_tosleep =3D counter_u64_alloc(M_WAITOK); >> >>>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) >> >>>> free(tcp_pace.grps, M_TCPHPTS); >> >>>> #endif >> >>>> >> >>>> + counter_u64_free(hpts_that_need_softclock); >> >>>> counter_u64_free(hpts_hopelessly_behind); >> >>>> counter_u64_free(hpts_loops); >> >>>> counter_u64_free(back_tosleep); >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> Nuno Teixeira >> >> FreeBSD Committer (ports) >> > >> > >> > >> > -- >> > Nuno Teixeira >> > FreeBSD Committer (ports) >> >> > > -- > Nuno Teixeira > FreeBSD Committer (ports) > --=20 Nuno Teixeira FreeBSD Committer (ports) --000000000000c64a570615bd6820 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
(...)

Backup server is https://www.rsync.net/ (free 500GB for Fr= eeBSD developers).

Nuno Teixeira <eduardo@freebsd.org> escreveu (quarta, 10/04/2024 =C3=A0= (s) 13:39):
With base stack I can complete restic check successfully = downloading/reading/checking all files from a "big" remote compre= ssed backup.
Changing it to RACK stack, it fails.

<= /div>
I run this command often because in the past, compression corrupt= ion occured and this is the equivalent of restoring backup to check its int= egrity.

Maybe someone could do a restic test to ch= eck if this is reproducible.

Thanks,



<tuexen@freebsd.org> escreveu (quarta, 10/04/2024 =C3=A0(= s) 13:12):


> On 10. Apr 2024, at 13:40, Nuno Teixeira <eduardo@freebsd.org> wrote:
>
> Hello all,
>
> @ current 1500018 and fetching torrents with net-p2p/qbittorrent finis= hed ~2GB download and connection UP until the end:
>
> ---
> Apr 10 11:26:46 leg kernel: re0: watchdog timeout
> Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
> Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.6= 7
> Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.25= 5.0
> Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0): 192.= 168.1.255
> Apr 10 11:26:49 leg kernel: re0: link state changed to UP
> Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1 > ---
>
> In the past tests, I've got more watchdog timeouts, connection goe= s down and a reboot needed to put it back (`service netif restart` didn'= ;t work).
>
> Other way to reproduce this is using sysutils/restic (backup program) = to read/check all files from a remote server via sftp:
>
> `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB= compressed backup.
>
> ---
> watchdog timeout x3 as above
> ---
>
> restic check fail log @ 15% progress:
> ---
> <snip>
> Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying af= ter 1.7670599s: connection lost
> Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying af= ter 4.619104908s: connection lost
> Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying af= ter 5.477648517s: connection lost
> List(lock) returned error, retrying after 293.057766ms: connection los= t
> List(lock) returned error, retrying after 385.206693ms: connection los= t
> List(lock) returned error, retrying after 1.577594281s: connection los= t
> <snip>
>
> Connection continues UP.
Hi,

I'm not sure what the issue is you are reporting. Could you state
what behavior you are experiencing with the base stack and with
the RACK stack. In particular, what the difference is?

Best regards
Michael
>
> Cheers,
>
> <tuexen@fre= ebsd.org> escreveu (quinta, 28/03/2024 =C3=A0(s) 15:53):
>> On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote:
>>
>> Hello all!
>>
>> Running rack @b7b78c1c169 "Optimize HPTS..." very happy = on my laptop (amd64)!
>>
>> Thanks all!
> Thanks for the feedback!
>
> Best regards
> Michael
>>
>> Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 =C3= =A0(s) 12:58):
>> The entire point is to *NOT* go through the overhead of scheduling= something asynchronously, but to take advantage of the fact that a user/ke= rnel transition is going to trash the cache anyway.
>>
>> In the common case of a system which has less than the threshold= =C2=A0 number of connections , we access the tcp_hpts_softclock function po= inter, make one function call, and access hpts_that_need_softclock, and the= n return.=C2=A0 So that's 2 variables and a function call.
>>
>> I think it would be preferable to avoid that call, and to move the= declaration of tcp_hpts_softclock and hpts_that_need_softclock so that the= y are in the same cacheline.=C2=A0 Then we'd be hitting just a single l= ine in the common case.=C2=A0 (I've made comments on the review to that= effect).
>>
>> Also, I wonder if the threshold could get higher by default, so th= at hpts is never called in this context unless we're to the point where= we're scheduling thousands of runs of the hpts thread (and taking all = those clock interrupts).
>>
>> Drew
>>
>> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
>>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
>>>> Ok I have created
>>>>
>>>> https://reviews.freebsd.org/D44420
>>>>
>>>>
>>>> To address the issue. I also attach a short version of the= patch that Nuno
>>>> can try and validate
>>>>
>>>> it works. Drew you may want to try this and validate the o= ptimization does
>>>> kick in since I can
>>>>
>>>> only now test that it does not on my local box :)
>>> The patch still causes access to all cpu's cachelines on e= ach userret.
>>> It would be much better to inc/check the threshold and only sc= hedule the
>>> call when exceeded.=C2=A0 Then the call can occur in some dedi= cated context,
>>> like per-CPU thread, instead of userret.
>>>
>>>>
>>>>
>>>> R
>>>>
>>>>
>>>>
>>>> On 3/18/24 3:42 PM, Drew Gallatin wrote:
>>>>> No.=C2=A0 The goal is to run on every return to usersp= ace for every thread.
>>>>>
>>>>> Drew
>>>>>
>>>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov = wrote:
>>>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gal= latin wrote:
>>>>>>> I got the idea from
>>>>>>> h= ttps://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf >>>>>>> The gist is that the TCP pacing stuff needs to= run frequently, and
>>>>>>> rather than run it out of a clock interrupt, i= ts more efficient to run
>>>>>>> it out of a system call context at just the po= int where we return to
>>>>>>> userspace and the cache is trashed anyway. The= current implementation
>>>>>>> is fine for our workload, but probably not ide= a for a generic system.
>>>>>>> Especially one where something is banging on s= ystem calls.
>>>>>>>
>>>>>>> Ast's could be the right tool for this, bu= t I'm super unfamiliar with
>>>>>>> them, and I can't find any docs on them. >>>>>>>
>>>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be= roughly equivalent to
>>>>>>> what's happening here?
>>>>>> This call would need some AST number added, and th= en it registers the
>>>>>> ast to run on next return to userspace, for the cu= rrent thread.
>>>>>>
>>>>>> Is it enough?
>>>>>>>
>>>>>>> Drew
>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin B= elousov wrote:
>>>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, = Mike Karels wrote:
>>>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: >>>>>>>>>
>>>>>>>>>>> On 18. Mar 2024, at 12:42, Nun= o Teixeira
>>>>>> <eduardo@freebsd.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello all!
>>>>>>>>>>>
>>>>>>>>>>> It works just fine!
>>>>>>>>>>> System performance is OK.
>>>>>>>>>>> Using patch on main-n268841-b0= aaf8beb126(-dirty).
>>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> net.inet.tcp.functions_availab= le:
>>>>>>>>>>> Stack=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0D<= br> >>>>>> Alias=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PCB count
>>>>>>>>>>> freebsd freebsd=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 0
>>>>>>>>>>> rack=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *=
>>>>>> rack=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A038
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> It would be so nice that we ca= n have a sysctl tunnable for
>>>>>> this patch
>>>>>>>>>>> so we could do more tests with= out recompiling kernel.
>>>>>>>>>> Thanks for testing!
>>>>>>>>>>
>>>>>>>>>> @gallatin: can you come up with a = patch that is acceptable
>>>>>> for Netflix
>>>>>>>>>> and allows to mitigate the perform= ance regression.
>>>>>>>>>
>>>>>>>>> Ideally, tcphpts could enable this aut= omatically when it
>>>>>> starts to be
>>>>>>>>> used (enough?), but a sysctl could sel= ect auto/on/off.
>>>>>>>> There is already a well-known mechanism to= request execution of the
>>>>>>>> specific function on return to userspace, = namely AST.=C2=A0 The difference
>>>>>>>> with the current hack is that the executio= n is requested for one
>>>>>> callback
>>>>>>>> in the context of the specific thread.
>>>>>>>>
>>>>>>>> Still, it might be worth a try to use it; = what is the reason to
>>>>>> hit a thread
>>>>>>>> that does not do networking, with TCP proc= essing?
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mike
>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>> Michael
>>>>>>>>>>>
>>>>>>>>>>> Thanks all!
>>>>>>>>>>> Really happy here :)
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> es= creveu (domingo,
>>>>>> 17/03/2024 =C3=A0(s) 20:26):
>>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't have the f= ull context, but it seems like the
>>>>>> complaint is a performance regression in bonnie++ = and perhaps other
>>>>>> things when tcp_hpts is loaded, even when it is no= t used.=C2=A0 Is that
>>>>>> correct?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If so, I suspect its b= ecause we drive the
>>>>>> tcp_hpts_softclock() routine from userret(), in or= der to avoid tons
>>>>>> of timer interrupts and context switches.=C2=A0 To= test this theory,=C2=A0 you
>>>>>> could apply a patch like:
>>>>>>>>>>>>
>>>>>>>>>>>> It's affecting overall= system performance, bonnie was just
>>>>>> a way to
>>>>>>>>>>>> get some numbers to compar= e.
>>>>>>>>>>>>
>>>>>>>>>>>> Tomorrow I will test patch= .
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Nuno Teixeira
>>>>>>>>>>>> FreeBSD Committer (ports)<= br> >>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Nuno Teixeira
>>>>>>>>>>> FreeBSD Committer (ports)
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts= .c
>>>> index 8c4d2d41a3eb..eadbee19f69c 100644
>>>> --- a/sys/netinet/tcp_hpts.c
>>>> +++ b/sys/netinet/tcp_hpts.c
>>>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {
>>>> void *ie_cookie;
>>>> uint16_t p_num; /* The hpts number one per cpu */
>>>> uint16_t p_cpu; /* The hpts CPU */
>>>> + uint8_t hit_callout_thresh;
>>>> /* There is extra space in here */
>>>> /* Cache line 0x100 */
>>>> struct callout co __aligned(CACHE_LINE_SIZE);
>>>> @@ -269,6 +270,11 @@ static struct hpts_domain_info {
>>>> int cpu[MAXCPU];
>>>> } hpts_domains[MAXMEMDOM];
>>>>
>>>> +counter_u64_t hpts_that_need_softclock;
>>>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, ne= edsoftclock, CTLFLAG_RD,
>>>> +=C2=A0 =C2=A0 &hpts_that_need_softclock,
>>>> +=C2=A0 =C2=A0 "Number of hpts threads that need soft= clock");
>>>> +
>>>> counter_u64_t hpts_hopelessly_behind;
>>>>
>>>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hop= eless, CTLFLAG_RD,
>>>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUT= O, precision, CTLFLAG_RW,
>>>>=C2=A0 =C2=A0 &tcp_hpts_precision, 120,
>>>>=C2=A0 =C2=A0 "Value for PRE() precision of callout&qu= ot;);
>>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFL= AG_RW,
>>>> -=C2=A0 =C2=A0 &conn_cnt_thresh, 0,
>>>> +=C2=A0 =C2=A0 &conn_cnt_thresh, DEFAULT_CONNECTION_TH= ESHOLD,
>>>>=C2=A0 =C2=A0 "How many connections (below) make us us= e the callout based mechanism");
>>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_= RW,
>>>>=C2=A0 =C2=A0 &hpts_does_tp_logging, 0,
>>>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void)
>>>> struct tcp_hpts_entry *hpts;
>>>> int ticks_ran;
>>>>
>>>> + if (counter_u64_fetch(hpts_that_need_softclock) =3D=3D 0= )
>>>> + return;
>>>> +
>>>> hpts =3D tcp_choose_hpts_to_run();
>>>>
>>>> if (hpts->p_hpts_active) {
>>>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)
>>>> ticks_ran =3D tcp_hptsi(hpts, 1);
>>>> tv.tv_sec =3D 0;
>>>> tv.tv_usec =3D hpts->p_hpts_sleep_time * HPTS_TICKS_PER= _SLOT;
>>>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) &= & (hpts->hit_callout_thresh =3D=3D 0)) {
>>>> + hpts->hit_callout_thresh =3D 1;
>>>> + counter_u64_add(hpts_that_need_softclock, 1);
>>>> + } else if ((hpts->p_on_queue_cnt <=3D conn_cnt_thr= esh) && (hpts->hit_callout_thresh =3D=3D 1)) {
>>>> + hpts->hit_callout_thresh =3D 0;
>>>> + counter_u64_add(hpts_that_need_softclock, -1);
>>>> + }
>>>> if (hpts->p_on_queue_cnt >=3D conn_cnt_thresh) {
>>>> if(hpts->p_direct_wake =3D=3D 0) {
>>>> /*
>>>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void)
>>>> cpu_top =3D NULL;
>>>> #endif
>>>> tcp_pace.rp_num_hptss =3D ncpus;
>>>> + hpts_that_need_softclock =3D counter_u64_alloc(M_WAITOK)= ;
>>>> hpts_hopelessly_behind =3D counter_u64_alloc(M_WAITOK); >>>> hpts_loops =3D counter_u64_alloc(M_WAITOK);
>>>> back_tosleep =3D counter_u64_alloc(M_WAITOK);
>>>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void)
>>>> free(tcp_pace.grps, M_TCPHPTS);
>>>> #endif
>>>>
>>>> + counter_u64_free(hpts_that_need_softclock);
>>>> counter_u64_free(hpts_hopelessly_behind);
>>>> counter_u64_free(hpts_loops);
>>>> counter_u64_free(back_tosleep);
>>>
>>>
>>
>>
>>
>> --
>> Nuno Teixeira
>> FreeBSD Committer (ports)
>
>
>
> --
> Nuno Teixeira
> FreeBSD Committer (ports)



--
Nuno Teixeira
FreeBSD Committ= er (ports)


--
Nuno Teixeira
FreeBSD Committ= er (ports)
--000000000000c64a570615bd6820--