Date: Wed, 24 Apr 2024 12:01:26 +0200 From: Alexander Leidinger <Alexander@Leidinger.net> To: Gleb Smirnoff <glebius@freebsd.org> Cc: Current <current@freebsd.org> Subject: Re: Strange network/socket anomalies since about a month Message-ID: <d915e7b730f59a531d03b027215f81e0@Leidinger.net> In-Reply-To: <ZiaMgRH-8vecPfSt@cell.glebi.us> References: <1fe609f252e7fae6d746530d5035ec0e@Leidinger.net> <ZiaMgRH-8vecPfSt@cell.glebi.us>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --=_6b985b0e8816937697b536ffdd21ad4a Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Am 2024-04-22 18:12, schrieb Gleb Smirnoff: > There were several preparatory commits that were not reverted and one > of them > had a bug. The bug manifested itself as failure to send(2) zero bytes > over > unix/stream. It was fixed with > e6a4b57239dafc6c944473326891d46d966c0264. Can > you please check you have this revision? Other than that there are no > known > bugs left. Yes, I have this fix in my running kernel. > A> Any ideas how to track this down more easily than running the entire > A> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? > > I don't have any better idea than ktrace over failing application. > Yep, I > understand that poudriere will produce a lot. But first we need to > determine Yes, it does. 4.4 GB just for the start of poudriere until the first package build fails due to a failed sccache start (luckily in the first builder, but I had at least 2 builders automatically spin up by poudriere at the time when I validated the failure in the logs and disabled the tracing). > what syscall fails and on what type of socket. After that we can scope > down to > using dtrace on very particular functions. I'm not sure I manage to find the cause of the failure... the only thing which remotely looks like an issue is "Resource temporarily unavailable", but this is from the process which waits for the server to have started: ---snip--- 58406 sccache 1713947887.504834367 RET __sysctl 0 58406 sccache 1713947887.505521884 CALL rfork(0x80000000<><invalid>2147483648) 58406 sccache 1713947887.505757777 CAP system call not allowed: rfork 58406 sccache 1713947887.505774176 RET rfork 58426/0xe43a 58406 sccache 1713947887.507304865 CALL compat11.kevent(0x3,0x371d360f89e8,0x2,0x371d360f89e8,0x2,0) 58406 sccache 1713947887.507657906 STRU struct freebsd11_kevent[] = { { ident=11, filter=EVFILT_READ, flags=0x61<EV_ADD|EV_CLEAR|EV_RECEIPT>, fflags=0, data=0, udata=0x0 } { ident=11, filter=EVFILT_WRITE, flags=0x61<EV_ADD|EV_CLEAR|EV_RECEIPT>, fflags=0, data=0, udata=0x0 } } 58406 sccache 1713947887.507689980 STRU struct freebsd11_kevent[] = { { ident=11, filter=EVFILT_READ, flags=0x4000<EV_ERROR>, fflags=0, data=0, udata=0x0 } { ident=11, filter=EVFILT_WRITE, flags=0x4000<EV_ERROR>, fflags=0, data=0, udata=0x0 } } 58406 sccache 1713947887.507977155 RET compat11.kevent 2 58406 sccache 1713947887.508015751 CALL write(0x5,0x371515685bcc,0x1) 58406 sccache 1713947887.508086434 GIO fd 5 wrote 1 byte 0x0000 01 |.| 58406 sccache 1713947887.508145930 RET write 1 58406 sccache 1713947887.508183140 CALL compat11.kevent(0x7,0,0,0x5a5689ab0c40,0x400,0) 58406 sccache 1713947887.508396614 STRU struct freebsd11_kevent[] = { } 58406 sccache 1713947887.508156537 STRU struct freebsd11_kevent[] = { { ident=4, filter=EVFILT_READ, flags=0x60<EV_CLEAR|EV_RECEIPT>, fflags=0, data=0x1, udata=0xffffffffffffffff } } 58406 sccache 1713947887.508530888 RET compat11.kevent 1 58406 sccache 1713947887.508563736 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947887.508729102 GIO fd 4 read 1 byte 0x0000 01 |.| 58406 sccache 1713947887.508967661 RET read 1 58406 sccache 1713947887.508996753 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947887.509028311 RET read -1 errno 35 Resource temporarily unavailable 58406 sccache 1713947887.509068838 CALL compat11.kevent(0x3,0,0,0x5a5689a97540,0x400,0x371d3a2887c8) ... 58406 sccache 1713947897.514352552 CALL _umtx_op(0x5a5689a3d290,0x10<UMTX_OP_WAKE_PRIVATE>,0x7fffffff,0,0) 58406 sccache 1713947897.514383653 RET _umtx_op 0 58406 sccache 1713947897.514421273 CALL write(0x5,0x371515685bcc,0x1) 58406 sccache 1713947897.515050967 STRU struct freebsd11_kevent[] = { { ident=4, filter=EVFILT_READ, flags=0x60<EV_CLEAR|EV_RECEIPT>, fflags=0, data=0x1, udata=0xffffffffffffffff } } 58406 sccache 1713947897.515146151 RET compat11.kevent 1 58406 sccache 1713947897.515178978 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947897.515368070 GIO fd 4 read 1 byte 0x0000 01 |.| 58406 sccache 1713947897.515396600 RET read 1 58406 sccache 1713947897.515426523 CALL read(0x4,0x371d3a2887c0,0x80) 58406 sccache 1713947897.515457073 RET read -1 errno 35 Resource temporarily unavailable 58406 sccache 1713947897.515004494 GIO fd 5 wrote 1 byte 0x0000 01 |.| ---snip--- https://www.leidinger.net/test/sccache.tar.bz2 contains the parts of the trace of the sccache processes (in case someone wants to have a look). The poudriere run had several builders in parallel, at least 2 were running at that point in time. What the overlay does is to startup (sccache --start-server) the sccache server process (forks to return back on the command line) which creates a file system socket, and then it queries the stats (sccache --show-stats). So some of the traces in the tarball are the server start (those with "Timed out waiting for server startup" are maybe the processes which fork to start the server and wait for it to be started), some are the stat-query, and some seem to be a successful start in another poudriere-builder (those with a successful /root/.ccache/sccache/5/4/<hashvalue> access look like from a successful start in another jail). Maybe there is also a --stop-server from poudriere somewhere. What I noticed (except that printing the new CAP stuff for non-CAP enabled processes by default is disturbing) is, that compat11 stuff is called (seems the rust ecosystem is not keeping up with our speed of development...). Not sure if it matters here that some compat stuff is called. Bye, Alexander. -- http://www.Leidinger.net Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_6b985b0e8816937697b536ffdd21ad4a Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc; size=833 Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEER9UlYXp1PSd08nWXEg2wmwP42IYFAmYo2IQACgkQEg2wmwP4 2IbLtRAAnQduZ2y4IiSCZmbfWHVePBv4ZX8tAjezWkMig6tpM3qPmpqBBEDrpsNJ l2mNMNPEeUpx7ThU4pNqBvVkYUkHGrXpI/w/i4xHO5cj/tOHANkRlQnf9WZWEqKM muvYsCz5ACvduiqYBkP2Z/Y+BD2ZoYljt7ALVUenlqGvmdWyCtDPXVP7/4ZqCCAS gveuWUY6ClX9AKxDTh46HjY1Nx3fiM8iNtP4ZTciHrXt++wmTJoPGODMxJeP0/Vd DvRTXM/zqeDKzQAhOcaa9CFoVXGq81WGgTSLU8NBHCS5Z/jClp3eiCfTtunL2Cn3 TQpvQEg2Vqz/eLWEn16d1OuMtd4PcY7aQSkMnyoVRPcg2Vu3I6rO/G74AGYdpKER btfIgpovbtBMz/TOF4gzBni+TsPpmWGjzXppbyRjKB7/Yw4vT3fUvtjMBMe7Iflt CERjAM9DVwaAx22xLPYhxaSoKe/dW7efYRVzshXTHRfr6rpt6w97OxG82C8DMJcd RAnqt2Gz62tVpjKrvyYtzeOpiOPV4u3MYmttebSjWPiNR7MAGPQISeq1pF9a7zK3 IVFVGru0ptCOKtdAEk0MMa647WbRCX1V2+fpsFiuk7qxyIZ6HvqGhNxU7LmmMeI/ q0dqRQm3952Rm5SwllDouPMC77Gmnd5rJYrR/Mh3vQakIS2SbY0= =Ql0c -----END PGP SIGNATURE----- --=_6b985b0e8816937697b536ffdd21ad4a--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d915e7b730f59a531d03b027215f81e0>