From nobody Thu Sep 22 20:40:08 2022 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MYRx84363z4dFKK for ; Thu, 22 Sep 2022 20:40:12 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MYRx64xcHz3q4X; Thu, 22 Sep 2022 20:40:10 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.16.1/8.16.1) with ESMTPS id 28MKe8sR001197 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 22 Sep 2022 13:40:08 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.16.1/8.16.1/Submit) id 28MKe8vq001196; Thu, 22 Sep 2022 13:40:08 -0700 (PDT) (envelope-from sgk) Date: Thu, 22 Sep 2022 13:40:08 -0700 From: Steve Kargl To: Mateusz Guzik Cc: Mark Johnston , freebsd-current@freebsd.org Subject: Re: A panic a day Message-ID: Reply-To: sgk@troutmask.apl.washington.edu References: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4MYRx64xcHz3q4X X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=washington.edu (policy=none); spf=none (mx1.freebsd.org: domain of sgk@troutmask.apl.washington.edu has no SPF policy when checking 128.95.76.21) smtp.mailfrom=sgk@troutmask.apl.washington.edu X-Spamd-Result: default: False [-3.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.997]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[washington.edu : No valid SPF, No valid DKIM,none]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; FREEMAIL_TO(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:73, ipnet:128.95.0.0/16, country:US]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; REPLYTO_ADDR_EQ_FROM(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_NA(0.00)[no SPF record]; HAS_REPLYTO(0.00)[sgk@troutmask.apl.washington.edu] X-ThisMailContainsUnwantedMimeParts: N On Thu, Sep 22, 2022 at 09:07:08PM +0200, Mateusz Guzik wrote: > On 9/22/22, Steve Kargl wrote: > > On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote: > >> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote: > >> > All, > >> > > >> > I updated my kernel/world/all ports on Sept 19 2022. > >> > Since then, I have had daily panics and hard lock-up > >> > (no panic, keyboard, mouse, network, ...). The one > >> > panic I did witness sent text scolling off the screen. > >> > There is no dump, or at least, I haven't figured out > >> > a way to get a dump. > >> > > >> > Using ports/graphics/tesseract and then hand editing > >> > the OCR result, the last visible portions is > >> > > >> > > > > > (panic messages removed). > > > >> It looks like you use the 4BSD scheduler? I think there's a bug in > >> kick_other_cpu() in that it doesn't make sure that the remote CPU's > >> curthread lock is held when modifying thread state. Because 4BSD has a > >> global scheduler lock, this is often true in practice, but doesn't have > >> to be. > > > > Yes, I use 4BSD. ULE has very poor performance for HPC type work with > > OpenMPI. > > > > Is there an easy way to set it up for testing purposes? > I reported this years ago. One instance is here https://lists.freebsd.org/pipermail/freebsd-hackers/2008-October/026375.html and, I've tested ULE a few times since. A HPC program, compiled with openmpi, can spawn multiple images. The gist of the problem is that under ULE, if one gets in an over-subscribed situation (e.g., N+1 images and only N cpus), then ULE's cpu affinity will place two images on 1 cpu. Those images ping-pong. The other N-1 images run happily. An image that completes its task will then wait on the ping-pong match before getting its next quantum of work. Under 4BSD, the N+1 images simply run on the N cpus where each gets a cpu slice. Note, you don't need an openmpi program to get this situation. Simply use a numerical intensive code that takes 5 or so minutes to complete. Start N+1 jobs. You'll get 2 jobs completing for 1 CPU. -- Steve