From owner-freebsd-xen@freebsd.org Mon Mar 15 21:24:51 2021 Return-Path: Delivered-To: freebsd-xen@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2A54A56832A for ; Mon, 15 Mar 2021 21:24:51 +0000 (UTC) (envelope-from buhrow@nfbcal.org) Received: from nfbcal.org (ns.NFBCAL.ORG [157.22.230.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "nfbcal.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DzqFG05sGz4T8F for ; Mon, 15 Mar 2021 21:24:49 +0000 (UTC) (envelope-from buhrow@nfbcal.org) Received: from nfbcal.org (localhost [127.0.0.1]) by nfbcal.org (8.15.2/8.14.1-NFBNETBSD) with ESMTPS id 12FLOekH025441 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 15 Mar 2021 14:24:40 -0700 (PDT) Received: (from buhrow@localhost) by nfbcal.org (8.15.2/8.12.11) id 12FLOeeg004469; Mon, 15 Mar 2021 14:24:40 -0700 (PDT) Message-Id: <202103152124.12FLOeeg004469@nfbcal.org> From: Brian Buhrow Date: Mon, 15 Mar 2021 14:24:40 -0700 In-Reply-To: <202101290111.10T1B4Br019488@nfbcal.org> X-Mailer: Mail User's Shell (7.2.6 beta(4.pl1)+dynamic 20000103) To: freebsd-xen@freebsd.org Subject: Re: Corruption in xenstored tdb file? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (nfbcal.org [127.0.0.1]); Mon, 15 Mar 2021 14:24:41 -0700 (PDT) X-Rspamd-Queue-Id: 4DzqFG05sGz4T8F X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of buhrow@nfbcal.org designates 157.22.230.125 as permitted sender) smtp.mailfrom=buhrow@nfbcal.org X-Spamd-Result: default: False [-2.29 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[157.22.230.125:from]; R_SPF_ALLOW(-0.20)[+a:ns.nfbcal.org]; MID_RHS_MATCH_FROM(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[nfbcal.org]; SPAMHAUS_ZRD(0.00)[157.22.230.125:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.99)[-0.994]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; ASN(0.00)[asn:7091, ipnet:157.22.0.0/16, country:US]; MAILMAN_DEST(0.00)[freebsd-xen] X-BeenThere: freebsd-xen@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussion of the freebsd port to xen - implementation and usage List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2021 21:24:51 -0000 hello. Following up on this thread, I'm still having a problem with starting domains under FreebSD-12.1 as dom0 after ensuring that /var/lib/xenstoredb/tdb is deleted on startup. The first domu starts just fine, an old NetBSD-5.2 domain. the second one, however, a NetBSD-current as of January 19 or so, however, starts fine but doesn't have a network interface by the time it gets to multiuser mode. The errors on the back end look like: xnb(xnb_probe:1129): Claiming device 1, xnb xnb(xnb_attach:1273): Attaching to backend/vif/10/0 xnb(xnb_frontend_changed:1397): frontend_state=Initialising, xnb_state=InitWait xnb10.0: link state changed to DOWN xnb10.0: link state changed to UP xnb10.0: link state changed to DOWN xnb10.0: promiscuous mode enabled xnb10.0: link state changed to UP nd6_dad_timer: cancel DAD on xnb10.0 because of ND6_IFF_IFDISABLED. xnb(xnb_frontend_changed:1397): frontend_state=Initialised, xnb_state=InitWait xnb1: Error 2 Unable to retrieve ring information from frontend /local/domain/10/device/vif/0. Unable to connect. xnb1: Fatal error. Transitioning to Closing State xnb(xnb_frontend_changed:1397): frontend_state=Connected, xnb_state=Closing xnb(xnb_connect_comms:793): rings connected! xnb(xnb_frontend_changed:1397): frontend_state=Closed, xnb_state=Connected In looking at the code, it looks like this is failing somewhere in xs_gather() in syskj/dev/xen/xenstore/xenstore.c I thought it was some kind of race condition at first, because I could stop the domains that didn't come up with a network interface, wait a bit, restar them and find they worked. Now, however, having upgraded to 12.1-P13, I find that I'm consistently getting this failure regardless of how often I destroy and create the domain. Any ideas on what might be going on? -thanks -Brian