From owner-freebsd-current@freebsd.org Tue Jan 9 00:31:22 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19480E6ED54; Tue, 9 Jan 2018 00:31:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-qk0-x242.google.com (mail-qk0-x242.google.com [IPv6:2607:f8b0:400d:c09::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C61E47D72C; Tue, 9 Jan 2018 00:31:21 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-qk0-x242.google.com with SMTP id l12so16176598qke.13; Mon, 08 Jan 2018 16:31:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=2w12l/GDWqBbOo10dS2vyeicOdCwT4Tf2uo+svmOcUo=; b=nd5sLI3EXbIRf6jEwrnsZJT+PTVX4rxTFGh5FcmJejLdz4e+iurSBJsl+4EkfgtSJF rozYIh8pIFEEiAS+T1Dlq7uTKTuJNdzlMZNOPepcdhM1hdTRt4wq+/4R/tVpAGYBVwnQ TG2Vgsy79xu1RE6HMD75EyPm7C0il/ikntgBvG0s7DXjTOTTZBDsn5MLAnOIe2ir+J+X kEkjeLfDC+SiPmxQxary66G11KX4wJ4wi5goJXjPDk1vewNq5fp2rJ80Khd1Z7fHX01Q nR+UxYq7zLHG348I9hTTV4K34AhYs825/fswLaeNm8U34e2mWl6EFaOvTWd/PeF95+x6 /N2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=2w12l/GDWqBbOo10dS2vyeicOdCwT4Tf2uo+svmOcUo=; b=NOQMuruuQMJVYg2Ys+vhjwKiwzd0cj3m2uE0h4LK8jKES0ljAB/HOeYWOH8Mf9/p4v aL/AGc37a/6JXHi6tnhO7OCJtqf7Qm0dEdxDRWdxwgKVw/a/0W9c23kiU4uAhl3QRYmK 4YmSIV+kMY6emWNs16VXh8nmbSIC8PXaf3tPWn4VJyrZn1x+WXvZixqID2InXg0JXBDS Aj6WhXTMiQ4a5Tfl5EOTZPVUxKQW1muR9ai5wfsbo/gXWYMfbq/smG9bpAMNLe6lppZv hz0Fuvk8HTCrxDu0bLRlYlAgL7fz6/56ZX31LnTbkvneEdofFE2PC8ej5vGua/FCk3Hx 58tw== X-Gm-Message-State: AKwxytcIyqeGhLsh/sU2H/DhLOlTUWbM2IDbGFDKavFt5BOEC7uk50pI i7XcAdnCKsieQk9TiDtN8IssgxdVxKLP7saWOsQPsA== X-Google-Smtp-Source: ACJfBosnb1Xw3FRlgbnjhibLbb3r8gsmyTsBKE2J4mm6UZMzy/T1W5L46Vx8PgfACMLnX7NTY0aZQrPxGOr7+oxGdTE= X-Received: by 10.55.147.199 with SMTP id v190mr19112757qkd.119.1515457880986; Mon, 08 Jan 2018 16:31:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.44.214 with HTTP; Mon, 8 Jan 2018 16:31:19 -0800 (PST) In-Reply-To: <54018b1b2feaab3b05d7ed406eb8273c@mikej.com> References: <6eecc842ba7a37af6b2ffe146dfd91da@mikej.com> <1684681.MCyL5Ev91y@ralph.baldwin.cx> <54018b1b2feaab3b05d7ed406eb8273c@mikej.com> From: Mateusz Guzik Date: Tue, 9 Jan 2018 01:31:19 +0100 Message-ID: Subject: Re: witness_lock_list_get: witness exhausted To: Michael Jung Cc: John Baldwin , FreeBSD Current , owner-freebsd-current@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jan 2018 00:31:22 -0000 On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: > On 2018-01-08 13:39, John Baldwin wrote: > >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: >> >>> Hi! >>> >>> I've recently up'd my processor count on our poudriere box and have >>> started noticing the error >>> "witness_lock_list_get: witness exhausted" on the console. The kernel >>> *DOES NOT* crash but I >>> thought the report may be useful to someone. >>> >>> $ uname -a >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov >>> 19 18:41:20 EST 2017 >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 >>> >>> The machine is pretty busy running four poudriere build instances. >>> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 >>> >>> up 6+07:32:59 14:44:03 >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio >>> >>> Let me know what additional information I might supply. >>> >> >> This just means that WITNESS stopped working because it ran out of >> pre-allocated objects. In particular the objects used to track how >> many locks are held by how many threads: >> >> /* >> * XXX: This is somewhat bogus, as we assume here that at most 2048 >> threads >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should >> * probably be safe for the most part, but it's still a SWAG. >> */ >> #define LOCK_NCHILDREN 5 >> #define LOCK_CHILDCOUNT 2048 >> >> Probably the '2048' (max number of concurrent threads) needs to scale with >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. >> > > > Thank you for you explanation. We are expanding our ESXi cluster and even > though with standard edition I can only assign 64 vCPU's to a guest and as > much > RAM as I want, I do like to help with edge cases if I can make them occur > pushing > boundaries as I can towards additianional improvements in FreeBSD. > Can you apply this and re-run the test? https://people.freebsd.org/~mjg/witness.diff It bumps the counters to be "high enough" but also starts tracking usage. If you get the message again, bump the values even higher. Once you get a complete poudriere run which did not result in the problem, do: $ sysctl debug.witness.list_used debug.witness.list_max_used to dump the actual usage. -- Mateusz Guzik