From owner-freebsd-current@freebsd.org Tue Oct 22 11:49:01 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B4E5D1583E7 for ; Tue, 22 Oct 2019 11:49:01 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46yBcD4v42z4fXF for ; Tue, 22 Oct 2019 11:49:00 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lj1-f195.google.com with SMTP id l21so16857397lje.4 for ; Tue, 22 Oct 2019 04:49:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:openpgp:autocrypt :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=MkJo7JRuh7GKl6RmqsqOdnvCop3hnQAmrpzbadz3DlU=; b=IJIANqmlZTXZAMGl6Dd3XCOtHD3YYne1gy/Ifcs3K3F71ViSCbSZF3WtpjmyMqyX7+ BSPxDTMWMa9aUW0OVEKsGHocl8HzDzXxJrZbnNzxUYTgfly9sPfVyt/h6nOA0OYqf+05 DB74GdoW5aVUuuU+bDil+IKKYQU8icNDSJ6W0LVRP9HXYEeaiUMj1ji5jeGLnbcDdp/D 03+kj3Jp1/P0KZ/SgOr9/RxrpDjLfjLR69EeS3azBrz9lS5fxOWfj5mxq1jaTqCWtd5M NWIQAekuhr2F+j1LQBD3MCKANfD+IX//v7aWVIcLO8rLzFgYjgdULy6AfXCWGhzQwgHb YKMg== X-Gm-Message-State: APjAAAV13xOsKVnBGhitJ3tFKgt9DzAhJHOUQEZAevtBCM5zOL7H5YM9 JguB65vCTTg0VU1Ah7uT/onYYnU3fHM= X-Google-Smtp-Source: APXvYqwFvr1K4v/tSOpsVHwOS4mBFvLrvmjVk7Sp7hDVAoN+3ppDfPWXKrZNtkIRbMjV/KUSwZTFdQ== X-Received: by 2002:a2e:85c1:: with SMTP id h1mr18462259ljj.169.1571744938419; Tue, 22 Oct 2019 04:48:58 -0700 (PDT) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id g3sm7691843ljj.59.2019.10.22.04.48.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 22 Oct 2019 04:48:57 -0700 (PDT) Subject: Re: thread on sleepqueue does not wake up after timeout To: Konstantin Belousov Cc: FreeBSD Current References: <20191022104434.GM73312@kib.kiev.ua> From: Andriy Gapon Openpgp: preference=signencrypt Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata= mQINBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABtB5BbmRyaXkgR2Fw b24gPGF2Z0BGcmVlQlNELm9yZz6JAlQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/ QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/ psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryLkCDQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+ 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+ 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L HgZY3zfKOqcAEQEAAYkCPAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp Message-ID: <3a67f9a9-31cf-5814-4a68-8bdd6063b21e@FreeBSD.org> Date: Tue, 22 Oct 2019 14:48:56 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191022104434.GM73312@kib.kiev.ua> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 46yBcD4v42z4fXF X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of agapon@gmail.com designates 209.85.208.195 as permitted sender) smtp.mailfrom=agapon@gmail.com X-Spamd-Result: default: False [-3.11 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FORGED_SENDER(0.30)[avg@FreeBSD.org,agapon@gmail.com]; FREEMAIL_TO(0.00)[gmail.com]; RECEIVED_SPAMHAUS_PBL(0.00)[96.151.72.93.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.10]; MIME_TRACE(0.00)[0:+]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[avg@FreeBSD.org,agapon@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[FreeBSD.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[195.208.85.209.list.dnswl.org : 127.0.5.0]; IP_SCORE(-1.11)[ip: (-0.21), ipnet: 209.85.128.0/17(-3.23), asn: 15169(-2.07), country: US(-0.05)]; RWL_MAILSPIKE_POSSIBLE(0.00)[195.208.85.209.rep.mailspike.net : 127.0.0.17]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Oct 2019 11:49:01 -0000 On 22/10/2019 13:44, Konstantin Belousov wrote: > On Tue, Oct 22, 2019 at 01:08:59PM +0300, Andriy Gapon wrote: >> >> We observe a problem that happens very rarely (about once a month across many >> test machines). The problem is that a thread remain in sleepq_timedwait() even >> after its timeout expires. The thread's td_slpcallout looks like the callout >> has fired. But the thread's state looks like it was never notified. >> E.g.: >> (kgdb) p td->td_slpcallout >> $1 = {c_links = {le = {le_next = 0xfffff800108e6470, le_prev = >> 0xfffffe0000be6ea8}, sle = {sle_next = 0xfffff800108e6470}, tqe = {tqe_next = >> 0xfffff800108e6470, tqe_prev = 0xfffffe0000be6ea8}}, c_time = 160957479343159, >> c_precision = 268435450, c_arg = 0xfffff80184602000, c_func = >> 0xffffffff807481d0 , c_lock = 0x0, c_flags = 2, c_iflags = 272, >> c_cpu = 6, c_exec_time = 160957506517070} [*] >> (kgdb) p/x td->td_flags >> $5 = 0x80000004 > What is the bit 31 in your flags ? FreeBSD does not use the bit. It's TDF_NOSWAP, a local addition. We use it to prohibit full process swapout (I guess that means kernel stacks). >> (kgdb) p td->td_sqqueue >> $8 = 0 >> (kgdb) p td->td_sleepqueue >> $9 = (struct sleepqueue *) 0x0 >> (kgdb) p td->td_wchan >> $10 = (void *) 0xfffff802b990df38 >> >> >> Has anyone seen anything like this problem? > Yes, but it was very long time ago. See r303426. Yeah, we are based off r329000 plus a bunch of merges for various fixes. One thing I forgot to mention is that it seems to happen only on VMware guests, but maybe it's only because we have many more virtual test boxes than we have physical ones. One thing I suspected was that binuptime() could somehow jump backwards... >> Any advice on how to diagnose it? >> >> Thanks! >> >> P.S. >> c_exec_time is our addition, we set this field right before firing a callback >> and we reset it to zero when a callout is (re-)scheduled. -- Andriy Gapon