From owner-freebsd-fs@FreeBSD.ORG Tue May 31 16:48:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BAAC2106564A; Tue, 31 May 2011 16:48:27 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 5F0488FC08; Tue, 31 May 2011 16:48:27 +0000 (UTC) Received: by yxl31 with SMTP id 31so2743280yxl.13 for ; Tue, 31 May 2011 09:48:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=B2Wc9BffubrUM4MQFd0HEdOa1KFPzOZZJXR+YqgmzGo=; b=l93dipYSrSKqeNcpG2+SYR+OBI7uITfY/O2H9Onhv0rc2pODHjRve1WYnme7CvHPl7 faH2dR/EChd9HDI7/PyKlMrEZBrwiPGc7+a3bRah2qt1ZnrWbGEss6JaLTm4QjYj8ycy Act5BSJjC/jAXwkjDxqhZoTPP7mcRKqwpoqoA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=cddq9fr1V9taFJMNmhcNdihL7sUNsqWyEQPI4cqQdqx0zBwEVVBV8QjhiNk4UyHmST 698zYCTYqn0/Uhe0hftuEBuq2Bh5iFSR2sSII35Ai3kdUBTxncMgLJlDbBP5spZcf4mJ a4ugoqt9r0JTQ+rSDyWUdLPhZb676qAqdD504= MIME-Version: 1.0 Received: by 10.236.138.161 with SMTP id a21mr2248741yhj.49.1306860506591; Tue, 31 May 2011 09:48:26 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.236.208.70 with HTTP; Tue, 31 May 2011 09:48:26 -0700 (PDT) In-Reply-To: <4DE50811.5060606@FreeBSD.org> References: <0EFD28CD-F2E9-4AE2-B927-1D327EC99DB9@bitgravity.com> <4DE50811.5060606@FreeBSD.org> Date: Tue, 31 May 2011 09:48:26 -0700 X-Google-Sender-Auth: 4NHHWxA5PfZLQuw5wMsJflfr69w Message-ID: From: Artem Belevich To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: arc_reclaim_thread running 100%, 8.1-RELEASE, LBOLT related X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2011 16:48:27 -0000 On Tue, May 31, 2011 at 8:24 AM, Andriy Gapon wrote: >>> =A0 =A0 =A0 =A0 65 =A0 =A0 =A0 =A0 nsec =3D (hrtime_t)ts.tv_sec * NANOS= EC + ts.tv_nsec; >> >> Yup. This would indeed overflow in ~106.75 days. > > Have you referred to the LBOLT above? > gethrtime() should have several hundred years before overflowing. hrtime_t is 64-bit. NANOSEC=3D1000000000. When it's time to use LBOLT, we further multiply number of seconds by HZ: >> 41 #define LBOLT ((gethrtime() * hz) / NANOSEC) In the end we want 64-bit scalar to hold number of seconds times 10e12. 0x7fffffffffffffff/1000000000000 =3D 9223372 # number of seconds before signed overflow 9223372/(24*60*60) --> 106 # .. or about 106 days >> The side effect is that it limits bolt resolution to hz units. With >> HZ=3D100, that will be 10ms. Whether it's good enough or too coarse I > > Nope, I think you did your your math wrong here. > As shown above it limits the resolution to hz ticks, i.e. hz * 1/hz secon= ds :) Point taken. > >> have no idea. Perhaps we can compromise and update lbolt in >> microseconds. That should give us few hundred years until the >> overflow. > > Well, we can either use the ticks variable, since we are not switching to= tickless > mode yet. =A0But we would need to make it 64-bit to avoid early overflows= . > Or, perhaps, to be somewhat future-friendly we could do approximately wha= t > OpenSolaris [upstream :-)] code does: > > gethrtime() / (NANOSEC / hz) > > Or, given that NANOSEC is constant and hz is invariant, we could apply ex= tended > invariant division by multiplication approach to get precise and fast res= ult > without overflowing. =A0But likely that's an overkill here. =A0Though we = definitely > should pre-calculate, store and re-use (NANOSEC / hz) value just like Ope= nSolaris > does it. This should work. > >>> clock_t will still need the typedef'ed to 64bit to still address the l2= arc usage of LBOLT. > > Hm, I see that OpenSolaris defines clock_t to long, while we use int32_t. > So, I think that it means two things: > - i386 OpenSolaris (if it exists) should be affected by the problem as we= ll > - we should not use our native clock_t definition for ported OpenSolaris = code > > Maybe we should fix our clock_t to be something wider at least for 64-bit > platforms. =A0But I am not prepared to discuss this. > > To summarize: > 1. We must fix ddi_get_lbolt*() to avoid unnecessary overflow in multipli= cation Agreed. > 2. We could fix 31-bit clock_t overflow by using Solaris-compatible defin= ition of > it (long), but that still won't help on i386. =A0Maybe we should bring up= this issue > to the attention of upstream ZFS developers. =A0Universally using ddi_get= _lbolt64() > seems like a safer bet. FYI, we've already changed clock_t for opensolaris code to int64_t in r218169 regardless of $MACHINE. --Artem