From owner-freebsd-mips@FreeBSD.ORG Mon Feb 24 18:02:22 2014 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76BB637A for ; Mon, 24 Feb 2014 18:02:22 +0000 (UTC) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 4E07B13D1 for ; Mon, 24 Feb 2014 18:02:22 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [198.74.231.63]) by cyrus.watson.org (Postfix) with ESMTPS id 40B6C46B2E; Mon, 24 Feb 2014 13:02:19 -0500 (EST) Date: Mon, 24 Feb 2014 18:02:19 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Warner Losh Subject: Re: [RFC] Enable use of UserLocal Register (ULRI) if detected (patches) In-Reply-To: Message-ID: References: <092B0786-EA73-44D0-81FC-DFB56B14D4D7@bsdimp.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "freebsd-mips@freebsd.org" X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Feb 2014 18:02:22 -0000 On Wed, 19 Feb 2014, Warner Losh wrote: >> I would note, BTW, that the current use of TLS in malloc()/free() and >> today's MIPS exception handler for TLS implementation do introduce a very >> measurable overhead. I'm left wondering if there is something we can do >> for unthreaded processes to avoid taking kernel traps on every memory >> allocation and free for MIPSes without ULRI. (Note that that problem is >> present before Stacey's patch: the reason we added ULRI support is that our >> hardware does support ULRI, and we can therefore avoid that nasty overhead >> ...) I understand there's work on a new MIPS ABI that specifies a TLS >> register not requiring a trap to read on non-ULRI hardware, but I'm not >> sure how far that is from being available. Certainly it will require >> compiler/OS/etc work before it becomes useful to us. > > One could easily have a global, static TLS value that gets set at startup, > and cleared when the first thread is forked. The gettls calls then become > something akin to > > if (global_tls) return global_tls; else return _get_tls(); > > without changes to the ABI at all... Our measurements suggest that the overhead of instruction emulation here is a significant overhead due to per-malloc/free costs in userspace. However, our platform is a CPU-poor compared to memory speed due to being FPGA-based research processor, so it might be a less significant factor on conventional silicon. It might be interesting for someone developing on a more conventional system to do a quick but casual experiment and see if it might make a difference. Robert