From owner-freebsd-java@FreeBSD.ORG Thu Feb 22 23:10:57 2007 Return-Path: X-Original-To: freebsd-java@FreeBSD.org Delivered-To: freebsd-java@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4432816A401 for ; Thu, 22 Feb 2007 23:10:57 +0000 (UTC) (envelope-from arnej@pvv.ntnu.no) Received: from decibel.pvv.ntnu.no (decibel.pvv.ntnu.no [129.241.210.179]) by mx1.freebsd.org (Postfix) with ESMTP id A89AB13C441 for ; Thu, 22 Feb 2007 23:10:56 +0000 (UTC) (envelope-from arnej@pvv.ntnu.no) Received: from arnej by decibel.pvv.ntnu.no with local (Exim 4.60) (envelope-from ) id 1HKN5T-0006WC-FF for freebsd-java@FreeBSD.org; Fri, 23 Feb 2007 00:10:55 +0100 Date: Fri, 23 Feb 2007 00:10:55 +0100 (CET) From: "Arne H. Juul" To: freebsd-java@FreeBSD.org Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Subject: patch: fix and re-enable curthread hash lookup X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting Java to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Feb 2007 23:10:57 -0000 I've analyzed the currently disabled code that implements a faster method to find the current (Java) thread object by getting hold of the stack pointer and doing a lookup in a hash table. This used to fail on thread exit sometimes because the invalidation wasn't done properly; I've also changed some of the parameters for the hash code and upped the size of the hash table so it should be more optimal. Finally I've added a "near hit" feature that should make the lookup faster when a thread is crossing back and forth over a stack page boundary; earlier this would always trigger the slow path, but now it compares the current stack pointer with the low and high stack boundaries and gets a hit if the hash table entry still points at the right thread object. This patch is still experimental, so if people can take a look at it and tell me about any problems they can spot that would be much appreciated. - Arne H. J. diff -rc jdk-1_5_0-scsl.b4/hotspot/src/os/bsd/vm/os_bsd.cpp jdk-1_5_0-scsl.b4.fct2/hotspot/src/os/bsd/vm/os_bsd.cpp *** jdk-1_5_0-scsl.b4/hotspot/src/os/bsd/vm/os_bsd.cpp Wed Jan 24 11:12:39 2007 --- jdk-1_5_0-scsl.b4.fct2/hotspot/src/os/bsd/vm/os_bsd.cpp Thu Feb 22 22:28:21 2007 *************** *** 138,152 **** // thread value in thread local storage. #endif // Store the new value before updating the cache to prevent a race // between get_thread_via_cache_slowly() and this store operation. os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread); ! // Update thread cache with new thread if setting on thread create, ! // or NO_CACHED_THREAD (zeroed) thread if resetting thread on exit. ! uintptr_t raw = pd_raw_thread_id(); ! int ix = pd_cache_index(raw); ! _get_thread_cache[ix] = thread == NULL ? NO_CACHED_THREAD : thread; } void ThreadLocalStorage::pd_init() { --- 138,176 ---- // thread value in thread local storage. #endif + uintptr_t raw = pd_raw_thread_id(); + int ix = pd_cache_index(raw); + + if (thread != NULL) { + // first make sure that nobody gets a cache collision pointing to + // this thread by updating its raw id + thread->_self_raw_id = raw; + // Store the new value before updating the cache to prevent a race // between get_thread_via_cache_slowly() and this store operation. os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread); ! // Update thread cache with new thread if setting on thread create ! _get_thread_cache[ix] = thread; ! } else { ! // deleting current thread, must get old value first ! Thread* was_thread = Thread::current(); ! ! // first make sure that nobody gets a cache collision pointing to ! // the old thread data (soon to be free'd) by updating the raw id ! was_thread->_self_raw_id = raw; ! ! // again, update the "real" value first: ! os::thread_local_storage_at_put(ThreadLocalStorage::thread_index(), thread); ! ! // then invalidate all cache elements that pointed to this thread ! // by setting them to NO_CACHED_THREAD (zeroed thread) ! for (int ix = 0; ix < _pd_cache_size; ++ix) { ! if (_get_thread_cache[ix] == was_thread) { ! _get_thread_cache[ix] = NO_CACHED_THREAD; ! } ! } ! } } void ThreadLocalStorage::pd_init() { *************** *** 1190,1196 **** // XXXBSD: hmm... really do not need? void os::free_thread_local_storage(int index) { // %%% don't think we need anything here ! // if ( pthread_key_delete((pthread_key_t) tk) ) // fatal("os::free_thread_local_storage: pthread_key_delete failed"); } --- 1214,1220 ---- // XXXBSD: hmm... really do not need? void os::free_thread_local_storage(int index) { // %%% don't think we need anything here ! // if ( pthread_key_delete((pthread_key_t) index) ) // fatal("os::free_thread_local_storage: pthread_key_delete failed"); } diff -rc jdk-1_5_0-scsl.b4/hotspot/src/os/bsd/vm/thread_bsd.inline.hpp jdk-1_5_0-scsl.b4.fct2/hotspot/src/os/bsd/vm/thread_bsd.inline.hpp *** jdk-1_5_0-scsl.b4/hotspot/src/os/bsd/vm/thread_bsd.inline.hpp Wed Jan 24 11:12:39 2007 --- jdk-1_5_0-scsl.b4.fct2/hotspot/src/os/bsd/vm/thread_bsd.inline.hpp Thu Feb 22 22:28:21 2007 *************** *** 19,36 **** uintptr_t raw = pd_raw_thread_id(); int ix = pd_cache_index(raw); ! // XXXBSD: disable fast case. there is a race condition where the ! // fast case returns a different thread from the slow case and has ! // been seen on both OpenBSD and FreeBSD. #if 1 - return ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix); - #else Thread *Candidate = ThreadLocalStorage::_get_thread_cache[ix]; if (Candidate->_self_raw_id == raw) { ! // hit return Candidate; - } else { - return ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix); } #endif } --- 19,49 ---- uintptr_t raw = pd_raw_thread_id(); int ix = pd_cache_index(raw); ! // XXXBSD: fast case. ! // there was a race condition where the fast case returned a different ! // thread from the slow case and has been seen on both OpenBSD and ! // FreeBSD, but I think it's fixed, so enable it for now: #if 1 Thread *Candidate = ThreadLocalStorage::_get_thread_cache[ix]; + if (Candidate->_self_raw_id == raw) { ! // direct hit return Candidate; } + + address stacktop = Candidate->_stack_base; + address stackbot = Candidate->_stack_base - Candidate->_stack_size; + + address sp = pd_sp_address(); + + // is this still the right thread? Check if current stack pointer + // is within the thread's stack, if ok update it with current raw id. + if (stacktop > sp && stackbot <= sp ) { + Candidate->_self_raw_id = raw; + // indirect hit + return Candidate; + } + ix = pd_cache_index(raw); #endif + return ThreadLocalStorage::get_thread_via_cache_slowly(raw, ix); } diff -rc jdk-1_5_0-scsl.b4/hotspot/src/os_cpu/bsd_amd64/vm/threadLS_bsd_amd64.hpp jdk-1_5_0-scsl.b4.fct2/hotspot/src/os_cpu/bsd_amd64/vm/threadLS_bsd_amd64.hpp *** jdk-1_5_0-scsl.b4/hotspot/src/os_cpu/bsd_amd64/vm/threadLS_bsd_amd64.hpp Wed Jan 24 11:12:39 2007 --- jdk-1_5_0-scsl.b4.fct2/hotspot/src/os_cpu/bsd_amd64/vm/threadLS_bsd_amd64.hpp Thu Feb 22 22:28:21 2007 *************** *** 9,15 **** // Processor dependent parts of ThreadLocalStorage private: ! static Thread* _get_thread_cache[]; // index by [(raw_id>>9)^(raw_id>>20) % _pd_cache_size] static Thread* get_thread_via_cache_slowly(uintptr_t raw_id, int index); NOT_PRODUCT(static int _tcacheHit;) --- 9,15 ---- // Processor dependent parts of ThreadLocalStorage private: ! static Thread* _get_thread_cache[]; // index by [(raw_id^(raw_id>>10)) % _pd_cache_size] static Thread* get_thread_via_cache_slowly(uintptr_t raw_id, int index); NOT_PRODUCT(static int _tcacheHit;) *************** *** 20,27 **** static void print_statistics() PRODUCT_RETURN; enum Constants { ! _pd_cache_size = 128*2, // projected typical # of threads * 2 ! _pd_min_page_size = 4*K, _pd_typical_stack_size = 512*K }; --- 20,27 ---- static void print_statistics() PRODUCT_RETURN; enum Constants { ! // projected typical # of threads * 2 * typical active stack pages ! _pd_cache_size = 128*2*4, _pd_min_page_size = 4*K, _pd_typical_stack_size = 512*K }; *************** *** 37,45 **** } static int pd_cache_index(uintptr_t sp_page) { ! return ((sp_page / 2) /* pages tend to come in pairs */ ! ^ (sp_page / (_pd_typical_stack_size/_pd_min_page_size))) ! % _pd_cache_size; } // Java Thread --- 37,44 ---- } static int pd_cache_index(uintptr_t sp_page) { ! // _pd_cache_size == 1<<10 ! return (sp_page ^ (sp_page >> 10)) % _pd_cache_size; } // Java Thread diff -rc jdk-1_5_0-scsl.b4/hotspot/src/os_cpu/bsd_i486/vm/threadLS_bsd_i486.hpp jdk-1_5_0-scsl.b4.fct2/hotspot/src/os_cpu/bsd_i486/vm/threadLS_bsd_i486.hpp *** jdk-1_5_0-scsl.b4/hotspot/src/os_cpu/bsd_i486/vm/threadLS_bsd_i486.hpp Wed Jan 24 11:12:39 2007 --- jdk-1_5_0-scsl.b4.fct2/hotspot/src/os_cpu/bsd_i486/vm/threadLS_bsd_i486.hpp Thu Feb 22 22:28:21 2007 *************** *** 9,15 **** // Processor dependent parts of ThreadLocalStorage private: ! static Thread* _get_thread_cache[]; // index by [(raw_id>>9)^(raw_id>>20) % _pd_cache_size] static Thread* get_thread_via_cache_slowly(uintptr_t raw_id, int index); NOT_PRODUCT(static int _tcacheHit;) --- 9,15 ---- // Processor dependent parts of ThreadLocalStorage private: ! static Thread* _get_thread_cache[]; // index by [(raw_id^(raw_id>>10)) % _pd_cache_size] static Thread* get_thread_via_cache_slowly(uintptr_t raw_id, int index); NOT_PRODUCT(static int _tcacheHit;) *************** *** 20,45 **** static void print_statistics() PRODUCT_RETURN; enum Constants { ! _pd_cache_size = 128*2, // projected typical # of threads * 2 ! _pd_min_page_size = 4*K, _pd_typical_stack_size = 512*K }; static address pd_sp_address() { ! int junk; ! return (address)&junk; } static uintptr_t pd_raw_thread_id() { ! address sp = pd_sp_address(); ! return (unsigned int)sp / _pd_min_page_size; } static int pd_cache_index(uintptr_t sp_page) { ! return ((sp_page / 2) /* pages tend to come in pairs */ ! ^ (sp_page / (_pd_typical_stack_size/_pd_min_page_size))) ! % _pd_cache_size; } // Java Thread --- 20,45 ---- static void print_statistics() PRODUCT_RETURN; enum Constants { ! // projected typical # of threads * typical active stack pages * 2 ! _pd_cache_size = 128*2*4, _pd_min_page_size = 4*K, _pd_typical_stack_size = 512*K }; static address pd_sp_address() { ! address sp; ! __asm__ volatile ("movl %%esp, %0" : "=r" (sp)); ! return sp; } static uintptr_t pd_raw_thread_id() { ! // _pd_min_page_size == 1 << 12 ! return ((unsigned int)pd_sp_address()) >> 12; } static int pd_cache_index(uintptr_t sp_page) { ! // _pd_cache_size == 1<<10 ! return (sp_page ^ (sp_page >> 10)) % _pd_cache_size; } // Java Thread diff -rc jdk-1_5_0-scsl.b4/hotspot/src/share/vm/runtime/thread.cpp jdk-1_5_0-scsl.b4.fct2/hotspot/src/share/vm/runtime/thread.cpp *** jdk-1_5_0-scsl.b4/hotspot/src/share/vm/runtime/thread.cpp Wed Jan 24 10:40:33 2007 --- jdk-1_5_0-scsl.b4.fct2/hotspot/src/share/vm/runtime/thread.cpp Thu Feb 22 23:33:51 2007 *************** *** 104,109 **** --- 104,114 ---- delete _SR_lock; + // to make sure nobody finds the deleted thread as a current thread + // in the cache (when comparing stack pointers) + set_stack_base(0); + set_stack_size(0); + // clear thread local storage if the Thread is deleting itself if (this == Thread::current()) { ThreadLocalStorage::set_thread(NULL);