From owner-freebsd-arch@FreeBSD.ORG Sun Dec 28 03:07:33 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EA171997; Sun, 28 Dec 2014 03:07:32 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 95C8F642F0; Sun, 28 Dec 2014 03:07:32 +0000 (UTC) Received: from AlfredMacbookAir.local (c-76-21-10-192.hsd1.ca.comcast.net [76.21.10.192]) by elvis.mu.org (Postfix) with ESMTPSA id C81C4341F853; Sat, 27 Dec 2014 19:07:25 -0800 (PST) Message-ID: <549F742D.3000802@mu.org> Date: Sat, 27 Dec 2014 19:08:29 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: "Simon J. Gerraty" Subject: Re: Libxo bugs and fixes. References: <201408141640.s7EGe422096656@idle.juniper.net> <53ED57F2.5020808@mu.org> <20140815053604.9E40B580A2@chaos.jnpr.net> <53EDB0EF.6090902@mu.org> <20140815173830.93832580A2@chaos.jnpr.net> <53EEA74B.9070107@mu.org> <20140816045254.5F47E580A2@chaos.jnpr.net> <549BA675.9070107@freebsd.org> <29784.1419492726@chaos> <26E73A9E-6EA4-4698-885D-BD91906D32D2@mu.org> <18276.1419528679@chaos> <549C50FB.6050908@mu.org> <23687.1419546234@chaos> In-Reply-To: <23687.1419546234@chaos> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcel Moolenaar , Phil Shafer , John-Mark Gurney , Alfred Perlstein , "arch@freebsd.org" , Poul-Henning Kamp , freebsd-arch , Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Dec 2014 03:07:33 -0000 On 12/25/14 2:23 PM, Simon J. Gerraty wrote: > Alfred Perlstein wrote: >> I don't think my patch adds an xo_flush to xo_emit... ? > No, you added a test around it - I only looked at the RHS. > OK, so I've updated https://reviews.freebsd.org/D1379 to include code to handle this. Solaris and glibc have a private libc function to check if a stream is line buffered called int __flbf (FILE *stream) which returns true/false if the stream if line buffered or not. We can use this in libxo now. This should avoid the superfluous flushes. One other topic, I talked to some webdevs and they suggested that we have a "jsonLD" which stands for "json line delimited" that would be useful for periodic output. This would be halfway between a "pretty format" and a compact stream format. meaning that each line would be a record, example: Currently: /usr/src/contrib/libxo/libxo % netstat --libxo json,pretty 1 { "statistics": { "interface-statistics": [ { "received-packets": 2, "received-errors": 0, "dropped-packets": 0, "received-bytes": 132, "sent-packets": 4, "send-errors": 0, "sent-bytes": 1080, "collisions": 0 }, { "received-packets": 2, "received-errors": 0, "dropped-packets": 0, "received-bytes": 132, "sent-packets": 6, "send-errors": 0, "sent-bytes": 1964, "collisions": 0 }, { "received-packets": 1, "received-errors": 0, "dropped-packets": 0, "received-bytes": 66, "sent-packets": 4, "send-errors": 0, "sent-bytes": 1048, "collisions": 0 Later we might want to support something like this: /usr/src/contrib/libxo/libxo % netstat --libxo json,ld 1 { "statistics": { "interface-statistics": [ { "received-packets": 2, "received-errors": 0, "dropped-packets": 0, "received-bytes": 132, "sent-packets": 4, "send-errors": 0, "sent-bytes": 1080, "collisions": 0 }, { "received-packets": 2, "received-errors": 0, "dropped-packets": 0, "received-bytes": 132, "sent-packets": 6, "send-errors": 0, "sent-bytes": 1964, "collisions": 0 }, { "received-packets": 1, "received-errors": 0, "dropped-packets": 0, "received-bytes": 66, "sent-packets": 4, "send-errors": 0, "sent-bytes": 1048, "collisions": 0 } Let me know what you think. -Alfred From owner-freebsd-arch@FreeBSD.ORG Mon Dec 29 20:08:03 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 674CAD96; Mon, 29 Dec 2014 20:08:03 +0000 (UTC) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DAF113671; Mon, 29 Dec 2014 20:08:02 +0000 (UTC) Received: from laptop015.home.selasky.org (31.89-11-148.nextgentel.com [89.11.148.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 80EC71FE023; Mon, 29 Dec 2014 21:07:59 +0100 (CET) Message-ID: <54A1B4C9.7030704@selasky.org> Date: Mon, 29 Dec 2014 21:08:41 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: freebsd-arch@freebsd.org, John Baldwin Subject: [RFC] kern/kern_timeout.c rewrite in progress References: <54A1B38C.1000709@selasky.org> In-Reply-To: <54A1B38C.1000709@selasky.org> X-Forwarded-Message-Id: <54A1B38C.1000709@selasky.org> Content-Type: multipart/mixed; boundary="------------080805010304000705000900" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Dec 2014 20:08:03 -0000 This is a multi-part message in MIME format. --------------080805010304000705000900 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit -------- Forwarded Message -------- Subject: [RFC] kern/kern_timeout.c rewrite in progress Date: Mon, 29 Dec 2014 21:03:24 +0100 From: Hans Petter Selasky To: FreeBSD Current Hi, I recently came across a class of errors which lead me into investigating the "kern/kern_timeout.c" and its subsystem. From what I can see new features like the SMP awareness has been "added" instead of fully "integrated". When going into the cornercases I've uncovered that the internal callout statemachine can sometimes report wrong values via its callout_active() and callout_pending() bits to its clients, which in turn can make the clients behave badly. I further did an investigation on how the safety of callout migration between CPU's is maintained. When I looked into the code and found stuff like "volatile" and "while()" loops to figure which CPU a callout belongs I understood that such logic completely undermines the cleverness found in the turnstiles of mutexes and decided to go through all of the logic inside "kern_timeout.c". Also static code analysis is harder when we don't use the basic mutexes and condition variables available in the kernel. First of all we need to make some driving rules for everyone: 1) A new feature called direct callbacks which execute the timer callbacks from the fast interrupt handler was added. All these callbacks _must_ be associated with a regular spinlocks, to maintain a safe callout_drain(). Else they should only be executed on CPU0. 2) All Giant locked callbacks should only execute on CPU0 to avoid congestion. 3) Callbacks using read-only locks for its callback should also only execute on CPU0 to avoid multiple instances pending for completion on multiple CPU's, because read-only locks can be entered multiple times. From what I can see, there are currently no consumers of this feature in the kernel. Secondly, we need a way to drain callbacks asynchronously, for example like in the TCP stack. Currently the TCP shutdown sequence for a connection looks like this: ... void pcb_teardown(pcb) { lock(pcb); callout_stop(&pcb->c1); callout_stop(&pcb->c2); callout_stop(&pcb->c3); unlock(pcb); free(pcb); } I guess some of you see what is wrong: Use after free. ... With a new function I've added, scenarios like this can be solved more elegantly and without having to fear use after free situations! static void pcb_callback(void *pcb) { lock(pcb); do_free = (--(pcb->refcount) == 0); unlock(pcb); if (do_free == 0) return; destroy_mtx(pcb); free(pcb); } void pcb_teardown(pcb) { lock(pcb); pcb->refcount = 4; if (callout_drain_async(&pcb->c1, pcb_callback, pcb) == 0) pcb->refcount--; if (callout_drain_async(&pcb->c2, pcb_callback, pcb) == 0) pcb->refcount--; if (callout_drain_async(&pcb->c3, pcb_callback, pcb) == 0) pcb->refcount--; unlock(pcb); pcb_callback(pcb); } Please find attached a patch for 11-current as of today. Comments and feedback is appreciated. BTW: Happy New Year everyone! --HPS --------------080805010304000705000900 Content-Type: text/x-patch; name="kern_timeout_wip.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="kern_timeout_wip.diff" Index: sys/kern/kern_timeout.c =================================================================== --- sys/kern/kern_timeout.c (revision 276240) +++ sys/kern/kern_timeout.c (working copy) @@ -125,36 +125,56 @@ u_int callwheelsize, callwheelmask; /* - * The callout cpu exec entities represent informations necessary for - * describing the state of callouts currently running on the CPU and the ones - * necessary for migrating callouts to the new callout cpu. In particular, - * the first entry of the array cc_exec_entity holds informations for callout - * running in SWI thread context, while the second one holds informations - * for callout running directly from hardware interrupt context. - * The cached informations are very important for deferring migration when - * the migrating callout is already running. + * The callout CPU exec structure represent information necessary for + * describing the state of callouts currently running on the CPU and + * for handling deferred callout restarts. + * + * In particular, the first entry of the array cc_exec_entity holds + * information for callouts running from the SWI thread context, while + * the second one holds information for callouts running directly from + * the hardware interrupt context. */ struct cc_exec { - struct callout *cc_next; + /* + * The "cc_curr" points to the currently executing callout and + * is write protected by the "cc_lock" spinlock. If no + * callback is currently executing it is equal to "NULL". + */ struct callout *cc_curr; -#ifdef SMP - void (*ce_migration_func)(void *); - void *ce_migration_arg; - int ce_migration_cpu; - sbintime_t ce_migration_time; - sbintime_t ce_migration_prec; -#endif - bool cc_cancel; - bool cc_waiting; + /* + * The "cc_restart_args" structure holds the argument for a + * deferred callback restart and is write protected by the + * "cc_lock" spinlock. The structure is only valid if + * "cc_restart" is "true". If "cc_restart" is "false" the + * information in the "cc_restart_args" structure shall be + * ignored. + */ + struct callout_args cc_restart_args; + bool cc_restart; + /* + * The "cc_cancel" variable allows the currently pending + * callback to be atomically cancelled. This field is write + * protected by the "cc_lock" spinlock. + */ + bool cc_cancel; + /* + * The "cc_drain_fn" points to a function which shall be + * called with the argument stored in "cc_drain_arg" when an + * asynchronous drain is performed. This field is write + * protected by the "cc_lock" spinlock. + */ + callout_func_t *cc_drain_fn; + void *cc_drain_arg; }; /* - * There is one struct callout_cpu per cpu, holding all relevant + * There is one "struct callout_cpu" per CPU, holding all relevant * state for the callout processing thread on the individual CPU. */ struct callout_cpu { struct mtx_padalign cc_lock; struct cc_exec cc_exec_entity[2]; + struct callout *cc_exec_next_dir; struct callout *cc_callout; struct callout_list *cc_callwheel; struct callout_tailq cc_expireq; @@ -166,27 +186,23 @@ char cc_ktr_event_name[20]; }; -#define cc_exec_curr cc_exec_entity[0].cc_curr -#define cc_exec_next cc_exec_entity[0].cc_next -#define cc_exec_cancel cc_exec_entity[0].cc_cancel -#define cc_exec_waiting cc_exec_entity[0].cc_waiting -#define cc_exec_curr_dir cc_exec_entity[1].cc_curr -#define cc_exec_next_dir cc_exec_entity[1].cc_next -#define cc_exec_cancel_dir cc_exec_entity[1].cc_cancel -#define cc_exec_waiting_dir cc_exec_entity[1].cc_waiting +/* + * When "CALLOUT_RDONLY_C_CPU(c)" returns "true" it indicates the + * given callback doesn't have any locks associated with it or are + * using the Giant lock or a shared kind of lock. For these kind of + * callbacks the "c_cpu" field must remain constant, hence there is no + * lock to protect updates to this field. + * + * When "CALLOUT_RDONLY_C_CPU(c)" returns "false" it indicates that + * "c->c_lock" is protecting any changes to "c->c_cpu", for the + * specified callback. + */ +#define CALLOUT_RDONLY_C_CPU(c) \ + ((c)->c_lock == NULL || (c)->c_lock == &Giant.lock_object || \ + ((c)->c_flags & (CALLOUT_LOCAL_ALLOC | CALLOUT_RETURNUNLOCKED | \ + CALLOUT_SHAREDLOCK)) != 0) #ifdef SMP -#define cc_migration_func cc_exec_entity[0].ce_migration_func -#define cc_migration_arg cc_exec_entity[0].ce_migration_arg -#define cc_migration_cpu cc_exec_entity[0].ce_migration_cpu -#define cc_migration_time cc_exec_entity[0].ce_migration_time -#define cc_migration_prec cc_exec_entity[0].ce_migration_prec -#define cc_migration_func_dir cc_exec_entity[1].ce_migration_func -#define cc_migration_arg_dir cc_exec_entity[1].ce_migration_arg -#define cc_migration_cpu_dir cc_exec_entity[1].ce_migration_cpu -#define cc_migration_time_dir cc_exec_entity[1].ce_migration_time -#define cc_migration_prec_dir cc_exec_entity[1].ce_migration_prec - struct callout_cpu cc_cpu[MAXCPU]; #define CPUBLOCK MAXCPU #define CC_CPU(cpu) (&cc_cpu[(cpu)]) @@ -211,62 +227,11 @@ static MALLOC_DEFINE(M_CALLOUT, "callout", "Callout datastructures"); -/** - * Locked by cc_lock: - * cc_curr - If a callout is in progress, it is cc_curr. - * If cc_curr is non-NULL, threads waiting in - * callout_drain() will be woken up as soon as the - * relevant callout completes. - * cc_cancel - Changing to 1 with both callout_lock and cc_lock held - * guarantees that the current callout will not run. - * The softclock() function sets this to 0 before it - * drops callout_lock to acquire c_lock, and it calls - * the handler only if curr_cancelled is still 0 after - * cc_lock is successfully acquired. - * cc_waiting - If a thread is waiting in callout_drain(), then - * callout_wait is nonzero. Set only when - * cc_curr is non-NULL. - */ - /* - * Resets the execution entity tied to a specific callout cpu. + * Kernel low level callwheel initialization called from cpu0 during + * kernel startup: */ static void -cc_cce_cleanup(struct callout_cpu *cc, int direct) -{ - - cc->cc_exec_entity[direct].cc_curr = NULL; - cc->cc_exec_entity[direct].cc_next = NULL; - cc->cc_exec_entity[direct].cc_cancel = false; - cc->cc_exec_entity[direct].cc_waiting = false; -#ifdef SMP - cc->cc_exec_entity[direct].ce_migration_cpu = CPUBLOCK; - cc->cc_exec_entity[direct].ce_migration_time = 0; - cc->cc_exec_entity[direct].ce_migration_prec = 0; - cc->cc_exec_entity[direct].ce_migration_func = NULL; - cc->cc_exec_entity[direct].ce_migration_arg = NULL; -#endif -} - -/* - * Checks if migration is requested by a specific callout cpu. - */ -static int -cc_cce_migrating(struct callout_cpu *cc, int direct) -{ - -#ifdef SMP - return (cc->cc_exec_entity[direct].ce_migration_cpu != CPUBLOCK); -#else - return (0); -#endif -} - -/* - * Kernel low level callwheel initialization - * called on cpu0 during kernel startup. - */ -static void callout_callwheel_init(void *dummy) { struct callout_cpu *cc; @@ -324,8 +289,6 @@ LIST_INIT(&cc->cc_callwheel[i]); TAILQ_INIT(&cc->cc_expireq); cc->cc_firstevent = SBT_MAX; - for (i = 0; i < 2; i++) - cc_cce_cleanup(cc, i); snprintf(cc->cc_ktr_event_name, sizeof(cc->cc_ktr_event_name), "callwheel cpu %d", cpu); if (cc->cc_callout == NULL) /* Only cpu0 handles timeout(9) */ @@ -338,37 +301,7 @@ } } -#ifdef SMP /* - * Switches the cpu tied to a specific callout. - * The function expects a locked incoming callout cpu and returns with - * locked outcoming callout cpu. - */ -static struct callout_cpu * -callout_cpu_switch(struct callout *c, struct callout_cpu *cc, int new_cpu) -{ - struct callout_cpu *new_cc; - - MPASS(c != NULL && cc != NULL); - CC_LOCK_ASSERT(cc); - - /* - * Avoid interrupts and preemption firing after the callout cpu - * is blocked in order to avoid deadlocks as the new thread - * may be willing to acquire the callout cpu lock. - */ - c->c_cpu = CPUBLOCK; - spinlock_enter(); - CC_UNLOCK(cc); - new_cc = CC_CPU(new_cpu); - CC_LOCK(new_cc); - spinlock_exit(); - c->c_cpu = new_cpu; - return (new_cc); -} -#endif - -/* * Start standard softclock thread. */ static void @@ -562,49 +495,54 @@ callout_lock(struct callout *c) { struct callout_cpu *cc; - int cpu; - - for (;;) { - cpu = c->c_cpu; -#ifdef SMP - if (cpu == CPUBLOCK) { - while (c->c_cpu == CPUBLOCK) - cpu_spinwait(); - continue; - } -#endif - cc = CC_CPU(cpu); - CC_LOCK(cc); - if (cpu == c->c_cpu) - break; - CC_UNLOCK(cc); - } + cc = CC_CPU(c->c_cpu); + CC_LOCK(cc); return (cc); } -static void -callout_cc_add(struct callout *c, struct callout_cpu *cc, - sbintime_t sbt, sbintime_t precision, void (*func)(void *), - void *arg, int cpu, int flags) +static struct callout_cpu * +callout_cc_add_locked(struct callout *c, struct callout_cpu *cc, + struct callout_args *coa) { +#ifndef NO_EVENTTIMERS + sbintime_t sbt; +#endif int bucket; CC_LOCK_ASSERT(cc); - if (sbt < cc->cc_lastscan) - sbt = cc->cc_lastscan; - c->c_arg = arg; - c->c_flags |= (CALLOUT_ACTIVE | CALLOUT_PENDING); - if (flags & C_DIRECT_EXEC) - c->c_flags |= CALLOUT_DIRECT; - c->c_flags &= ~CALLOUT_PROCESSED; - c->c_func = func; - c->c_time = sbt; - c->c_precision = precision; + + /* update flags before swapping locks, if any */ + c->c_flags &= ~(CALLOUT_PROCESSED | CALLOUT_DIRECT | CALLOUT_DEFRESTART); + if (coa->flags & C_DIRECT_EXEC) + c->c_flags |= (CALLOUT_ACTIVE | CALLOUT_PENDING | CALLOUT_DIRECT); + else + c->c_flags |= (CALLOUT_ACTIVE | CALLOUT_PENDING); + +#ifdef SMP + /* + * Check if we are changing the CPU on which the callback + * should be executed: + */ + if (coa->cpu != c->c_cpu) { + CC_UNLOCK(cc); + c->c_cpu = coa->cpu; + cc = callout_lock(c); + } +#endif + if (coa->time < cc->cc_lastscan) + coa->time = cc->cc_lastscan; + c->c_arg = coa->arg; + c->c_func = coa->func; + c->c_time = coa->time; + c->c_precision = coa->precision; + bucket = callout_get_bucket(c->c_time); CTR3(KTR_CALLOUT, "precision set for %p: %d.%08x", c, (int)(c->c_precision >> 32), (u_int)(c->c_precision & 0xffffffff)); LIST_INSERT_HEAD(&cc->cc_callwheel[bucket], c, c_links.le); + + /* Ensure we are first to be scanned, if called via a callback */ if (cc->cc_bucket == bucket) cc->cc_exec_next_dir = c; #ifndef NO_EVENTTIMERS @@ -617,9 +555,10 @@ sbt = c->c_time + c->c_precision; if (sbt < cc->cc_firstevent) { cc->cc_firstevent = sbt; - cpu_new_callout(cpu, sbt, c->c_time); + cpu_new_callout(coa->cpu, sbt, c->c_time); } #endif + return (cc); } static void @@ -626,8 +565,6 @@ callout_cc_del(struct callout *c, struct callout_cpu *cc) { - if ((c->c_flags & CALLOUT_LOCAL_ALLOC) == 0) - return; c->c_func = NULL; SLIST_INSERT_HEAD(&cc->cc_callfree, c, c_links.sle); } @@ -640,19 +577,13 @@ int direct) { struct rm_priotracker tracker; - void (*c_func)(void *); + struct callout_cpu *cc_orig = cc; + callout_func_t *c_func; void *c_arg; struct lock_class *class; struct lock_object *c_lock; uintptr_t lock_status; int c_flags; -#ifdef SMP - struct callout_cpu *new_cc; - void (*new_func)(void *); - void *new_arg; - int flags, new_cpu; - sbintime_t new_prec, new_time; -#endif #if defined(DIAGNOSTIC) || defined(CALLOUT_PROFILING) sbintime_t sbt1, sbt2; struct timespec ts2; @@ -675,25 +606,36 @@ c_func = c->c_func; c_arg = c->c_arg; c_flags = c->c_flags; - if (c->c_flags & CALLOUT_LOCAL_ALLOC) + if (c_flags & CALLOUT_LOCAL_ALLOC) c->c_flags = CALLOUT_LOCAL_ALLOC; else c->c_flags &= ~CALLOUT_PENDING; + + /* reset our local state */ cc->cc_exec_entity[direct].cc_curr = c; cc->cc_exec_entity[direct].cc_cancel = false; + cc->cc_exec_entity[direct].cc_restart = false; + cc->cc_exec_entity[direct].cc_drain_fn = NULL; + cc->cc_exec_entity[direct].cc_drain_arg = NULL; CC_UNLOCK(cc); + + /* unlocked region for switching locks, if any */ + if (c_lock != NULL) { class->lc_lock(c_lock, lock_status); /* - * The callout may have been cancelled - * while we switched locks. + * Check if the callout may have been cancelled while + * we were switching locks: */ if (cc->cc_exec_entity[direct].cc_cancel) { - class->lc_unlock(c_lock); - goto skip; + if ((c_flags & CALLOUT_RETURNUNLOCKED) != 0) + class->lc_unlock(c_lock); + goto skip_callback; } - /* The callout cannot be stopped now. */ + + /* The callout cannot be stopped now! */ cc->cc_exec_entity[direct].cc_cancel = true; + if (c_lock == &Giant.lock_object) { #ifdef CALLOUT_PROFILING (*gcalls)++; @@ -740,85 +682,49 @@ #endif KTR_STATE0(KTR_SCHED, "callout", cc->cc_ktr_event_name, "idle"); CTR1(KTR_CALLOUT, "callout %p finished", c); - if ((c_flags & CALLOUT_RETURNUNLOCKED) == 0) - class->lc_unlock(c_lock); -skip: + +skip_callback: CC_LOCK(cc); KASSERT(cc->cc_exec_entity[direct].cc_curr == c, ("mishandled cc_curr")); cc->cc_exec_entity[direct].cc_curr = NULL; - if (cc->cc_exec_entity[direct].cc_waiting) { - /* - * There is someone waiting for the - * callout to complete. - * If the callout was scheduled for - * migration just cancel it. - */ - if (cc_cce_migrating(cc, direct)) { - cc_cce_cleanup(cc, direct); - /* - * It should be assert here that the callout is not - * destroyed but that is not easy. - */ - c->c_flags &= ~CALLOUT_DFRMIGRATION; - } - cc->cc_exec_entity[direct].cc_waiting = false; - CC_UNLOCK(cc); - wakeup(&cc->cc_exec_entity[direct].cc_waiting); - CC_LOCK(cc); - } else if (cc_cce_migrating(cc, direct)) { - KASSERT((c_flags & CALLOUT_LOCAL_ALLOC) == 0, - ("Migrating legacy callout %p", c)); -#ifdef SMP - /* - * If the callout was scheduled for - * migration just perform it now. - */ - new_cpu = cc->cc_exec_entity[direct].ce_migration_cpu; - new_time = cc->cc_exec_entity[direct].ce_migration_time; - new_prec = cc->cc_exec_entity[direct].ce_migration_prec; - new_func = cc->cc_exec_entity[direct].ce_migration_func; - new_arg = cc->cc_exec_entity[direct].ce_migration_arg; - cc_cce_cleanup(cc, direct); + /* + * At this point the callback structure might have been freed, + * so we need to check the copied "c->c_flags" to figure out + * if the callback is preallocated or not! + */ + if (c_flags & CALLOUT_LOCAL_ALLOC) { + /* return callout back to freelist */ + callout_cc_del(c, cc); + } else if (cc->cc_exec_entity[direct].cc_restart) { + /* [re-]schedule callout, if any */ + cc = callout_cc_add_locked(c, cc, + &cc->cc_exec_entity[direct].cc_restart_args); + } - /* - * It should be assert here that the callout is not destroyed - * but that is not easy. - * - * As first thing, handle deferred callout stops. - */ - if ((c->c_flags & CALLOUT_DFRMIGRATION) == 0) { - CTR3(KTR_CALLOUT, - "deferred cancelled %p func %p arg %p", - c, new_func, new_arg); - callout_cc_del(c, cc); - return; - } - c->c_flags &= ~CALLOUT_DFRMIGRATION; + /* + * Unlock the callback lock at the end, if any, so that the + * "c_cpu" field in the callback structure gets write + * protected by this lock. + */ + if ((c_flags & CALLOUT_RETURNUNLOCKED) == 0) + class->lc_unlock(c_lock); - new_cc = callout_cpu_switch(c, cc, new_cpu); - flags = (direct) ? C_DIRECT_EXEC : 0; - callout_cc_add(c, new_cc, new_time, new_prec, new_func, - new_arg, new_cpu, flags); - CC_UNLOCK(new_cc); - CC_LOCK(cc); -#else - panic("migration should not happen"); -#endif - } /* - * If the current callout is locally allocated (from - * timeout(9)) then put it on the freelist. - * - * Note: we need to check the cached copy of c_flags because - * if it was not local, then it's not safe to deref the - * callout pointer. + * Check if there is anything which needs draining. There is + * no need to lock any locks here, hence the calling thread is + * the only one writing these fields! */ - KASSERT((c_flags & CALLOUT_LOCAL_ALLOC) == 0 || - c->c_flags == CALLOUT_LOCAL_ALLOC, - ("corrupted callout")); - if (c_flags & CALLOUT_LOCAL_ALLOC) - callout_cc_del(c, cc); + if (cc->cc_exec_entity[direct].cc_drain_fn != NULL) { + CC_UNLOCK(cc); + cc->cc_exec_entity[direct].cc_drain_fn( + cc->cc_exec_entity[direct].cc_drain_arg); + CC_LOCK(cc_orig); + } else if (cc_orig != cc) { + /* switch back locks again, must return locked */ + CC_UNLOCK(cc); + CC_LOCK(cc_orig); + } } /* @@ -899,10 +805,11 @@ /* XXX Attempt to malloc first */ panic("timeout table full"); SLIST_REMOVE_HEAD(&cc->cc_callfree, c_links.sle); - callout_reset(new, to_ticks, ftn, arg); handle.callout = new; CC_UNLOCK(cc); + callout_reset(new, to_ticks, ftn, arg); + return (handle); } @@ -910,6 +817,7 @@ untimeout(timeout_t *ftn, void *arg, struct callout_handle handle) { struct callout_cpu *cc; + bool match; /* * Check for a handle that was initialized @@ -920,9 +828,11 @@ return; cc = callout_lock(handle.callout); - if (handle.callout->c_func == ftn && handle.callout->c_arg == arg) + match = (handle.callout->c_func == ftn && handle.callout->c_arg == arg); + CC_UNLOCK(cc); + + if (match) callout_stop(handle.callout); - CC_UNLOCK(cc); } void @@ -931,6 +841,143 @@ handle->callout = NULL; } +#if defined(DIAGNOSTIC) +static void +callout_assert_locks(struct callout *c) +{ + struct lock_class *class; + /* + * Some old subsystems don't hold Giant while stopping and + * starting callouts. Discard any asserts if the Giant lock is + * specified. + */ + if (c->c_lock == NULL || c->c_lock == &Giant.lock_object) + return; + class = LOCK_CLASS(c->c_lock); + class->lc_assert(c->c_lock, LA_XLOCKED); +} +#endif + +static int +callout_restart_async(struct callout *c, struct callout_args *coa, + callout_func_t *drain_fn, void *drain_arg) +{ + struct callout_cpu *cc; + int cancelled; + int direct; + +#if defined(DIAGNOSTIC) + callout_assert_locks(c); +#endif + cc = callout_lock(c); + +#ifdef SMP + /* don't change CPU, if any */ + if (coa != NULL && CALLOUT_RDONLY_C_CPU(c)) + coa->cpu = c->c_cpu; +#endif + /* Figure out if the callout is direct or not */ + direct = ((c->c_flags & CALLOUT_DIRECT) != 0); + + /* + * Check if the callback is currently scheduled for + * completion: + */ + if (cc->cc_exec_entity[direct].cc_curr == c) { + /* + * Try to prevent the callback from running by setting + * the "cc_cancel" variable to "true". Also check if + * the callout was previously subject to a deferred + * callout restart: + */ + if (cc->cc_exec_entity[direct].cc_cancel == false || + (c->c_flags & CALLOUT_DEFRESTART) != 0) { + cc->cc_exec_entity[direct].cc_cancel = true; + cancelled = 1; + } else { + cancelled = 0; + } + + /* + * Prevent callback restart if "callout_drain_xxx()" + * is being called or we are stopping the callout or + * the callback was preallocated by us: + */ + if (cc->cc_exec_entity[direct].cc_drain_fn != NULL || + coa == NULL || (c->c_flags & CALLOUT_LOCAL_ALLOC) != 0) { + CTR4(KTR_CALLOUT, "%s %p func %p arg %p", + cancelled ? "cancelled and draining" : "draining", + c, c->c_func, c->c_arg); + + /* set drain function, if any */ + if (drain_fn != NULL) { + cc->cc_exec_entity[direct].cc_drain_fn = drain_fn; + cc->cc_exec_entity[direct].cc_drain_arg = drain_arg; + cancelled |= 2; /* XXX define the value */ + } + /* clear old flags, if any */ + c->c_flags &= ~(CALLOUT_ACTIVE | CALLOUT_PENDING | + CALLOUT_DEFRESTART | CALLOUT_PROCESSED); + + /* clear restart flag, if any */ + cc->cc_exec_entity[direct].cc_restart = false; + } else { + CTR4(KTR_CALLOUT, "%s %p func %p arg %p", + cancelled ? "cancelled and restarting" : "restarting", + c, c->c_func, c->c_arg); + + /* get us back into the game */ + c->c_flags |= (CALLOUT_ACTIVE | CALLOUT_PENDING | + CALLOUT_DEFRESTART); + c->c_flags &= ~CALLOUT_PROCESSED; + + /* enable deferred restart */ + cc->cc_exec_entity[direct].cc_restart = true; + + /* store arguments for the deferred restart, if any */ + cc->cc_exec_entity[direct].cc_restart_args = *coa; + } + } else { + /* stop callout */ + if (c->c_flags & CALLOUT_PENDING) { + /* + * The callback has not yet been executed, and + * we simply just need to unlink it: + */ + if ((c->c_flags & CALLOUT_PROCESSED) == 0) { + if (cc->cc_exec_next_dir == c) + cc->cc_exec_next_dir = LIST_NEXT(c, c_links.le); + LIST_REMOVE(c, c_links.le); + } else { + TAILQ_REMOVE(&cc->cc_expireq, c, c_links.tqe); + } + cancelled = 1; + } else { + cancelled = 0; + } + + /* [re-]schedule callout, if any */ + if (coa != NULL) { + cc = callout_cc_add_locked(c, cc, coa); + } else { + /* clear old flags, if any */ + c->c_flags &= ~(CALLOUT_ACTIVE | CALLOUT_PENDING | + CALLOUT_DEFRESTART | CALLOUT_PROCESSED); + + /* return callback to pre-allocated list, if any */ + if ((c->c_flags & CALLOUT_LOCAL_ALLOC) && cancelled != 0) { + callout_cc_del(c, cc); + } + } + + CTR4(KTR_CALLOUT, "%s %p func %p arg %p", + cancelled ? "rescheduled" : "scheduled", + c, c->c_func, c->c_arg); + } + CC_UNLOCK(cc); + return (cancelled); +} + /* * New interface; clients allocate their own callout structures. * @@ -949,25 +996,32 @@ */ int callout_reset_sbt_on(struct callout *c, sbintime_t sbt, sbintime_t precision, - void (*ftn)(void *), void *arg, int cpu, int flags) + callout_func_t *ftn, void *arg, int cpu, int flags) { - sbintime_t to_sbt, pr; - struct callout_cpu *cc; - int cancelled, direct; + struct callout_args coa; - cancelled = 0; - if (flags & C_ABSOLUTE) { - to_sbt = sbt; + /* store arguments for callout add function */ + coa.func = ftn; + coa.arg = arg; + coa.precision = precision; + coa.flags = flags; + coa.cpu = cpu; + + /* compute the rest of the arguments needed */ + if (coa.flags & C_ABSOLUTE) { + coa.time = sbt; } else { - if ((flags & C_HARDCLOCK) && (sbt < tick_sbt)) + sbintime_t pr; + + if ((coa.flags & C_HARDCLOCK) && (sbt < tick_sbt)) sbt = tick_sbt; - if ((flags & C_HARDCLOCK) || + if ((coa.flags & C_HARDCLOCK) || #ifdef NO_EVENTTIMERS sbt >= sbt_timethreshold) { - to_sbt = getsbinuptime(); + coa.time = getsbinuptime(); /* Add safety belt for the case of hz > 1000. */ - to_sbt += tc_tick_sbt - tick_sbt; + coa.time += tc_tick_sbt - tick_sbt; #else sbt >= sbt_tickthreshold) { /* @@ -977,101 +1031,29 @@ * active ones. */ #ifdef __LP64__ - to_sbt = DPCPU_GET(hardclocktime); + coa.time = DPCPU_GET(hardclocktime); #else spinlock_enter(); - to_sbt = DPCPU_GET(hardclocktime); + coa.time = DPCPU_GET(hardclocktime); spinlock_exit(); #endif #endif - if ((flags & C_HARDCLOCK) == 0) - to_sbt += tick_sbt; + if ((coa.flags & C_HARDCLOCK) == 0) + coa.time += tick_sbt; } else - to_sbt = sbinuptime(); - if (SBT_MAX - to_sbt < sbt) - to_sbt = SBT_MAX; + coa.time = sbinuptime(); + if (SBT_MAX - coa.time < sbt) + coa.time = SBT_MAX; else - to_sbt += sbt; - pr = ((C_PRELGET(flags) < 0) ? sbt >> tc_precexp : - sbt >> C_PRELGET(flags)); - if (pr > precision) - precision = pr; + coa.time += sbt; + pr = ((C_PRELGET(coa.flags) < 0) ? sbt >> tc_precexp : + sbt >> C_PRELGET(coa.flags)); + if (pr > coa.precision) + coa.precision = pr; } - /* - * Don't allow migration of pre-allocated callouts lest they - * become unbalanced. - */ - if (c->c_flags & CALLOUT_LOCAL_ALLOC) - cpu = c->c_cpu; - direct = (c->c_flags & CALLOUT_DIRECT) != 0; - KASSERT(!direct || c->c_lock == NULL, - ("%s: direct callout %p has lock", __func__, c)); - cc = callout_lock(c); - if (cc->cc_exec_entity[direct].cc_curr == c) { - /* - * We're being asked to reschedule a callout which is - * currently in progress. If there is a lock then we - * can cancel the callout if it has not really started. - */ - if (c->c_lock != NULL && !cc->cc_exec_entity[direct].cc_cancel) - cancelled = cc->cc_exec_entity[direct].cc_cancel = true; - if (cc->cc_exec_entity[direct].cc_waiting) { - /* - * Someone has called callout_drain to kill this - * callout. Don't reschedule. - */ - CTR4(KTR_CALLOUT, "%s %p func %p arg %p", - cancelled ? "cancelled" : "failed to cancel", - c, c->c_func, c->c_arg); - CC_UNLOCK(cc); - return (cancelled); - } - } - if (c->c_flags & CALLOUT_PENDING) { - if ((c->c_flags & CALLOUT_PROCESSED) == 0) { - if (cc->cc_exec_next_dir == c) - cc->cc_exec_next_dir = LIST_NEXT(c, c_links.le); - LIST_REMOVE(c, c_links.le); - } else - TAILQ_REMOVE(&cc->cc_expireq, c, c_links.tqe); - cancelled = 1; - c->c_flags &= ~(CALLOUT_ACTIVE | CALLOUT_PENDING); - } -#ifdef SMP - /* - * If the callout must migrate try to perform it immediately. - * If the callout is currently running, just defer the migration - * to a more appropriate moment. - */ - if (c->c_cpu != cpu) { - if (cc->cc_exec_entity[direct].cc_curr == c) { - cc->cc_exec_entity[direct].ce_migration_cpu = cpu; - cc->cc_exec_entity[direct].ce_migration_time - = to_sbt; - cc->cc_exec_entity[direct].ce_migration_prec - = precision; - cc->cc_exec_entity[direct].ce_migration_func = ftn; - cc->cc_exec_entity[direct].ce_migration_arg = arg; - c->c_flags |= CALLOUT_DFRMIGRATION; - CTR6(KTR_CALLOUT, - "migration of %p func %p arg %p in %d.%08x to %u deferred", - c, c->c_func, c->c_arg, (int)(to_sbt >> 32), - (u_int)(to_sbt & 0xffffffff), cpu); - CC_UNLOCK(cc); - return (cancelled); - } - cc = callout_cpu_switch(c, cc, cpu); - } -#endif - - callout_cc_add(c, cc, to_sbt, precision, ftn, arg, cpu, flags); - CTR6(KTR_CALLOUT, "%sscheduled %p func %p arg %p in %d.%08x", - cancelled ? "re" : "", c, c->c_func, c->c_arg, (int)(to_sbt >> 32), - (u_int)(to_sbt & 0xffffffff)); - CC_UNLOCK(cc); - - return (cancelled); + /* get callback started, if any */ + return (callout_restart_async(c, &coa, NULL, NULL)); } /* @@ -1090,175 +1072,74 @@ } int -_callout_stop_safe(struct callout *c, int safe) +callout_stop(struct callout *c) { - struct callout_cpu *cc, *old_cc; + /* get callback stopped, if any */ + return (callout_restart_async(c, NULL, NULL, NULL)); +} + +static void +callout_drain_function(void *arg) +{ + wakeup(arg); +} + +int +callout_drain_async(struct callout *c, callout_func_t *fn, void *arg) +{ + /* get callback stopped, if any */ + return (callout_restart_async(c, NULL, fn, arg) & 2); +} + +int +callout_drain(struct callout *c) +{ struct lock_class *class; - int direct, sq_locked, use_lock; + int cancelled; - /* - * Some old subsystems don't hold Giant while running a callout_stop(), - * so just discard this check for the moment. - */ - if (!safe && c->c_lock != NULL) { - if (c->c_lock == &Giant.lock_object) - use_lock = mtx_owned(&Giant); - else { - use_lock = 1; - class = LOCK_CLASS(c->c_lock); - class->lc_assert(c->c_lock, LA_XLOCKED); - } - } else - use_lock = 0; - direct = (c->c_flags & CALLOUT_DIRECT) != 0; - sq_locked = 0; - old_cc = NULL; -again: - cc = callout_lock(c); - - /* - * If the callout was migrating while the callout cpu lock was - * dropped, just drop the sleepqueue lock and check the states - * again. - */ - if (sq_locked != 0 && cc != old_cc) { -#ifdef SMP - CC_UNLOCK(cc); - sleepq_release(&old_cc->cc_exec_entity[direct].cc_waiting); - sq_locked = 0; - old_cc = NULL; - goto again; -#else - panic("migration should not happen"); -#endif + if (c->c_lock != NULL && c->c_lock != &Giant.lock_object) { + class = LOCK_CLASS(c->c_lock); + class->lc_lock(c->c_lock, 0); + } else { + class = NULL; } - /* - * If the callout isn't pending, it's not on the queue, so - * don't attempt to remove it from the queue. We can try to - * stop it by other means however. - */ - if (!(c->c_flags & CALLOUT_PENDING)) { - c->c_flags &= ~CALLOUT_ACTIVE; + /* at this point the "c->c_cpu" field is not changing */ + cancelled = callout_drain_async(c, &callout_drain_function, c); + + if (cancelled & 2) { + struct callout_cpu *cc; + int direct; + + CTR3(KTR_CALLOUT, "need to drain %p func %p arg %p", + c, c->c_func, c->c_arg); + + cc = callout_lock(c); + direct = ((c->c_flags & CALLOUT_DIRECT) != 0); + /* - * If it wasn't on the queue and it isn't the current - * callout, then we can't stop it, so just bail. + * We've gotten our callout CPU lock, it is safe to + * drop the initial lock: */ - if (cc->cc_exec_entity[direct].cc_curr != c) { - CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p", - c, c->c_func, c->c_arg); - CC_UNLOCK(cc); - if (sq_locked) - sleepq_release( - &cc->cc_exec_entity[direct].cc_waiting); - return (0); - } + if (class != NULL) + class->lc_unlock(c->c_lock); - if (safe) { - /* - * The current callout is running (or just - * about to run) and blocking is allowed, so - * just wait for the current invocation to - * finish. - */ - while (cc->cc_exec_entity[direct].cc_curr == c) { - /* - * Use direct calls to sleepqueue interface - * instead of cv/msleep in order to avoid - * a LOR between cc_lock and sleepqueue - * chain spinlocks. This piece of code - * emulates a msleep_spin() call actually. - * - * If we already have the sleepqueue chain - * locked, then we can safely block. If we - * don't already have it locked, however, - * we have to drop the cc_lock to lock - * it. This opens several races, so we - * restart at the beginning once we have - * both locks. If nothing has changed, then - * we will end up back here with sq_locked - * set. - */ - if (!sq_locked) { - CC_UNLOCK(cc); - sleepq_lock( - &cc->cc_exec_entity[direct].cc_waiting); - sq_locked = 1; - old_cc = cc; - goto again; - } + /* Wait for drain to complete */ - /* - * Migration could be cancelled here, but - * as long as it is still not sure when it - * will be packed up, just let softclock() - * take care of it. - */ - cc->cc_exec_entity[direct].cc_waiting = true; - DROP_GIANT(); - CC_UNLOCK(cc); - sleepq_add( - &cc->cc_exec_entity[direct].cc_waiting, - &cc->cc_lock.lock_object, "codrain", - SLEEPQ_SLEEP, 0); - sleepq_wait( - &cc->cc_exec_entity[direct].cc_waiting, - 0); - sq_locked = 0; - old_cc = NULL; + while (cc->cc_exec_entity[direct].cc_curr == c) + msleep_spin(c, (struct mtx *)&cc->cc_lock, "codrain", 0); - /* Reacquire locks previously released. */ - PICKUP_GIANT(); - CC_LOCK(cc); - } - } else if (use_lock && - !cc->cc_exec_entity[direct].cc_cancel) { - /* - * The current callout is waiting for its - * lock which we hold. Cancel the callout - * and return. After our caller drops the - * lock, the callout will be skipped in - * softclock(). - */ - cc->cc_exec_entity[direct].cc_cancel = true; - CTR3(KTR_CALLOUT, "cancelled %p func %p arg %p", - c, c->c_func, c->c_arg); - KASSERT(!cc_cce_migrating(cc, direct), - ("callout wrongly scheduled for migration")); - CC_UNLOCK(cc); - KASSERT(!sq_locked, ("sleepqueue chain locked")); - return (1); - } else if ((c->c_flags & CALLOUT_DFRMIGRATION) != 0) { - c->c_flags &= ~CALLOUT_DFRMIGRATION; - CTR3(KTR_CALLOUT, "postponing stop %p func %p arg %p", - c, c->c_func, c->c_arg); - CC_UNLOCK(cc); - return (1); - } - CTR3(KTR_CALLOUT, "failed to stop %p func %p arg %p", - c, c->c_func, c->c_arg); CC_UNLOCK(cc); - KASSERT(!sq_locked, ("sleepqueue chain still locked")); - return (0); + } else { + if (class != NULL) + class->lc_unlock(c->c_lock); } - if (sq_locked) - sleepq_release(&cc->cc_exec_entity[direct].cc_waiting); - c->c_flags &= ~(CALLOUT_ACTIVE | CALLOUT_PENDING); - CTR3(KTR_CALLOUT, "cancelled %p func %p arg %p", c, c->c_func, c->c_arg); - if ((c->c_flags & CALLOUT_PROCESSED) == 0) { - if (cc->cc_exec_next_dir == c) - cc->cc_exec_next_dir = LIST_NEXT(c, c_links.le); - LIST_REMOVE(c, c_links.le); - } else - TAILQ_REMOVE(&cc->cc_expireq, c, c_links.tqe); - callout_cc_del(c, cc); - CC_UNLOCK(cc); - return (1); + return (cancelled & 1); } void Index: sys/sys/_callout.h =================================================================== --- sys/sys/_callout.h (revision 276240) +++ sys/sys/_callout.h (working copy) @@ -46,6 +46,17 @@ SLIST_HEAD(callout_slist, callout); TAILQ_HEAD(callout_tailq, callout); +typedef void callout_func_t(void *); + +struct callout_args { + sbintime_t time; /* absolute time for the event */ + sbintime_t precision; /* delta allowed wrt opt */ + void *arg; /* function argument */ + callout_func_t *func; /* function to call */ + int flags; /* flags passed to callout_reset() */ + int cpu; /* CPU we're scheduled on */ +}; + struct callout { union { LIST_ENTRY(callout) le; @@ -52,13 +63,13 @@ SLIST_ENTRY(callout) sle; TAILQ_ENTRY(callout) tqe; } c_links; - sbintime_t c_time; /* ticks to the event */ + sbintime_t c_time; /* absolute time for the event */ sbintime_t c_precision; /* delta allowed wrt opt */ void *c_arg; /* function argument */ - void (*c_func)(void *); /* function to call */ - struct lock_object *c_lock; /* lock to handle */ + callout_func_t *c_func; /* function to call */ + struct lock_object *c_lock; /* callback lock */ int c_flags; /* state of this entry */ - volatile int c_cpu; /* CPU we're scheduled on */ + int c_cpu; /* CPU we're scheduled on */ }; #endif Index: sys/sys/callout.h =================================================================== --- sys/sys/callout.h (revision 276240) +++ sys/sys/callout.h (working copy) @@ -46,7 +46,7 @@ #define CALLOUT_MPSAFE 0x0008 /* callout handler is mp safe */ #define CALLOUT_RETURNUNLOCKED 0x0010 /* handler returns with mtx unlocked */ #define CALLOUT_SHAREDLOCK 0x0020 /* callout lock held in shared mode */ -#define CALLOUT_DFRMIGRATION 0x0040 /* callout in deferred migration mode */ +#define CALLOUT_DEFRESTART 0x0040 /* callout restart is deferred */ #define CALLOUT_PROCESSED 0x0080 /* callout in wheel or processing list? */ #define CALLOUT_DIRECT 0x0100 /* allow exec from hw int context */ @@ -65,7 +65,8 @@ #ifdef _KERNEL #define callout_active(c) ((c)->c_flags & CALLOUT_ACTIVE) #define callout_deactivate(c) ((c)->c_flags &= ~CALLOUT_ACTIVE) -#define callout_drain(c) _callout_stop_safe(c, 1) +int callout_drain(struct callout *); +int callout_drain_async(struct callout *, callout_func_t *, void *); void callout_init(struct callout *, int); void _callout_init_lock(struct callout *, struct lock_object *, int); #define callout_init_mtx(c, mtx, flags) \ @@ -79,7 +80,7 @@ NULL, (flags)) #define callout_pending(c) ((c)->c_flags & CALLOUT_PENDING) int callout_reset_sbt_on(struct callout *, sbintime_t, sbintime_t, - void (*)(void *), void *, int, int); + callout_func_t *, void *, int, int); #define callout_reset_sbt(c, sbt, pr, fn, arg, flags) \ callout_reset_sbt_on((c), (sbt), (pr), (fn), (arg), (c)->c_cpu, (flags)) #define callout_reset_sbt_curcpu(c, sbt, pr, fn, arg, flags) \ @@ -103,8 +104,7 @@ int callout_schedule_on(struct callout *, int, int); #define callout_schedule_curcpu(c, on_tick) \ callout_schedule_on((c), (on_tick), PCPU_GET(cpuid)) -#define callout_stop(c) _callout_stop_safe(c, 0) -int _callout_stop_safe(struct callout *, int); +int callout_stop(struct callout *); void callout_process(sbintime_t now); #endif --------------080805010304000705000900-- From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 11:19:53 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 73F8231B for ; Tue, 30 Dec 2014 11:19:53 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0018D11FC for ; Tue, 30 Dec 2014 11:19:52 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sBUBJgi8045655 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 30 Dec 2014 13:19:42 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sBUBJgi8045655 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sBUBJfcX045654 for arch@freebsd.org; Tue, 30 Dec 2014 13:19:41 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 30 Dec 2014 13:19:41 +0200 From: Konstantin Belousov To: arch@freebsd.org Subject: Disabling ptrace Message-ID: <20141230111941.GE42409@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 11:19:53 -0000 The question about a facility to disable introspection functionality (ptrace etc) for a process was asked several times. The latest query made me actually code the feature. Note that other systems, e.g. Linux and OSX, do have similar facilities. Patch is below, it provides two new procctl(2) requests. PROC_TRACE_ENABLE enables or disables tracing. It includes core dumping, ptrace, ktrace, debugging sysctls and hwpmc. PROC_TRACE_STATUS allows to get the tracing state. Most interesting question is how should disabling of trace behave with regard of fork and exec. IMO, the right model is to protect access to the _program_ address space, which translates to inheritance of the attribute for fork, and reenabling the tracing on exec. On the other hand, I understand that some users want to inherit the tracing disable on exec, so there are PROC_TRACE_SET_DISABLED and PROC_TRACE_SET_DISABLED_EXEC, the later makes disable to be kept after exec. Note that it is trivial for root on the host to circumvent the feature. diff --git a/lib/libc/sys/procctl.2 b/lib/libc/sys/procctl.2 index 649e0ad..4e7e5b8 100644 --- a/lib/libc/sys/procctl.2 +++ b/lib/libc/sys/procctl.2 @@ -29,7 +29,7 @@ .\" .\" $FreeBSD$ .\" -.Dd December 16, 2014 +.Dd December 29, 2014 .Dt PROCCTL 2 .Os .Sh NAME @@ -275,6 +275,51 @@ delivery failed, e.g. due to the permission problems. If no such process exist, the .Fa rk_fpid field is set to -1. +.It Dv PROC_TRACE_ENABLE +Enable or disable tracing of the specified process(es), according to the +value of the integer argument. +Tracing includes attachment to the process using +.Xr ptrace 2 +and +.Xr ktrace 2 , +debugging sysctls, +.Xr hwpmc 4 +and core dumping. +Possible values for the +.Fa data +argument are: +.Bl -tag -width "Dv PROC_TRACE_SET_DISABLED_EXEC" +.It Dv PROC_TRACE_SET_ENABLED +Enable the tracing, after it was disabled by +.Dv PROC_TRACE_SET_DISABLED . +Only allowed for self. +.It Dv PROC_TRACE_SET_DISABLED +Disable tracing for the specified process. +The tracing is re-enabled when the process changes the executing +program with +.Xr execve 2 +syscall. +A child inherits the trace settings from the parent. +.It Dv PROC_TRACE_SET_DISABLED_EXEC +Same as +.Dv PROC_TRACE_SET_DISABLED , +but setting persist for the process even after +.Xr execve 2 . +.It Dv PROC_TRACE_STATUS +Returns the current status of tracing for the specified process into +the integer variable pointed to by +.Fa data . +If tracing is disabled, +.Fa data +is set to -1. +If tracing is enabled, but no debugger is attached by +.Xr ptrace 2 +syscall, the +.Fa data +is set to 0. +If a debugger is attached, +.Fa data +is set to the pid of the debugger process. .El .Sh RETURN VALUES If an error occurs, a value of -1 is returned and @@ -343,9 +388,28 @@ The .Dv PROC_REAP_ACQUIRE request was issued by a process that had already acquired reaper status and has not yet released it. +.It Bq Er EBUSY +The +.Dv PROC_TRACE_ENABLE +request was issued for the process already traced. +.It Bq Er EPERM +The +.Dv PROC_TRACE_ENABLE +request to re-enable tracing of the process ( +.Dv PROC_TRACE_SET_ENABLED ) , +or to disable persistence of the +.Dv PROC_TRACE_SET_DISABLED +on +.Xr execve 2 +was issued for the non-current process. +.It Bq Er EINVAL +The value of integer parameter for the +.Dv PROC_TRACE_ENABLE +request is invalid. .El .Sh SEE ALSO .Xr kill 2 , +.Xr ktrace 2 , .Xr ptrace 2 , .Xr wait 2 , .Xr init 8 diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c index 1457f57..f96e78e 100644 --- a/sys/compat/freebsd32/freebsd32_misc.c +++ b/sys/compat/freebsd32/freebsd32_misc.c @@ -2969,6 +2969,7 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) switch (uap->com) { case PROC_SPROTECT: + case PROC_TRACE_ENABLE: error = copyin(PTRIN(uap->data), &flags, sizeof(flags)); if (error != 0) return (error); @@ -2997,6 +2998,9 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) return (error); data = &x.rk; break; + case PROT_TRACE_STATUS: + data = &flags; + break; default: return (EINVAL); } @@ -3012,6 +3016,10 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) if (error == 0) error = error1; break; + case PROC_TRACE_STATUS: + if (error == 0) + error = copyout(&flags, uap->data, sizeof(flags)); + break; } return (error); } diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c index 19c33b6..5b80f5c 100644 --- a/sys/kern/kern_exec.c +++ b/sys/kern/kern_exec.c @@ -634,6 +634,8 @@ interpret: * it that it now has its own resources back */ p->p_flag |= P_EXEC; + if ((p->p_flag2 & P2_NOTRACE_EXEC) == 0) + p->p_flag2 &= ~P2_NOTRACE; if (p->p_flag & P_PPWAIT) { p->p_flag &= ~(P_PPWAIT | P_PPTRACE); cv_broadcast(&p->p_pwait); diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c index f469db6..2c83422 100644 --- a/sys/kern/kern_fork.c +++ b/sys/kern/kern_fork.c @@ -494,7 +494,7 @@ do_fork(struct thread *td, int flags, struct proc *p2, struct thread *td2, * Increase reference counts on shared objects. */ p2->p_flag = P_INMEM; - p2->p_flag2 = 0; + p2->p_flag2 = p1->p_flag2 & (P2_NOTRACE | P2_NOTRACE_EXEC); p2->p_swtick = ticks; if (p1->p_flag & P_PROFIL) startprofclock(p2); diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c index 9b0d14a..e265ed5 100644 --- a/sys/kern/kern_procctl.c +++ b/sys/kern/kern_procctl.c @@ -280,6 +280,62 @@ reap_kill(struct thread *td, struct proc *p, struct procctl_reaper_kill *rk) return (error); } +static int +trace_enable(struct thread *td, struct proc *p, int state) +{ + + PROC_LOCK_ASSERT(p, MA_OWNED); + + /* + * Ktrace changes p_traceflag from or to zero under the + * process lock, so the test does not need to acquire ktrace + * mutex. + */ + if ((p->p_flag & P_TRACED) != 0 || p->p_traceflag != 0) + return (EBUSY); + + switch (state) { + case PROC_TRACE_SET_ENABLED: + if (td->td_proc != p) + return (EPERM); + p->p_flag2 &= ~(P2_NOTRACE | P2_NOTRACE_EXEC); + break; + case PROC_TRACE_SET_DISABLED_EXEC: + p->p_flag2 |= P2_NOTRACE_EXEC | P2_NOTRACE; + break; + case PROC_TRACE_SET_DISABLED: + if ((p->p_flag2 & P2_NOTRACE_EXEC) != 0) { + KASSERT((p->p_flag2 & P2_NOTRACE) != 0, + ("dandling P2_NOTRACE_EXEC")); + if (td->td_proc != p) + return (EPERM); + p->p_flag2 &= ~P2_NOTRACE_EXEC; + } else { + p->p_flag2 |= P2_NOTRACE; + } + break; + default: + return (EINVAL); + } + return (0); +} + +static int +trace_status(struct thread *td, struct proc *p, int *data) +{ + + if ((p->p_flag2 & P2_NOTRACE) != 0) { + KASSERT((p->p_flag & P_TRACED) == 0, + ("%d traced but tracing disabled", p->p_pid)); + *data = -1; + } else if ((p->p_flag & P_TRACED) != 0) { + *data = p->p_pptr->p_pid; + } else { + *data = 0; + } + return (0); +} + #ifndef _SYS_SYSPROTO_H_ struct procctl_args { idtype_t idtype; @@ -302,6 +358,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap) switch (uap->com) { case PROC_SPROTECT: + case PROC_TRACE_ENABLE: error = copyin(uap->data, &flags, sizeof(flags)); if (error != 0) return (error); @@ -328,6 +385,9 @@ sys_procctl(struct thread *td, struct procctl_args *uap) return (error); data = &x.rk; break; + case PROC_TRACE_STATUS: + data = &flags; + break; default: return (EINVAL); } @@ -342,6 +402,10 @@ sys_procctl(struct thread *td, struct procctl_args *uap) if (error == 0) error = error1; break; + case PROC_TRACE_STATUS: + if (error == 0) + error = copyout(&flags, uap->data, sizeof(flags)); + break; } return (error); } @@ -364,6 +428,10 @@ kern_procctl_single(struct thread *td, struct proc *p, int com, void *data) return (reap_getpids(td, p, data)); case PROC_REAP_KILL: return (reap_kill(td, p, data)); + case PROC_TRACE_ENABLE: + return (trace_enable(td, p, *(int *)data)); + case PROC_TRACE_STATUS: + return (trace_status(td, p, data)); default: return (EINVAL); } @@ -375,6 +443,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) struct pgrp *pg; struct proc *p; int error, first_error, ok; + bool tree_locked; switch (com) { case PROC_REAP_ACQUIRE: @@ -382,6 +451,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) case PROC_REAP_STATUS: case PROC_REAP_GETPIDS: case PROC_REAP_KILL: + case PROC_TRACE_STATUS: if (idtype != P_PID) return (EINVAL); } @@ -391,11 +461,17 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) case PROC_REAP_STATUS: case PROC_REAP_GETPIDS: case PROC_REAP_KILL: + case PROC_TRACE_ENABLE: sx_slock(&proctree_lock); + tree_locked = true; break; case PROC_REAP_ACQUIRE: case PROC_REAP_RELEASE: sx_xlock(&proctree_lock); + tree_locked = true; + break; + case PROC_TRACE_STATUS: + tree_locked = false; break; default: return (EINVAL); @@ -456,6 +532,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) error = EINVAL; break; } - sx_unlock(&proctree_lock); + if (tree_locked) + sx_unlock(&proctree_lock); return (error); } diff --git a/sys/kern/kern_prot.c b/sys/kern/kern_prot.c index 2a667b5..a5d9596 100644 --- a/sys/kern/kern_prot.c +++ b/sys/kern/kern_prot.c @@ -1714,6 +1714,10 @@ p_candebug(struct thread *td, struct proc *p) if ((p->p_flag & P_INEXEC) != 0) return (EBUSY); + /* Denied explicitely */ + if ((p->p_flag2 & P2_NOTRACE) != 0) + return (EPERM); + return (0); } diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index a4f0f88..c4025ba 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c @@ -3243,7 +3243,8 @@ coredump(struct thread *td) MPASS((p->p_flag & P_HADTHREADS) == 0 || p->p_singlethread == td); _STOPEVENT(p, S_CORE, 0); - if (!do_coredump || (!sugid_coredump && (p->p_flag & P_SUGID) != 0)) { + if (!do_coredump || (!sugid_coredump && (p->p_flag & P_SUGID) != 0) || + (p->p_flag2 & P2_NOTRACE) != 0) { PROC_UNLOCK(p); return (EFAULT); } diff --git a/sys/sys/proc.h b/sys/sys/proc.h index d7a45e9..4b660d5 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -674,6 +674,8 @@ struct proc { /* These flags are kept in p_flag2. */ #define P2_INHERIT_PROTECTED 0x00000001 /* New children get P_PROTECTED. */ +#define P2_NOTRACE 0x00000002 /* No ptrace(2) attach or coredumps. */ +#define P2_NOTRACE_EXEC 0x00000004 /* Keep P2_NOPTRACE on exec(2). */ /* Flags protected by proctree_lock, kept in p_treeflags. */ #define P_TREE_ORPHANED 0x00000001 /* Reparented, on orphan list */ diff --git a/sys/sys/procctl.h b/sys/sys/procctl.h index d11b2b2..503c09b 100644 --- a/sys/sys/procctl.h +++ b/sys/sys/procctl.h @@ -41,6 +41,8 @@ #define PROC_REAP_STATUS 4 /* reaping status */ #define PROC_REAP_GETPIDS 5 /* get descendants */ #define PROC_REAP_KILL 6 /* kill descendants */ +#define PROC_TRACE_ENABLE 7 /* en/dis ptrace and coredumps */ +#define PROC_TRACE_STATUS 8 /* query tracing status */ /* Operations for PROC_SPROTECT (passed in integer arg). */ #define PPROT_OP(x) ((x) & 0xf) @@ -96,6 +98,10 @@ struct procctl_reaper_kill { #define REAPER_KILL_CHILDREN 0x00000001 #define REAPER_KILL_SUBTREE 0x00000002 +#define PROC_TRACE_SET_ENABLED 1 +#define PROC_TRACE_SET_DISABLED 2 +#define PROC_TRACE_SET_DISABLED_EXEC 3 + #ifndef _KERNEL __BEGIN_DECLS int procctl(idtype_t, id_t, int, void *); From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 14:07:14 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7344C353 for ; Tue, 30 Dec 2014 14:07:14 +0000 (UTC) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 3ACD93ECB for ; Tue, 30 Dec 2014 14:07:14 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id 20AA0B8412; Tue, 30 Dec 2014 15:07:10 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 0F06528494; Tue, 30 Dec 2014 15:07:10 +0100 (CET) Date: Tue, 30 Dec 2014 15:07:10 +0100 From: Jilles Tjoelker To: Konstantin Belousov Subject: Re: Disabling ptrace Message-ID: <20141230140709.GA96469@stack.nl> References: <20141230111941.GE42409@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141230111941.GE42409@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 14:07:14 -0000 On Tue, Dec 30, 2014 at 01:19:41PM +0200, Konstantin Belousov wrote: > The question about a facility to disable introspection functionality > (ptrace etc) for a process was asked several times. The latest query > made me actually code the feature. Note that other systems, e.g. Linux > and OSX, do have similar facilities. > Patch is below, it provides two new procctl(2) requests. > PROC_TRACE_ENABLE enables or disables tracing. It includes core > dumping, ptrace, ktrace, debugging sysctls and hwpmc. > PROC_TRACE_STATUS allows to get the tracing state. > Most interesting question is how should disabling of trace behave > with regard of fork and exec. IMO, the right model is to protect > access to the _program_ address space, which translates to inheritance > of the attribute for fork, and reenabling the tracing on exec. I agree. I imagine this will be useful for programs like ssh-agent, to protect their unlocked key material. This is also what Linux provides, and it is simpler than this patch: prctl(PR_SET_DUMPABLE) lets a process make their issetugid() equivalent return true, including preventing tracing by unprivileged users. You could call that unification a hack. > On the other hand, I understand that some users want to inherit the > tracing disable on exec, so there are PROC_TRACE_SET_DISABLED and > PROC_TRACE_SET_DISABLED_EXEC, the later makes disable to be kept after > exec. This is apparently meant to protect a whole process tree as a hardening measure, or instead of PROC_TRACE_SET_DISABLED if it is undesirable to modify the program with key material. > Note that it is trivial for root on the host to circumvent the feature. I'd prefer if root can still trace normally, without needing any hacks. Philosophically, FreeBSD should serve the system administrator first and only then the application programmer. Also, the debugging facilities may be needed to debug FreeBSD itself (e.g. procstat -k), not just the application. -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 15:39:07 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B507183D for ; Tue, 30 Dec 2014 15:39:07 +0000 (UTC) Received: from mail-qc0-x229.google.com (mail-qc0-x229.google.com [IPv6:2607:f8b0:400d:c01::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 664AC1E41 for ; Tue, 30 Dec 2014 15:39:07 +0000 (UTC) Received: by mail-qc0-f169.google.com with SMTP id w7so10685334qcr.28 for ; Tue, 30 Dec 2014 07:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:user-agent:in-reply-to :references:mime-version:content-type; bh=YdV8fxBib6dNpqEkh5USML09Q4q0tqt7H7saymH10S8=; b=jvBMnmDIuax1S1Z+gphFfmWUnKI4kKGc0I1tt5zepbf1+MdN8gYgRmRZ+UYwIL7SIv 3IztpX9v6SNl51PCMt0p3Mfex2hiYLJmIb01BzO5a9tfLjXACmlPNy9yyU9bQfaHGjuO RAMgAddkZtHbJDVgW540Fk89abBIEjFjJGohpqbmot8dakJUH3cyuhOzdK5+F8BG5jv/ laDrFmPpWrzD4Y/te7tX0VevLvvY+N5axO55C7TGP6o2wVqfiWq8FCwKjrfWH2aVR3Qm N+BVObY1VUVyLXS9eF78j7sO8HJRjmqMyVS8kW/ezQgPuMk9iN5lY16Ob07YQNGGYACL dnCw== X-Received: by 10.140.92.202 with SMTP id b68mr95875023qge.10.1419953946590; Tue, 30 Dec 2014 07:39:06 -0800 (PST) Received: from shawnwebb-laptop.localnet ([73.173.99.185]) by mx.google.com with ESMTPSA id k88sm36195660qge.13.2014.12.30.07.39.05 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Dec 2014 07:39:05 -0800 (PST) From: Shawn Webb To: freebsd-arch@freebsd.org Subject: Re: Disabling ptrace Date: Tue, 30 Dec 2014 10:38:56 -0500 Message-ID: <3368390.qHnOScdmzK@shawnwebb-laptop> User-Agent: KMail/4.14.2 (FreeBSD/11.0-CURRENT; KDE/4.14.2; amd64; ; ) In-Reply-To: <20141230140709.GA96469@stack.nl> References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart12681731.czNAJhFhVz"; micalg="pgp-sha256"; protocol="application/pgp-signature" Cc: Konstantin Belousov , Jilles Tjoelker X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 15:39:07 -0000 --nextPart12681731.czNAJhFhVz Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" On Tuesday, December 30, 2014 03:07:10 PM Jilles Tjoelker wrote: > On Tue, Dec 30, 2014 at 01:19:41PM +0200, Konstantin Belousov wrote: > > The question about a facility to disable introspection functionality > > (ptrace etc) for a process was asked several times. The latest query > > made me actually code the feature. Note that other systems, e.g. Linux > > and OSX, do have similar facilities. > > > > Patch is below, it provides two new procctl(2) requests. > > PROC_TRACE_ENABLE enables or disables tracing. It includes core > > dumping, ptrace, ktrace, debugging sysctls and hwpmc. > > PROC_TRACE_STATUS allows to get the tracing state. > > > > Most interesting question is how should disabling of trace behave > > with regard of fork and exec. IMO, the right model is to protect > > access to the _program_ address space, which translates to inheritance > > of the attribute for fork, and reenabling the tracing on exec. > > I agree. I imagine this will be useful for programs like ssh-agent, to > protect their unlocked key material. > > This is also what Linux provides, and it is simpler than this patch: > prctl(PR_SET_DUMPABLE) lets a process make their issetugid() equivalent > return true, including preventing tracing by unprivileged users. You > could call that unification a hack. > > > On the other hand, I understand that some users want to inherit the > > tracing disable on exec, so there are PROC_TRACE_SET_DISABLED and > > PROC_TRACE_SET_DISABLED_EXEC, the later makes disable to be kept after > > exec. > > This is apparently meant to protect a whole process tree as a hardening > measure, or instead of PROC_TRACE_SET_DISABLED if it is undesirable to > modify the program with key material. > > > Note that it is trivial for root on the host to circumvent the feature. > > I'd prefer if root can still trace normally, without needing any hacks. > Philosophically, FreeBSD should serve the system administrator first and > only then the application programmer. Also, the debugging facilities may > be needed to debug FreeBSD itself (e.g. procstat -k), not just the > application. It's easy even for non-root to disable or work around ptrace disabling. LD_PRELOAD, nopping out the instructions, dtrace, etc. Note that for SUID applications, such tricks don't work. The point is that such protections are very easily disabled, even by non-root users for non-SUID applications. I'm curious what the use case was that brought this up. And why the requester thinks it's actually useful. We at HardenedBSD have introduced a ptrace hardening patch that limits those who can use ptrace to a certain group. We've also added hardening around [lin]procfs. I believe those to be effective against ptrace abuse to a greater extent. It doesn't, though, handle dtrace, something we still need to research. Thanks, Shawn --nextPart12681731.czNAJhFhVz Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJUoscQAAoJEGqEZY9SRW7ukGAP/AqEpfrkmjxZCjykuTs66Nxx CvSKtToVU3Rh3ymcS0QrOZAp8DlDFkFl00ZY+5/hdvEpLeMQ02yRiPikzbEpTjQ9 fcsPW2pRRR5GLwPWBxfeXPE5KfIxN8f22lFbCO4Xaf9PIUR4jzxwM9JpvjO3ZtJB zwfTinR3PsOnYf5zvROp/QmdYgjbI1BXft9Yhwyn6MblIG7WL2HfWYO6NpDOz1R/ KyUlw/GI+KNyXSIwhe9zm+eD/ASx1rlh5vQyZlyDevGSJdgCgpbwylPE2rjp+ikx YrVMZhEgUSTOia/cOQoq6QsLJiq0FU/YQZgPg39OcyA2YS9t/u+Di0Ut2/AJ7qtv TqxYq7ylr+QDIfreYqJwPzMQBnFPY67cReDq2P5m2jgychvmWmYqK5SbW+SPm740 NSlGg/wbpfVbJ84hxXIz+KTpcftzxheuatFDVW38FBsxAIjz40OGoWafA6jdtAH1 Xj/lLW2OjJMm+hVgFOFmJjlFJIcDifKq6SPyH/Gi00ZUlQGukmqsj/TzLbVa/WPX 0Omcfye9yTFAafMZqszlrS8i5qU8pf0dVUQy6Po46W14CqZKa7YVhFTP7R5W0ZOS gk53U6itNTURFlXixblMMYLgCdpkoTREVWO9iUl4pKbPIdpggXkCFg+LRgutJgkI cTa1vQdKG4sgLiS6UbV3 =VgnI -----END PGP SIGNATURE----- --nextPart12681731.czNAJhFhVz-- From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 15:58:46 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AB3BEDFC for ; Tue, 30 Dec 2014 15:58:46 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4C1C221BD for ; Tue, 30 Dec 2014 15:58:46 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sBUFwe8S015251 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 30 Dec 2014 17:58:40 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sBUFwe8S015251 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sBUFwe08015250; Tue, 30 Dec 2014 17:58:40 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 30 Dec 2014 17:58:40 +0200 From: Konstantin Belousov To: Shawn Webb Subject: Re: Disabling ptrace Message-ID: <20141230155840.GI42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> <3368390.qHnOScdmzK@shawnwebb-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3368390.qHnOScdmzK@shawnwebb-laptop> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: Jilles Tjoelker , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 15:58:46 -0000 On Tue, Dec 30, 2014 at 10:38:56AM -0500, Shawn Webb wrote: > On Tuesday, December 30, 2014 03:07:10 PM Jilles Tjoelker wrote: > > On Tue, Dec 30, 2014 at 01:19:41PM +0200, Konstantin Belousov wrote: > > > The question about a facility to disable introspection functionality > > > (ptrace etc) for a process was asked several times. The latest query > > > made me actually code the feature. Note that other systems, e.g. Linux > > > and OSX, do have similar facilities. > > > > > > Patch is below, it provides two new procctl(2) requests. > > > PROC_TRACE_ENABLE enables or disables tracing. It includes core > > > dumping, ptrace, ktrace, debugging sysctls and hwpmc. > > > PROC_TRACE_STATUS allows to get the tracing state. > > > > > > Most interesting question is how should disabling of trace behave > > > with regard of fork and exec. IMO, the right model is to protect > > > access to the _program_ address space, which translates to inheritance > > > of the attribute for fork, and reenabling the tracing on exec. > > > > I agree. I imagine this will be useful for programs like ssh-agent, to > > protect their unlocked key material. > > > > This is also what Linux provides, and it is simpler than this patch: > > prctl(PR_SET_DUMPABLE) lets a process make their issetugid() equivalent > > return true, including preventing tracing by unprivileged users. You > > could call that unification a hack. > > > > > On the other hand, I understand that some users want to inherit the > > > tracing disable on exec, so there are PROC_TRACE_SET_DISABLED and > > > PROC_TRACE_SET_DISABLED_EXEC, the later makes disable to be kept after > > > exec. > > > > This is apparently meant to protect a whole process tree as a hardening > > measure, or instead of PROC_TRACE_SET_DISABLED if it is undesirable to > > modify the program with key material. > > > > > Note that it is trivial for root on the host to circumvent the feature. > > > > I'd prefer if root can still trace normally, without needing any hacks. > > Philosophically, FreeBSD should serve the system administrator first and > > only then the application programmer. Also, the debugging facilities may > > be needed to debug FreeBSD itself (e.g. procstat -k), not just the > > application. > > It's easy even for non-root to disable or work around ptrace disabling. > LD_PRELOAD, nopping out the instructions, dtrace, etc. Note that for SUID > applications, such tricks don't work. The point is that such protections are > very easily disabled, even by non-root users for non-SUID applications. Google for 'It rather involved being on the other side of this airtight hatchway'. > > I'm curious what the use case was that brought this up. And why the requester > thinks it's actually useful. > > We at HardenedBSD have introduced a ptrace hardening patch that limits those > who can use ptrace to a certain group. We've also added hardening around > [lin]procfs. I believe those to be effective against ptrace abuse to a greater > extent. It doesn't, though, handle dtrace, something we still need to > research. > > Thanks, > > Shawn From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 16:51:42 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1118E7EA for ; Tue, 30 Dec 2014 16:51:42 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A5B322B61 for ; Tue, 30 Dec 2014 16:51:41 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sBUGpa8d027433 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 30 Dec 2014 18:51:36 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sBUGpa8d027433 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sBUGpa0I027432; Tue, 30 Dec 2014 18:51:36 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 30 Dec 2014 18:51:36 +0200 From: Konstantin Belousov To: Jilles Tjoelker Subject: Re: Disabling ptrace Message-ID: <20141230165136.GK42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141230140709.GA96469@stack.nl> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 16:51:42 -0000 On Tue, Dec 30, 2014 at 03:07:10PM +0100, Jilles Tjoelker wrote: > On Tue, Dec 30, 2014 at 01:19:41PM +0200, Konstantin Belousov wrote: > > The question about a facility to disable introspection functionality > > (ptrace etc) for a process was asked several times. The latest query > > made me actually code the feature. Note that other systems, e.g. Linux > > and OSX, do have similar facilities. > > > Patch is below, it provides two new procctl(2) requests. > > PROC_TRACE_ENABLE enables or disables tracing. It includes core > > dumping, ptrace, ktrace, debugging sysctls and hwpmc. > > PROC_TRACE_STATUS allows to get the tracing state. > > > Most interesting question is how should disabling of trace behave > > with regard of fork and exec. IMO, the right model is to protect > > access to the _program_ address space, which translates to inheritance > > of the attribute for fork, and reenabling the tracing on exec. > > I agree. I imagine this will be useful for programs like ssh-agent, to > protect their unlocked key material. > > This is also what Linux provides, and it is simpler than this patch: > prctl(PR_SET_DUMPABLE) lets a process make their issetugid() equivalent > return true, including preventing tracing by unprivileged users. You > could call that unification a hack. Yes, I do not like this. We have nice and proper p_candebug(9) KPI. > > > On the other hand, I understand that some users want to inherit the > > tracing disable on exec, so there are PROC_TRACE_SET_DISABLED and > > PROC_TRACE_SET_DISABLED_EXEC, the later makes disable to be kept after > > exec. > > This is apparently meant to protect a whole process tree as a hardening > measure, or instead of PROC_TRACE_SET_DISABLED if it is undesirable to > modify the program with key material. Agreed, it could be reinterpreted this way. Do you suggest to change name for PROC_TRACE_SET_DISABLED_EXEC ? E.g. PROC_TRACE_SET_DISABLED_TREE ? > > > Note that it is trivial for root on the host to circumvent the feature. > > I'd prefer if root can still trace normally, without needing any hacks. > Philosophically, FreeBSD should serve the system administrator first and > only then the application programmer. Also, the debugging facilities may > be needed to debug FreeBSD itself (e.g. procstat -k), not just the > application. This is reasonable. It seems that the only way to enable host root to use tracing without allowing jail' roots to do the same, is to introduce new privilege. I changed the p_candebug() chunk to the following: /* Denied explicitely */ if ((p->p_flag2 & P2_NOTRACE) != 0) { error = priv_check(td, PRIV_DEBUG_DENIED); if (error != 0) return (error); } From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 18:26:26 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9F1D8620 for ; Tue, 30 Dec 2014 18:26:26 +0000 (UTC) Received: from mail-ie0-x22c.google.com (mail-ie0-x22c.google.com [IPv6:2607:f8b0:4001:c03::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 660E32B5A for ; Tue, 30 Dec 2014 18:26:26 +0000 (UTC) Received: by mail-ie0-f172.google.com with SMTP id tr6so13969002ieb.3 for ; Tue, 30 Dec 2014 10:26:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=6FaEKr/c9ODDzyUYTJNUJH+3nlIQ47efYULO5AElpao=; b=ri/rszMPudggBbq0IBdvJNT3D6eoVnWobO8uYwFV1+0pO2sv67FEB7fxyI8fcYMmUT nOI/FyVNI15LNaz8X5v8QFlqBrSIi6x3OrWkY4jLp2rzTj13LkT57y1Ru1zTh3SsPXOj oUV6iTCcRxerBL8NGCa3UVe2jnxHmJ3pplAInOQkny/IaQ5pyn0xUdsJpVrkAZ4f6fWQ Ja/1+QPxrXhhEd1G+SSOg/Cv06DhxXJTDJ2nlwNxoHwJRqWY3E2hQoyDRxFubCTzZmbX LWOX7JEp50H/xyX1enL/7zXjC8QsgNdlY3N6tTopFnOdUeJ78iZFc3C6CyV18BpLW2CX ljXw== X-Received: by 10.42.41.146 with SMTP id p18mr46579433ice.52.1419963985781; Tue, 30 Dec 2014 10:26:25 -0800 (PST) MIME-Version: 1.0 Sender: carpeddiem@gmail.com Received: by 10.107.0.85 with HTTP; Tue, 30 Dec 2014 10:26:04 -0800 (PST) In-Reply-To: <3368390.qHnOScdmzK@shawnwebb-laptop> References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> <3368390.qHnOScdmzK@shawnwebb-laptop> From: Ed Maste Date: Tue, 30 Dec 2014 13:26:04 -0500 X-Google-Sender-Auth: 7--Aau4jyJgVhJrOIvw5R1nK1oM Message-ID: Subject: Re: Disabling ptrace To: Shawn Webb Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 18:26:26 -0000 On 30 December 2014 at 10:38, Shawn Webb wrote: > > I'm curious what the use case was that brought this up. And why the requester > thinks it's actually useful. I had one use case for this: LLDB's test suite includes a test for the debugger handling failure from ptrace(PT_ATTACH, ...), and the test used the Linux/OS X version of the change under discussion. As it turns out that case can be easily tested by just having another ptrace consumer already attached, and the author of the test in LLDB switched to that approach instead. From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 20:44:51 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CC26CAA0 for ; Tue, 30 Dec 2014 20:44:51 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3E4F22242 for ; Tue, 30 Dec 2014 20:44:51 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sBUKijoC078529 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 30 Dec 2014 22:44:45 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sBUKijoC078529 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sBUKij3s078528; Tue, 30 Dec 2014 22:44:45 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 30 Dec 2014 22:44:45 +0200 From: Konstantin Belousov To: "Simon J. Gerraty" Subject: Re: Disabling ptrace Message-ID: <20141230204445.GM42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> <3368390.qHnOScdmzK@shawnwebb-laptop> <29058.1419970932@chaos> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <29058.1419970932@chaos> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: freebsd-arch@freebsd.org, Jilles Tjoelker , Shawn Webb X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 20:44:51 -0000 On Tue, Dec 30, 2014 at 12:22:12PM -0800, Simon J. Gerraty wrote: > Shawn Webb wrote: > > I'm curious what the use case was that brought this up. And why the requester > > thinks it's actually useful. > > Being able to disable ptrace is useful - provided it cannot be bypassed. > In Junos we leveraged the signed binary implementation (based on NetBSD's > verified exec) to tag processes for which ptrace should fail. The > signed binary stuff also supposed to prevent games with LD_PRELOAD - > assuming we didn't provide and sign the lib in question. Look. If somebody can preload a library into the process, or arbitrary modify the text segment, circumventing ptrace(2) ban is a least worry. The reference to the "Old New Thing" blog I posted before explains that with with fireworks, based on real 'security reports' sent to the security team at MS. > > When we re-implemented veriexec as a MAC module, the above was left out, > in anticipation of using a separate module (though perhaps still > leveraging veriexec to set the labels). From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 20:52:31 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8D377DB7 for ; Tue, 30 Dec 2014 20:52:31 +0000 (UTC) Received: from mail-qg0-x230.google.com (mail-qg0-x230.google.com [IPv6:2607:f8b0:400d:c04::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3EBED2491 for ; Tue, 30 Dec 2014 20:52:31 +0000 (UTC) Received: by mail-qg0-f48.google.com with SMTP id f51so10678286qge.7 for ; Tue, 30 Dec 2014 12:52:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:user-agent:in-reply-to :references:mime-version:content-type; bh=adudmKJ1Hedf8I3A0XjCzJ8eXaR36qrC7nycYktjNhk=; b=UAjTIO1akTkFOvXE0vqfOQvsxH/+2sUF1gT34pDDuUdvYJxQ4SYkhOEgU2z7VEKLSV zYGK7iDomVQmlDpcuHChHeM43wXFE5QbOzFMpsnvR6xMsYpKPfDo5yFQGBc4hFDgboaw uRMvkiTxnLYQWmhLty5ZQK3OEy6yp4ecNgRJ4gNXr44ZR3OF0a1/uriqhW10MGY7Y5yq k/JTEyYXaN+9MA8Ec0D93LruDbb6esUr8Z66kU6jF34F7FPvy1GbXTn5Nq6qDzoQDGOR sPM10/iocs4gjAQxhnmPa7WyVw/5iOeghk3Z+O8eYwoDh7xkOyr6WYTeBF2iVPPc0jwv bpZQ== X-Received: by 10.224.51.11 with SMTP id b11mr103872552qag.43.1419972750132; Tue, 30 Dec 2014 12:52:30 -0800 (PST) Received: from shawnwebb-laptop.localnet ([73.173.99.185]) by mx.google.com with ESMTPSA id z5sm12341463qal.11.2014.12.30.12.52.28 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Dec 2014 12:52:29 -0800 (PST) From: Shawn Webb To: Konstantin Belousov Subject: Re: Disabling ptrace Date: Tue, 30 Dec 2014 15:52:25 -0500 Message-ID: <2246813.ih6odxTDOy@shawnwebb-laptop> User-Agent: KMail/4.14.2 (FreeBSD/11.0-CURRENT; KDE/4.14.2; amd64; ; ) In-Reply-To: <20141230204445.GM42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> <29058.1419970932@chaos> <20141230204445.GM42409@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart12681899.qa3kkoyJ5M"; micalg="pgp-sha256"; protocol="application/pgp-signature" Cc: freebsd-arch@freebsd.org, Jilles Tjoelker , "Simon J. Gerraty" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 20:52:31 -0000 --nextPart12681899.qa3kkoyJ5M Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" On Tuesday, December 30, 2014 10:44:45 PM Konstantin Belousov wrote: > On Tue, Dec 30, 2014 at 12:22:12PM -0800, Simon J. Gerraty wrote: > > Shawn Webb wrote: > > > I'm curious what the use case was that brought this up. And why the > > > requester thinks it's actually useful. > > > > Being able to disable ptrace is useful - provided it cannot be bypassed. > > In Junos we leveraged the signed binary implementation (based on NetBSD's > > verified exec) to tag processes for which ptrace should fail. The > > signed binary stuff also supposed to prevent games with LD_PRELOAD - > > assuming we didn't provide and sign the lib in question. > > Look. If somebody can preload a library into the process, or arbitrary > modify the text segment, circumventing ptrace(2) ban is a least worry. > The reference to the "Old New Thing" blog I posted before explains > that with with fireworks, based on real 'security reports' sent to the > security team at MS. I asked about use case mainly because the applications I'm familiar with that care about disabling debugging facilities are ones that are trying to deter reverse engineering. Which is silly. Disabling ptrace, though, is a great idea overall. I'm all for it. Tools like libhijack work only through ptrace. Disabling ptrace would also disrupt tools like libhijack (a good thing). My only concern was the use case of anti-reverse engineering. Disabling ptrace in that case is useless. > > > When we re-implemented veriexec as a MAC module, the above was left out, > > in anticipation of using a separate module (though perhaps still > > leveraging veriexec to set the labels). --nextPart12681899.qa3kkoyJ5M Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJUoxCJAAoJEGqEZY9SRW7u9VoQAJBQxbXU918drc+/6q5gX1p4 3z4jiF/tGWM6KKT2McCRJhPaIrJ2YUg8986QbXewGpoTMIOM7kYGWba7SL7DpNc/ QIjmDoIcRNmYjWJuhcOE+vSXxOAATILCGCyH4F0jsn5zp0gFcZ/rpgBmub2m5Z9D 5y9N2McWySYDvaS0Yy/EsECDlb/9JQpqadCH80iAsH1DuCPr/GaUSFqpE/KSzdFe S1YrYj/FUizLjZp5ffF60E7e75jrP7HjgjWU1UZ4zGnnIx6lJhxh7s4ZToDniJRf M+lyJOHor2ZX9uioGA09/SsTKcWj9bmiykp0VmWwNC93M4Cq9FrB6nAuzxaxfF4E AzHc+CqqVw4grdc42zJu30gz+5zlrUFA8phqhpmjbGUD3pqsKlCeNSCLPZ0fOMJa rdWdy8mK1yYDiUYdU0F02ugzroPOVYGnlQdqVV1v59Nk+ttkGPivvMF/1jUJjpFn 2SDray6GTwLnWBooHg2+ig06LexouM5ObkDaQX/BEyecRuCsxBSlw1982rRHVVcY B0MMHRc12Cbl1J5pOWMgHqVxpm8yHxvU4OTqW+Y4EWcEe6n+XjYedNhjj/pkGVDC GODLMNJugszWXukwZJSXGrMdhFIsHNehqPafPAIAqOrOCXPIENJ8D5GHXuPo4bVl peXAozbu0c4v8peopSk5 =r1GY -----END PGP SIGNATURE----- --nextPart12681899.qa3kkoyJ5M-- From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 20:56:59 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F273FE86 for ; Tue, 30 Dec 2014 20:56:58 +0000 (UTC) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0139.outbound.protection.outlook.com [207.46.100.139]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A870824EB for ; Tue, 30 Dec 2014 20:56:57 +0000 (UTC) Received: from CO2PR05CA011.namprd05.prod.outlook.com (10.141.241.139) by DM2PR05MB445.namprd05.prod.outlook.com (10.141.104.154) with Microsoft SMTP Server (TLS) id 15.1.49.12; Tue, 30 Dec 2014 20:22:34 +0000 Received: from BN1AFFO11FD038.protection.gbl (2a01:111:f400:7c10::193) by CO2PR05CA011.outlook.office365.com (2a01:111:e400:1429::11) with Microsoft SMTP Server (TLS) id 15.1.49.12 via Frontend Transport; Tue, 30 Dec 2014 20:22:33 +0000 Received: from P-EMF02-SAC.jnpr.net (66.129.239.16) by BN1AFFO11FD038.mail.protection.outlook.com (10.58.52.242) with Microsoft SMTP Server (TLS) id 15.1.49.13 via Frontend Transport; Tue, 30 Dec 2014 20:22:33 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF02-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Tue, 30 Dec 2014 12:22:13 -0800 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id sBUKMCW19513; Tue, 30 Dec 2014 12:22:12 -0800 (PST) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id AB452580A3; Tue, 30 Dec 2014 12:22:12 -0800 (PST) To: Shawn Webb Subject: Re: Disabling ptrace In-Reply-To: <3368390.qHnOScdmzK@shawnwebb-laptop> References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> <3368390.qHnOScdmzK@shawnwebb-laptop> Comments: In-reply-to: Shawn Webb message dated "Tue, 30 Dec 2014 10:38:56 -0500." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 Date: Tue, 30 Dec 2014 12:22:12 -0800 Message-ID: <29058.1419970932@chaos> MIME-Version: 1.0 Content-Type: text/plain X-EOPAttributedMessage: 0 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.16 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.16) smtp.mailfrom=sjg@juniper.net; X-Forefront-Antispam-Report: CIP:66.129.239.16; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(24454002)(189002)(199003)(2950100001)(47776003)(99396003)(89996001)(221733001)(21056001)(77156002)(62966003)(105596002)(92566001)(106466001)(107046002)(68736005)(77096005)(31966008)(64706001)(81156004)(50466002)(48376002)(46102003)(33716001)(97736003)(6806004)(1411001)(19580395003)(84676001)(50226001)(76506005)(76176999)(19580405001)(110136001)(50986999)(120916001)(87936001)(4396001)(86362001)(117636001)(57986006)(20776003)(69596002)(62816006)(42262002); DIR:OUT; SFP:1102; SCL:1; SRVR:DM2PR05MB445; H:P-EMF02-SAC.jnpr.net; FPR:; SPF:SoftFail; MLV:sfv; PTR:InfoDomainNonexistent; MX:1; A:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:DM2PR05MB445; X-Forefront-PRVS: 04410E544A X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2014 20:22:33.0281 (UTC) X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.16] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM2PR05MB445 Cc: Konstantin Belousov , Jilles Tjoelker , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 20:56:59 -0000 Shawn Webb wrote: > I'm curious what the use case was that brought this up. And why the requester > thinks it's actually useful. Being able to disable ptrace is useful - provided it cannot be bypassed. In Junos we leveraged the signed binary implementation (based on NetBSD's verified exec) to tag processes for which ptrace should fail. The signed binary stuff also supposed to prevent games with LD_PRELOAD - assuming we didn't provide and sign the lib in question. When we re-implemented veriexec as a MAC module, the above was left out, in anticipation of using a separate module (though perhaps still leveraging veriexec to set the labels). From owner-freebsd-arch@FreeBSD.ORG Tue Dec 30 21:45:56 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A08868EF for ; Tue, 30 Dec 2014 21:45:56 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 8E27D64ACF for ; Tue, 30 Dec 2014 21:45:56 +0000 (UTC) Received: from AlfredMacbookAir.local (c-76-21-10-192.hsd1.ca.comcast.net [76.21.10.192]) by elvis.mu.org (Postfix) with ESMTPSA id EF38F341F854; Tue, 30 Dec 2014 13:45:49 -0800 (PST) Message-ID: <54A31D45.8080704@freebsd.org> Date: Tue, 30 Dec 2014 13:46:45 -0800 From: Alfred Perlstein Organization: FreeBSD User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Jilles Tjoelker , Konstantin Belousov Subject: Re: Disabling ptrace References: <20141230111941.GE42409@kib.kiev.ua> <20141230140709.GA96469@stack.nl> In-Reply-To: <20141230140709.GA96469@stack.nl> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Dec 2014 21:45:56 -0000 On 12/30/14 6:07 AM, Jilles Tjoelker wrote: > Philosophically, FreeBSD should serve the system administrator first > and only then the application programmer. That's a good way to make sure there aren't a lot of FreeBSD systems to administer. ;) -Alfred From owner-freebsd-arch@FreeBSD.ORG Wed Dec 31 02:47:32 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 35947C31 for ; Wed, 31 Dec 2014 02:47:32 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bn0102.outbound.protection.outlook.com [157.56.110.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D1AC62BFC for ; Wed, 31 Dec 2014 02:47:31 +0000 (UTC) Received: from CO2PR05CA039.namprd05.prod.outlook.com (10.141.241.167) by BLUPR05MB433.namprd05.prod.outlook.com (10.141.27.140) with Microsoft SMTP Server (TLS) id 15.1.49.12; Tue, 30 Dec 2014 23:14:45 +0000 Received: from BY2FFO11FD028.protection.gbl (2a01:111:f400:7c0c::148) by CO2PR05CA039.outlook.office365.com (2a01:111:e400:1429::39) with Microsoft SMTP Server (TLS) id 15.1.49.12 via Frontend Transport; Tue, 30 Dec 2014 23:14:44 +0000 Received: from P-EMF02-SAC.jnpr.net (66.129.239.16) by BY2FFO11FD028.mail.protection.outlook.com (10.1.15.217) with Microsoft SMTP Server (TLS) id 15.1.49.13 via Frontend Transport; Tue, 30 Dec 2014 23:14:44 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF02-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Tue, 30 Dec 2014 15:14:43 -0800 Received: from chaos.jnpr.net (chaos.jnpr.net [172.21.16.28]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id sBUNEhW04498; Tue, 30 Dec 2014 15:14:43 -0800 (PST) (envelope-from sjg@juniper.net) Received: from chaos (localhost [127.0.0.1]) by chaos.jnpr.net (Postfix) with ESMTP id 0CC62580A3; Tue, 30 Dec 2014 15:14:43 -0800 (PST) To: Shawn Webb Subject: Re: Disabling ptrace In-Reply-To: <2246813.ih6odxTDOy@shawnwebb-laptop> References: <20141230111941.GE42409@kib.kiev.ua> <29058.1419970932@chaos> <20141230204445.GM42409@kib.kiev.ua> <2246813.ih6odxTDOy@shawnwebb-laptop> Comments: In-reply-to: Shawn Webb message dated "Tue, 30 Dec 2014 15:52:25 -0500." From: "Simon J. Gerraty" X-Mailer: MH-E 8.0.3; nmh 1.3; GNU Emacs 22.3.1 Date: Tue, 30 Dec 2014 15:14:43 -0800 Message-ID: <11126.1419981283@chaos> MIME-Version: 1.0 Content-Type: text/plain X-EOPAttributedMessage: 0 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.16 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.16) smtp.mailfrom=sjg@juniper.net; X-Forefront-Antispam-Report: CIP:66.129.239.16; CTRY:US; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(10019020)(6009001)(24454002)(199003)(189002)(77156002)(50226001)(62966003)(97736003)(33716001)(92566001)(89996001)(31966008)(77096005)(68736005)(57986006)(2950100001)(107046002)(76506005)(4396001)(19580395003)(69596002)(221733001)(110136001)(6806004)(81156004)(106466001)(46102003)(20776003)(47776003)(64706001)(21056001)(105596002)(48376002)(120916001)(50466002)(50986999)(76176999)(1411001)(99396003)(93886004)(84676001)(19580405001)(87936001)(117636001)(86362001)(42262002)(62816006); DIR:OUT; SFP:1102; SCL:1; SRVR:BLUPR05MB433; H:P-EMF02-SAC.jnpr.net; FPR:; SPF:SoftFail; MLV:sfv; PTR:InfoDomainNonexistent; A:1; MX:1; LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BLUPR05MB433; X-Forefront-PRVS: 04410E544A X-OriginatorOrg: juniper.net X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Dec 2014 23:14:44.0046 (UTC) X-MS-Exchange-CrossTenant-Id: bea78b3c-4cdb-4130-854a-1d193232e5f4 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=bea78b3c-4cdb-4130-854a-1d193232e5f4; Ip=[66.129.239.16] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR05MB433 Cc: Konstantin Belousov , Jilles Tjoelker , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Dec 2014 02:47:32 -0000 Shawn Webb wrote: > I asked about use case mainly because the applications I'm familiar with that > care about disabling debugging facilities are ones that are trying to deter > reverse engineering. Which is silly. Yes, we only do it on the apps responsible for verifying signatures and the like. From owner-freebsd-arch@FreeBSD.ORG Thu Jan 1 12:35:50 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9A7B3BB2; Thu, 1 Jan 2015 12:35:50 +0000 (UTC) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 583CA645A9; Thu, 1 Jan 2015 12:35:49 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 849B81FE022; Thu, 1 Jan 2015 13:35:46 +0100 (CET) Message-ID: <54A53F4F.2000003@selasky.org> Date: Thu, 01 Jan 2015 13:36:31 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Ivan Klymenko Subject: Re: [RFC] kern/kern_timeout.c rewrite in progress References: <54A1B38C.1000709@selasky.org> <20150101005613.4f788b0c@nonamehost.local> <54A49CA5.2060801@selasky.org> <54A4A002.8010802@selasky.org> In-Reply-To: <54A4A002.8010802@selasky.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: FreeBSD Current , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jan 2015 12:35:50 -0000 On 01/01/15 02:16, Hans Petter Selasky wrote: > On 01/01/15 02:02, Hans Petter Selasky wrote: >> On 12/31/14 23:56, Ivan Klymenko wrote: >>> В Mon, 29 Dec 2014 21:03:24 +0100 >>> Hans Petter Selasky пишет: >>> >> >> Hi, >> >> Is your kernel compiled with Witness? Do you see any lock order reversal >> warnings? >> >> Can you do from kgdb: >> >> thread apply all bt >> >> And send me the result off-list? >> >> I'll have a closer look at this tomorrow. >> >>> >>> panic: spin lock held too long >>> http://paste.org.ru/?acf7io >>> >> >> Thank you! >> > Hi, I see what is going on. There is an LOR which is not printed because MTX_QUIET is passed when locking inside the callback process routine. It happens because cv_timedwait() is using callouts() to timeout and callout_drain() is using cv_wait() to wait for draining. This was not so well documented in the old code. I'll update my patch and send out a new one later today. I see a room for doing some improvements too: callout_init(&td->td_slpcallout, CALLOUT_MPSAFE) Possibly we could use a so-called "DIRECT" callback from the fast IRQ of the timer, when waking up other threads from cv_timedwait(), hence only spinlocks are involved? This would save waking up the callout SWI only to wakeup another thread ! Thank you for testing! --HPS From owner-freebsd-arch@FreeBSD.ORG Fri Jan 2 09:16:49 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CAB94D0F for ; Fri, 2 Jan 2015 09:16:49 +0000 (UTC) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 6F4AD66F57 for ; Fri, 2 Jan 2015 09:16:49 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [198.74.231.63]) by cyrus.watson.org (Postfix) with ESMTPS id 4864F46B43; Fri, 2 Jan 2015 04:16:42 -0500 (EST) Date: Fri, 2 Jan 2015 09:16:42 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Konstantin Belousov Subject: Re: Disabling ptrace In-Reply-To: <20141230111941.GE42409@kib.kiev.ua> Message-ID: References: <20141230111941.GE42409@kib.kiev.ua> User-Agent: Alpine 2.11 (BSF 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jan 2015 09:16:49 -0000 On Tue, 30 Dec 2014, Konstantin Belousov wrote: > The question about a facility to disable introspection functionality (ptrace > etc) for a process was asked several times. The latest query made me > actually code the feature. Note that other systems, e.g. Linux and OSX, do > have similar facilities. > > Patch is below, it provides two new procctl(2) requests. PROC_TRACE_ENABLE > enables or disables tracing. It includes core dumping, ptrace, ktrace, > debugging sysctls and hwpmc. PROC_TRACE_STATUS allows to get the tracing > state. > > Most interesting question is how should disabling of trace behave with > regard of fork and exec. IMO, the right model is to protect access to the > _program_ address space, which translates to inheritance of the attribute > for fork, and reenabling the tracing on exec. On the other hand, I > understand that some users want to inherit the tracing disable on exec, so > there are PROC_TRACE_SET_DISABLED and PROC_TRACE_SET_DISABLED_EXEC, the > later makes disable to be kept after exec. > > Note that it is trivial for root on the host to circumvent the feature. I admit some curiosity about the use case as well -- if this is a security feature, possibly this should be a property of the credential rather than the process? Among other things, this could help implement trace limitations in asynchronous contexts -- not that I'm aware of too many currently, but if you don't want the I/O of a program traced in the synchronous context, I guess it might follow that you wouldn't want that to happen asynchronously either (etc). On the other hand, if this is a security feature, it doesn't really 'stand alone' very well, as in absence of a more complete policy, simply disabling introspection features doesn't provide guarantees, just annoyance until someone writes a suitable script to work around it. This has been one of my objections to securelevel, FWIW: while the various bits building it are useful for admins (e.g., file flages), securelevel isn't a 'coherent' policy that gives you strong integrity properties, rather, a set of manual frobs and inconsistent design choices that mean it's very hard to imagine an admin ever getting it to be useful without running with read-only filesystems, etc (and even then). In MAC policies such as Biba and MLS, we mediate the ability to debug target processes based on additional labels, but there framing integrity and confidentiality policies give the restriction a security context. In the Mac OS X version of the MAC Framework, support for disabling debugging following execve() is slightly more mature in FreeBSD: in FreeBSD it's done based on label change (a la credential change), but in Mac OS X it tracks a policy entry point that decides whether the change has access-control significance (which means you can have information-flow tracking policies that manipulate labels but don't change functional aspects of the system, which isn't possible in the FreeBSD version currently). There's a good argument we should pick up the Mac OS X design choice there at some point. The reason I wonder about the use case is actually one to do with accurate performance debugging of systems: ssh, ssh-agent, and friends can already request that they not be core dumped to avoid leaking keying material to filesystems, but having them be exempt from hwpmc, DTrace, etc, will make understanding system and application behaviour harder -- and if they 'fail silently' then users could suffer potentially confusing (and misleading) results. Is the general concern here that people are turning on tracing/profiling/etc tools and not understanding that they may be leaking keying material? (Note that we do have a global safety frob to disable ptrace and friends systemically, security.bsd.unprivileged_proc_debug, which is intended as an easily available workaround for security vulnerabilities discovered in process-debugging facilities, which have historically proven a bit vulnerability-prone across all operating systems I'm aware of. We should remember this next time we cut an advisory on a ptrace/ktrace/ec vulnerability.) Robert > > diff --git a/lib/libc/sys/procctl.2 b/lib/libc/sys/procctl.2 > index 649e0ad..4e7e5b8 100644 > --- a/lib/libc/sys/procctl.2 > +++ b/lib/libc/sys/procctl.2 > @@ -29,7 +29,7 @@ > .\" > .\" $FreeBSD$ > .\" > -.Dd December 16, 2014 > +.Dd December 29, 2014 > .Dt PROCCTL 2 > .Os > .Sh NAME > @@ -275,6 +275,51 @@ delivery failed, e.g. due to the permission problems. > If no such process exist, the > .Fa rk_fpid > field is set to -1. > +.It Dv PROC_TRACE_ENABLE > +Enable or disable tracing of the specified process(es), according to the > +value of the integer argument. > +Tracing includes attachment to the process using > +.Xr ptrace 2 > +and > +.Xr ktrace 2 , > +debugging sysctls, > +.Xr hwpmc 4 > +and core dumping. > +Possible values for the > +.Fa data > +argument are: > +.Bl -tag -width "Dv PROC_TRACE_SET_DISABLED_EXEC" > +.It Dv PROC_TRACE_SET_ENABLED > +Enable the tracing, after it was disabled by > +.Dv PROC_TRACE_SET_DISABLED . > +Only allowed for self. > +.It Dv PROC_TRACE_SET_DISABLED > +Disable tracing for the specified process. > +The tracing is re-enabled when the process changes the executing > +program with > +.Xr execve 2 > +syscall. > +A child inherits the trace settings from the parent. > +.It Dv PROC_TRACE_SET_DISABLED_EXEC > +Same as > +.Dv PROC_TRACE_SET_DISABLED , > +but setting persist for the process even after > +.Xr execve 2 . > +.It Dv PROC_TRACE_STATUS > +Returns the current status of tracing for the specified process into > +the integer variable pointed to by > +.Fa data . > +If tracing is disabled, > +.Fa data > +is set to -1. > +If tracing is enabled, but no debugger is attached by > +.Xr ptrace 2 > +syscall, the > +.Fa data > +is set to 0. > +If a debugger is attached, > +.Fa data > +is set to the pid of the debugger process. > .El > .Sh RETURN VALUES > If an error occurs, a value of -1 is returned and > @@ -343,9 +388,28 @@ The > .Dv PROC_REAP_ACQUIRE > request was issued by a process that had already acquired reaper status > and has not yet released it. > +.It Bq Er EBUSY > +The > +.Dv PROC_TRACE_ENABLE > +request was issued for the process already traced. > +.It Bq Er EPERM > +The > +.Dv PROC_TRACE_ENABLE > +request to re-enable tracing of the process ( > +.Dv PROC_TRACE_SET_ENABLED ) , > +or to disable persistence of the > +.Dv PROC_TRACE_SET_DISABLED > +on > +.Xr execve 2 > +was issued for the non-current process. > +.It Bq Er EINVAL > +The value of integer parameter for the > +.Dv PROC_TRACE_ENABLE > +request is invalid. > .El > .Sh SEE ALSO > .Xr kill 2 , > +.Xr ktrace 2 , > .Xr ptrace 2 , > .Xr wait 2 , > .Xr init 8 > diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c > index 1457f57..f96e78e 100644 > --- a/sys/compat/freebsd32/freebsd32_misc.c > +++ b/sys/compat/freebsd32/freebsd32_misc.c > @@ -2969,6 +2969,7 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) > > switch (uap->com) { > case PROC_SPROTECT: > + case PROC_TRACE_ENABLE: > error = copyin(PTRIN(uap->data), &flags, sizeof(flags)); > if (error != 0) > return (error); > @@ -2997,6 +2998,9 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) > return (error); > data = &x.rk; > break; > + case PROT_TRACE_STATUS: > + data = &flags; > + break; > default: > return (EINVAL); > } > @@ -3012,6 +3016,10 @@ freebsd32_procctl(struct thread *td, struct freebsd32_procctl_args *uap) > if (error == 0) > error = error1; > break; > + case PROC_TRACE_STATUS: > + if (error == 0) > + error = copyout(&flags, uap->data, sizeof(flags)); > + break; > } > return (error); > } > diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c > index 19c33b6..5b80f5c 100644 > --- a/sys/kern/kern_exec.c > +++ b/sys/kern/kern_exec.c > @@ -634,6 +634,8 @@ interpret: > * it that it now has its own resources back > */ > p->p_flag |= P_EXEC; > + if ((p->p_flag2 & P2_NOTRACE_EXEC) == 0) > + p->p_flag2 &= ~P2_NOTRACE; > if (p->p_flag & P_PPWAIT) { > p->p_flag &= ~(P_PPWAIT | P_PPTRACE); > cv_broadcast(&p->p_pwait); > diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c > index f469db6..2c83422 100644 > --- a/sys/kern/kern_fork.c > +++ b/sys/kern/kern_fork.c > @@ -494,7 +494,7 @@ do_fork(struct thread *td, int flags, struct proc *p2, struct thread *td2, > * Increase reference counts on shared objects. > */ > p2->p_flag = P_INMEM; > - p2->p_flag2 = 0; > + p2->p_flag2 = p1->p_flag2 & (P2_NOTRACE | P2_NOTRACE_EXEC); > p2->p_swtick = ticks; > if (p1->p_flag & P_PROFIL) > startprofclock(p2); > diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c > index 9b0d14a..e265ed5 100644 > --- a/sys/kern/kern_procctl.c > +++ b/sys/kern/kern_procctl.c > @@ -280,6 +280,62 @@ reap_kill(struct thread *td, struct proc *p, struct procctl_reaper_kill *rk) > return (error); > } > > +static int > +trace_enable(struct thread *td, struct proc *p, int state) > +{ > + > + PROC_LOCK_ASSERT(p, MA_OWNED); > + > + /* > + * Ktrace changes p_traceflag from or to zero under the > + * process lock, so the test does not need to acquire ktrace > + * mutex. > + */ > + if ((p->p_flag & P_TRACED) != 0 || p->p_traceflag != 0) > + return (EBUSY); > + > + switch (state) { > + case PROC_TRACE_SET_ENABLED: > + if (td->td_proc != p) > + return (EPERM); > + p->p_flag2 &= ~(P2_NOTRACE | P2_NOTRACE_EXEC); > + break; > + case PROC_TRACE_SET_DISABLED_EXEC: > + p->p_flag2 |= P2_NOTRACE_EXEC | P2_NOTRACE; > + break; > + case PROC_TRACE_SET_DISABLED: > + if ((p->p_flag2 & P2_NOTRACE_EXEC) != 0) { > + KASSERT((p->p_flag2 & P2_NOTRACE) != 0, > + ("dandling P2_NOTRACE_EXEC")); > + if (td->td_proc != p) > + return (EPERM); > + p->p_flag2 &= ~P2_NOTRACE_EXEC; > + } else { > + p->p_flag2 |= P2_NOTRACE; > + } > + break; > + default: > + return (EINVAL); > + } > + return (0); > +} > + > +static int > +trace_status(struct thread *td, struct proc *p, int *data) > +{ > + > + if ((p->p_flag2 & P2_NOTRACE) != 0) { > + KASSERT((p->p_flag & P_TRACED) == 0, > + ("%d traced but tracing disabled", p->p_pid)); > + *data = -1; > + } else if ((p->p_flag & P_TRACED) != 0) { > + *data = p->p_pptr->p_pid; > + } else { > + *data = 0; > + } > + return (0); > +} > + > #ifndef _SYS_SYSPROTO_H_ > struct procctl_args { > idtype_t idtype; > @@ -302,6 +358,7 @@ sys_procctl(struct thread *td, struct procctl_args *uap) > > switch (uap->com) { > case PROC_SPROTECT: > + case PROC_TRACE_ENABLE: > error = copyin(uap->data, &flags, sizeof(flags)); > if (error != 0) > return (error); > @@ -328,6 +385,9 @@ sys_procctl(struct thread *td, struct procctl_args *uap) > return (error); > data = &x.rk; > break; > + case PROC_TRACE_STATUS: > + data = &flags; > + break; > default: > return (EINVAL); > } > @@ -342,6 +402,10 @@ sys_procctl(struct thread *td, struct procctl_args *uap) > if (error == 0) > error = error1; > break; > + case PROC_TRACE_STATUS: > + if (error == 0) > + error = copyout(&flags, uap->data, sizeof(flags)); > + break; > } > return (error); > } > @@ -364,6 +428,10 @@ kern_procctl_single(struct thread *td, struct proc *p, int com, void *data) > return (reap_getpids(td, p, data)); > case PROC_REAP_KILL: > return (reap_kill(td, p, data)); > + case PROC_TRACE_ENABLE: > + return (trace_enable(td, p, *(int *)data)); > + case PROC_TRACE_STATUS: > + return (trace_status(td, p, data)); > default: > return (EINVAL); > } > @@ -375,6 +443,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) > struct pgrp *pg; > struct proc *p; > int error, first_error, ok; > + bool tree_locked; > > switch (com) { > case PROC_REAP_ACQUIRE: > @@ -382,6 +451,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) > case PROC_REAP_STATUS: > case PROC_REAP_GETPIDS: > case PROC_REAP_KILL: > + case PROC_TRACE_STATUS: > if (idtype != P_PID) > return (EINVAL); > } > @@ -391,11 +461,17 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) > case PROC_REAP_STATUS: > case PROC_REAP_GETPIDS: > case PROC_REAP_KILL: > + case PROC_TRACE_ENABLE: > sx_slock(&proctree_lock); > + tree_locked = true; > break; > case PROC_REAP_ACQUIRE: > case PROC_REAP_RELEASE: > sx_xlock(&proctree_lock); > + tree_locked = true; > + break; > + case PROC_TRACE_STATUS: > + tree_locked = false; > break; > default: > return (EINVAL); > @@ -456,6 +532,7 @@ kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data) > error = EINVAL; > break; > } > - sx_unlock(&proctree_lock); > + if (tree_locked) > + sx_unlock(&proctree_lock); > return (error); > } > diff --git a/sys/kern/kern_prot.c b/sys/kern/kern_prot.c > index 2a667b5..a5d9596 100644 > --- a/sys/kern/kern_prot.c > +++ b/sys/kern/kern_prot.c > @@ -1714,6 +1714,10 @@ p_candebug(struct thread *td, struct proc *p) > if ((p->p_flag & P_INEXEC) != 0) > return (EBUSY); > > + /* Denied explicitely */ > + if ((p->p_flag2 & P2_NOTRACE) != 0) > + return (EPERM); > + > return (0); > } > > diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c > index a4f0f88..c4025ba 100644 > --- a/sys/kern/kern_sig.c > +++ b/sys/kern/kern_sig.c > @@ -3243,7 +3243,8 @@ coredump(struct thread *td) > MPASS((p->p_flag & P_HADTHREADS) == 0 || p->p_singlethread == td); > _STOPEVENT(p, S_CORE, 0); > > - if (!do_coredump || (!sugid_coredump && (p->p_flag & P_SUGID) != 0)) { > + if (!do_coredump || (!sugid_coredump && (p->p_flag & P_SUGID) != 0) || > + (p->p_flag2 & P2_NOTRACE) != 0) { > PROC_UNLOCK(p); > return (EFAULT); > } > diff --git a/sys/sys/proc.h b/sys/sys/proc.h > index d7a45e9..4b660d5 100644 > --- a/sys/sys/proc.h > +++ b/sys/sys/proc.h > @@ -674,6 +674,8 @@ struct proc { > > /* These flags are kept in p_flag2. */ > #define P2_INHERIT_PROTECTED 0x00000001 /* New children get P_PROTECTED. */ > +#define P2_NOTRACE 0x00000002 /* No ptrace(2) attach or coredumps. */ > +#define P2_NOTRACE_EXEC 0x00000004 /* Keep P2_NOPTRACE on exec(2). */ > > /* Flags protected by proctree_lock, kept in p_treeflags. */ > #define P_TREE_ORPHANED 0x00000001 /* Reparented, on orphan list */ > diff --git a/sys/sys/procctl.h b/sys/sys/procctl.h > index d11b2b2..503c09b 100644 > --- a/sys/sys/procctl.h > +++ b/sys/sys/procctl.h > @@ -41,6 +41,8 @@ > #define PROC_REAP_STATUS 4 /* reaping status */ > #define PROC_REAP_GETPIDS 5 /* get descendants */ > #define PROC_REAP_KILL 6 /* kill descendants */ > +#define PROC_TRACE_ENABLE 7 /* en/dis ptrace and coredumps */ > +#define PROC_TRACE_STATUS 8 /* query tracing status */ > > /* Operations for PROC_SPROTECT (passed in integer arg). */ > #define PPROT_OP(x) ((x) & 0xf) > @@ -96,6 +98,10 @@ struct procctl_reaper_kill { > #define REAPER_KILL_CHILDREN 0x00000001 > #define REAPER_KILL_SUBTREE 0x00000002 > > +#define PROC_TRACE_SET_ENABLED 1 > +#define PROC_TRACE_SET_DISABLED 2 > +#define PROC_TRACE_SET_DISABLED_EXEC 3 > + > #ifndef _KERNEL > __BEGIN_DECLS > int procctl(idtype_t, id_t, int, void *); > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" > From owner-freebsd-arch@FreeBSD.ORG Fri Jan 2 17:13:22 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5373DCE1; Fri, 2 Jan 2015 17:13:22 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CDF7B660A0; Fri, 2 Jan 2015 17:13:21 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t02HDFAU074198 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 2 Jan 2015 19:13:15 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t02HDFAU074198 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t02HDE5X074197; Fri, 2 Jan 2015 19:13:14 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 2 Jan 2015 19:13:14 +0200 From: Konstantin Belousov To: Robert Watson Subject: Re: Disabling ptrace Message-ID: <20150102171314.GS42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jan 2015 17:13:22 -0000 On Fri, Jan 02, 2015 at 09:16:42AM +0000, Robert Watson wrote: > On Tue, 30 Dec 2014, Konstantin Belousov wrote: > > > The question about a facility to disable introspection functionality (ptrace > > etc) for a process was asked several times. The latest query made me > > actually code the feature. Note that other systems, e.g. Linux and OSX, do > > have similar facilities. > > > > Patch is below, it provides two new procctl(2) requests. PROC_TRACE_ENABLE > > enables or disables tracing. It includes core dumping, ptrace, ktrace, > > debugging sysctls and hwpmc. PROC_TRACE_STATUS allows to get the tracing > > state. > > > > Most interesting question is how should disabling of trace behave with > > regard of fork and exec. IMO, the right model is to protect access to the > > _program_ address space, which translates to inheritance of the attribute > > for fork, and reenabling the tracing on exec. On the other hand, I > > understand that some users want to inherit the tracing disable on exec, so > > there are PROC_TRACE_SET_DISABLED and PROC_TRACE_SET_DISABLED_EXEC, the > > later makes disable to be kept after exec. > > > > Note that it is trivial for root on the host to circumvent the feature. > > I admit some curiosity about the use case as well -- if this is a security > feature, possibly this should be a property of the credential rather than the > process? Among other things, this could help implement trace limitations in > asynchronous contexts -- not that I'm aware of too many currently, but if > you don't want the I/O of a program traced in the synchronous context, I > guess it might follow that you wouldn't want that to happen asynchronously > either (etc). > > On the other hand, if this is a security feature, it doesn't really 'stand > alone' very well, as in absence of a more complete policy, simply disabling > introspection features doesn't provide guarantees, just annoyance until > someone writes a suitable script to work around it. This has been one of my > objections to securelevel, FWIW: while the various bits building it are useful > for admins (e.g., file flages), securelevel isn't a 'coherent' policy that > gives you strong integrity properties, rather, a set of manual frobs and > inconsistent design choices that mean it's very hard to imagine an admin ever > getting it to be useful without running with read-only filesystems, etc (and > even then). > > In MAC policies such as Biba and MLS, we mediate the ability to debug target > processes based on additional labels, but there framing integrity and > confidentiality policies give the restriction a security context. In the Mac > OS X version of the MAC Framework, support for disabling debugging following > execve() is slightly more mature in FreeBSD: in FreeBSD it's done based on > label change (a la credential change), but in Mac OS X it tracks a policy > entry point that decides whether the change has access-control significance > (which means you can have information-flow tracking policies that manipulate > labels but don't change functional aspects of the system, which isn't possible > in the FreeBSD version currently). There's a good argument we should pick up > the Mac OS X design choice there at some point. > > The reason I wonder about the use case is actually one to do with accurate > performance debugging of systems: ssh, ssh-agent, and friends can already > request that they not be core dumped to avoid leaking keying material to > filesystems, but having them be exempt from hwpmc, DTrace, etc, will make > understanding system and application behaviour harder -- and if they 'fail > silently' then users could suffer potentially confusing (and misleading) > results. Is the general concern here that people are turning on > tracing/profiling/etc tools and not understanding that they may be leaking > keying material? > > (Note that we do have a global safety frob to disable ptrace and friends > systemically, security.bsd.unprivileged_proc_debug, which is intended as an > easily available workaround for security vulnerabilities discovered in > process-debugging facilities, which have historically proven a bit > vulnerability-prone across all operating systems I'm aware of. We should > remember this next time we cut an advisory on a ptrace/ktrace/ec > vulnerability.) Use case does not have anything with security. It is obfuscation and making the introspection harder, for some dynamic scope. AFAIU, it is the same as the usage of the similar facilities in Linux and OSX. Everybody involved in conversation understood the inherit limitations of the reliability of the knob. It is usual cost of implementing vs. cost of breaking protection scheme. and since exactly this bit of reverse-engineering defense is asked for many times, I do not see it much wrong in falling into conformity with other OSes. I did not intended the PROC_TRACE_ENABLE applied to the programs manipulating the keying material. I do not see any 'protection' the ban could add there, user already posses an access to his own keys. Even if you mean access to the temporal session keys, generated by for the conversation instance, as opposed to the private key material used to establish sessions, I do not see why would system need to actively prevent user from gaining access to it. Inter-credential barriers must provide adequate separation. I should note that I very much dislike Ubuntu-style vandalization of the ptrace(2) API. AFAIK, it only allows PT_ATTACH to the (indirect) child of the debugger. WRT putting the trace-disabled indicator into the credentials, instead of just having it as a process attribute, I am not sure that this is useful thing. Do you propose to modify existing ucred for the process, effectively applying the notrace policy to the whole set of processes sharing the credential, or do COW for ucred on disable ? If not COW, then the set of the affected processes is somewhat random and too wide. If COW, I do not see much practical difference with the per-process bit, esp. due to the execve(2) handling. Also, I am puzzled by reference to an async context. Could you provide explicit examples where we do such tracing ? From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 13:37:36 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C050A586 for ; Sat, 3 Jan 2015 13:37:36 +0000 (UTC) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 9A486179E for ; Sat, 3 Jan 2015 13:37:36 +0000 (UTC) Received: from [10.0.1.21] (host86-132-107-174.range86-132.btcentralplus.com [86.132.107.174]) by cyrus.watson.org (Postfix) with ESMTPSA id 0F77446B0D; Sat, 3 Jan 2015 08:37:34 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: Disabling ptrace From: Robert Watson In-Reply-To: <20150102171314.GS42409@kib.kiev.ua> Date: Sat, 3 Jan 2015 13:37:33 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <179DAA4D-3526-446C-A0A2-9F7DA137293F@FreeBSD.org> References: <20141230111941.GE42409@kib.kiev.ua> <20150102171314.GS42409@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1993) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 13:37:36 -0000 On 2 Jan 2015, at 17:13, Konstantin Belousov = wrote: > WRT putting the trace-disabled indicator into the credentials, instead > of just having it as a process attribute, I am not sure that this is > useful thing. Do you propose to modify existing ucred for the = process, > effectively applying the notrace policy to the whole set of processes > sharing the credential, or do COW for ucred on disable ? If not COW, > then the set of the affected processes is somewhat random and too = wide. > If COW, I do not see much practical difference with the per-process > bit, esp. due to the execve(2) handling. >=20 > Also, I am puzzled by reference to an async context. > Could you provide explicit examples where we do such tracing ? It=E2=80=99s less about what we do do, and more about what we perhaps = should be doing. I had two things vaguely in mind, neither of which is a = perfect match: (1) Normally, we expose the data from synchronous read(), write(), = send(), and recv() operations using ktrace =E2=80=94 but we don=E2=80=99t = currently do this for asynchronous I/O calls. We likely should, in which = case I was pondering which of the asynchronously available bits of state = we might want to use for it. In principle, we have both the calling = process and credential available, so there, the process would be fine. (2) For other kinds of tracing =E2=80=94 specifically, for the audit = trail =E2=80=94 we make use of saved credentials in a number of cases, = and store audit-related filtering data (not so dissimilar from a = no-trace flag in some ways) in the credential for this reason. While you = are not proposing limiting auditing using this process flag, I think = there are a number of structural similarities here that could suggest a = similar approach. In general, we had always planned to allow auditing of far more = asynchronous events than we currently do =E2=80=94 e.g., firewall events = triggered asynchronously by system-call behaviour. We also had loose = plans to allow auditing of NFS-originated RPCs, although those are = arguably fairly synchronous and not so dissimilar to system calls in = various ways. I=E2=80=99m OK with putting the flag on the process, but frequently the = process credential is where we stick security-related subject/object = flags... Robert= From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 14:25:41 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DD835147; Sat, 3 Jan 2015 14:25:40 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7F2FA1F1B; Sat, 3 Jan 2015 14:25:40 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t03EPZIC019099 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Jan 2015 16:25:35 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t03EPZIC019099 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t03EPZ1H019047; Sat, 3 Jan 2015 16:25:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 3 Jan 2015 16:25:35 +0200 From: Konstantin Belousov To: Robert Watson Subject: Re: Disabling ptrace Message-ID: <20150103142535.GW42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> <20150102171314.GS42409@kib.kiev.ua> <179DAA4D-3526-446C-A0A2-9F7DA137293F@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <179DAA4D-3526-446C-A0A2-9F7DA137293F@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 14:25:41 -0000 On Sat, Jan 03, 2015 at 01:37:33PM +0000, Robert Watson wrote: > On 2 Jan 2015, at 17:13, Konstantin Belousov wrote: > > > WRT putting the trace-disabled indicator into the credentials, instead > > of just having it as a process attribute, I am not sure that this is > > useful thing. Do you propose to modify existing ucred for the process, > > effectively applying the notrace policy to the whole set of processes > > sharing the credential, or do COW for ucred on disable ? If not COW, > > then the set of the affected processes is somewhat random and too wide. > > If COW, I do not see much practical difference with the per-process > > bit, esp. due to the execve(2) handling. > > > > Also, I am puzzled by reference to an async context. > > Could you provide explicit examples where we do such tracing ? > > It???s less about what we do do, and more about what we perhaps should be doing. I had two things vaguely in mind, neither of which is a perfect match: > > (1) Normally, we expose the data from synchronous read(), write(), send(), and recv() operations using ktrace ??? but we don???t currently do this for asynchronous I/O calls. We likely should, in which case I was pondering which of the asynchronously available bits of state we might want to use for it. In principle, we have both the calling process and credential available, so there, the process would be fine. > > (2) For other kinds of tracing ??? specifically, for the audit trail ??? we make use of saved credentials in a number of cases, and store audit-related filtering data (not so dissimilar from a no-trace flag in some ways) in the credential for this reason. While you are not proposing limiting auditing using this process flag, I think there are a number of structural similarities here that could suggest a similar approach. > > In general, we had always planned to allow auditing of far more asynchronous events than we currently do ??? e.g., firewall events triggered asynchronously by system-call behaviour. We also had loose plans to allow auditing of NFS-originated RPCs, although those are arguably fairly synchronous and not so dissimilar to system calls in various ways. Isn't allowing a process to exempt itself from aduting a real security bug ? > > I???m OK with putting the flag on the process, but frequently the process credential is where we stick security-related subject/object flags... Should I interpret the statement as agreement, in principle, with the patch ? From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 16:13:09 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CFC6B7EE; Sat, 3 Jan 2015 16:13:09 +0000 (UTC) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 90B383F4F; Sat, 3 Jan 2015 16:13:09 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id A8E73359306; Sat, 3 Jan 2015 17:13:06 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id 8E32128494; Sat, 3 Jan 2015 17:13:06 +0100 (CET) Date: Sat, 3 Jan 2015 17:13:06 +0100 From: Jilles Tjoelker To: Garrett Wollman Subject: futimens/utimensat (was: Re: Change default VFS timestamp precision?) Message-ID: <20150103161306.GB46373@stack.nl> References: <201412161348.41219.jhb@freebsd.org> <77322.1418933100@critter.freebsd.dk> <77371.1418933642@critter.freebsd.dk> <7567696.mqJ3jgzJgL@ralph.baldwin.cx> <82135.1419010861@critter.freebsd.dk> <20141219194800.GA29107@stack.nl> <201412192012.sBJKC1rW086109@hergotha.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201412192012.sBJKC1rW086109@hergotha.csail.mit.edu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: portmgr@FreeBSD.org, pluknet@FreeBSD.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 16:13:09 -0000 On Fri, Dec 19, 2014 at 03:12:01PM -0500, Garrett Wollman wrote: > In article <20141219194800.GA29107@stack.nl>, jilles@stack.nl writes: > >Because there is no API to set timestamps with nanosecond resolution, > >and therefore a cp -p copy of a file will appear older than the original > >with 99.9% probability. I think that is undesirable. > But that's something we can easily fix -- and should have done, years > ago. Why don't we just *do* that? > Of course, in the case of NFS clients, where this issue is most > severe, the RPCs are already defined. The underlying VOP_SETATTR has > no trouble with nanoseconds, either. It's just a matter of providing > a standard library interface (and associated system call(s)) to do it, > and since Linux has already implemented this, we can just implement > that interface and applications will get it "for free". OK, I dusted off the old patch from pluknet@ and added many necessary things. Please review at https://reviews.freebsd.org/D1426 I added a compatibility wrapper, mainly to save portmgr some work. Of course, this does not add to the old kernel a version of lutimes() that works relative to a file descriptor. I also have changes for cp, mv and more utilities, but that's a different patch. There is also contrib code that either only supports old calls or is explicitly configured to use only old calls. -- Jilles Tjoelker From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 16:32:55 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1B56DBCE; Sat, 3 Jan 2015 16:32:55 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 99B721209; Sat, 3 Jan 2015 16:32:54 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t03GWnAE047306 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 3 Jan 2015 18:32:49 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t03GWnAE047306 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t03GWnXP047305; Sat, 3 Jan 2015 18:32:49 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 3 Jan 2015 18:32:49 +0200 From: Konstantin Belousov To: Robert Watson Subject: Re: Disabling ptrace Message-ID: <20150103163249.GX42409@kib.kiev.ua> References: <20141230111941.GE42409@kib.kiev.ua> <20150102171314.GS42409@kib.kiev.ua> <179DAA4D-3526-446C-A0A2-9F7DA137293F@FreeBSD.org> <20150103142535.GW42409@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150103142535.GW42409@kib.kiev.ua> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 16:32:55 -0000 On Sat, Jan 03, 2015 at 04:25:35PM +0200, Konstantin Belousov wrote: > On Sat, Jan 03, 2015 at 01:37:33PM +0000, Robert Watson wrote: > > I???m OK with putting the flag on the process, but frequently the > > process credential is where we stick security-related subject/object > > flags... Hm, credentials store the rights of the subject, related to the credentials (am I using the correct terminology ?). While the no-trace attribute is not rights, it is very similar to e.g. DAC or ACL on the files, which are stored in inode. No-trace is an attribute of the process, and by the DAC analogy, should be stored in the object which is protected. In other words, we do not disallow some user to do attach with ptrace, but mark some process as not attachable. From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 17:12:08 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B52D5A64 for ; Sat, 3 Jan 2015 17:12:08 +0000 (UTC) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B47B648E9 for ; Sat, 3 Jan 2015 17:12:08 +0000 (UTC) Received: by mail-wg0-f44.google.com with SMTP id b13so25660719wgh.31 for ; Sat, 03 Jan 2015 09:12:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=JCDADHFA4sha4ReO0PQ29CN4P4zrdKs3uRgayggxE/8=; b=AdmY1fvaOIyg0PYsdX6fR2uGinGdlHs8/XbziTaxEVv3Y+jFQV83g5oysVU9dNSSkt VK2+7hxtwezr1YFzjb6ZhcoVqyMD1PzyvBmUrFoJfIxV4egLGVsyy1bpNSlg4QcEwjdh e9IwFhpx2l+RFvvovC8P1X9qUNkLOs80GmF751p0rPz0cOmDKH6AVYb0xQ6Pg0K9utPU JyN5CFpDJlBli/IH6G+ahhPtNh/DxublkDXUSdgzfZz70+YuXFGGbjzKgKq2wpTd5qT4 zH6HSx7877FtTPxSPa995ViEcDSnaUHwnCGiyy0G93dj+MGpOLJFKVVbH2L65wKuWlQa Ft5Q== MIME-Version: 1.0 X-Received: by 10.180.80.163 with SMTP id s3mr8341099wix.59.1420305126746; Sat, 03 Jan 2015 09:12:06 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.216.41.136 with HTTP; Sat, 3 Jan 2015 09:12:06 -0800 (PST) Date: Sat, 3 Jan 2015 09:12:06 -0800 X-Google-Sender-Auth: sYQD9CasKHd9-AJAJTVO1o4vjR4 Message-ID: Subject: odd behaviour with graphics, vt(4), VGA, and probing the text region leading to NMIs at boot From: Adrian Chadd To: "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 17:12:08 -0000 Hi, I found -head didn't boot on a T400 I have here with external radeon graphics (ie, not using the intel graphics.) I was getting a memory parity error NMI upon boot, shortly after probing option ROMs. It turns out that it's the ISA device probe/attach path trying to attach the VGA adapter, and doing testing of the memory regions at 0xb800:0000 (mono text) and 0xb000:0000 (colour text) - yes, I'm using the 8086 real mode notation because I'm talking about ISA graphics cards. If I boot -HEAD with vt(4) in graphics mode, then it's using 640x480 graphics and that's mapped at 0xa000:0000 for 64k. There apparently isn't anything at 0xb800:0000 or 0xb000:0000 and touching those regions leads to an NMI. Now, the tricksy bit is that it's not accessing those addresses directly - it's going via the physical map, that's mapped it at 0xc00b8000 and 0xc00b0000 - yes, those are 32 bit (virtual, kernel mode) addresses. I don't know if this is part of the problem. If I boot -HEAD with vt(4) in text mode (hw.vga.textmode=1) then it boots fine. So - has anyone else seen/debugged this? It doesn't happen on everything - just this one T400 I have. But I can't help but wonder what else is going to get finnicky about ISA probing like this. From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 17:27:39 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8E7D2A32; Sat, 3 Jan 2015 17:27:39 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 7C8C764B5F; Sat, 3 Jan 2015 17:27:39 +0000 (UTC) Received: from [10.0.1.110] (c-76-21-10-192.hsd1.ca.comcast.net [76.21.10.192]) by elvis.mu.org (Postfix) with ESMTPSA id 69CBE341F854; Sat, 3 Jan 2015 09:27:38 -0800 (PST) Subject: Re: odd behaviour with graphics, vt(4), VGA, and probing the text region leading to NMIs at boot Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Alfred Perlstein In-Reply-To: Date: Sat, 3 Jan 2015 09:31:22 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <2F715AD9-CF34-4516-B75E-E14C77BFD2E3@mu.org> References: To: Adrian Chadd X-Mailer: Apple Mail (2.1283) Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 17:27:39 -0000 Sounds like the virtualization code is sensitive to the mode. Meaning = that whatever is virtualizing the graphics card doesn't do the proper = thing when you're in a graphical mode. Can't the probes be turned off? Basically when the "card" is in gfx mode, it doesn't do the right thing = for those memory addresses. Sounds like the fix would be to not touch = those regions unless you're actually in text mode. =20 On Jan 3, 2015, at 9:12 AM, Adrian Chadd wrote: > Hi, >=20 > I found -head didn't boot on a T400 I have here with external radeon > graphics (ie, not using the intel graphics.) >=20 > I was getting a memory parity error NMI upon boot, shortly after > probing option ROMs. >=20 > It turns out that it's the ISA device probe/attach path trying to > attach the VGA adapter, and doing testing of the memory regions at > 0xb800:0000 (mono text) and 0xb000:0000 (colour text) - yes, I'm using > the 8086 real mode notation because I'm talking about ISA graphics > cards. >=20 > If I boot -HEAD with vt(4) in graphics mode, then it's using 640x480 > graphics and that's mapped at 0xa000:0000 for 64k. There apparently > isn't anything at 0xb800:0000 or 0xb000:0000 and touching those > regions leads to an NMI. >=20 > Now, the tricksy bit is that it's not accessing those addresses > directly - it's going via the physical map, that's mapped it at > 0xc00b8000 and 0xc00b0000 - yes, those are 32 bit (virtual, kernel > mode) addresses. I don't know if this is part of the problem. >=20 > If I boot -HEAD with vt(4) in text mode (hw.vga.textmode=3D1) then it = boots fine. >=20 > So - has anyone else seen/debugged this? It doesn't happen on > everything - just this one T400 I have. But I can't help but wonder > what else is going to get finnicky about ISA probing like this. > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" >=20 From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 17:35:02 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 71334C1A; Sat, 3 Jan 2015 17:35:02 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 5EECD64CC2; Sat, 3 Jan 2015 17:35:02 +0000 (UTC) Received: from [10.0.1.110] (c-76-21-10-192.hsd1.ca.comcast.net [76.21.10.192]) by elvis.mu.org (Postfix) with ESMTPSA id 01B0B341F853; Sat, 3 Jan 2015 09:35:01 -0800 (PST) Subject: Re: odd behaviour with graphics, vt(4), VGA, and probing the text region leading to NMIs at boot Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Alfred Perlstein In-Reply-To: <2F715AD9-CF34-4516-B75E-E14C77BFD2E3@mu.org> Date: Sat, 3 Jan 2015 09:38:46 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <2F715AD9-CF34-4516-B75E-E14C77BFD2E3@mu.org> To: Adrian Chadd X-Mailer: Apple Mail (2.1283) Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 17:35:02 -0000 Another option might be to try forcing text mode, then switching to gfx = then trying to write to the b800/b000 memory region. See if that "initializes" things properly. -Alfred On Jan 3, 2015, at 9:31 AM, Alfred Perlstein wrote: > Sounds like the virtualization code is sensitive to the mode. Meaning = that whatever is virtualizing the graphics card doesn't do the proper = thing when you're in a graphical mode. Can't the probes be turned off? >=20 > Basically when the "card" is in gfx mode, it doesn't do the right = thing for those memory addresses. Sounds like the fix would be to not = touch those regions unless you're actually in text mode. =20 >=20 > On Jan 3, 2015, at 9:12 AM, Adrian Chadd wrote: >=20 >> Hi, >>=20 >> I found -head didn't boot on a T400 I have here with external radeon >> graphics (ie, not using the intel graphics.) >>=20 >> I was getting a memory parity error NMI upon boot, shortly after >> probing option ROMs. >>=20 >> It turns out that it's the ISA device probe/attach path trying to >> attach the VGA adapter, and doing testing of the memory regions at >> 0xb800:0000 (mono text) and 0xb000:0000 (colour text) - yes, I'm = using >> the 8086 real mode notation because I'm talking about ISA graphics >> cards. >>=20 >> If I boot -HEAD with vt(4) in graphics mode, then it's using 640x480 >> graphics and that's mapped at 0xa000:0000 for 64k. There apparently >> isn't anything at 0xb800:0000 or 0xb000:0000 and touching those >> regions leads to an NMI. >>=20 >> Now, the tricksy bit is that it's not accessing those addresses >> directly - it's going via the physical map, that's mapped it at >> 0xc00b8000 and 0xc00b0000 - yes, those are 32 bit (virtual, kernel >> mode) addresses. I don't know if this is part of the problem. >>=20 >> If I boot -HEAD with vt(4) in text mode (hw.vga.textmode=3D1) then it = boots fine. >>=20 >> So - has anyone else seen/debugged this? It doesn't happen on >> everything - just this one T400 I have. But I can't help but wonder >> what else is going to get finnicky about ISA probing like this. >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" >>=20 >=20 > _______________________________________________ > freebsd-arch@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-arch > To unsubscribe, send any mail to = "freebsd-arch-unsubscribe@freebsd.org" >=20 From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 17:38:21 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 73A16D9B for ; Sat, 3 Jan 2015 17:38:21 +0000 (UTC) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 47C1E64D26 for ; Sat, 3 Jan 2015 17:38:21 +0000 (UTC) Received: from [10.0.1.17] (host86-132-107-174.range86-132.btcentralplus.com [86.132.107.174]) by cyrus.watson.org (Postfix) with ESMTPSA id 9323846B0D; Sat, 3 Jan 2015 12:38:19 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: Disabling ptrace From: "Robert N. M. Watson" In-Reply-To: <20150103163249.GX42409@kib.kiev.ua> Date: Sat, 3 Jan 2015 17:38:18 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20141230111941.GE42409@kib.kiev.ua> <20150102171314.GS42409@kib.kiev.ua> <179DAA4D-3526-446C-A0A2-9F7DA137293F@FreeBSD.org> <20150103142535.GW42409@kib.kiev.ua> <20150103163249.GX42409@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1993) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 17:38:21 -0000 On 3 Jan 2015, at 16:32, Konstantin Belousov = wrote: >=20 > On Sat, Jan 03, 2015 at 04:25:35PM +0200, Konstantin Belousov wrote: >> On Sat, Jan 03, 2015 at 01:37:33PM +0000, Robert Watson wrote: >>> I???m OK with putting the flag on the process, but frequently the >>> process credential is where we stick security-related subject/object >>> flags... > Hm, credentials store the rights of the subject, related to the > credentials (am I using the correct terminology ?). While the no-trace > attribute is not rights, it is very similar to e.g. DAC or ACL on the > files, which are stored in inode. No-trace is an attribute of the > process, and by the DAC analogy, should be stored in the object which = is > protected. >=20 > In other words, we do not disallow some user to do attach with ptrace, > but mark some process as not attachable. Processes are different from most other kernels objects in that they are = both subjects and objects of operations. While subject 'credentials' in = the classic UNIX model (UIDs, GIDs, additional groups) differ from = object metadata (e.g., user/group/permissions), for other models the = same data structures are used for both the subject and object (e.g., for = most labeled MAC policies). When we do inter-process access control, the = credential of the target process is used for most aspects of protection, = just as file ownership/permissions would be, so really are its object = properties as much as its subject properties. Robert= From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 17:51:03 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 913F9258 for ; Sat, 3 Jan 2015 17:51:03 +0000 (UTC) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 68C8864F23 for ; Sat, 3 Jan 2015 17:51:03 +0000 (UTC) Received: from [10.0.1.21] (host86-132-107-174.range86-132.btcentralplus.com [86.132.107.174]) by cyrus.watson.org (Postfix) with ESMTPSA id 500BE46B3F; Sat, 3 Jan 2015 12:51:02 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) Subject: Re: Disabling ptrace From: Robert Watson In-Reply-To: <20150103142535.GW42409@kib.kiev.ua> Date: Sat, 3 Jan 2015 17:51:00 +0000 Content-Transfer-Encoding: quoted-printable Message-Id: <4DC2F5F5-1C46-453C-80C9-0BCC8884A1A1@FreeBSD.org> References: <20141230111941.GE42409@kib.kiev.ua> <20150102171314.GS42409@kib.kiev.ua> <179DAA4D-3526-446C-A0A2-9F7DA137293F@FreeBSD.org> <20150103142535.GW42409@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1993) Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 17:51:03 -0000 On 3 Jan 2015, at 14:25, Konstantin Belousov = wrote: >=20 >> In general, we had always planned to allow auditing of far more = asynchronous events than we currently do ??? e.g., firewall events = triggered asynchronously by system-call behaviour. We also had loose = plans to allow auditing of NFS-originated RPCs, although those are = arguably fairly synchronous and not so dissimilar to system calls in = various ways. >=20 > Isn't allowing a process to exempt itself from aduting a real security = bug ? Oh, definitely. This was an example, however, of more asynchronous = tracing types and events, where having access to the =E2=80=98tracing = disabled=E2=80=99 state of the originating process might prove = important. For example, if we extended ktrace to support tracing some of = the same sorts of asynchronous events, where full process context = isn=E2=80=99t available, but the events can be cleanly tied back to the = initiating process via a saved credential. >> I???m OK with putting the flag on the process, but frequently the = process credential is where we stick security-related subject/object = flags... >=20 > Should I interpret the statement as agreement, in principle, with the = patch ? As long as it is clearly and carefully documented in the man page that = this is a non-security feature, I=E2=80=99m alright with it being = brought in. We might want to think about how tools such as DTrace, etc, = should report tracing failures when the flag is set. Robert= From owner-freebsd-arch@FreeBSD.ORG Sat Jan 3 21:28:40 2015 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E6D75985; Sat, 3 Jan 2015 21:28:40 +0000 (UTC) Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailhost.stack.nl", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 76E153139; Sat, 3 Jan 2015 21:28:40 +0000 (UTC) Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131]) by mx1.stack.nl (Postfix) with ESMTP id D964E3592DF; Sat, 3 Jan 2015 22:28:37 +0100 (CET) Received: by snail.stack.nl (Postfix, from userid 1677) id BDAF028494; Sat, 3 Jan 2015 22:28:37 +0100 (CET) Date: Sat, 3 Jan 2015 22:28:37 +0100 From: Jilles Tjoelker To: Konstantin Belousov Subject: Re: Fixing dlopen("libpthread.so") Message-ID: <20150103212837.GC46373@stack.nl> References: <20141226165337.GJ1754@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141226165337.GJ1754@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: threads@freebsd.org, arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Jan 2015 21:28:41 -0000 On Fri, Dec 26, 2014 at 06:53:37PM +0200, Konstantin Belousov wrote: > It is somewhat well-known that our libthr.so cannot be loaded > dynamically into the process. Or rather, it can be, but the > consequences are catastrophic. We recommend to link any program which > may load modules, explicitely with -lpthread; the known workaround is > to do LD_PRELOAD=libthr.so.3 for binaries which were not. I > implemented support for ld -z nodlopen some time ago, but attempt to > mark libthr.so as non-loadable caused extreme roar. > A common opinion is that the proper way to fix the problem is > to merge the actual code from libthr into libc, leaving libthr as the > filter to preserve the current ABI. Unfortunately, there are some > non-trivial and undesirable consequences of doing this. > First, all pthread mutexes (and other kind of locks) would become > fully initialized and used even for single-threaded programs, at least > I do not see a way to work around this. Right now, libc shims for > pthread_mutex_init() and pthread_mutex_lock(3) are nop. After the > merge, init needs to allocate memory and lock/unlock operations, > although uncontested, will start costing one atomic each. In > particular, malloc(3) and stdio(3) are affected. > Another very delicate issue is introducing unwanted cancellation > points into libc functions after libthr wrappers become mandatory. > This is fixable, but requires lot of mundane work and probably a long > time to find missed places (i.e. bugs). > There are probably more problems, and this brings an obvious > alternative: fix the issues which make dlopen("libthr.so") so > destructive. > One known show-stopper is the broken errno after the load. The libthr > provides the interposer for the errno and all cancellable functions > from libc. If any interposed symbols have been resolved before the > libthr.so was loaded, or non-lazy binding mode is requested, the > bindings cannot be undonde. In particular, references to __error(), > which implements errno, are bound to return locate of the main thread > errno variable. Similarly, code referencing cancellable functions > still gets the uncancellable libc implementations of them. > Another issue is the recursion between malloc(3) and mutex_init(). > The statically initialized pthread_mutex_t needs some further > initialization before first use. Jemalloc calls pthread_mutex_init(3) > for internally-used mutexes, which is nop stub from libc until libthr > is loaded. After the load, first use of any mutex by malloc(3) leads > to the thr_mutex.c initialization code, which needs calloc(3). This > immediately leads to hang due to recursion on some internal libthr > umtx. Making the lock recursive does not solve the problem, which is > the infinite mutual recursion between malloc and pthread_mutex_lock() > for uninitialized malloc mutex. > Yet another issue is the signal handlers. The libthr routes signal > delivery through its internal signal handler, to avoid interrupting > critical sections. Any signal handler installed prior to libthr is > loaded misses the wrapper, potentially breaking cancellation and > critical sections. > Proposed patch does the following: > - Remove libthr interposers of the libc functions, including > __error(). Instead, functions calls are indirected through the > interposing table, similar to how pthread stubs in libc are already > done. Libc by default points either to syscall trampolines or to > existing libc implementations. On libthr load, it rewrites the > pointers to the cancellable implementations already in libthr. > - Postpone the malloc(3) internal mutexes initialization until libthr > is loaded. > - Reinstall signal handlers with wrapper on libthr load. > The signal handler reinstallation on libthr initialization is only > needed when libthr.so is dlopened. Performing 128*2 sigaction(2) > calls on the startup of the binary which is linked to libthr, and thus > libthr is guaranteed to install proper sighandler wrappers, is huge > waste. So, I perform the hand-over of signal handlers only for the > dlopen-ed libthr, which now needs to detect loading at startup > vs. dlopen. I was unable to distinguish the cases using existing > facilities, so new private rtld interface is implemented, > _rtld_is_dlopened(), to query the way library was brought into the > process address space. > Without some special measures, static binaries would pull in the whole > set of the interposed syscalls due to references from the > interposition table. To fix it, the references are made weak. Also, > to not pull in the pthread stubs, the interposition table is separate > from pthreads stubs indirection table. > The patch is available at > https://www.kib.kiev.ua/kib/libthr_dlopen.1.patch . > Among other things, I tested it with the program illustrating the > issues https://www.kib.kiev.ua/kib/threaded_errno.c . > Note that you must use matching versions of rtld, libc and libthr. > Using old ld-elf.so.1 or old libc.so.7 with new libthr.so.3 will > break the system. I think this can work, conceptually. Does this also mean that the order of libc and libthr on the link line no longer matters? (so SVN r265003 could be reverted?) It looks like this change may introduce more indirection in common paths which might harm performance. The effect is slight and on the other hand more single-threaded applications can benefit from non-libthr performance. I noticed a problem in lib/libc/gen/raise.c: it calls the INTERPOS_pause entry instead of the INTERPOS_raise entry. Alternatively, libc could call thr_self() and thr_kill() so libthr need not do anything (much like how lib/libc/gen/sem_new.c works). The number of interposers could be reduced slightly by doing simple work like wait(), waitpid() and wait3() before instead of after interposition. This would be an additional change and the current patch is more like the existing code. In lib/libc/sys/__error.c, __error_selector should be declared static or __hidden and an accessor function added for libthr, to avoid an additional pointer chase in a common path. One more PLT and GOT reference in some situations may be avoided by making a __hidden alias for __error_unthreaded and making that __error_selector's initial value. Although __libc_interposing is properly not exported (with __libc_interposing_slot accessor), the full benefit is not obtained because __libc_interposing is not declared __hidden in lib/libc/include/libc_private.h, so the compiler will generate an indirect access anyway (and ld will fix it up using a relative relocation). In lib/libthr/thread/thr_private.h, __error_threaded() can be declared __hidden. Sharing __sys_* code between libc and libthr is a net loss (the two copies of the name, GOT entry, PLT entry and longer instruction sequences are almost certainly larger than a syscall stub (32 bytes on amd64)), so I think the best fix for most indirection from __libc_interposing entries is to give libthr its own copy and mark both copies hidden, instead of adding hidden aliases now. -- Jilles Tjoelker