From owner-svn-src-head@freebsd.org Tue Nov 28 16:26:07 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0BBB4DBB94E; Tue, 28 Nov 2017 16:26:07 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by mx1.freebsd.org (Postfix) with ESMTP id 8144A6C450; Tue, 28 Nov 2017 16:26:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id 87926102316; Wed, 29 Nov 2017 02:56:03 +1100 (AEDT) Date: Wed, 29 Nov 2017 02:56:02 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andrew Turner cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r326312 - head/sys/arm64/arm64 In-Reply-To: <201711281104.vASB4lTu024913@repo.freebsd.org> Message-ID: <20171128234051.L1761@besplex.bde.org> References: <201711281104.vASB4lTu024913@repo.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=cK6QihWN c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=Jb6GJt-qubM306kzQlUA:9 a=4mpojyaBOmDga0F6:21 a=EBaCygjbx1TqSPLS:21 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2017 16:26:07 -0000 On Tue, 28 Nov 2017, Andrew Turner wrote: > Log: > When we exit the kernel debugger having entered because of a breakpoint > instruction we need to jump over the instruction. Without this we will > execute the same instruction again and enter into the debugger again. > > PR: 223917 > Reported by: emaste > MFC after: 1 week > Sponsored by: DARPA, AFRL This belongs in the debugger. ddb has several macros for skipping back and forth over the breakpoint instruction, and normally returns with it skipped over. gdb presumably does the same. The macros seem to be correct for arm64. > Modified: head/sys/arm64/arm64/trap.c > ============================================================================== > --- head/sys/arm64/arm64/trap.c Tue Nov 28 09:34:43 2017 (r326311) > +++ head/sys/arm64/arm64/trap.c Tue Nov 28 11:04:47 2017 (r326312) > @@ -323,7 +323,10 @@ do_el1h_sync(struct thread *td, struct trapframe *fram > break; > } > #endif > - /* FALLTHROUGH */ > + kdb_trap(exception, 0, > + (td->td_frame != NULL) ? td->td_frame : frame); > + frame->tf_elr -= 4; This seems to have the wrong sign as well as being fundamentally wrong. Execution only decrements the pc on exotic arches, and arm64 seems to be non-exotic, since the BKPT_SKIP() instruction in db_machdep.h for skipping the pc does frame->tf_elr += BKPT_SIZE. BKPT_SKIP also does kdb_thrctx->pcb_rip += BKPT_SIZE on amd64, i386 and mips. This is missing on arm, arm64 and riscv and sparc64. BKPT_SKIP is missing on powerpc. On sparc64, BKPT_SKIP also advances a second pc by 2*BKPT_SIZE. > + break; > case EXCP_WATCHPT_EL1: > case EXCP_SOFTSTP_EL1: > #ifdef KDB Falling through was almost correct. I don't see anything wrong with BKPT_SKIP(), so the bug seems to break the usual case where ddb skips over the breakpoint. Then the above reverse the skip. Perhaps gdb is broken and advances by 8. Then the above corrects +8 to +4. Unusual cases are much more complicated and broken. Sometimes the instruction needs to be restarted or user has set the pc to an unrelated value -- then only the debugger knows what the pc should be. Old bugs in the fall-through case: - non-bug: when KDB is not configured, panic is called. This is correct, especially for the breakpoint case since the instruction stream might be corrupt in this case. - when KDB configured but no backend is attached, kdb_trap() fails and panic should be called then, as on x86, for the same reasons as when KDB is not configured. - it is a style bug to ifdef the call to kdb_trap() with KDB. When KDB is not configured, kdb_trap() fails and correct code checks for this and panics, as on x86. - non-bug: since the hardware doesn't advance the pc on arm64, the instruction stream is actually never corrupt when kdb_trap() fails - kdb_trap() can also fail when ddb is attached but the console is transiently unavailable. Then the instruction stream is corrupt in some cases, but it is wrong to turn console unavailability into a panic. This is partly fixed in my version, but fixing up the instruction stream and returning success from ddb (with the usual complications for continuing over a breakpoint). Console unavailability remains very broken as implemented -- it does little except try to give this panic but not give it in many cases. New bugs given by not falling through: - non-bug: it is missing the style bug of the KDB ifdef - it is missing the panic in the case where KDB is not configured - non-new-bug: when KDB is not configured it is missing the error check and panic as before. One case that is clearly broken now but probably worked before is when multiple CPUs hit the same breakpoint and the breakpoint is removed on one of them that is not the last one to enter ddb. Then the correct behaviour is: - enter with all pc's advanced or not over the breakpoint instruction (assume that it is a specialy instruction written into the instruction stream; this corrupts the stream and must be fixed up) - in each ddb instance, unadvance the pc if necessary to the breakpoint instruction - determine if the pc is for a breakpoint set by us. Until the breakpoint is unset by one of the ddb instances, find that it was set by us. Continue normally in this case (put back the original instruction and single-step over it. This commit seems to completely break this handling, even for a single ddb instance). Then get a single-step trap and put back the breakpoint instruction.) - there is not enough synchronization, so ddb loses control when it returns to single-step. Other ddb instances hitting the breakpoint see the original instruction while racing with the previous step. They should put it back over itself and do much the same. - eventually 1 of the ddb instances removes the breakpoint. It puts back the original instruction and continues without single-stepping. - ddb instances after this enter with a breakpoint trap but no sign of the breakpoint in the instruction stream. They must restart at where the breakpoint was. This is very broken on x86 (except in my version). I think it used to work automatically on arm64, because the hardware doesn't advance the pc. Now the software breaks it by advancing unconditionally. Here is the ddb code for fixing up the pc after a breakpoint, with my fixes applied: X bool X db_stop_at_pc(int type, int code, bool *is_breakpoint, bool *is_watchpoint) X { X db_addr_t pc; X db_breakpoint_t bkpt; X X *is_breakpoint = IS_BREAKPOINT_TRAP(type, code); X *is_watchpoint = IS_WATCHPOINT_TRAP(type, code); X pc = PC_REGS(); X if (db_pc_is_singlestep(pc)) X *is_breakpoint = false; ddb watchpoints are mostly broken and cause confusion. x86 doesn't have any, but maps x86 hardware breakpoints to ddb watchpoints. I think the latter are supposed to be in software and can be implemented using software interrupts, but x86 doesn't bother. It is unclear what they can do that breakpoint software interrupts can't do -- they can at least give a different class of breakpoints encoded by the software interrupt number. arm64 has a watchpoint exception number EXCP_WATCHPOINT_EL1. This maps to is_watchpoint = TRUE in the above. X X db_clear_single_step(); X db_clear_breakpoints(); X db_clear_watchpoints(); X X #ifdef FIXUP_PC_AFTER_BREAK X if (*is_breakpoint) { X /* X * Breakpoint trap. Fix up the PC if the X * machine requires it. X */ X FIXUP_PC_AFTER_BREAK X pc = PC_REGS(); X } X #endif FIXUP_PC_AFTER_BREAK must be defined by arches where the hardware doesn't advance the pc over breakpoint instructions in the instruction stream. On x86, it unadvances the pc by 1. On arm64, it is not defined, since the pc is already unadvanced. X X /* X * Now check for a breakpoint at this address. X */ X bkpt = db_find_breakpoint_here(pc); X if (bkpt) { X if (--bkpt->count == 0) { X bkpt->count = bkpt->init_count; X *is_breakpoint = true; X return (true); /* stop here */ X } X return (false); /* continue the countdown */ X } else if (*is_breakpoint) { X #ifdef BKPT_SKIP We unadvanced the pc to look it up in the breakpoint table. Now we must advance it over the breakpoint instruction to restart from there since it was not in our table. It should be part of the normal instruction stream. If another debugger overwrote the instruction stream and gave control to us, then that is a bug in the other debugger and we will probably crash soon. X #ifdef SMP X /* X * In the SMP case, 'bkpt' doesn't tell us if this breakpoint X * was controlled by us when we hit it, since the breakpoint X * may have been cleared during a ddb entry by another CPU X * which won the race to enter. Only skip if the breakpoint X * is still in the instruction stream. Otherwise, turn the X * breakpoint into an unknown trap except for ignoring it if X * we are continuing. X */ X BKPT_INST_TYPE memval, testval; X X memval = db_get_value(pc, BKPT_SIZE, FALSE); X testval = BKPT_SET(&testval); X if (bcmp(&memval, &testval, BKPT_SIZE) == 0) X BKPT_SKIP; X else { X *is_breakpoint = false; X if (db_run_mode == STEP_CONTINUE) X return (false); /* continue */ X } For the SMP case, the other debugger can be ddb, and this is my fix for it. X #else X BKPT_SKIP; The !SMP case is simpler. BKPT_SKIP usually just advances the pc by BKPT_SIZE, but does a contitional advance related to the current problem on mips. X #endif X #endif X } We already returned with the unadvanced pc if we are contining over a breakpoint. Advancing in the caller breaks this. Unadvancing breaks it more. X X *is_breakpoint = false; /* might be a breakpoint, but not ours */ Here we continue with the advanced pc. Again it seems to be correct on arm64, and advancing or unadvancing in the caller breaks it. Here are my fixes in the above code: X Index: db_run.c X =================================================================== X --- db_run.c (revision 325403) X +++ db_run.c (working copy) X @@ -36,6 +36,7 @@ X __FBSDID("$FreeBSD$"); X X #include X +#include X #include X #include X X @@ -129,8 +130,31 @@ X return (false); /* continue the countdown */ X } else if (*is_breakpoint) { X #ifdef BKPT_SKIP X +#ifdef SMP X + /* X + * In the SMP case, 'bkpt' doesn't tell us if this breakpoint X + * was controlled by us when we hit it, since the breakpoint X + * may have been cleared during a ddb entry by another CPU X + * which won the race to enter. Only skip if the breakpoint X + * is still in the instruction stream. Otherwise, turn the X + * breakpoint into an unknown trap except for ignoring it if X + * we are continuing. X + */ X + BKPT_INST_TYPE memval, testval; X + X + memval = db_get_value(pc, BKPT_SIZE, FALSE); X + testval = BKPT_SET(&testval); BKPT_SET() is a misimplemented macro that takes an arg which is always ignored; it is always implemented as a manifest constant (ddb macros mostly have the opposite error of looking like manifest constants but being function-like macros with no args but side effects). It is apparently meant for setting a breakpoint instruction directly in the instruction stream, but that is too hard so it is always used as above by setting a local variable and then copying that to memory using a special ddb access function. X + if (bcmp(&memval, &testval, BKPT_SIZE) == 0) X BKPT_SKIP; kib didn't like the MD'ness of this, but it is no worse than hard-coding the setting of the instruction using the accesss function because the BKPT_SET() macro is useless for applying MD settings to the instruction stream (as used, it might apply them only to a local variable)). X + else { X + *is_breakpoint = false; X + if (db_run_mode == STEP_CONTINUE) X + return (false); /* continue */ X + } X +#else X + BKPT_SKIP; X #endif X +#endif X } X X *is_breakpoint = false; /* might be a breakpoint, but not ours */ X Behaviour of various arches for advancing and unadvancing the pc after a breakpoint: - and64, i386: +-1 over 1-byte instruction. 1 is spelled as itself instead of as BKPT_SIZE in the implementation. Most implentations have bogus parentheses around literal constants. - arm: +4 and no unadvance over 4-byte instruction. 4 is spelled BKPT_SIZE and further obfuscated by spelling BKPT_SIZE as INSN_SIZE in the implementation. INSN_SIZE is defined in another header, so db_machdep.h requires more namespace pollution than it should. BKPT_INSTRUCTION is similarly obfuscated by spelling it as KERNEL_BREAKPOINT which is defined in yet another header so even more pollution is required. - arm64: like arm except without the pollution. Minor obfuscation of spelling 4 as BKPT_SIZE in the implementation. The change should use the macro if it is correct, but spells BKPT_SIZE as -4. - mips: +4 and no unadvance over 4-byte instruction. 4 is spelled BKPT_SIZE BKPT_SIZE but BKPT_SIZE is not obfuscated as on arm. BREAKPOINT_INSTRUCTION doesn't exit and shouldn't be exported as on arm*, but its literal value is more interesting. mips apparently has many software interrupts which give the same debugger exception and the instruction has to be decoded to decide which one. BKPT_SE() returns one for ddb, MIPS_BREAK_DDB, and pollution is implemented by not defining this in ddb's header file. BKPT_SKIP() is complicted to read the instruction stream much like my patch does. Although MIPS_BREAK_DDB is a single instruction, BKPT_SKIP ignores some bits in the comparision so as to skip for this and some other instructions. It makes some sense to use different software interrupts for different debuggers, but I don't see how this can fix the problem of debuggers possibly corrupting each others' instruction streams with breapoint instructions. It can help detect the problem. - powerpc: like arm64 - riscv: like arm - sparc64: like arm64, except it spells BKPT_SIZE as 4 in the implementation and has a secondary pc which it advances. amd64 and i386 are the only arches where the hardware advances over the breakpoint so that the unadvance is needed. kdb seems to be most broken for powerpc. powerpc has no advance and no unadvance. It doesn't even call kdb_reenter(). Bruce