Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Oct 2007 00:58:29 +0200
From:      "Heiko Wundram (Beenic)" <wundram@beenic.net>
To:        freebsd-questions@freebsd.org
Subject:   Re: Mentor for C self study wanted
Message-ID:  <200710240058.29506.wundram@beenic.net>
In-Reply-To: <200710232324.09851.h.schmalzbauer@omnisec.de>
References:  <200710232044.53240.h.schmalzbauer@omnisec.de> <20071023162454.93851854.wmoran@potentialtech.com> <200710232324.09851.h.schmalzbauer@omnisec.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Am Dienstag, 23. Oktober 2007 23:24:09 schrieb Harald Schmalzbauer:
> #include <stdio.h>
>
> void main()
> {
>   short nnote;
>
>   // Numerischen Notenwert einlesen
>   printf("Bitte numerischen Schulnotenwert eingeben: ");
>   scanf("%d",&nnote);

man 3 scanf (most important thing to look at with any such problem is the=20
C-library documentation, which is excellent on FreeBSD) says that for "%d"=
=20
the passed pointer has to be a pointer to "integer", which &nnote is not.=20
&nnote is a pointer to short, which points to 2 bytes, whereas a pointer to=
=20
integer is a pointer to 4 bytes of storage.

Generally, nnote is reserved by the compiler on the stack (as it's a local=
=20
variable) with two bytes (but this depends on your platform), and &nnote=20
points to the beginning of this area.

As you are probably running on a little-endian architecture, the layout tha=
t=20
scanf presumes is (from low to high):

=2D--> increasing addresses
lsbyte 2 3 msbyte
^
|-- &nnote points here

of which only the first two are interpreted as nnote by the rest of the=20
program; the upper two are different stack content (probably a return addre=
ss=20
to the C initialization code calling main(), or a pushed stack pointer, or=
=20
such, as your procedure defines no other locals, see below).

Now, when scanf assigns the four bytes, it'll properly enter the lower two=
=20
bytes of the integer into "lsbyte 2" (which is nnote, in the same byte=20
order), but overwrite two bytes that are above it.

When main() finishes, the (now broken) saved address (of which "3 msbyte" i=
s=20
the lower half) is popped, which leads to the SIGSEGV you're seeing.

In case you were on big-endian, the result would be different (i.e., the or=
der=20
would be reversed, so that nnote would always be zero or minus one in case=
=20
you entered small integral values in terms of absolute value), but=20
effectively, the return address would be overwritten as well, breaking it.

This is effectively what can be called a buffer-overflow.

Just to finish this: the proper format would be "%hd", for which the flag "=
h"=20
signifies that the pointer is a pointer to a "short int", also documented i=
n=20
man 3 scanf.

Why aren't you seeing this behaviour with printf (i.e., why can you pass a=
=20
short but still specify "%d")? Because C defines that functions that take a=
=20
variable number of arguments (of which printf is one such) get each argumen=
t=20
as type "long" (the type that's at least as big as a pointer on the current=
=20
platform), so when passing a short as argument to a var-args function, the=
=20
C-compiler inserts code which makes sure that the value is promoted to a lo=
ng=20
in the argument stack for printf. scanf is also a varargs function, but=20
you're not passing the value of nnote, but rather a pointer to it, which=20
(should) already be as wide as a long.

=46inally, looking at (parts of) the assembly that gcc generates (on a=20
little-endian i386 machine):

=2Eglobl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp

; Set up the pointer to the local frame (EBP on i386). All locals are
; relative to EBP in a function.
        movl    %esp, %ebp

; ECX is the first (hidden) local.
        pushl   %ecx

        subl    $20, %esp
        subl    $12, %esp
        pushl   $.LC0
        call    printf
        addl    $16, %esp
        subl    $8, %esp

; Load the effective address of EBP-6, i.e., nnote, into EAX, which
; is pushed for scanf. scanf will thus write its output on EBP-6 up to
; EBP-3, where EBP-4 and EBP-3 are part of the value that's been
; pushed in the "pushl %ecx" above.
        leal    -6(%ebp), %eax

        pushl   %eax
        pushl   $.LC1
        call    scanf

=2E..

; Restore the value at EBP-4 (i.e., the ECX that was pushed above) into
; ECX at function exit. This value has been corrupted by the integer
; assignment due to scanf.
        movl    -4(%ebp), %ecx

        leave

; Restore the stack pointer from the (invalidated) %ecx, i.e. produce a
; bogus stack pointer.
        leal    -4(%ecx), %esp

        ret

This produces a segfault, after the return to the C initialization code,=20
simply because the stack pointer is totally bogus.

> P.S.:
> I found that declaring nnote as int soleves my problem, but I couldn=C4t
> understand why.

Everything clear now? ;-)

=2D-=20
Heiko Wundram
Product & Application Development
=2D------------------------------------
Office Germany - EXPO PARK HANNOVER
=20
Beenic Networks GmbH
Mail=E4nder Stra=DFe 2
30539 Hannover
=20
=46on        +49 511 / 590 935 - 15
=46ax        +49 511 / 590 935 - 29
Mobil      +49 172 / 437 3 734
Mail       wundram@beenic.net


Beenic Networks GmbH
=2D------------------------------------
Sitz der Gesellschaft: Hannover
Gesch=E4ftsf=FChrer: Jorge Delgado
Registernummer: HRB 61869
Registergericht: Amtsgericht Hannover



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200710240058.29506.wundram>