Date: Tue, 12 Aug 2014 17:07:56 +0200 From: Daniel Lovasko <daniel.lovasko@gmail.com> To: soc-status@freebsd.org Subject: libctf & ddb Message-ID: <CA%2BcSnN0oG-YA2b%2B5c0-AEsU2nU2%2BWGL9DBrq2vppfOEY0Z6R8Q@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi all, I am _very_ sorry that I forgot to write these weekly reports for few past weeks and I would like to make this right, so here is the work that I have done: I introduced a public API that is created by C macros so that the generic get_something() function is written only once and not N times. All API functions are returning an integer that is one of the CTF_E_* constants or the CTF_OK. If a function is intended to provide some value, it is always transferred by the last argument. All types like ctf_typedef, ctf_enum_entry, ctf_float, ..., are typedefs of appropriate structs and are intended to be used as opaque types. This provides us implementation freedom to the future (we may decide that names are not as char*, but rather some index to a string table or such). All programs that are using this libctf (ctfdump, ctfstats, and ctfquery (see below)) are now using this API. The library is pollution free, as all non-static functions that exist are with the _ctf prefix to ensure the zero-to-none possibility of a name collision. Currently almost every data structure and its member, algorithm and function has its documentation that is written manually. The ctfdump program that serves very similar purpose as the CDDL licensed one. I had a nice idea, that maybe the project "Machine-readable utilities" could provide (with some help, of course) its multi-format support. I have seen some repositories [1] that tried to do this, so there may be an audience for such feature. Also showing the ability to cooperate is very nice indeed. The ctfstats program that computes and emits CTF data statistics (similar to old ctfdump -S). The ctfquery program that serves as an intermediate implementation before putting this code into the DDB. The program takes an input, a type name, and looks it up in the type database. After successful match, it presents the data type in appropriate manner - for typedef it solves the possible typedef chain to the basic type, for struct it prints all its members (and if a member is another struct, it prints it with additional indentation). The only thing missing is the memory access that would make this completely DDB-like. But almost all of this code is usable in the DDB. The type lookup based on the name is using very naive/simple approach - O(n) traversal of all types while reporting success on the first hit. I have done no real benchmarking, since the search appears instantaneous (I even tried simulating some high workload and the cycle run just as swiftly). In case that this is not good enough, there are several possible improvements algorithm-wise - building a trie that would speed the algorithm to O(longest type name) and then making buckets of types - struct bucket, union bucket, enum bucket - so when the user requests e.g. "struct dpt_cbb", we can safely make the enquiry in the struct bucket only. This can be, obviously, applied to all kinds of data structures - simple linked lists or tries. The search works in situations, when the user omits the struct/union/enum part too. One thing that needs separate attention is the ctfquery feature to guess the data structure type - linked list, binary tree, n-ary tree, all the queue(3) and tree(3) data types. The struct in question is tested for presence of all these members (for example, being an queue(3) SLIST means that the struct contains an anonymous struct that has only one member, pointer to the parent-struct and the member has to be named "sle_next". This was maybe the most enjoyable coding from this project so far. The usefulness of this feature is, that after we discover that a struct is a linked list, we are able to print it more intelligently (see my proposal for this). Visualisation of other data structures will not be done in this project, but I am open for future suggestions (but, visualising a red-black tree on a 80x25 terminal might be ... well, challenging). Type to string conversion: if the CTF data in question is for example pointer -> const -> struct dpt_ccb, it gets resolved to "const dpt_ccb*". This is used in the ctfdump and ctfquery programs. There are still some crazy scenarios that need to be taken care of but the majority of the types is converted correctly. The libctf undergone some linting and valgrinding, which discovered some nasty hidden memleaks and potential bugs that are now fixed. One of the bugs took me 4 days to fix - improper handling of large struct members - the mistake was hidden under three layers of logic and I must admit that I was pretty happy after I finally found and fixed it. DDB code to parse arguments of the command (this was a bit tricky thanks to lack of documentation and weird naming). Right now I am fighting a huge problem: while writing the proposal during the last winter, I was able to use linker_ctf_get function to obtain the CTF data of the kernel file in the kernel space. Unfortunately, the same code on the same installation (and on a clean 10.0 and 9.2 installations) crashed very badly and the problem seems to be the vn_open call in the linker_elf_ctf_get in the /usr/sys/kern/kern_ctf.c file. I tried to call the vn_open function directly in my modified DDB code and it crashes too - the exact call looks like this (it is taken directly from the kern_ctf.c file): [2] I am looking forward to any ideas about this problem :) My plans for the next few days: I need to adapt the libctf allocation routines to work in kernel space too, therefore I need to #ifdef the usage of all malloc(3)s with malloc(9)s and some minor changes in strdup, strcpy and such. This should not pose any problem. There is no need for the zlib to be used in the kernel-space-version of the library, as the linker_ctf_get() function returns already unzipped CTF data. Small changes need to be done in the libctf loading code, because right now, we are able to only get file name and read all the ELF sections by ourselves, but the linker_ctf_get() function already does this step, so we can omit this too. To summarize, next baby-step is to be able to print all CTF types inside the DDB and then just copy/paste the ctfquery code and add some usability/user experience functionality like DDB modifiers for hexadecimal output and such. Maybe I forgot some things or details, so if I think of some more additions later, I will write them here. Again, I am sorry for the delay, please do not get an impression of some lazy attitude or that I have not been working on the project. Best, Daniel [1] https://github.com/rmustacc/ctf2json [2] http://pastebin.com/gxG55vHn
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BcSnN0oG-YA2b%2B5c0-AEsU2nU2%2BWGL9DBrq2vppfOEY0Z6R8Q>