From owner-freebsd-hackers Thu Oct 14 20: 4: 0 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from palrel3.hp.com (palrel3.hp.com [156.153.255.226]) by hub.freebsd.org (Postfix) with ESMTP id 90CDE14CF0 for ; Thu, 14 Oct 1999 20:03:52 -0700 (PDT) (envelope-from darrylo@sr.hp.com) Received: from postal.sr.hp.com (root@postal.sr.hp.com [15.4.46.173]) by palrel3.hp.com (8.8.6 (PHNE_17135)/8.8.5tis) with ESMTP id UAA17972 for ; Thu, 14 Oct 1999 20:03:39 -0700 (PDT) Received: from mina.sr.hp.com (root@mina.sr.hp.com [15.4.42.247]) by postal.sr.hp.com with ESMTP (8.8.6 (PHNE_17190)/8.7.3 TIS 5.0) id UAA23419 for ; Thu, 14 Oct 1999 20:03:28 -0700 (PDT) Received: from localhost (darrylo@mina.sr.hp.com [15.4.42.247]) by mina.sr.hp.com with ESMTP (8.8.6 (PHNE_17135)/8.7.3 TIS 5.0) id UAA04572 for ; Thu, 14 Oct 1999 20:03:12 -0700 (PDT) Message-Id: <199910150303.UAA04572@mina.sr.hp.com> To: freebsd-hackers@FreeBSD.ORG Subject: Re: Search a symbol in the source tree Reply-To: Darryl Okahata In-reply-to: Your message of "Thu, 14 Oct 1999 07:38:21 +1000." Mime-Version: 1.0 (generated by tm-edit 7.108) Content-Type: text/plain; charset=US-ASCII Date: Thu, 14 Oct 1999 20:03:11 -0700 From: Darryl Okahata Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Peter Jeremy wrote: > I use id-utils (/usr/ports/devel/id-utils). It builds a single database > file and has a variety of tools (including e-lisp) to search the database. > > Since global(1) was mentioned in this threaad, I decided to have a look > at it. It seems much slower and my sample (samba-2.0.5a) database was > nearly 20 times larger. Well, as a longtime-user of mkid, mkid2, and mkid3 (the predecessors to id-utils), here are some comments on the various packages: [ Note: in the following, I'm not quite comparing apples and apples. However, I'm too lazy to do a strict comparison, but this should still give people a vague idea of each package's performance. Take the following as you will, with a grain of salt. ] * As a baseline, let's look at plain grep. First, generate a list of files to search (this assumes that we don't want to look through all files, including Makefiles, man pages, etc.): cd /usr/src find * -type f | time grep '\.[chsSly][cxp]*$' > /tmp/foo Now, on my system (-current from Aug. 21, a PII 300MHz w/128MB & a F/W SCSI disk), this takes around 50 seconds (real time): xargs grep ptrace < /tmp/foo Not too bad, but not great, either. Let's try looking for utmp.h: xargs grep 'utmp\.h' < /tmp/foo This takes around a minute. Now, let's look at "grep -R": cd /usr/src grep -R ptrace . # 2 minutes 42 seconds grep -R 'utmp\.h' . # 2 minutes 40 seconds In other words, with grep, you need to limit your searches. Also, "grep -R" doesn't work very well if you also happen to have glimpse, global, or mkid/id-utils indices under /usr/src. * Global is OK (does not appear to support C++, though), but generates HUGE databases (by default). For /usr/src, the databases are around as large as the total size of the indexed source files (the gtags "-c" option was not used). Indexing is slow, but searching seems to be quite fast. In particular, "global -x name" is nice, because it just return where "name" is defined, as opposed to a plain grep which can also return matches on "fooname" and "namebar", as well as where "name" is used. However, global appears to be optimized for locating where a function is defined. It appears to be difficult to locate, for example, where a preprocessor macro is defined; except for "global -g" (which is often too slow to be usable), I haven't found a way of getting global to search through .h header files. On my system, indexing /usr/src took around an hour, and the indices took up around 240MB+ (this was with "gtags" and not "gtags -c"). This is 20+ times larger than a glimpse or mkid/id-utils database. It's interesting to note that "global -x -g ptrace" takes around twice as long to execute (over two minutes), compared to plain grep. However, "global -x -s ptrace" is very fast (under 1 second). Searching for ptrace generates two (2) lines of output, in well under one second: global -x ptrace as do these: global -x -s ptrace global -x -s uap Looking for where "utmp.h" is used: global -x -s utmp.h This takes more than 2212 seconds (over 36 minutes!), and outputs nothing. Well, let's try this instead: global -x -g utmp.h This works, taking a bit over a minute and a half. However, plain grep is faster (note that, as global searches through source files only, you have to compare it to the source-file-only grep, and not "grep -R"). However, looking for the definition of a preprocessor macro is a pain. Try looking for KBD_DATA_PORT: global -x KBD_DATA_PORT This runs quickly, but displays nothing. Next, try: global -x -s KBD_DATA_PORT This runs quickly, and shows where this is used in .c source files. However, where's the definition? It's not shown. This works: global -x -g KBD_DATA_PORT However, this takes around two minutes to run, which is much slower than a plain grep. * Glimpse is a general-purpose text indexer which can be used to index source files. It's basically an intelligent grep, but it works quite well. Unlike mkid, you can search through comments and non-source files (like Makefiles, man pages, README's, etc.). On my system, indexing /usr/src took around 6 minutes (using the "-M 20" option), and the indices took up around 10MB. On my system, searching for ptrace took 35 seconds, with 505 lines of output (ChangeLogs, man pages, etc. account for the extra lines): glimpse -w ptrace Searching for uap takes around 21 seconds: glimpse -w uap Looking for utmp.h: glimpse -y -w utmp.h This takes a bit over 45 seconds. However, glimpse searched through (and displayed hits in) non-source files, like configure, configure.in, Makefiles, etc.. It is possible to have glimpse exclude certain files and index only those files you want indexed. However, I don't have the time to configure and test this. Perhaps someone else will do this. * Mkid/mkid2/mkid3/id-utils appear to generate the smallest index databases, and they run quickly. They're great for looking up where a particular identifier is used (e.g., "gid ptrace", which is an intelligent grep), but it can't just tell you where something is defined, and only that place. The place where something is defined is output along with every place that it's used. You're basically doing a very intelligent grep. However, grep'ing via gid is *MUCH* faster than "global -g" (it's like 100X faster); on the other hand, "global -s" is often comparable to gid. Mkid and friends can also (supposedly, as I've never tried it) tell you where a number occurs, in any base. If you know the number 100 is somewhere in your source code, mkid can show you where it occurs, as "100" (decimal), "64" (hex)", or "144" (octal). Only source files are indexed, as mkid & friends only know about certain languages (C, C++, & assembly being a few). Also, comments aren't indexed, although gid will display hits in comments (because the file being grep'd contains a hit in a non-comment line). However, the "id-utils-3.2" package for -current dumps core when used to index /usr/src. I don't have the time to track this down. On my system, indexing /usr/src using mkid3 took a bit over 2 minutes, and the indices took up around 9.1MB. The index was built using: find . -type f | grep '\.[chsSly][cxp]*$' | time mkid - (Note: id-utils is further broken, since it cannot take the list of files to index from stdin or a file -- this example is for mkid3.) Both glimpse and global index more files by default (in the case of glimpse, Makefiles, CVS/Root, CVS/Repository, COPYRIGHT files, etc. were indexed). It's VERY fast. On my system, searching for ptrace takes under 0.5 sec.: gid ptrace Yup, that's under one-half second, with 195 lines of output. Let's try looking for where "utmp.h" is used: gid utmp.h This takes around 2.5 seconds. ***** Bottom line: For general-purpose use, mkid and friends is best, as long as you don't need to search through comments or non-source files (Makefiles, README's, etc.). The database index is reasonably small, the indexing time is relatively quick, and the search times are often comparable to or better than those of global. However, mkid and friends can't just tell you where something is defined; they can only show where it is defined and used. If you need to search through comments, or need to search non-source files, glimpse is good. The index is larger than that of mkid/id-utils, and the search speed is decent, but not great. For many searches, it's faster than plain grep, although it can be comparable to grep in some cases. I've got mixed feelings about global. On the one hand, you can't beat it for locating where a function is defined, and it's very good at showing where a variable is used. However, for best results, you have to remember to use different options when searching for function definitions, identifier usage, preprocessor definitions, etc., and you may still have to resort to doing a full grep because, for some searches, global is too slow. The indices for global are HUGE, and indexing takes much longer than other approaches. I'm surprised that global is part of the base distribution, instead of being a port. -- Darryl Okahata darrylo@sr.hp.com DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion, or policy of Hewlett-Packard, or of the little green men that have been following him all day. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message