Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Sep 2009 15:27:00 +0200
From:      Gabor Kovesdan <gabor@FreeBSD.org>
To:        hackers@FreeBSD.ORG
Cc:        Roman Divacky <rdivacky@FreeBSD.org>
Subject:   BSDL texttools status and further thoughts...
Message-ID:  <4ABF6824.9090601@FreeBSD.org>

next in thread | raw e-mail | index | archive | help
Hello all,

recently, I've had a discussion with rdivacky@ about the status of these 
tools. It's about bc, dc, grep, sort and iconv. He has persuaded me to 
write a summary here in case someone else is interested in contributing 
to these tools. So here I come with a little summary.

BSD bc/dc will come just after 8.0-RELEASE. They are quite mature and 
delphij@ offered to help me getting this into the three by reviewing and 
approving my changes (I only have doc/ports bit).

BSD grep is also quite mature, I've fixed the last critical bug 
recently. My only concern is the performance. GNU is fast but has ~8 
KSLOC. BSD grep is slightly slower but has only ~1.5 KSLOC. It's a huge 
difference in complexity and GNU grep is very hard to read but they use 
a lot of custom optimizations to get this performance. I think we should 
go another way and have a well-optimized and mature regex library. The 
current one is very old and doesn't have wchar support, it's slow like 
hell and doesn't support custom GNU bullshit, which is unfortunately 
necessary to maintain compatiblity. (e.g. "(a|)" is considered invalid 
in strict POSIX regex but GNU accepts it!) Because of this, BSD grep is 
linked to the GNU regex library at the moment but because of the custom 
magic in grep it's still slower a bit. If we can live with this slight 
performance hit, we can commit it, I think because it's quite 
feature-complete. You know, I'm a beginner but I think that the code of 
BSD grep is so tiny and simple that there are almost absolutely no ways 
to optimize it more by simplifying the code, so I think further 
optimization should be done in the regex library. As for the regex 
library, NetBSD's SoC project is worth a look. I'm interested in this 
but I have too much things in the queue to start another one...

As for sort, it isn't so mature yet. I've just made a TODO list of the 
known missing features or bugs:
- sometimes it segfaults when reading huge files
- the -k option isn't implemented yet
- the -n option doesn't work correctly
- preproc() optimization (I don't what it refers to actually but I had 
it on my previous TODO list, will have to check)
- polishing man page
- adding some more test cases to the regression test
- checking performance (in this case, it really matters because sorting 
is an algorithmic piece of cake and I'm not an algorithmic guru... And 
this version of sort was written by me from scratch. The OpenBSD-one 
isn't wchar-clean and can't be fixed by design.  This sort is much more 
tiny but it seems the algorithm isn't optimal.)

As for iconv, I'll keep working on it in my BSc thesis. The forward (foo 
-> utf32) conversions are almost completely GNU-compatible, the reverse 
ones not so much. GNU has an optional transliteration, while BSD iconv 
uses it at default so I compared the output to GNU's transliterated 
output and it has some more advanced mappings to do this. Apart from 
this, almost all encodings are supported, that we have in locale(1) 
charmaps but the Big5 module segfaults. I hope I'll be able to solve 
these issues and check performance as part of my BSc thesis.

Regards,

-- 
Gabor Kovesdan
FreeBSD Volunteer

EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org
WEB:   http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ABF6824.9090601>