From owner-soc-status@FreeBSD.ORG Mon Jun 25 20:49:29 2012 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E3FF51065670 for ; Mon, 25 Jun 2012 20:49:28 +0000 (UTC) (envelope-from jesse.hagewood@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 79A0D8FC16 for ; Mon, 25 Jun 2012 20:49:28 +0000 (UTC) Received: by werg1 with SMTP id g1so4032490wer.13 for ; Mon, 25 Jun 2012 13:49:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=PqBfXsILxpzYsJZ3YR6ZrDKWx4WxYEo3jF3iHUM5oyI=; b=Zt/IkwD7CUWD7QimPMna7RBDmsIBzJOIQP1ner76K/LGhhRKKiUfztB8l7qAMyZ/qI Okmbk+pbyxVBZZQFGPHlJcHTYxqKBdua5X8NYJ39UK87PMYgRHeqRI0z0224dcAVH8tW HZwGwMYuQlLgHP3DiMrMR1Vl3Dsd5YFrpywl1w050ayjZGTAZgXvhludXzgf1gj8r0EV 7CrAcTgDKxOk66W4eQizqec3+pImWfN1z2Vlb8pKuMKU5i0HZWK1l57LyI9Rki/ZczWZ RztOb5vopRz6VoyYgwaYo37a5adfhc//YMrKuSyeW8OEgtb3Sd3OI89YcqBxAkaSeesc 5Jew== MIME-Version: 1.0 Received: by 10.180.94.4 with SMTP id cy4mr27641834wib.2.1340657367446; Mon, 25 Jun 2012 13:49:27 -0700 (PDT) Received: by 10.216.200.87 with HTTP; Mon, 25 Jun 2012 13:49:27 -0700 (PDT) Date: Mon, 25 Jun 2012 16:49:27 -0400 Message-ID: From: Jesse Hagewood To: soc-status@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Improving BSD licensed text-processing tools X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jun 2012 20:49:29 -0000 Progress this week: - Diff's context, unified, and normal formats seem to be completely GNU compatible now. Most of it was timestamp issues, a little bit of it was output diff would give when running across binary files or directories. - The bug I found that involved input files over a few hundred bytes turned out to not be about size. It actually occurred because BSD diff would search the input file for any non-ASCII characters, and if it found any at all in the file, would consider the file a binary file. GNU diff doesn't do that. This means that any text file with Unicode characters would be considered a binary file. My fix for this is problem is to instead check the first few bytes of the file to see if it is an ELF format file, and if so, assumes the file is a text file. - Lots of code clean-up with diff. There were lots of uses of putchar(), puts() and other output functions like that in diffreg.c, and i substituted all of them with printf(), also fixed a lot of style things. Not really finished in this respect, though. - Put together a test script for diff. - Studied the --ignore-*-* options, I've found that the ones that were previously implemented don't work correctly. For example, in ignore-blank-lines' output, the line in the diff dealing with the blank lines is followed by a 'o' character. - Did a write-up for man/mdoc macros on my wiki. Currently I've described the specific source files involved with implementing macros, and I will add more information soon. Here's my to-do list for diff: https://socsvn.freebsd.org/socsvn/soc2012/jhagewood/diff/TODO