From owner-freebsd-questions@FreeBSD.ORG Thu Oct 29 15:39:14 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EBA31106566B for ; Thu, 29 Oct 2009 15:39:14 +0000 (UTC) (envelope-from martin@dc.cis.okstate.edu) Received: from dc.cis.okstate.edu (dc.cis.okstate.edu [139.78.103.93]) by mx1.freebsd.org (Postfix) with ESMTP id C26EF8FC20 for ; Thu, 29 Oct 2009 15:39:14 +0000 (UTC) Received: from dc.cis.okstate.edu (localhost.cis.okstate.edu [127.0.0.1]) by dc.cis.okstate.edu (8.14.2/8.13.8) with ESMTP id n9TFcuKB078966 for ; Thu, 29 Oct 2009 10:39:07 -0500 (CDT) (envelope-from martin@dc.cis.okstate.edu) Message-Id: <200910291539.n9TFcuKB078966@dc.cis.okstate.edu> To: freebsd-questions@freebsd.org Date: Thu, 29 Oct 2009 10:38:56 -0500 From: Martin McCormick Subject: Merging Related Information from 2 Tables X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Oct 2009 15:39:15 -0000 This is probably going to be a hashing exercise but I am checking to see if any of the building blocks needed are already out there. The problem is simple to describe in that there are 2 tables. One is a DNS zone transfer table of all the A or Address records in a given zone or from several zones for that matter. the other table is from the same zones and consists of text or TXT records. The only thing the 2 tables have in common is that some of the TXT records share the exact same name field as the A records so we should be able to display the important contents of the A and TXT records on the same line if their names match. The challenge is to do this quickly so some sort of hash function is needed to locate A and TXT records having the same name. Grep does this beautifully for single entries across multiple files, but I need to merge the text part of the TXT record with the IP address and host name from the A record with the same name. The only hard part is finding the quickest way to match the roughly 25,000 host names in the A records with around half as many TXT records. This is basically a bucket list problem in which we can either have an A record name in a bucket by itself or an A record in a given bucket and a TXT record in another bucket with the same name as the A record. In the interest of standing on the shoulders of giants, I am checking to see how much tried and tested tools already exist and how much needs to be home-grown. It is also possible to use egrep to search for A and TXT records in 1 pass through a file in which case one would search from the same file for both record types but the problem is the same. In case anybody wonders: egrep '([[:space:]]IN([[:space:]]TXT[[:space:]]|[[:space:]]A[[:space:]]))' okstate.zone >ATXT.txt The line break here is for Email consideration. The above command should all be on one line. Thanks for any suggestions. Martin McCormick WB5AGZ Stillwater, OK Systems Engineer OSU Information Technology Department Telecommunications Services Group