From owner-freebsd-questions@FreeBSD.ORG Fri Aug 13 19:18:27 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D333A1065670 for ; Fri, 13 Aug 2010 19:18:27 +0000 (UTC) (envelope-from prvs=083438c6c1=johnl@iecc.com) Received: from gal.iecc.com (gal.iecc.com [64.57.183.53]) by mx1.freebsd.org (Postfix) with ESMTP id 6419C8FC15 for ; Fri, 13 Aug 2010 19:18:27 +0000 (UTC) Received: (qmail 21467 invoked from network); 13 Aug 2010 18:51:46 -0000 Received: from mail1.iecc.com (64.57.183.56) by mail1.iecc.com with QMQP; 13 Aug 2010 18:51:46 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:subject:in-reply-to:cc:mime-version:content-type:content-transfer-encoding; s=k1008; olt=johnl@user.iecc.com; bh=fmNI9ri6CY33CpX1oFQ4x9QHDFOHWEwgWTXR8dFzEDo=; b=aZ5R85xEC3nVQEafetoacpOwbz+ZH0ubh1pScvMWPZkZi7K3hJ9KbpQbBywWOcUEX5AC4Y5B2iNtqkrr9TlKgJPbab4IUHxT5vYwM0dev27aHGtg3AgRO/0m9xANAAMntCzzhd1QdS2CHrXr1Taus2IEjkFBkDDURAxrqnLEriE= Date: 13 Aug 2010 18:51:46 -0000 Message-ID: <20100813185146.2645.qmail@joyce.lan> From: John Levine To: freebsd-questions@freebsd.org In-Reply-To: <201008131601.34182.j.mckeown@ru.ac.za> Organization: X-Headerized: yes Mime-Version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7bit Cc: j.mckeown@ru.ac.za Subject: Re: Grepping a list of words X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Aug 2010 19:18:27 -0000 >> Since I will have a need to run this check frequently, any suggestions for >> a better approach are welcome. > >sort -u and comm(1)? sort is O(N log N) while grep is O(N) Which is faster depends on the constant factors in each, but as the data sets get bigger, the log N term will dominate. That is, for small sets of data, I don't know which will be faster, but either will be fast enough so who cares. For large sets of data, the sort will be slow. R's, John