From owner-freebsd-questions@FreeBSD.ORG Tue Aug 9 13:36:34 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DD17B106564A for ; Tue, 9 Aug 2011 13:36:34 +0000 (UTC) (envelope-from mexas@bristol.ac.uk) Received: from dirj.bris.ac.uk (dirj.bris.ac.uk [137.222.10.78]) by mx1.freebsd.org (Postfix) with ESMTP id 98DAB8FC1A for ; Tue, 9 Aug 2011 13:36:34 +0000 (UTC) Received: from ncsc.bris.ac.uk ([137.222.10.41]) by dirj.bris.ac.uk with esmtp (Exim 4.72) (envelope-from ) id 1QqmTx-0002lO-3h for freebsd-questions@freebsd.org; Tue, 09 Aug 2011 14:36:33 +0100 Received: from mech-cluster241.men.bris.ac.uk ([137.222.187.241]) by ncsc.bris.ac.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1QqmTw-0005G1-Vh for freebsd-questions@freebsd.org; Tue, 09 Aug 2011 14:36:33 +0100 Received: from mech-cluster241.men.bris.ac.uk (localhost [127.0.0.1]) by mech-cluster241.men.bris.ac.uk (8.14.4/8.14.4) with ESMTP id p79DaWxK038816 for ; Tue, 9 Aug 2011 14:36:32 +0100 (BST) (envelope-from mexas@bristol.ac.uk) Received: (from mexas@localhost) by mech-cluster241.men.bris.ac.uk (8.14.4/8.14.4/Submit) id p79DaW5g038815 for freebsd-questions@freebsd.org; Tue, 9 Aug 2011 14:36:32 +0100 (BST) (envelope-from mexas@bristol.ac.uk) X-Authentication-Warning: mech-cluster241.men.bris.ac.uk: mexas set sender to mexas@bristol.ac.uk using -f Date: Tue, 9 Aug 2011 14:36:32 +0100 From: Anton Shterenlikht To: freebsd-questions@freebsd.org Message-ID: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> Mail-Followup-To: freebsd-questions@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: extracting text from docx files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2011 13:36:34 -0000 I often receive information in *.docx format from my MS using colleagues. Sometimes I can ask for a pdf (or similar) instead, but not always. Usually I unzip a docx and then search through all *xml files to find the useful data. However, I can't find any xml styles to use, so I have to convert the relevant xml file(s) to plain text by hand. I wonder if anybody can suggest a better way. Perhaps there's something in ports that can help. Many thanks Anton -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423