From owner-freebsd-questions@freebsd.org Sat Jan 23 10:14:46 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 34D5C4EF44B for ; Sat, 23 Jan 2021 10:14:46 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mout.kundenserver.de (mout.kundenserver.de [217.72.192.75]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mout.kundenserver.de", Issuer "TeleSec ServerPass Class 2 CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DNBnc747Jz3ngG for ; Sat, 23 Jan 2021 10:14:44 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r56.edvax.de ([178.5.224.154]) by mrelayeu.kundenserver.de (mreue106 [212.227.15.183]) with ESMTPA (Nemesis) id 1Mzy6q-1lym0N3x8I-00x44Q; Sat, 23 Jan 2021 11:14:42 +0100 Date: Sat, 23 Jan 2021 11:14:41 +0100 From: Polytropon To: "Steve O'Hara-Smith" Cc: freebsd-questions@freebsd.org Subject: Re: Convert PDF to Excel Message-Id: <20210123111441.b8c5de4e.freebsd@edvax.de> In-Reply-To: <20210123090421.7fb3ede1754fe280b685f83c@sohara.org> References: <20210123054209.f03ac420.freebsd@edvax.de> <20210123094041.f932fd4c.freebsd@edvax.de> <20210123090421.7fb3ede1754fe280b685f83c@sohara.org> Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K1:fFuuVuRoqE2z7QCh/foh5vPCu94SUqMNJ+nx8IR7SLcGW51zDvW hONg91rN35Saqv1yXHCX+GyWvTsClWBYWqAPGuG42qMYjCXu/dSkKkuBfQyoQL/6VQlq9WY CReJB9WiE+8Mn+ucaKNvdlcYFvRdPgbWqMWsBJ+g6wq+SEg8N34+pdqHpnUpLrxgk+LX3ln CWy9f33ia4HOtL13XRExw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:Hx/FGNKTnEQ=:adTvklzCDTiTYjdN0RrzGs 2GD28/Rx8WV5mYTPAfcyrikoRBKgwejkbkaZaRV7H8d836ZeIayKfdoOTEe96pNjaUiOzHj3W 3lDpDUduY7vBWToD0MudryJErSZ9yx+X4ePloz3X6iuc1xwCOeIAzyjKPt5fX6iLRCTjpSSmk HH0jS31DA4M1syirqM8V87Fj8IRkC6BwLFwo33IHpd6E0w/qlSzxwxM9uElfe6XKfxwRymnTk 9o3MCU0yPNXhJXdFGbSqhlEEOqRZ/0/b4eaa1xoV1Mm0XLQYdjDIkQF9eSEUI5qMbX5/suHy3 Qhy1bUXk+I+WmGTEpNvCtzs1cP/cug5caePD4RNvGwkrT4EJodmVfPSOBMi9ysOmzCNMwNNTT Z4NjB6vokg0IszYRBA1nP9/Ft4SvTOwp3Bi5j4+arpXeUtSx4gCRhqg7y5rrC X-Rspamd-Queue-Id: 4DNBnc747Jz3ngG X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of freebsd@edvax.de has no SPF policy when checking 217.72.192.75) smtp.mailfrom=freebsd@edvax.de X-Spamd-Result: default: False [-0.60 / 15.00]; HAS_REPLYTO(0.00)[freebsd@edvax.de]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; MV_CASE(0.50)[]; HAS_ORG_HEADER(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; RECEIVED_SPAMHAUS_PBL(0.00)[178.5.224.154:received]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[217.72.192.75:from]; ASN(0.00)[asn:8560, ipnet:217.72.192.0/20, country:DE]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; REPLYTO_EQ_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[edvax.de]; AUTH_NA(1.00)[]; SPAMHAUS_ZRD(0.00)[217.72.192.75:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_CONTAINS_FROM(1.00)[]; RCVD_IN_DNSWL_NONE(0.00)[217.72.192.75:from]; R_SPF_NA(0.00)[no SPF record]; RWL_MAILSPIKE_POSSIBLE(0.00)[217.72.192.75:from]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jan 2021 10:14:46 -0000 On Sat, 23 Jan 2021 09:04:21 +0000, Steve O'Hara-Smith wrote: > On Sat, 23 Jan 2021 09:40:41 +0100 > Polytropon wrote: > > > They contain text, so the OCR problem is out of the way. > > Sadly, the text is re-arranged so the optimal solution (one > > line in a table equals one line of text, with the columns > > being separated by whitespace) does not appear, instead it > > is the other way round: one line equals one column. > > I spy a fun interview question buried in this problem - flipping a > text file like that efficiently is far from easy - dead easy if you > don't mind eating memory of course. The lesson to learn for this potential interview question simply is RTFM; from "man pdftotext": -layout will try its best to preserve the original display in the raw output. So data that is in lines, but arranged to columns, will then be output as columns; each "dataset" is one line. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...