From nobody Tue Oct 25 20:49:33 2022 X-Original-To: dev-commits-ports-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MxkZk1hhDz4fwjL; Tue, 25 Oct 2022 20:49:34 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MxkZk18pjz3ZW7; Tue, 25 Oct 2022 20:49:34 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666730974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=1yaBkZ0P8o9pnXRouliUgwFuCedGIJILUzBMfRWBTro=; b=VtFX60qbUULAORYg8pup/Azw6dqxv+lU7nqKNo9jsa2DI8MFx8e6EgC+KgFpeAp9IQN4pn NEopI0KbXupdW+omoRkTsrY6qro9UGV8fejjzP/36xLmnDyn4Ol9kiRDXDntVVe2JoZWCM 5I3TuJ4eGwaJZO0dfyqA2sksYjCLlcXYbEnzwJJPnPKkajSd6aMbG28U+XLS6SMSkNJ4eo IA6dKcBdhJVXOr8+qFrREpltQgdJg4F9kBc9REyCBH7mFr3rfoY7J9Yji4xRNmqzf+Ab/A I+32XAK6BS8UH2uuF1iQNpi6X5nZDCG/D3abJJRnZlBpYPn1QEybnWxE9NOO2A== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4MxkZj71VnzrqL; Tue, 25 Oct 2022 20:49:33 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 29PKnXJl091044; Tue, 25 Oct 2022 20:49:33 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 29PKnXhC091043; Tue, 25 Oct 2022 20:49:33 GMT (envelope-from git) Date: Tue, 25 Oct 2022 20:49:33 GMT Message-Id: <202210252049.29PKnXhC091043@gitrepo.freebsd.org> To: ports-committers@FreeBSD.org, dev-commits-ports-all@FreeBSD.org, dev-commits-ports-main@FreeBSD.org From: Li-Wen Hsu Subject: git: b6e6388dab6d - main - Add textproc/py-textract: Extract text from any document List-Id: Commit messages for all branches of the ports repository List-Archive: https://lists.freebsd.org/archives/dev-commits-ports-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-ports-all@freebsd.org X-BeenThere: dev-commits-ports-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: lwhsu X-Git-Repository: ports X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: b6e6388dab6dd78e37adebf738e568997db6d15a Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666730974; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=1yaBkZ0P8o9pnXRouliUgwFuCedGIJILUzBMfRWBTro=; b=F8QRxihEoD+1KsPgqsLCQZC4mihDsEACX9r1/GO1aZCeX3A/8s2AZpKPzd5c+tv8N7Eqqv fRrBSPxTRX0KkDDAprC55vJ3FQ/TQVR2MnlZ7ih5LJsTC3EvAU6BN9eptt8Sp+JRsEM7xv xr6vl+fNYyMDxMOnnOgATEChj/GpsSDm72KUACuTMTOIyZP96wB2wMrK9uFWvfUjcv0rQx NN6l1ZlY2+Cdl5uaYGo2JKkfJTw0tVkIXthz78ttRQqiXHdMnh8zxvrnG7jLeDKh60inm0 jTUg6oPrDw7ieOpNvbATPHzfe2KabforwMgR6ucntW3fkfd2FFkVsXDMjQfmhA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1666730974; a=rsa-sha256; cv=none; b=d+4fuBpstTzcd/uGKj3SmSxqn0JP7XaBDAoUWS9t7JtA9cOc9P4HBddXbjjcIuZQnwF0F8 MckywuGPVg2+P0hRD8kxsK7Jzc9HgvE2uxYzXjRn65XRR7EwXrV00dMkTHOxXDfFyTuJlP u/6ZDfs362BbS0NQtMpDtIHyFezM7qfEtWK6aqbjhOuQOCJ5ZVrw77AvU5LDxXoXdRroBG jxFXlrPv1soyB5/hsSmr2eEaqX7F8jOPWNNWCegkzEfQFrGgJS4qxrCWP/CDlwj1JNqwA9 EvZv30EEbd6Ohk6aBEuw00FEzsOpVN09LhJqwuPXoFettCZZ8cMWqZid5RjuWQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by lwhsu: URL: https://cgit.FreeBSD.org/ports/commit/?id=b6e6388dab6dd78e37adebf738e568997db6d15a commit b6e6388dab6dd78e37adebf738e568997db6d15a Author: Jesús Daniel Colmenares Oviedo AuthorDate: 2022-09-23 16:18:31 +0000 Commit: Li-Wen Hsu CommitDate: 2022-10-25 20:49:12 +0000 Add textproc/py-textract: Extract text from any document textract provides a single interface for extracting content embedded from Word documents, PowerPoint presentations, PDFs and much more, which can be used for further textual analysis and visualization. WWW: https://github.com/deanmalmgren/textract PR: 265768 --- textproc/Makefile | 1 + textproc/py-textract/Makefile | 69 ++++++++++++++++++++++++++++++++++++++++++ textproc/py-textract/distinfo | 3 ++ textproc/py-textract/pkg-descr | 3 ++ 4 files changed, 76 insertions(+) diff --git a/textproc/Makefile b/textproc/Makefile index 5b5097135fcb..8b52f2176b4f 100644 --- a/textproc/Makefile +++ b/textproc/Makefile @@ -1545,6 +1545,7 @@ SUBDIR += py-tablib SUBDIR += py-terminaltables SUBDIR += py-textdistance + SUBDIR += py-textract SUBDIR += py-textfsm SUBDIR += py-texttable SUBDIR += py-textual diff --git a/textproc/py-textract/Makefile b/textproc/py-textract/Makefile new file mode 100644 index 000000000000..a1a57ea56e62 --- /dev/null +++ b/textproc/py-textract/Makefile @@ -0,0 +1,69 @@ +PORTNAME= textract +PORTVERSION= 1.6.5 +CATEGORIES= textproc python +MASTER_SITES= CHEESESHOP +PKGNAMEPREFIX= ${PYTHON_PKGNAMEPREFIX} + +MAINTAINER= DtxdF@disroot.org +COMMENT= Extract text from any document +WWW= https://github.com/deanmalmgren/textract + +LICENSE= MIT +LICENSE_FILE= ${WRKSRC}/LICENSE + +RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}argcomplete>=1.10.0:devel/py-argcomplete@${PY_FLAVOR} \ + ${PYTHON_PKGNAMEPREFIX}chardet>=3:textproc/py-chardet@${PY_FLAVOR} \ + ${PYTHON_PKGNAMEPREFIX}six>1.12.0:devel/py-six@${PY_FLAVOR} + +USES= python:3.8+ +USE_PYTHON= autoplist distutils + +OPTIONS_DEFINE= ANTIWORD BEAUTIFULSOUP DOCX2TXT MSG LIBXML2 \ + LIBXSLT PPTX PS SPREADSHEET UNRTF +OPTIONS_DEFAULT= ANTIWORD BEAUTIFULSOUP DOCX2TXT FFMPEG FLAC JPEG_TURBO \ + LAME LIBXML2 LIBXSLT MSG PDFTOTEXT PPTX PS SOX \ + SPEECH_RECOGNITION SPREADSHEET TESSERACT UNRTF +OPTIONS_GROUP= AUDIO OCR PDF RTF +OPTIONS_GROUP_AUDIO= FFMPEG FLAC LAME POCKETSPHINX SOX SPEECH_RECOGNITION +OPTIONS_GROUP_OCR= JPEG_TURBO TESSERACT +OPTIONS_GROUP_PDF= PDFMINER PDFTOTEXT + +ANTIWORD_DESC= DOC document support +BEAUTIFULSOUP_DESC= HTML parsing library +DOCX2TXT_DESC= DOCX document support +JPEG_TURBO_DESC= SIMD-accelerated JPEG codec +LIBXML2_DESC= Python interface for XML parser library +LIBXSLT_DESC= XML stylesheet transformation library +MSG_DESC= MS Outlook MSG file format support +PDFMINER_DESC= PDF parser and analyzer +PDFTOTEXT_DESC= Extract text from a PDF document +POCKETSPHINX_DESC= Interface to CMU Sphinxbase and Pocketsphinx +PPTX_DESC= MS PowerPoint PPTX presentations support +SOX_DESC= Command-line audio processing tool +SPEECH_RECOGNITION_DESC= Python library for performing speech recognition +SPREADSHEET_DESC= XLS and XLSX spreadsheet support +TESSERACT_DESC= Commercial quality open source OCR engine +UNRTF_DESC= RTF document support + +ANTIWORD_RUN_DEPENDS= antiword>0:textproc/antiword +BEAUTIFULSOUP_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}beautifulsoup>=4.8.0:www/py-beautifulsoup@${PY_FLAVOR} +DOCX2TXT_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}docx2txt>=0.8:textproc/py-docx2txt@${PY_FLAVOR} +FFMPEG_RUN_DEPENDS= ffmpeg>0:multimedia/ffmpeg +FLAC_RUN_DEPENDS= flac>0:audio/flac +JPEG_TURBO_RUN_DEPENDS= jpeg-turbo>0:graphics/jpeg-turbo +LAME_RUN_DEPENDS= lame>0:audio/lame +LIBXML2_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}libxml2>0:textproc/py-libxml2@${PY_FLAVOR} +LIBXSLT_RUN_DEPENDS= libxslt>=1.1.15:textproc/libxslt +MSG_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}extract-msg>=0.29:textproc/py-extract-msg@${PY_FLAVOR} +PDFMINER_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}pdfminer.six>=20191110:textproc/py-pdfminer.six@${PY_FLAVOR} +PDFTOTEXT_RUN_DEPENDS= poppler-utils>0:graphics/poppler-utils +POCKETSPHINX_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}pocketsphinx>0:audio/py-pocketsphinx@${PY_FLAVOR} +PPTX_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}python-pptx>=0.6.18:textproc/py-python-pptx@${PY_FLAVOR} +PS_RUN_DEPENDS= pstotext>0:print/pstotext +SOX_RUN_DEPENDS= sox>0:audio/sox +SPEECH_RECOGNITION_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}SpeechRecognition>=3.8.1:audio/py-speechrecognition@${PY_FLAVOR} +SPREADSHEET_RUN_DEPENDS= ${PYTHON_PKGNAMEPREFIX}xlrd>=1.2.0:textproc/py-xlrd@${PY_FLAVOR} +TESSERACT_RUN_DEPENDS= tesseract>0:graphics/tesseract +UNRTF_RUN_DEPENDS= unrtf>0:textproc/unrtf + +.include diff --git a/textproc/py-textract/distinfo b/textproc/py-textract/distinfo new file mode 100644 index 000000000000..14f25b8e65e4 --- /dev/null +++ b/textproc/py-textract/distinfo @@ -0,0 +1,3 @@ +TIMESTAMP = 1659835075 +SHA256 (textract-1.6.5.tar.gz) = 68f0f09056885821e6c43d8538987518daa94057c306679f2857cc5ee66ad850 +SIZE (textract-1.6.5.tar.gz) = 17871 diff --git a/textproc/py-textract/pkg-descr b/textproc/py-textract/pkg-descr new file mode 100644 index 000000000000..7d4986c9d8cb --- /dev/null +++ b/textproc/py-textract/pkg-descr @@ -0,0 +1,3 @@ +textract provides a single interface for extracting content embedded +from Word documents, PowerPoint presentations, PDFs and much more, +which can be used for further textual analysis and visualization.