Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Dec 2024 02:09:50 GMT
From:      Wen Heping <wen@FreeBSD.org>
To:        ports-committers@FreeBSD.org, dev-commits-ports-all@FreeBSD.org, dev-commits-ports-main@FreeBSD.org
Subject:   git: 89d55115a4c0 - main - converters/py-markitdown: New port
Message-ID:  <202412200209.4BK29oeC092787@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch main has been updated by wen:

URL: https://cgit.FreeBSD.org/ports/commit/?id=89d55115a4c0f52bbeb08ea5f5899d6e6b62fa1b

commit 89d55115a4c0f52bbeb08ea5f5899d6e6b62fa1b
Author:     Wen Heping <wen@FreeBSD.org>
AuthorDate: 2024-12-20 02:01:25 +0000
Commit:     Wen Heping <wen@FreeBSD.org>
CommitDate: 2024-12-20 02:09:15 +0000

    converters/py-markitdown: New port
    
    MarkItDown library is a utility tool for converting various files to Markdown
    (e.g., for indexing, text analysis, etc.)
    
    It presently supports:
      *PDF (.pdf)
      *PowerPoint (.pptx)
      *Word (.docx)
      *Excel (.xlsx)
      *Images (EXIF metadata, and OCR)
      *Audio (EXIF metadata, and speech transcription)
      *HTML (special handling of Wikipedia, etc.)
      *Various other text-based formats (csv, json, xml, etc.)
      *ZIP (Iterates over contents and converts each file)
---
 converters/Makefile                |  1 +
 converters/py-markitdown/Makefile  | 27 +++++++++++++++++++++++++++
 converters/py-markitdown/distinfo  |  3 +++
 converters/py-markitdown/pkg-descr | 13 +++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/converters/Makefile b/converters/Makefile
index d963b78583d0..645b3b83065f 100644
--- a/converters/Makefile
+++ b/converters/Makefile
@@ -153,6 +153,7 @@
     SUBDIR += py-bsdconv
     SUBDIR += py-gotenberg-client
     SUBDIR += py-mammoth
+    SUBDIR += py-markitdown
     SUBDIR += py-rencode
     SUBDIR += py-svglib
     SUBDIR += py-text-unidecode
diff --git a/converters/py-markitdown/Makefile b/converters/py-markitdown/Makefile
new file mode 100644
index 000000000000..a9ce7a689d57
--- /dev/null
+++ b/converters/py-markitdown/Makefile
@@ -0,0 +1,27 @@
+PORTNAME=	markitdown
+DISTVERSION=	0.0.1a3
+CATEGORIES=	converters python
+MASTER_SITES=	PYPI
+PKGNAMEPREFIX=	${PYTHON_PKGNAMEPREFIX}
+
+MAINTAINER=	wen@FreeBSD.org
+COMMENT=	Utility tool for converting various files to Markdown
+WWW=		https://pypi.org/project/tlv8/
+
+LICENSE=	APACHE20
+
+BUILD_DEPENDS=	${PYTHON_PKGNAMEPREFIX}hatchling>=0:devel/py-hatchling@${PY_FLAVOR}
+RUN_DEPENDS=	${PYTHON_PKGNAMEPREFIX}mammoth>=0:converters/py-mammoth@${PY_FLAVOR} \
+		${PYTHON_PKGNAMEPREFIX}markdownify>=0:textproc/py-markdownify@${PY_FLAVOR} \
+		${PYTHON_PKGNAMEPREFIX}pandas>=0:math/py-pandas@${PY_FLAVOR} \
+		${PYTHON_PKGNAMEPREFIX}pdfminer.six>=0:textproc/py-pdfminer.six@${PY_FLAVOR} \
+		${PYTHON_PKGNAMEPREFIX}python-pptx>=0:textproc/py-python-pptx@${PY_FLAVOR} \
+		${PYTHON_PKGNAMEPREFIX}puremagic>=0:sysutils/py-puremagic@${PY_FLAVOR} \
+		${PYTHON_PKGNAMEPREFIX}requests>=0:www/py-requests@${PY_FLAVOR}
+
+USES=		python
+USE_PYTHON=	autoplist pep517
+
+NO_ARCH=	yes
+
+.include <bsd.port.mk>
diff --git a/converters/py-markitdown/distinfo b/converters/py-markitdown/distinfo
new file mode 100644
index 000000000000..a69065a058ef
--- /dev/null
+++ b/converters/py-markitdown/distinfo
@@ -0,0 +1,3 @@
+TIMESTAMP = 1734654122
+SHA256 (markitdown-0.0.1a3.tar.gz) = f6c8f5f7f5541e91c6c535218318968fefd71e2a6faa0eb782b3492e04cd023d
+SIZE (markitdown-0.0.1a3.tar.gz) = 16073
diff --git a/converters/py-markitdown/pkg-descr b/converters/py-markitdown/pkg-descr
new file mode 100644
index 000000000000..8871cf0e5603
--- /dev/null
+++ b/converters/py-markitdown/pkg-descr
@@ -0,0 +1,13 @@
+MarkItDown library is a utility tool for converting various files to Markdown
+(e.g., for indexing, text analysis, etc.)
+
+It presently supports:
+  *PDF (.pdf)
+  *PowerPoint (.pptx)
+  *Word (.docx)
+  *Excel (.xlsx)
+  *Images (EXIF metadata, and OCR)
+  *Audio (EXIF metadata, and speech transcription)
+  *HTML (special handling of Wikipedia, etc.)
+  *Various other text-based formats (csv, json, xml, etc.)
+  *ZIP (Iterates over contents and converts each file)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202412200209.4BK29oeC092787>