From nobody Mon Jan 16 09:41:03 2023 X-Original-To: dev-commits-ports-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NwRq32zkzz2tlXH; Mon, 16 Jan 2023 09:41:03 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NwRq32Cl4z3q9C; Mon, 16 Jan 2023 09:41:03 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1673862063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DkCd94a8z9nxrtwKkk9+ZnuF0W84ZI1HangwdbPkPtI=; b=Boh7JxGV9pHFQVzVRxi5CHwHQATiJpV+eW0fk8tah+tUdxXtfOH6TEe9Ou8mRfSLt6GsH+ kBD3oHh5H5kvB9s477gmZJ2RCDeVJBSQVzNayl1KtIbf3W3VbVAbiaYj+zbBfbTLf1DJ6P rlRJOBgOw2O4Bkn5fIT9K/gg5zuWN8/kI9Y5g+TGXPuGELUivecqsdBn1d84HhzVa+sSJ9 eIDkwsb3kW6wVjNZClh/VaLkiuJMRdMC/4lxbLmtOwCBIiLVMCeAdbIwa1CrkrLvWKXuFb LIJvinHvueLQI5t6ppEpIgNPGBESI2NIBCn9XhQSFtR7caJ5c3gNpGcpCjDRGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1673862063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=DkCd94a8z9nxrtwKkk9+ZnuF0W84ZI1HangwdbPkPtI=; b=MaT5Y+ZjsWVQ68AGH5GU0du3bLhweYeunDvpKPL8iYg6lxxzGAje02vt5yVInjXDM+7iKU 8t8tZnZFmZPdJcMj1W/l/XTDKqB3QzLFJS5eSrTCNxk/0oIG661i0aKYT/OS21PaPp/yCi sH7iONyz1q3kzh30VGke8AzP9hS4nZJ8MP/SJ2KV4CmHVdvyEb+PVZsR1t1IQ+t87Kpcvc WmRh3ts1ioNiM8LNHh7vzvX6zt6FKnoN4odQIZgD1xquGZBOtIzADSEKvUvQUYfKGI7rO/ veEriMUhByCcSLXNBCd+fVWC22A02Nrpu1oiXLx8o1R8hKJ14p3Q49xqOceowQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1673862063; a=rsa-sha256; cv=none; b=NxXYiwLwQQsdjvGExva2Br/Dqdvf89n3ZwAiAqSec2Ah+FpaKFhwa9c7dslzhgCIrvuJHS e1Z1m2EJG1WmO4/LNFh4vmmKfly5yigSCpEx86Kshc69wMOeyjNtIp/hwETpm8irUAfFAm FbcM/Xb0419C1zPfWwODK2p5Adt8WeX1vuKZQFPp9iqEQtVoPYscfqlIwtATKOTonQAjJx Gmkd4xabTlD3FMFbdrz63q4aVRoQa3xElGHGRqILIzSXCGHrHm/DnAGOIOqVHrTKv7KH2D ioPH52V9jWgnbDP/c5k5Rgp9GNuzEuD3aeCKu4rV0LcGg4vOwOP4rDapfdtUUQ== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4NwRq31H7fz12cp; Mon, 16 Jan 2023 09:41:03 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 30G9f366030014; Mon, 16 Jan 2023 09:41:03 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 30G9f3Tg030013; Mon, 16 Jan 2023 09:41:03 GMT (envelope-from git) Date: Mon, 16 Jan 2023 09:41:03 GMT Message-Id: <202301160941.30G9f3Tg030013@gitrepo.freebsd.org> To: ports-committers@FreeBSD.org, dev-commits-ports-all@FreeBSD.org, dev-commits-ports-main@FreeBSD.org From: Yuri Victorovich Subject: git: 922291e01926 - main - textproc/sentencepiece: New port: Unsupervised text tokenizer for Neural Network-based text generation List-Id: Commits to the main branch of the FreeBSD ports repository List-Archive: https://lists.freebsd.org/archives/dev-commits-ports-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-ports-main@freebsd.org X-BeenThere: dev-commits-ports-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: yuri X-Git-Repository: ports X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 922291e019260419b7bf80e0db65caf4563c2174 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by yuri: URL: https://cgit.FreeBSD.org/ports/commit/?id=922291e019260419b7bf80e0db65caf4563c2174 commit 922291e019260419b7bf80e0db65caf4563c2174 Author: Yuri Victorovich AuthorDate: 2023-01-16 09:36:02 +0000 Commit: Yuri Victorovich CommitDate: 2023-01-16 09:41:00 +0000 textproc/sentencepiece: New port: Unsupervised text tokenizer for Neural Network-based text generation --- textproc/Makefile | 1 + textproc/sentencepiece/Makefile | 21 +++++++++++++++++++++ textproc/sentencepiece/distinfo | 3 +++ textproc/sentencepiece/pkg-descr | 7 +++++++ textproc/sentencepiece/pkg-plist | 16 ++++++++++++++++ 5 files changed, 48 insertions(+) diff --git a/textproc/Makefile b/textproc/Makefile index a85511af2b50..e2d0e0ea9521 100644 --- a/textproc/Makefile +++ b/textproc/Makefile @@ -1888,6 +1888,7 @@ SUBDIR += sdocbook-xml SUBDIR += sdom SUBDIR += senna + SUBDIR += sentencepiece SUBDIR += sgmlformat SUBDIR += sgmls SUBDIR += sgrep diff --git a/textproc/sentencepiece/Makefile b/textproc/sentencepiece/Makefile new file mode 100644 index 000000000000..84e7ac9ca43e --- /dev/null +++ b/textproc/sentencepiece/Makefile @@ -0,0 +1,21 @@ +PORTNAME= sentencepiece +DISTVERSIONPREFIX= v +DISTVERSION= 0.1.97 +CATEGORIES= textproc # machine-learning + +MAINTAINER= yuri@FreeBSD.org +COMMENT= Unsupervised text tokenizer for Neural Network-based text generation +WWW= https://github.com/google/sentencepiece + +LICENSE= APACHE20 +LICENSE_FILE= ${WRKSRC}/LICENSE + +USES= cmake:testing compiler:c++17-lang +USE_LDCONFIG= yes + +USE_GITHUB= yes +GH_ACCOUNT= google + +CMAKE_TESTING_ON= SPM_BUILD_TEST + +.include diff --git a/textproc/sentencepiece/distinfo b/textproc/sentencepiece/distinfo new file mode 100644 index 000000000000..c29dc9430710 --- /dev/null +++ b/textproc/sentencepiece/distinfo @@ -0,0 +1,3 @@ +TIMESTAMP = 1673860778 +SHA256 (google-sentencepiece-v0.1.97_GH0.tar.gz) = 41c3a07f315e3ac87605460c8bb8d739955bc8e7f478caec4017ef9b7d78669b +SIZE (google-sentencepiece-v0.1.97_GH0.tar.gz) = 11945436 diff --git a/textproc/sentencepiece/pkg-descr b/textproc/sentencepiece/pkg-descr new file mode 100644 index 000000000000..62b7de5f4ece --- /dev/null +++ b/textproc/sentencepiece/pkg-descr @@ -0,0 +1,7 @@ +SentencePiece is an unsupervised text tokenizer and detokenizer mainly for +Neural Network-based text generation systems where the vocabulary size is +predetermined prior to the neural model training. SentencePiece implements +subword units (e.g., byte-pair-encoding (BPE)) and unigram language model +with the extension of direct training from raw sentences. SentencePiece +allows us to make a purely end-to-end system that does not depend on +language-specific pre/postprocessing. diff --git a/textproc/sentencepiece/pkg-plist b/textproc/sentencepiece/pkg-plist new file mode 100644 index 000000000000..7640dc4d9c23 --- /dev/null +++ b/textproc/sentencepiece/pkg-plist @@ -0,0 +1,16 @@ +bin/spm_decode +bin/spm_encode +bin/spm_export_vocab +bin/spm_normalize +bin/spm_train +include/sentencepiece_processor.h +include/sentencepiece_trainer.h +lib/libsentencepiece.a +lib/libsentencepiece.so +lib/libsentencepiece.so.0 +lib/libsentencepiece.so.0.0.0 +lib/libsentencepiece_train.a +lib/libsentencepiece_train.so +lib/libsentencepiece_train.so.0 +lib/libsentencepiece_train.so.0.0.0 +libdata/pkgconfig/sentencepiece.pc