Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Mar 2026 19:17:08 +0000
From:      Robert Clausecker <fuz@FreeBSD.org>
To:        ports-committers@FreeBSD.org, dev-commits-ports-all@FreeBSD.org, dev-commits-ports-main@FreeBSD.org
Cc:        Wade Markham <wadegimpbc@tuta.com>
Subject:   git: 02a5aded7e25 - main - textproc/sonic: Make tokenizer features optional via OPTIONS, adopt port
Message-ID:  <69c2e334.41043.6b787d0f@gitrepo.freebsd.org>

index | next in thread | raw e-mail

The branch main has been updated by fuz:

URL: https://cgit.FreeBSD.org/ports/commit/?id=02a5aded7e2587143522858fe321ae1a14a56d9e

commit 02a5aded7e2587143522858fe321ae1a14a56d9e
Author:     Wade Markham <wadegimpbc@tuta.com>
AuthorDate: 2026-03-21 08:16:20 +0000
Commit:     Robert Clausecker <fuz@FreeBSD.org>
CommitDate: 2026-03-24 19:12:37 +0000

    textproc/sonic: Make tokenizer features optional via OPTIONS, adopt port
    
    This patch makes the Japanese and Chinese word segmentation features
    optional via FreeBSD OPTIONS helpers, and adopts the port.
    Currently the port unconditionally downloads a ~100MB UniDic Japanese
    dictionary (unidic-mecab-2.1.2_src.zip) for every build, regardless of
    whether the user needs Japanese tokenization. Upstream removed
    tokenizer-japanese from default cargo features in v1.4.2 because it
    10x'd the final binary size. This patch brings the port in line with
    upstream's intent.
    
    Changes:
    
     - MAINTAINER changed to wadegimpbc@tuta.com
     - Added CHINESE and JAPANESE OPTIONS using OPTIONS helpers
     - OPTIONS_DEFAULT includes CHINESE (matching upstream's default features)
     - UniDic download now conditional on JAPANESE option
     - CARGO_FEATURES uses --no-default-features with allocator-jemalloc as
       base, per cargo.mk convention (lines 23-26, 192, 197-200)
     - added missing zstd dependency
    
    PR:             293943
---
 textproc/sonic/Makefile | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/textproc/sonic/Makefile b/textproc/sonic/Makefile
index c533ef8857c7..e84acf782907 100644
--- a/textproc/sonic/Makefile
+++ b/textproc/sonic/Makefile
@@ -3,10 +3,7 @@ DISTVERSIONPREFIX=	v
 DISTVERSION=	1.4.9
 PORTREVISION=	16
 CATEGORIES=	textproc
-MASTER_SITES+=	https://clrd.ninjal.ac.jp/unidic_archive/cwj/2.1.2/:unidic
-DISTFILES+=	unidic-mecab-2.1.2_src.zip:unidic # check cargo-crates/lindera-unidic-XXX/build.rs
-
-MAINTAINER=	ports@FreeBSD.org
+MAINTAINER=	wadegimpbc@tuta.com
 COMMENT=	Fast, lightweight, and schema-less search backend
 WWW=		https://github.com/valeriansaliou/sonic
 
@@ -14,6 +11,7 @@ LICENSE=	MPL20
 LICENSE_FILE=	${WRKSRC}/LICENSE.md
 
 BUILD_DEPENDS=	llvm${LLVM_DEFAULT}>0:devel/llvm${LLVM_DEFAULT}
+LIB_DEPENDS=	libzstd.so:archivers/zstd
 
 USES=		cargo compiler:c++11-lang gmake
 USE_GITHUB=	yes
@@ -26,9 +24,17 @@ GROUPS=		sonic
 PLIST_FILES=	bin/sonic \
 		"@sample ${ETCDIR}/config.cfg.sample"
 PORTDOCS=	CONFIGURATION.md PROTOCOL.md README.md
-OPTIONS_DEFINE=	DOCS
+OPTIONS_DEFINE=	CHINESE DOCS JAPANESE
+OPTIONS_DEFAULT=	CHINESE
+CHINESE_DESC=	Chinese word segmentation
+JAPANESE_DESC=	Japanese word segmentation (adds ~100MB UniDic download)
 
 CARGO_ENV+=	DISTDIR=${DISTDIR}
+CARGO_FEATURES=	--no-default-features allocator-jemalloc
+CHINESE_VARS=	CARGO_FEATURES+=tokenizer-chinese
+JAPANESE_VARS=	CARGO_FEATURES+=tokenizer-japanese
+JAPANESE_MASTER_SITES=	https://clrd.ninjal.ac.jp/unidic_archive/cwj/2.1.2/:unidic
+JAPANESE_DISTFILES=	unidic-mecab-2.1.2_src.zip:unidic
 
 post-install:
 	@${MKDIR} ${STAGEDIR}${ETCDIR}


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69c2e334.41043.6b787d0f>