From nobody Thu Apr 18 07:21:48 2024 X-Original-To: ports-bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VKq3133Gcz5J30j for ; Thu, 18 Apr 2024 07:21:49 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VKq310b23z4Y2c for ; Thu, 18 Apr 2024 07:21:49 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1713424909; a=rsa-sha256; cv=none; b=ryEevJY5K85+Vzo81QZV1NjfrfKkvZUMj3vq4vXQMCdS0XAmtAusO7krEecehqssb2i6Tc 7Yl9elngyMYQABpcC9pA5E3+z9bJ9YGp/5rcuDzjxKrbSMNQ8aP8DvjEzIHCF37ssEBZ16 qTRbAgdORCXbJD9Vx9qPr8t58rQuHMnk4RN0cWhQ4Nu6tNPZXp2GTDBo3KcoeC0p8QpTrG 0SYqgiUAn08IU9X0mkzTvV4gz0tCfFs0LqEPaDB1Wr7oUU7DMr0YZxPyUsaROpUukChUhd eNOlPrT1S+7uX27z5Zrjr8HO5yXpJJHgtvb7dWWV3n8wwUWfpNtTOYZ4O6hiVQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1713424909; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=uklAw/hRt9IjNkJUQKcRKup2TIaSXkUCmntNbnNqKq8=; b=hOGf4NDtexDqCvQngEgcZwL5zJ5haEkE5O5NIBrKSJ+ptIyaFFNN5VM+b45BzAG/NlFv0l emci22sm+aQ2wcl7Qj3RyTAOBG9H8DnUDfCBzJDjxzqXSq4s5LLKcQZiCoHp9xYLCdL9L4 XtfTIk3uaF4tkTVpFvS9tIjolt66gUiCleyjd0UksixCdku9Gv+HEDKM3lOhJnQHbUg8rY EMNSSw6yZTltODcc/20TTJF6VNUSiNsR5k5svRm+iP0jpP6Lt2FMu6Dy//RPgc0Qehh/C6 jULVDRo9pW2GB9CQRXvBPFgZ110/ZdaNMMGod4iXqcTZoJWc6BbmdnaW17FIkg== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4VKq310C1Yz10Rv for ; Thu, 18 Apr 2024 07:21:49 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 43I7Lmnp018155 for ; Thu, 18 Apr 2024 07:21:48 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 43I7Lmw3018154 for ports-bugs@FreeBSD.org; Thu, 18 Apr 2024 07:21:48 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: ports-bugs@FreeBSD.org Subject: [Bug 278424] deskutils/py-paperless-ngx: man page doesn't mention NLTK's Snowball Stemmer Date: Thu, 18 Apr 2024 07:21:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: freebsd.bugzilla@mail.tinsuke.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: grembo@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter flagtypes.name Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Ports bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-ports-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-ports-bugs@freebsd.org Sender: owner-freebsd-ports-bugs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D278424 Bug ID: 278424 Summary: deskutils/py-paperless-ngx: man page doesn't mention NLTK's Snowball Stemmer Product: Ports & Packages Version: Latest Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: Individual Port(s) Assignee: grembo@FreeBSD.org Reporter: freebsd.bugzilla@mail.tinsuke.com Flags: maintainer-feedback?(grembo@FreeBSD.org) Assignee: grembo@FreeBSD.org The man page states, about setting up NLTK: > NLTK DATA > In order to process scanned documents using machine learning, paperle= ss- > ngx requires NLTK (natural language toolkit) data. The required files > can be downloaded by using these commands: > > /usr/local/bin/python3.9 -m nltk.downloader \ > stopwords punkt -d /var/db/paperless/nltkdata It is missing the "snowball_data" file to be downloaded. The file is referr= ed to in the project's doc (https://docs.paperless-ngx.com/setup/#bare_metal): > Optional: If using the NLTK machine learning processing (see PAPERLESS_EN= ABLE_NLTK for details), download the NLTK data for the Snowball Stemmer, St= opwords and Punkt tokenizer to your PAPERLESS_DATA_DIR/nltk. Refer to the N= LTK instructions for details on how to download the data. I can't vouch for how handy it is to have that in NLTK or not, but it sounds very useful from its description (https://github.com/snowballstem/snowball?tab=3Dreadme-ov-file#what-is-stem= ming): > What is Stemming? > Stemming maps different forms of the same word to a common "stem" - for e= xample, the English stemmer maps connection, connections, connective, conne= cted, and connecting to connect. So a search for connected would also find = documents which only have the other forms. I suggest "snowball_data" is added to the man page's sample NLTK download command so it is in line with the project's docs and can be useful to users= of this port (thanks for it, btw!). --=20 You are receiving this mail because: You are the assignee for the bug.=