FreeBSD Mail Archives

Date:      Mon, 22 Mar 2021 14:16:03 GMT
From:      Baptiste Daroussin <bapt@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org
Subject:   git: a0409676120c - main - libucl: vendor import snapshort 20210314
Message-ID:  <202103221416.12MEG3w9043302@gitrepo.freebsd.org>

index | next in thread | raw e-mail


The branch main has been updated by bapt:

URL: https://cgit.FreeBSD.org/src/commit/?id=a0409676120c1e558d0ade943019934e0f15118d

commit a0409676120c1e558d0ade943019934e0f15118d
Merge: 64a790d26480 3c319408d0de
Author:     Baptiste Daroussin <bapt@FreeBSD.org>
AuthorDate: 2021-03-22 14:13:02 +0000
Commit:     Baptiste Daroussin <bapt@FreeBSD.org>
CommitDate: 2021-03-22 14:13:02 +0000

    libucl: vendor import snapshort 20210314

 contrib/libucl/CMakeLists.txt                      | 105 ++-
 contrib/libucl/ChangeLog.md                        |  38 +-
 contrib/libucl/README.md                           |  64 +-
 contrib/libucl/configure.ac                        |   8 +-
 contrib/libucl/doc/api.md                          |  17 +-
 contrib/libucl/doc/libucl.3                        |  26 +-
 contrib/libucl/doc/lua_api.md                      |   4 +-
 contrib/libucl/include/lua_ucl.h                   |  20 +-
 contrib/libucl/include/ucl++.h                     | 191 ++++-
 contrib/libucl/include/ucl.h                       | 191 ++++-
 contrib/libucl/klib/kvec.h                         |  78 ++-
 contrib/libucl/lua/lua_ucl.c                       | 450 ++++++++++--
 contrib/libucl/python/MANIFEST.in                  |   5 +
 contrib/libucl/python/setup.py                     |  42 +-
 contrib/libucl/python/src/uclmodule.c              |   3 +-
 contrib/libucl/python/tests/test_example.py        |  59 ++
 contrib/libucl/python/tests/test_load.py           |  17 +-
 contrib/libucl/src/mum.h                           |   8 +-
 contrib/libucl/src/ucl_chartable.h                 |   4 +-
 contrib/libucl/src/ucl_emitter.c                   |  12 +-
 contrib/libucl/src/ucl_emitter_utils.c             |  57 +-
 contrib/libucl/src/ucl_hash.c                      | 218 +++++-
 contrib/libucl/src/ucl_hash.h                      |  20 +-
 contrib/libucl/src/ucl_internal.h                  | 105 ++-
 contrib/libucl/src/ucl_msgpack.c                   |  82 ++-
 contrib/libucl/src/ucl_parser.c                    | 552 ++++++++++++---
 contrib/libucl/src/ucl_schema.c                    |  29 +-
 contrib/libucl/src/ucl_util.c                      | 780 ++++++++++++++++-----
 contrib/libucl/tests/basic.test                    |   2 +-
 contrib/libucl/tests/basic/13.in                   |   2 +-
 contrib/libucl/tests/basic/20.in                   |   2 -
 contrib/libucl/tests/basic/20.res                  |   5 -
 contrib/libucl/tests/basic/21.in                   |   2 -
 contrib/libucl/tests/basic/21.res                  |  10 -
 contrib/libucl/tests/basic/9.in                    |   2 +-
 contrib/libucl/tests/basic/9.res                   |   8 +-
 contrib/libucl/tests/basic/squote.in               |   8 +
 contrib/libucl/tests/basic/squote.res              |   7 +
 .../libucl/tests/fuzzers/ucl_add_string_fuzzer.c   |  25 +
 contrib/libucl/tests/fuzzers/ucl_msgpack_fuzzer.c  |  29 +
 contrib/libucl/tests/generate.test                 |   2 +-
 contrib/libucl/tests/run_tests.sh                  |   4 +-
 contrib/libucl/tests/streamline.test               |   2 +-
 contrib/libucl/tests/test_basic.c                  |  11 +-
 contrib/libucl/tests/test_generate.c               |  15 +-
 contrib/libucl/tests/test_msgpack.c                |   1 +
 contrib/libucl/utils/CMakeLists.txt                |  12 +
 contrib/libucl/utils/objdump.c                     |  17 +-
 contrib/libucl/utils/ucl-tool.c                    | 100 +--
 49 files changed, 2820 insertions(+), 631 deletions(-)

diff --cc contrib/libucl/README.md
index 44983c57d643,000000000000..53d8a651d73b
mode 100644,000000..100644
--- a/contrib/libucl/README.md
+++ b/contrib/libucl/README.md
@@@ -1,384 -1,0 +1,418 @@@
 +# LIBUCL
 +
- [![Build Status](https://travis-ci.org/vstakhov/libucl.svg?branch=master)](https://travis-ci.org/vstakhov/libucl)
++[![CircleCI](https://circleci.com/gh/vstakhov/libucl.svg?style=svg)](https://circleci.com/gh/vstakhov/libucl)
 +[![Coverity](https://scan.coverity.com/projects/4138/badge.svg)](https://scan.coverity.com/projects/4138)
 +[![Coverage Status](https://coveralls.io/repos/github/vstakhov/libucl/badge.svg?branch=master)](https://coveralls.io/github/vstakhov/libucl?branch=master)
 +
 +**Table of Contents**  *generated with [DocToc](http://doctoc.herokuapp.com/)*
 +
 +- [Introduction](#introduction)
 +- [Basic structure](#basic-structure)
 +- [Improvements to the json notation](#improvements-to-the-json-notation)
 +	- [General syntax sugar](#general-syntax-sugar)
 +	- [Automatic arrays creation](#automatic-arrays-creation)
 +	- [Named keys hierarchy](#named-keys-hierarchy)
 +	- [Convenient numbers and booleans](#convenient-numbers-and-booleans)
 +- [General improvements](#general-improvements)
 +	- [Comments](#comments)
 +	- [Macros support](#macros-support)
 +	- [Variables support](#variables-support)
 +	- [Multiline strings](#multiline-strings)
++	- [Single quoted strings](#single-quoted-strings)
 +- [Emitter](#emitter)
 +- [Validation](#validation)
 +- [Performance](#performance)
 +- [Conclusion](#conclusion)
 +
 +## Introduction
 +
 +This document describes the main features and principles of the configuration
 +language called `UCL` - universal configuration language.
 +
 +If you are looking for the libucl API documentation you can find it at [this page](doc/api.md).
 +
 +## Basic structure
 +
 +UCL is heavily infused by `nginx` configuration as the example of a convenient configuration
 +system. However, UCL is fully compatible with `JSON` format and is able to parse json files.
 +For example, you can write the same configuration in the following ways:
 +
 +* in nginx like:
 +
 +```nginx
 +param = value;
 +section {
 +    param = value;
 +    param1 = value1;
 +    flag = true;
 +    number = 10k;
 +    time = 0.2s;
 +    string = "something";
 +    subsection {
 +        host = {
 +            host = "hostname";
 +            port = 900;
 +        }
 +        host = {
 +            host = "hostname";
 +            port = 901;
 +        }
 +    }
 +}
 +```
 +
 +* or in JSON:
 +
 +```json
 +{
 +    "param": "value",
-     "param1": "value1",
-     "flag": true,
-     "subsection": {
-         "host": [
-         {
-             "host": "hostname",
-             "port": 900
-         },
-         {
-             "host": "hostname",
-             "port": 901
++    "section": {
++        "param": "value",
++        "param1": "value1",
++        "flag": true,
++        "number": 10000,
++        "time": "0.2s",
++        "string": "something",
++        "subsection": {
++            "host": [
++                {
++                    "host": "hostname",
++                    "port": 900
++                },
++                {
++                    "host": "hostname",
++                    "port": 901
++                }
++            ]
 +        }
-         ]
 +    }
 +}
 +```
 +
 +## Improvements to the json notation.
 +
 +There are various things that make ucl configuration more convenient for editing than strict json:
 +
 +### General syntax sugar
 +
 +* Braces are not necessary to enclose a top object: it is automatically treated as an object:
 +
 +```json
 +"key": "value"
 +```
 +is equal to:
 +```json
 +{"key": "value"}
 +```
 +
 +* There is no requirement of quotes for strings and keys, moreover, `:` may be replaced `=` or even be skipped for objects:
 +
 +```nginx
 +key = value;
 +section {
 +    key = value;
 +}
 +```
 +is equal to:
 +```json
 +{
 +    "key": "value",
 +    "section": {
 +        "key": "value"
 +    }
 +}
 +```
 +
 +* No commas mess: you can safely place a comma or semicolon for the last element in an array or an object:
 +
 +```json
 +{
 +    "key1": "value",
 +    "key2": "value",
 +}
 +```
 +### Automatic arrays creation
 +
 +* Non-unique keys in an object are allowed and are automatically converted to the arrays internally:
 +
 +```json
 +{
 +    "key": "value1",
 +    "key": "value2"
 +}
 +```
 +is converted to:
 +```json
 +{
 +    "key": ["value1", "value2"]
 +}
 +```
 +
 +### Named keys hierarchy
 +
 +UCL accepts named keys and organize them into objects hierarchy internally. Here is an example of this process:
 +```nginx
 +section "blah" {
 +	key = value;
 +}
 +section foo {
 +	key = value;
 +}
 +```
 +
 +is converted to the following object:
 +
 +```nginx
 +section {
 +	blah {
 +		key = value;
 +	}
 +	foo {
 +		key = value;
 +	}
 +}
 +```
 +
 +Plain definitions may be more complex and contain more than a single level of nested objects:
 +
 +```nginx
 +section "blah" "foo" {
 +	key = value;
 +}
 +```
 +
 +is presented as:
 +
 +```nginx
 +section {
 +	blah {
 +		foo {
 +			key = value;
 +		}
 +	}
 +}
 +```
 +
 +### Convenient numbers and booleans
 +
 +* Numbers can have suffixes to specify standard multipliers:
 +    + `[kKmMgG]` - standard 10 base multipliers (so `1k` is translated to 1000)
 +    + `[kKmMgG]b` - 2 power multipliers (so `1kb` is translated to 1024)
 +    + `[s|min|d|w|y]` - time multipliers, all time values are translated to float number of seconds, for example `10min` is translated to 600.0 and `10ms` is translated to 0.01
 +* Hexadecimal integers can be used by `0x` prefix, for example `key = 0xff`. However, floating point values can use decimal base only.
 +* Booleans can be specified as `true` or `yes` or `on` and `false` or `no` or `off`.
 +* It is still possible to treat numbers and booleans as strings by enclosing them in double quotes.
 +
 +## General improvements
 +
 +### Comments
 +
 +UCL supports different style of comments:
 +
 +* single line: `#`
 +* multiline: `/* ... */`
 +
 +Multiline comments may be nested:
 +```c
 +# Sample single line comment
 +/*
 + some comment
 + /* nested comment */
 + end of comment
 +*/
 +```
 +
 +### Macros support
 +
 +UCL supports external macros both multiline and single line ones:
 +```nginx
 +.macro_name "sometext";
 +.macro_name {
 +    Some long text
 +    ....
 +};
 +```
 +
 +Moreover, each macro can accept an optional list of arguments in braces. These
 +arguments themselves are the UCL object that is parsed and passed to a macro as
 +options:
 +
 +```nginx
 +.macro_name(param=value) "something";
 +.macro_name(param={key=value}) "something";
 +.macro_name(.include "params.conf") "something";
 +.macro_name(#this is multiline macro
 +param = [value1, value2]) "something";
 +.macro_name(key="()") "something";
 +```
 +
 +UCL also provide a convenient `include` macro to load content from another files
 +to the current UCL object. This macro accepts either path to file:
 +
 +```nginx
 +.include "/full/path.conf"
 +.include "./relative/path.conf"
 +.include "${CURDIR}/path.conf"
 +```
 +
 +or URL (if ucl is built with url support provided by either `libcurl` or `libfetch`):
 +
 +	.include "http://example.com/file.conf"
 +
 +`.include` macro supports a set of options:
 +
 +* `try` (default: **false**) - if this option is `true` than UCL treats errors on loading of
 +this file as non-fatal. For example, such a file can be absent but it won't stop the parsing
 +of the top-level document.
 +* `sign` (default: **false**) - if this option is `true` UCL loads and checks the signature for
 +a file from path named `<FILEPATH>.sig`. Trusted public keys should be provided for UCL API after
 +parser is created but before any configurations are parsed.
 +* `glob` (default: **false**) - if this option is `true` UCL treats the filename as GLOB pattern and load
 +all files that matches the specified pattern (normally the format of patterns is defined in `glob` manual page
 +for your operating system). This option is meaningless for URL includes.
 +* `url` (default: **true**) - allow URL includes.
 +* `path` (default: empty) - A UCL_ARRAY of directories to search for the include file.
 +Search ends after the first match, unless `glob` is true, then all matches are included.
 +* `prefix` (default false) - Put included contents inside an object, instead
 +of loading them into the root. If no `key` is provided, one is automatically generated based on each files basename()
 +* `key` (default: <empty string>) - Key to load contents of include into. If
 +the key already exists, it must be the correct type
 +* `target` (default: object) - Specify if the `prefix` `key` should be an
 +object or an array.
 +* `priority` (default: 0) - specify priority for the include (see below).
 +* `duplicate` (default: 'append') - specify policy of duplicates resolving:
 +	- `append` - default strategy, if we have new object of higher priority then it replaces old one, if we have new object with less priority it is ignored completely, and if we have two duplicate objects with the same priority then we have a multi-value key (implicit array)
 +	- `merge` - if we have object or array, then new keys are merged inside, if we have a plain object then an implicit array is formed (regardless of priorities)
 +	- `error` - create error on duplicate keys and stop parsing
 +	- `rewrite` - always rewrite an old value with new one (ignoring priorities)
 +
 +Priorities are used by UCL parser to manage the policy of objects rewriting during including other files
 +as following:
 +
 +* If we have two objects with the same priority then we form an implicit array
 +* If a new object has bigger priority then we overwrite an old one
 +* If a new object has lower priority then we ignore it
 +
 +By default, the priority of top-level object is set to zero (lowest priority). Currently,
 +you can define up to 16 priorities (from 0 to 15). Includes with bigger priorities will
- rewrite keys from the objects with lower priorities as specified by the policy.
++rewrite keys from the objects with lower priorities as specified by the policy. The priority
++of the top-level or any other object can be changed with the `.priority` macro, which has no
++options and takes the new priority:
++
++```
++# Default priority: 0.
++foo = 6
++.priority 5
++# The following will have priority 5.
++bar = 6
++baz = 7
++# The following will be included with a priority of 3, 5, and 6 respectively.
++.include(priority=3) "path.conf"
++.include(priority=5) "equivalent-path.conf"
++.include(priority=6) "highpriority-path.conf"
++```
 +
 +### Variables support
 +
 +UCL supports variables in input. Variables are registered by a user of the UCL parser and can be presented in the following forms:
 +
 +* `${VARIABLE}`
 +* `$VARIABLE`
 +
 +UCL currently does not support nested variables. To escape variables one could use double dollar signs:
 +
 +* `$${VARIABLE}` is converted to `${VARIABLE}`
 +* `$$VARIABLE` is converted to `$VARIABLE`
 +
 +However, if no valid variables are found in a string, no expansion will be performed (and `$$` thus remains unchanged). This may be a subject
 +to change in future libucl releases.
 +
 +### Multiline strings
 +
 +UCL can handle multiline strings as well as single line ones. It uses shell/perl like notation for such objects:
 +```
 +key = <<EOD
 +some text
 +splitted to
 +lines
 +EOD
 +```
 +
 +In this example `key` will be interpreted as the following string: `some text\nsplitted to\nlines`.
 +Here are some rules for this syntax:
 +
 +* Multiline terminator must start just after `<<` symbols and it must consist of capital letters only (e.g. `<<eof` or `<< EOF` won't work);
 +* Terminator must end with a single newline character (and no spaces are allowed between terminator and newline character);
 +* To finish multiline string you need to include a terminator string just after newline and followed by a newline (no spaces or other characters are allowed as well);
 +* The initial and the final newlines are not inserted to the resulting string, but you can still specify newlines at the beginning and at the end of a value, for example:
 +
 +```
 +key <<EOD
 +
 +some
 +text
 +
 +EOD
 +```
 +
++### Single quoted strings
++
++It is possible to use single quoted strings to simplify escaping rules. All values passed in single quoted strings are *NOT* escaped, with two exceptions: a single `'` character just before `\` character, and a newline character just after `\` character that is ignored.
++
++```
++key = 'value'; # Read as value
++key = 'value\n\'; # Read as  value\n\
++key = 'value\''; # Read as value'
++key = 'value\
++bla'; # Read as valuebla
++```
++
 +## Emitter
 +
- Each UCL object can be serialized to one of the three supported formats:
++Each UCL object can be serialized to one of the four supported formats:
 +
 +* `JSON` - canonic json notation (with spaces indented structure);
 +* `Compacted JSON` - compact json notation (without spaces or newlines);
 +* `Configuration` - nginx like notation;
 +* `YAML` - yaml inlined notation.
 +
 +## Validation
 +
 +UCL allows validation of objects. It uses the same schema that is used for json: [json schema v4](http://json-schema.org). UCL supports the full set of json schema with the exception of remote references. This feature is unlikely useful for configuration objects. Of course, a schema definition can be in UCL format instead of JSON that simplifies schemas writing. Moreover, since UCL supports multiple values for keys in an object it is possible to specify generic integer constraints `maxValues` and `minValues` to define the limits of values count in a single key. UCL currently is not absolutely strict about validation schemas themselves, therefore UCL users should supply valid schemas (as it is defined in json-schema draft v4) to ensure that the input objects are validated properly.
 +
 +## Performance
 +
 +Are UCL parser and emitter fast enough? Well, there are some numbers.
 +I got a 19Mb file that consist of ~700 thousand lines of json (obtained via
 +http://www.json-generator.com/). Then I checked jansson library that performs json
 +parsing and emitting and compared it with UCL. Here are results:
 +
 +```
 +jansson: parsed json in 1.3899 seconds
 +jansson: emitted object in 0.2609 seconds
 +
 +ucl: parsed input in 0.6649 seconds
 +ucl: emitted config in 0.2423 seconds
 +ucl: emitted json in 0.2329 seconds
 +ucl: emitted compact json in 0.1811 seconds
 +ucl: emitted yaml in 0.2489 seconds
 +```
 +
 +So far, UCL seems to be significantly faster than jansson on parsing and slightly faster on emitting. Moreover,
 +UCL compiled with optimizations (-O3) performs significantly faster:
 +```
 +ucl: parsed input in 0.3002 seconds
 +ucl: emitted config in 0.1174 seconds
 +ucl: emitted json in 0.1174 seconds
 +ucl: emitted compact json in 0.0991 seconds
 +ucl: emitted yaml in 0.1354 seconds
 +```
 +
 +You can do your own benchmarks by running `make check` in libucl top directory.
 +
 +## Conclusion
 +
 +UCL has clear design that should be very convenient for reading and writing. At the same time it is compatible with
 +JSON language and therefore can be used as a simple JSON parser. Macro logic provides an ability to extend configuration
 +language (for example by including some lua code) and comments allow to disable or enable the parts of a configuration
 +quickly.
diff --cc contrib/libucl/python/MANIFEST.in
index 000000000000,336a4b3a7a07..336a4b3a7a07
mode 000000,100644..100644
--- a/contrib/libucl/python/MANIFEST.in
+++ b/contrib/libucl/python/MANIFEST.in
diff --cc contrib/libucl/python/tests/test_example.py
index 000000000000,f0785531f4e2..f0785531f4e2
mode 000000,100644..100644
--- a/contrib/libucl/python/tests/test_example.py
+++ b/contrib/libucl/python/tests/test_example.py
diff --cc contrib/libucl/src/mum.h
index 9c719e515538,000000000000..148161a3ec58
mode 100644,000000..100644
--- a/contrib/libucl/src/mum.h
+++ b/contrib/libucl/src/mum.h
@@@ -1,419 -1,0 +1,419 @@@
 +/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>
 +
 +   Permission is hereby granted, free of charge, to any person
 +   obtaining a copy of this software and associated documentation
 +   files (the "Software"), to deal in the Software without
 +   restriction, including without limitation the rights to use, copy,
 +   modify, merge, publish, distribute, sublicense, and/or sell copies
 +   of the Software, and to permit persons to whom the Software is
 +   furnished to do so, subject to the following conditions:
 +
 +   The above copyright notice and this permission notice shall be
 +   included in all copies or substantial portions of the Software.
 +
 +   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 +   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 +   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 +   NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
 +   BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
 +   ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 +   CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 +   SOFTWARE.
 +*/
 +
 +/* This file implements MUM (MUltiply and Mix) hashing.  We randomize
 +   input data by 64x64-bit multiplication and mixing hi- and low-parts
 +   of the multiplication result by using an addition and then mix it
 +   into the current state.  We use prime numbers randomly generated
 +   with the equal probability of their bit values for the
 +   multiplication.  When all primes are used once, the state is
 +   randomized and the same prime numbers are used again for data
 +   randomization.
 +
 +   The MUM hashing passes all SMHasher tests.  Pseudo Random Number
 +   Generator based on MUM also passes NIST Statistical Test Suite for
 +   Random and Pseudorandom Number Generators for Cryptographic
 +   Applications (version 2.2.1) with 1000 bitstreams each containing
 +   1M bits.  MUM hashing is also faster Spooky64 and City64 on small
-    strings (at least upto 512-bit) on Haswell and Power7.  The MUM bulk
++   strings (at least up to 512-bit) on Haswell and Power7.  The MUM bulk
 +   speed (speed on very long data) is bigger than Spooky and City on
 +   Power7.  On Haswell the bulk speed is bigger than Spooky one and
 +   close to City speed.  */
 +
 +#ifndef __MUM_HASH__
 +#define __MUM_HASH__
 +
 +#include <stddef.h>
 +#include <stdlib.h>
 +#include <string.h>
 +#include <limits.h>
 +
 +#ifdef _MSC_VER
 +typedef unsigned __int16 uint16_t;
 +typedef unsigned __int32 uint32_t;
 +typedef unsigned __int64 uint64_t;
 +#else
 +#include <stdint.h>
 +#endif
 +
 +/* Macro saying to use 128-bit integers implemented by GCC for some
 +   targets.  */
 +#ifndef _MUM_USE_INT128
 +/* In GCC uint128_t is defined if HOST_BITS_PER_WIDE_INT >= 64.
 +   HOST_WIDE_INT is long if HOST_BITS_PER_LONG > HOST_BITS_PER_INT,
 +   otherwise int. */
 +#if defined(__GNUC__) && UINT_MAX != ULONG_MAX
 +#define _MUM_USE_INT128 1
 +#else
 +#define _MUM_USE_INT128 0
 +#endif
 +#endif
 +
 +#if 0
 +#if defined(__GNUC__) && ((__GNUC__ == 4) &&  (__GNUC_MINOR__ >= 9) || (__GNUC__ > 4))
 +#define _MUM_FRESH_GCC
 +#endif
 +#endif
 +
 +#if defined(__GNUC__) && !defined(__llvm__) && defined(_MUM_FRESH_GCC)
 +#define _MUM_ATTRIBUTE_UNUSED  __attribute__((unused))
 +#define _MUM_OPTIMIZE(opts) __attribute__((__optimize__ (opts)))
 +#define _MUM_TARGET(opts) __attribute__((__target__ (opts)))
 +#else
 +#define _MUM_ATTRIBUTE_UNUSED
 +#define _MUM_OPTIMIZE(opts)
 +#define _MUM_TARGET(opts)
 +#endif
 +
 +
 +/* Here are different primes randomly generated with the equal
 +   probability of their bit values.  They are used to randomize input
 +   values.  */
 +static uint64_t _mum_hash_step_prime = 0x2e0bb864e9ea7df5ULL;
 +static uint64_t _mum_key_step_prime = 0xcdb32970830fcaa1ULL;
 +static uint64_t _mum_block_start_prime = 0xc42b5e2e6480b23bULL;
 +static uint64_t _mum_unroll_prime = 0x7b51ec3d22f7096fULL;
 +static uint64_t _mum_tail_prime = 0xaf47d47c99b1461bULL;
 +static uint64_t _mum_finish_prime1 = 0xa9a7ae7ceff79f3fULL;
 +static uint64_t _mum_finish_prime2 = 0xaf47d47c99b1461bULL;
 +
 +static uint64_t _mum_primes [] = {
 +  0X9ebdcae10d981691, 0X32b9b9b97a27ac7d, 0X29b5584d83d35bbd, 0X4b04e0e61401255f,
 +  0X25e8f7b1f1c9d027, 0X80d4c8c000f3e881, 0Xbd1255431904b9dd, 0X8a3bd4485eee6d81,
 +  0X3bc721b2aad05197, 0X71b1a19b907d6e33, 0X525e6c1084a8534b, 0X9e4c2cd340c1299f,
 +  0Xde3add92e94caa37, 0X7e14eadb1f65311d, 0X3f5aa40f89812853, 0X33b15a3b587d15c9,
 +};
 +
 +/* Multiply 64-bit V and P and return sum of high and low parts of the
 +   result.  */
 +static inline uint64_t
 +_mum (uint64_t v, uint64_t p) {
 +  uint64_t hi, lo;
 +#if _MUM_USE_INT128
 +#if defined(__aarch64__)
 +  /* AARCH64 needs 2 insns to calculate 128-bit result of the
 +     multiplication.  If we use a generic code we actually call a
 +     function doing 128x128->128 bit multiplication.  The function is
 +     very slow.  */
 +  lo = v * p, hi;
 +  asm ("umulh %0, %1, %2" : "=r" (hi) : "r" (v), "r" (p));
 +#else
 +  __uint128_t r = (__uint128_t) v * (__uint128_t) p;
 +  hi = (uint64_t) (r >> 64);
 +  lo = (uint64_t) r;
 +#endif
 +#else
 +  /* Implementation of 64x64->128-bit multiplication by four 32x32->64
 +     bit multiplication.  */
 +  uint64_t hv = v >> 32, hp = p >> 32;
 +  uint64_t lv = (uint32_t) v, lp = (uint32_t) p;
 +  uint64_t rh =  hv * hp;
 +  uint64_t rm_0 = hv * lp;
 +  uint64_t rm_1 = hp * lv;
 +  uint64_t rl =  lv * lp;
 +  uint64_t t, carry = 0;
 +
 +  /* We could ignore a carry bit here if we did not care about the
 +     same hash for 32-bit and 64-bit targets.  */
 +  t = rl + (rm_0 << 32);
 +#ifdef MUM_TARGET_INDEPENDENT_HASH
 +  carry = t < rl;
 +#endif
 +  lo = t + (rm_1 << 32);
 +#ifdef MUM_TARGET_INDEPENDENT_HASH
 +  carry += lo < t;
 +#endif
 +  hi = rh + (rm_0 >> 32) + (rm_1 >> 32) + carry;
 +#endif
 +  /* We could use XOR here too but, for some reasons, on Haswell and
 +     Power7 using an addition improves hashing performance by 10% for
 +     small strings.  */
 +  return hi + lo;
 +}
 +
 +#if defined(_MSC_VER)
 +#define _mum_bswap_32(x) _byteswap_uint32_t (x)
 +#define _mum_bswap_64(x) _byteswap_uint64_t (x)
 +#elif defined(__APPLE__)
 +#include <libkern/OSByteOrder.h>
 +#define _mum_bswap_32(x) OSSwapInt32 (x)
 +#define _mum_bswap_64(x) OSSwapInt64 (x)
 +#elif defined(__GNUC__)
 +#define _mum_bswap32(x) __builtin_bswap32 (x)
 +#define _mum_bswap64(x) __builtin_bswap64 (x)
 +#else
 +#include <byteswap.h>
 +#define _mum_bswap32(x) bswap32 (x)
 +#define _mum_bswap64(x) bswap64 (x)
 +#endif
 +
 +static inline uint64_t
 +_mum_le (uint64_t v) {
 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ || !defined(MUM_TARGET_INDEPENDENT_HASH)
 +  return v;
 +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
 +  return _mum_bswap64 (v);
 +#else
- #error "Unknown endianess"
++#error "Unknown endianness"
 +#endif
 +}
 +
 +static inline uint32_t
 +_mum_le32 (uint32_t v) {
 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ || !defined(MUM_TARGET_INDEPENDENT_HASH)
 +  return v;
 +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
 +  return _mum_bswap32 (v);
 +#else
- #error "Unknown endianess"
++#error "Unknown endianness"
 +#endif
 +}
 +
 +/* Macro defining how many times the most nested loop in
 +   _mum_hash_aligned will be unrolled by the compiler (although it can
 +   make an own decision:).  Use only a constant here to help a
 +   compiler to unroll a major loop.
 +
 +   The macro value affects the result hash for strings > 128 bit.  The
 +   unroll factor greatly affects the hashing speed.  We prefer the
 +   speed.  */
 +#ifndef _MUM_UNROLL_FACTOR_POWER
 +#if defined(__PPC64__) && !defined(MUM_TARGET_INDEPENDENT_HASH)
 +#define _MUM_UNROLL_FACTOR_POWER 3
 +#elif defined(__aarch64__) && !defined(MUM_TARGET_INDEPENDENT_HASH)
 +#define _MUM_UNROLL_FACTOR_POWER 4
 +#else
 +#define _MUM_UNROLL_FACTOR_POWER 2
 +#endif
 +#endif
 +
 +#if _MUM_UNROLL_FACTOR_POWER < 1
 +#error "too small unroll factor"
 +#elif _MUM_UNROLL_FACTOR_POWER > 4
 +#error "We have not enough primes for such unroll factor"
 +#endif
 +
 +#define _MUM_UNROLL_FACTOR (1 << _MUM_UNROLL_FACTOR_POWER)
 +
 +static inline uint64_t _MUM_OPTIMIZE("unroll-loops")
 +_mum_hash_aligned (uint64_t start, const void *key, size_t len) {
 +  uint64_t result = start;
 +  const unsigned char *str = (const unsigned char *) key;
 +  uint64_t u64;
 +  int i;
 +  size_t n;
 +
 +  result = _mum (result, _mum_block_start_prime);
 +  while  (len > _MUM_UNROLL_FACTOR * sizeof (uint64_t)) {
 +    /* This loop could be vectorized when we have vector insns for
 +       64x64->128-bit multiplication.  AVX2 currently only have a
 +       vector insn for 4 32x32->64-bit multiplication.  */
 +    for (i = 0; i < _MUM_UNROLL_FACTOR; i++)
 +      result ^= _mum (_mum_le (((uint64_t *) str)[i]), _mum_primes[i]);
 +    len -= _MUM_UNROLL_FACTOR * sizeof (uint64_t);
 +    str += _MUM_UNROLL_FACTOR * sizeof (uint64_t);
 +    /* We will use the same prime numbers on the next iterations --
 +       randomize the state.  */
 +    result = _mum (result, _mum_unroll_prime);
 +  }
 +  n = len / sizeof (uint64_t);
 +  for (i = 0; i < (int)n; i++)
 +    result ^= _mum (_mum_le (((uint64_t *) str)[i]), _mum_primes[i]);
 +  len -= n * sizeof (uint64_t); str += n * sizeof (uint64_t);
 +  switch (len) {
 +  case 7:
 +    u64 = _mum_le32 (*(uint32_t *) str);
 +    u64 |= (uint64_t) str[4] << 32;
 +    u64 |= (uint64_t) str[5] << 40;
 +    u64 |= (uint64_t) str[6] << 48;
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  case 6:
 +    u64 = _mum_le32 (*(uint32_t *) str);
 +    u64 |= (uint64_t) str[4] << 32;
 +    u64 |= (uint64_t) str[5] << 40;
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  case 5:
 +    u64 = _mum_le32 (*(uint32_t *) str);
 +    u64 |= (uint64_t) str[4] << 32;
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  case 4:
 +    u64 = _mum_le32 (*(uint32_t *) str);
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  case 3:
 +    u64 = str[0];
 +    u64 |= (uint64_t) str[1] << 8;
 +    u64 |= (uint64_t) str[2] << 16;
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  case 2:
 +    u64 = str[0];
 +    u64 |= (uint64_t) str[1] << 8;
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  case 1:
 +    u64 = str[0];
 +    return result ^ _mum (u64, _mum_tail_prime);
 +  }
 +  return result;
 +}
 +
 +/* Final randomization of H.  */
 +static inline uint64_t
 +_mum_final (uint64_t h) {
 +  h ^= _mum (h, _mum_finish_prime1);
 +  h ^= _mum (h, _mum_finish_prime2);
 +  return h;
 +}
 +
 +#if defined(__x86_64__) && defined(_MUM_FRESH_GCC)
 +
 +/* We want to use AVX2 insn MULX instead of generic x86-64 MULQ where
 +   it is possible.  Although on modern Intel processors MULQ takes
 +   3-cycles vs. 4 for MULX, MULX permits more freedom in insn
 +   scheduling as it uses less fixed registers.  */
 +static inline uint64_t _MUM_TARGET("arch=haswell")
 +_mum_hash_avx2 (const void * key, size_t len, uint64_t seed) {
 +  return _mum_final (_mum_hash_aligned (seed + len, key, len));
 +}
 +#endif
 +
 +#ifndef _MUM_UNALIGNED_ACCESS
 +#if defined(__x86_64__) || defined(__i386__) || defined(__PPC64__) \
 +    || defined(__s390__) || defined(__m32c__) || defined(cris)     \
 +    || defined(__CR16__) || defined(__vax__) || defined(__m68k__) \
 +    || defined(__aarch64__)
 +#define _MUM_UNALIGNED_ACCESS 1
 +#else
 +#define _MUM_UNALIGNED_ACCESS 0
 +#endif
 +#endif
 +
 +/* When we need an aligned access to data being hashed we move part of
 +   the unaligned data to an aligned block of given size and then
 +   process it, repeating processing the data by the block.  */
 +#ifndef _MUM_BLOCK_LEN
 +#define _MUM_BLOCK_LEN 1024
 +#endif
 +
 +#if _MUM_BLOCK_LEN < 8
 +#error "too small block length"
 +#endif
 +
 +static inline uint64_t
 +#if defined(__x86_64__)
 +_MUM_TARGET("inline-all-stringops")
 +#endif
 +_mum_hash_default (const void *key, size_t len, uint64_t seed) {
 +  uint64_t result;
 +  const unsigned char *str = (const unsigned char *) key;
 +  size_t block_len;
 +  uint64_t buf[_MUM_BLOCK_LEN / sizeof (uint64_t)];
 +
 +  result = seed + len;
 +  if (_MUM_UNALIGNED_ACCESS || ((size_t) str & 0x7) == 0)
 +    result = _mum_hash_aligned (result, key, len);
 +  else {
 +    while (len != 0) {
 +      block_len = len < _MUM_BLOCK_LEN ? len : _MUM_BLOCK_LEN;
 +      memmove (buf, str, block_len);
 +      result = _mum_hash_aligned (result, buf, block_len);
 +      len -= block_len;
 +      str += block_len;
 +    }
 +  }
 +  return _mum_final (result);
 +}
 +
 +static inline uint64_t
 +_mum_next_factor (void) {
 +  uint64_t start = 0;
 +  int i;
 +
 +  for (i = 0; i < 8; i++)
 +    start = (start << 8) | rand() % 256;
 +  return start;
 +}
 +
 +/* ++++++++++++++++++++++++++ Interface functions: +++++++++++++++++++  */
 +
 +/* Set random multiplicators depending on SEED.  */
 +static inline void
 +mum_hash_randomize (uint64_t seed) {
 +  int i;
 +
 +  srand (seed);
 +  _mum_hash_step_prime = _mum_next_factor ();
 +  _mum_key_step_prime = _mum_next_factor ();
 +  _mum_finish_prime1 = _mum_next_factor ();
 +  _mum_finish_prime2 = _mum_next_factor ();
 +  _mum_block_start_prime = _mum_next_factor ();
 +  _mum_unroll_prime = _mum_next_factor ();
 +  _mum_tail_prime = _mum_next_factor ();
 +  for (i = 0; i < (int)(sizeof (_mum_primes) / sizeof (uint64_t)); i++)
 +    _mum_primes[i] = _mum_next_factor ();
 +}
 +
 +/* Start hashing data with SEED.  Return the state.  */
 +static inline uint64_t
 +mum_hash_init (uint64_t seed) {
 +  return seed;
 +}
 +
 +/* Process data KEY with the state H and return the updated state.  */
 +static inline uint64_t
 +mum_hash_step (uint64_t h, uint64_t key)
 +{
 +  return _mum (h, _mum_hash_step_prime) ^ _mum (key, _mum_key_step_prime);
 +}
 +
 +/* Return the result of hashing using the current state H.  */
 +static inline uint64_t
 +mum_hash_finish (uint64_t h) {
 +  return _mum_final (h);
 +}
 +
 +/* Fast hashing of KEY with SEED.  The hash is always the same for the
 +   same key on any target. */
 +static inline size_t
 +mum_hash64 (uint64_t key, uint64_t seed) {
 +  return mum_hash_finish (mum_hash_step (mum_hash_init (seed), key));
 +}
 +
 +/* Hash data KEY of length LEN and SEED.  The hash depends on the
-    target endianess and the unroll factor.  */
++   target endianness and the unroll factor.  */
 +static inline uint64_t
 +mum_hash (const void *key, size_t len, uint64_t seed) {
 +#if defined(__x86_64__) && defined(_MUM_FRESH_GCC)
 +  static int avx2_support = 0;
 +
 +  if (avx2_support > 0)
 +    return _mum_hash_avx2 (key, len, seed);
 +  else if (! avx2_support) {
 +    __builtin_cpu_init ();
 +    avx2_support =  __builtin_cpu_supports ("avx2") ? 1 : -1;
 +    if (avx2_support > 0)
 +      return _mum_hash_avx2 (key, len, seed);
 +  }
 +#endif
 +  return _mum_hash_default (key, len, seed);
 +}
 +
 +#endif
diff --cc contrib/libucl/tests/basic/squote.in
index 000000000000,2ee9d25d8122..2ee9d25d8122
mode 000000,100644..100644
--- a/contrib/libucl/tests/basic/squote.in
+++ b/contrib/libucl/tests/basic/squote.in
diff --cc contrib/libucl/tests/basic/squote.res
index 000000000000,a595269567bd..a595269567bd
mode 000000,100644..100644
--- a/contrib/libucl/tests/basic/squote.res
+++ b/contrib/libucl/tests/basic/squote.res
diff --cc contrib/libucl/tests/fuzzers/ucl_add_string_fuzzer.c
index 000000000000,388ce5dffebb..388ce5dffebb
mode 000000,100644..100644
--- a/contrib/libucl/tests/fuzzers/ucl_add_string_fuzzer.c
+++ b/contrib/libucl/tests/fuzzers/ucl_add_string_fuzzer.c
diff --cc contrib/libucl/tests/fuzzers/ucl_msgpack_fuzzer.c
index 000000000000,6989ec0e4375..6989ec0e4375
mode 000000,100644..100644
--- a/contrib/libucl/tests/fuzzers/ucl_msgpack_fuzzer.c
+++ b/contrib/libucl/tests/fuzzers/ucl_msgpack_fuzzer.c
diff --cc contrib/libucl/utils/CMakeLists.txt
index 000000000000,4de95fd7b4b0..4de95fd7b4b0
mode 000000,100644..100644
--- a/contrib/libucl/utils/CMakeLists.txt
+++ b/contrib/libucl/utils/CMakeLists.txt

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202103221416.12MEG3w9043302>

Header And Logo

Peripheral Links

Site Navigation