From owner-svn-src-vendor@freebsd.org Tue Mar 27 17:03:04 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 318D4F6B1B6; Tue, 27 Mar 2018 17:03:04 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D50BE72301; Tue, 27 Mar 2018 17:03:03 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id CFA3A1EEE1; Tue, 27 Mar 2018 17:03:03 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2RH33cP049484; Tue, 27 Mar 2018 17:03:03 GMT (envelope-from jkim@FreeBSD.org) Received: (from jkim@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2RH316x049460; Tue, 27 Mar 2018 17:03:01 GMT (envelope-from jkim@FreeBSD.org) Message-Id: <201803271703.w2RH316x049460@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: jkim set sender to jkim@FreeBSD.org using -f From: Jung-uk Kim Date: Tue, 27 Mar 2018 17:03:01 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331625 - in vendor-crypto/openssl/dist: . apps crypto crypto/asn1 crypto/bf crypto/bio crypto/bn crypto/conf crypto/des crypto/dh crypto/dsa crypto/ec crypto/ecdh crypto/engine crypto/... X-SVN-Group: vendor-crypto X-SVN-Commit-Author: jkim X-SVN-Commit-Paths: in vendor-crypto/openssl/dist: . apps crypto crypto/asn1 crypto/bf crypto/bio crypto/bn crypto/conf crypto/des crypto/dh crypto/dsa crypto/ec crypto/ecdh crypto/engine crypto/err crypto/evp crypto/hma... X-SVN-Commit-Revision: 331625 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Mar 2018 17:03:04 -0000 Author: jkim Date: Tue Mar 27 17:03:01 2018 New Revision: 331625 URL: https://svnweb.freebsd.org/changeset/base/331625 Log: Import OpenSSL 1.0.2o. Modified: vendor-crypto/openssl/dist/CHANGES vendor-crypto/openssl/dist/Configure vendor-crypto/openssl/dist/FREEBSD-upgrade vendor-crypto/openssl/dist/LICENSE vendor-crypto/openssl/dist/Makefile vendor-crypto/openssl/dist/NEWS vendor-crypto/openssl/dist/README vendor-crypto/openssl/dist/apps/app_rand.c vendor-crypto/openssl/dist/apps/apps.c vendor-crypto/openssl/dist/apps/ca.c vendor-crypto/openssl/dist/apps/ciphers.c vendor-crypto/openssl/dist/apps/cms.c vendor-crypto/openssl/dist/apps/dgst.c vendor-crypto/openssl/dist/apps/dsaparam.c vendor-crypto/openssl/dist/apps/ecparam.c vendor-crypto/openssl/dist/apps/enc.c vendor-crypto/openssl/dist/apps/errstr.c vendor-crypto/openssl/dist/apps/ocsp.c vendor-crypto/openssl/dist/apps/openssl.c vendor-crypto/openssl/dist/apps/passwd.c vendor-crypto/openssl/dist/apps/pkcs12.c vendor-crypto/openssl/dist/apps/pkcs8.c vendor-crypto/openssl/dist/apps/rand.c vendor-crypto/openssl/dist/apps/req.c vendor-crypto/openssl/dist/apps/s_client.c vendor-crypto/openssl/dist/apps/s_server.c vendor-crypto/openssl/dist/apps/s_socket.c vendor-crypto/openssl/dist/apps/s_time.c vendor-crypto/openssl/dist/apps/speed.c vendor-crypto/openssl/dist/apps/x509.c vendor-crypto/openssl/dist/crypto/asn1/a_gentm.c vendor-crypto/openssl/dist/crypto/asn1/a_mbstr.c vendor-crypto/openssl/dist/crypto/asn1/a_object.c vendor-crypto/openssl/dist/crypto/asn1/a_strex.c vendor-crypto/openssl/dist/crypto/asn1/a_time.c vendor-crypto/openssl/dist/crypto/asn1/a_utctm.c vendor-crypto/openssl/dist/crypto/asn1/asn1.h vendor-crypto/openssl/dist/crypto/asn1/asn1_err.c vendor-crypto/openssl/dist/crypto/asn1/asn1_lib.c vendor-crypto/openssl/dist/crypto/asn1/asn1_par.c vendor-crypto/openssl/dist/crypto/asn1/asn_mime.c vendor-crypto/openssl/dist/crypto/asn1/t_x509a.c vendor-crypto/openssl/dist/crypto/asn1/tasn_dec.c vendor-crypto/openssl/dist/crypto/asn1/tasn_prn.c vendor-crypto/openssl/dist/crypto/bf/bftest.c vendor-crypto/openssl/dist/crypto/bio/b_dump.c vendor-crypto/openssl/dist/crypto/bio/b_print.c vendor-crypto/openssl/dist/crypto/bio/bio_cb.c vendor-crypto/openssl/dist/crypto/bio/bss_bio.c vendor-crypto/openssl/dist/crypto/bio/bss_conn.c vendor-crypto/openssl/dist/crypto/bio/bss_file.c vendor-crypto/openssl/dist/crypto/bn/bn_exp.c vendor-crypto/openssl/dist/crypto/bn/bn_lib.c vendor-crypto/openssl/dist/crypto/bn/bn_mont.c vendor-crypto/openssl/dist/crypto/bn/bn_print.c vendor-crypto/openssl/dist/crypto/bn/bntest.c vendor-crypto/openssl/dist/crypto/bn/expspeed.c vendor-crypto/openssl/dist/crypto/bn/exptest.c vendor-crypto/openssl/dist/crypto/conf/conf_def.c vendor-crypto/openssl/dist/crypto/conf/conf_mod.c vendor-crypto/openssl/dist/crypto/des/destest.c vendor-crypto/openssl/dist/crypto/des/ecb_enc.c vendor-crypto/openssl/dist/crypto/des/fcrypt.c vendor-crypto/openssl/dist/crypto/des/read_pwd.c vendor-crypto/openssl/dist/crypto/des/set_key.c vendor-crypto/openssl/dist/crypto/dh/dhtest.c vendor-crypto/openssl/dist/crypto/dsa/dsatest.c vendor-crypto/openssl/dist/crypto/ec/ec_lib.c vendor-crypto/openssl/dist/crypto/ec/ec_mult.c vendor-crypto/openssl/dist/crypto/ec/ecp_nistp224.c vendor-crypto/openssl/dist/crypto/ec/ecp_nistp256.c vendor-crypto/openssl/dist/crypto/ec/ecp_nistp521.c vendor-crypto/openssl/dist/crypto/ec/ecp_nistz256.c vendor-crypto/openssl/dist/crypto/ec/ecp_smpl.c vendor-crypto/openssl/dist/crypto/ec/ectest.c vendor-crypto/openssl/dist/crypto/ecdh/ecdhtest.c vendor-crypto/openssl/dist/crypto/engine/eng_cryptodev.c vendor-crypto/openssl/dist/crypto/engine/eng_table.c vendor-crypto/openssl/dist/crypto/err/err.c vendor-crypto/openssl/dist/crypto/err/err_prn.c vendor-crypto/openssl/dist/crypto/evp/bio_b64.c vendor-crypto/openssl/dist/crypto/evp/digest.c vendor-crypto/openssl/dist/crypto/evp/e_aes.c vendor-crypto/openssl/dist/crypto/evp/e_camellia.c vendor-crypto/openssl/dist/crypto/evp/evp_enc.c vendor-crypto/openssl/dist/crypto/evp/evp_locl.h vendor-crypto/openssl/dist/crypto/evp/evp_pbe.c vendor-crypto/openssl/dist/crypto/evp/evp_test.c vendor-crypto/openssl/dist/crypto/evp/openbsd_hw.c vendor-crypto/openssl/dist/crypto/evp/p5_crpt2.c vendor-crypto/openssl/dist/crypto/hmac/hmac.c vendor-crypto/openssl/dist/crypto/jpake/jpake.c vendor-crypto/openssl/dist/crypto/md2/md2_dgst.c vendor-crypto/openssl/dist/crypto/md4/md4.c vendor-crypto/openssl/dist/crypto/mem_dbg.c vendor-crypto/openssl/dist/crypto/o_init.c vendor-crypto/openssl/dist/crypto/o_time.c vendor-crypto/openssl/dist/crypto/objects/o_names.c vendor-crypto/openssl/dist/crypto/objects/obj_dat.c vendor-crypto/openssl/dist/crypto/opensslv.h vendor-crypto/openssl/dist/crypto/pem/pem_info.c vendor-crypto/openssl/dist/crypto/pem/pem_lib.c vendor-crypto/openssl/dist/crypto/pkcs7/pk7_doit.c vendor-crypto/openssl/dist/crypto/rand/md_rand.c vendor-crypto/openssl/dist/crypto/rand/rand_egd.c vendor-crypto/openssl/dist/crypto/rand/rand_unix.c vendor-crypto/openssl/dist/crypto/rsa/rsa_crpt.c vendor-crypto/openssl/dist/crypto/rsa/rsa_gen.c vendor-crypto/openssl/dist/crypto/rsa/rsa_pss.c vendor-crypto/openssl/dist/crypto/rsa/rsa_test.c vendor-crypto/openssl/dist/crypto/srp/srp_grps.h vendor-crypto/openssl/dist/crypto/threads/mttest.c vendor-crypto/openssl/dist/crypto/ts/Makefile vendor-crypto/openssl/dist/crypto/ts/ts_rsp_sign.c vendor-crypto/openssl/dist/crypto/ui/ui_openssl.c vendor-crypto/openssl/dist/crypto/x509/x509_txt.c vendor-crypto/openssl/dist/crypto/x509/x509_v3.c vendor-crypto/openssl/dist/crypto/x509/x509_vpm.c vendor-crypto/openssl/dist/crypto/x509v3/v3_alt.c vendor-crypto/openssl/dist/crypto/x509v3/v3_conf.c vendor-crypto/openssl/dist/crypto/x509v3/v3_info.c vendor-crypto/openssl/dist/doc/apps/ca.pod vendor-crypto/openssl/dist/doc/apps/ecparam.pod vendor-crypto/openssl/dist/doc/apps/s_client.pod vendor-crypto/openssl/dist/doc/apps/verify.pod vendor-crypto/openssl/dist/doc/apps/x509.pod vendor-crypto/openssl/dist/doc/crypto/ASN1_STRING_length.pod vendor-crypto/openssl/dist/doc/crypto/BIO_s_mem.pod vendor-crypto/openssl/dist/doc/crypto/BN_zero.pod vendor-crypto/openssl/dist/doc/crypto/EVP_EncryptInit.pod vendor-crypto/openssl/dist/doc/crypto/X509_VERIFY_PARAM_set_flags.pod vendor-crypto/openssl/dist/doc/crypto/threads.pod vendor-crypto/openssl/dist/engines/ccgost/README.gost vendor-crypto/openssl/dist/engines/ccgost/gost_eng.c vendor-crypto/openssl/dist/engines/e_atalla.c vendor-crypto/openssl/dist/ssl/Makefile vendor-crypto/openssl/dist/ssl/bad_dtls_test.c vendor-crypto/openssl/dist/ssl/d1_lib.c vendor-crypto/openssl/dist/ssl/d1_pkt.c vendor-crypto/openssl/dist/ssl/fatalerrtest.c vendor-crypto/openssl/dist/ssl/kssl.c vendor-crypto/openssl/dist/ssl/s23_srvr.c vendor-crypto/openssl/dist/ssl/s2_clnt.c vendor-crypto/openssl/dist/ssl/s2_enc.c vendor-crypto/openssl/dist/ssl/s2_lib.c vendor-crypto/openssl/dist/ssl/s2_srvr.c vendor-crypto/openssl/dist/ssl/s3_clnt.c vendor-crypto/openssl/dist/ssl/s3_lib.c vendor-crypto/openssl/dist/ssl/s3_pkt.c vendor-crypto/openssl/dist/ssl/s3_srvr.c vendor-crypto/openssl/dist/ssl/ssl_cert.c vendor-crypto/openssl/dist/ssl/ssl_lib.c vendor-crypto/openssl/dist/ssl/ssl_sess.c vendor-crypto/openssl/dist/ssl/ssltest.c vendor-crypto/openssl/dist/ssl/t1_enc.c vendor-crypto/openssl/dist/ssl/t1_lib.c vendor-crypto/openssl/dist/ssl/t1_trce.c Modified: vendor-crypto/openssl/dist/CHANGES ============================================================================== --- vendor-crypto/openssl/dist/CHANGES Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/CHANGES Tue Mar 27 17:03:01 2018 (r331625) @@ -7,6 +7,21 @@ https://github.com/openssl/openssl/commits/ and pick the appropriate release branch. + Changes between 1.0.2n and 1.0.2o [27 Mar 2018] + + *) Constructed ASN.1 types with a recursive definition could exceed the stack + + Constructed ASN.1 types with a recursive definition (such as can be found + in PKCS7) could eventually exceed the stack given malicious input with + excessive recursion. This could result in a Denial Of Service attack. There + are no such structures used within SSL/TLS that come from untrusted sources + so this is considered safe. + + This issue was reported to OpenSSL on 4th January 2018 by the OSS-fuzz + project. + (CVE-2018-0739) + [Matt Caswell] + Changes between 1.0.2m and 1.0.2n [7 Dec 2017] *) Read/write after SSL object in error state @@ -2012,8 +2027,11 @@ to work with OPENSSL_NO_SSL_INTERN defined. [Steve Henson] - *) Add SRP support. - [Tom Wu and Ben Laurie] + *) A long standing patch to add support for SRP from EdelWeb (Peter + Sylvester and Christophe Renou) was integrated. + [Christophe Renou , Peter Sylvester + , Tom Wu , and + Ben Laurie] *) Add functions to copy EVP_PKEY_METHOD and retrieve flags and id. [Steve Henson] Modified: vendor-crypto/openssl/dist/Configure ============================================================================== --- vendor-crypto/openssl/dist/Configure Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/Configure Tue Mar 27 17:03:01 2018 (r331625) @@ -354,7 +354,7 @@ my %table=( "hpux-gcc", "gcc:-DB_ENDIAN -DBN_DIV2W -O3::(unknown)::-Wl,+s -ldld:DES_PTR DES_UNROLL DES_RISC1:${no_asm}:dl:hpux-shared:-fPIC:-shared:.sl.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)", #### HP MPE/iX http://jazz.external.hp.com/src/openssl/ -"MPE/iX-gcc", "gcc:-D_ENDIAN -DBN_DIV2W -O3 -D_POSIX_SOURCE -D_SOCKET_SOURCE -I/SYSLOG/PUB::(unknown):MPE:-L/SYSLOG/PUB -lsyslog -lsocket -lcurses:BN_LLONG DES_PTR DES_UNROLL DES_RISC1:::", +"MPE/iX-gcc", "gcc:-DBN_DIV2W -O3 -D_POSIX_SOURCE -D_SOCKET_SOURCE -I/SYSLOG/PUB::(unknown):MPE:-L/SYSLOG/PUB -lsyslog -lsocket -lcurses:BN_LLONG DES_PTR DES_UNROLL DES_RISC1:::", # DEC Alpha OSF/1/Tru64 targets. # @@ -1269,7 +1269,7 @@ my ($prelflags,$postlflags)=split('%',$lflags); if (defined($postlflags)) { $lflags=$postlflags; } else { $lflags=$prelflags; undef $prelflags; } -if ($target =~ /^mingw/ && `$cc --target-help 2>&1` !~ m/\-mno\-cygwin/m) +if ($target =~ /^mingw/ && `$cross_compile_prefix$cc --target-help 2>&1` !~ m/\-mno\-cygwin/m) { $cflags =~ s/\-mno\-cygwin\s*//; $shared_ldflag =~ s/\-mno\-cygwin\s*//; @@ -1661,18 +1661,25 @@ if ($shlib_version_number =~ /(^[0-9]*)\.([0-9\.]*)/) $shlib_minor=$2; } -my $ecc = $cc; -$ecc = "clang" if `$cc --version 2>&1` =~ /clang/; +my %predefined; +# collect compiler pre-defines from gcc or gcc-alike... +open(PIPE, "$cross_compile_prefix$cc -dM -E -x c /dev/null 2>&1 |"); +while () { + m/^#define\s+(\w+(?:\(\w+\))?)(?:\s+(.+))?/ or last; + $predefined{$1} = defined($2) ? $2 : ""; +} +close(PIPE); + if ($strict_warnings) { my $wopt; - die "ERROR --strict-warnings requires gcc or clang" unless ($ecc =~ /gcc$/ or $ecc =~ /clang$/); + die "ERROR --strict-warnings requires gcc or clang" unless defined($predefined{__GNUC__}); foreach $wopt (split /\s+/, $gcc_devteam_warn) { $cflags .= " $wopt" unless ($cflags =~ /(^|\s)$wopt(\s|$)/) } - if ($ecc eq "clang") + if (defined($predefined{__clang__})) { foreach $wopt (split /\s+/, $clang_devteam_warn) { @@ -1723,15 +1730,14 @@ while () s/^NM=\s*/NM= \$\(CROSS_COMPILE\)/; s/^RANLIB=\s*/RANLIB= \$\(CROSS_COMPILE\)/; s/^RC=\s*/RC= \$\(CROSS_COMPILE\)/; - s/^MAKEDEPPROG=.*$/MAKEDEPPROG= \$\(CROSS_COMPILE\)$cc/ if $cc eq "gcc"; + s/^MAKEDEPPROG=.*$/MAKEDEPPROG= \$\(CROSS_COMPILE\)$cc/ if $predefined{__GNUC__} >= 3; } else { s/^CC=.*$/CC= $cc/; s/^AR=\s*ar/AR= $ar/; s/^RANLIB=.*/RANLIB= $ranlib/; s/^RC=.*/RC= $windres/; - s/^MAKEDEPPROG=.*$/MAKEDEPPROG= $cc/ if $cc eq "gcc"; - s/^MAKEDEPPROG=.*$/MAKEDEPPROG= $cc/ if $ecc eq "gcc" || $ecc eq "clang"; + s/^MAKEDEPPROG=.*$/MAKEDEPPROG= $cc/ if $predefined{__GNUC__} >= 3; } s/^CFLAG=.*$/CFLAG= $cflags/; s/^DEPFLAG=.*$/DEPFLAG=$depflags/; Modified: vendor-crypto/openssl/dist/FREEBSD-upgrade ============================================================================== --- vendor-crypto/openssl/dist/FREEBSD-upgrade Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/FREEBSD-upgrade Tue Mar 27 17:03:01 2018 (r331625) @@ -11,8 +11,8 @@ First, read http://wiki.freebsd.org/SubversionPrimer/V # Xlist setenv XLIST /FreeBSD/work/openssl/svn-FREEBSD-files/FREEBSD-Xlist setenv FSVN "svn+ssh://repo.freebsd.org/base" -setenv OSSLVER 1.0.2n -# OSSLTAG format: v1_0_2n +setenv OSSLVER 1.0.2o +# OSSLTAG format: v1_0_2o ###setenv OSSLTAG v`echo ${OSSLVER} | tr . _` Modified: vendor-crypto/openssl/dist/LICENSE ============================================================================== --- vendor-crypto/openssl/dist/LICENSE Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/LICENSE Tue Mar 27 17:03:01 2018 (r331625) @@ -12,7 +12,7 @@ --------------- /* ==================================================================== - * Copyright (c) 1998-2017 The OpenSSL Project. All rights reserved. + * Copyright (c) 1998-2018 The OpenSSL Project. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions Modified: vendor-crypto/openssl/dist/Makefile ============================================================================== --- vendor-crypto/openssl/dist/Makefile Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/Makefile Tue Mar 27 17:03:01 2018 (r331625) @@ -4,7 +4,7 @@ ## Makefile for OpenSSL ## -VERSION=1.0.2n +VERSION=1.0.2o MAJOR=1 MINOR=0.2 SHLIB_VERSION_NUMBER=1.0.0 @@ -73,7 +73,7 @@ NM= nm PERL= /usr/bin/perl TAR= tar TARFLAGS= --no-recursion -MAKEDEPPROG=makedepend +MAKEDEPPROG= cc LIBDIR=lib # We let the C compiler driver to take care of .s files. This is done in Modified: vendor-crypto/openssl/dist/NEWS ============================================================================== --- vendor-crypto/openssl/dist/NEWS Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/NEWS Tue Mar 27 17:03:01 2018 (r331625) @@ -5,6 +5,11 @@ This file gives a brief overview of the major changes between each OpenSSL release. For more details please read the CHANGES file. + Major changes between OpenSSL 1.0.2n and OpenSSL 1.0.2o [27 Mar 2018] + + o Constructed ASN.1 types with a recursive definition could exceed the + stack (CVE-2018-0739) + Major changes between OpenSSL 1.0.2m and OpenSSL 1.0.2n [7 Dec 2017] o Read/write after SSL object in error state (CVE-2017-3737) Modified: vendor-crypto/openssl/dist/README ============================================================================== --- vendor-crypto/openssl/dist/README Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/README Tue Mar 27 17:03:01 2018 (r331625) @@ -1,5 +1,5 @@ - OpenSSL 1.0.2n 7 Dec 2017 + OpenSSL 1.0.2o 27 Mar 2018 Copyright (c) 1998-2015 The OpenSSL Project Copyright (c) 1995-1998 Eric A. Young, Tim J. Hudson Modified: vendor-crypto/openssl/dist/apps/app_rand.c ============================================================================== --- vendor-crypto/openssl/dist/apps/app_rand.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/app_rand.c Tue Mar 27 17:03:01 2018 (r331625) @@ -128,7 +128,7 @@ int app_RAND_load_file(const char *file, BIO *bio_e, i #endif if (file == NULL) - file = RAND_file_name(buffer, sizeof buffer); + file = RAND_file_name(buffer, sizeof(buffer)); else if (RAND_egd(file) > 0) { /* * we try if the given filename is an EGD socket. if it is, we don't @@ -203,7 +203,7 @@ int app_RAND_write_file(const char *file, BIO *bio_e) return 0; if (file == NULL) - file = RAND_file_name(buffer, sizeof buffer); + file = RAND_file_name(buffer, sizeof(buffer)); if (file == NULL || !RAND_write_file(file)) { BIO_printf(bio_e, "unable to write 'random state'\n"); return 0; Modified: vendor-crypto/openssl/dist/apps/apps.c ============================================================================== --- vendor-crypto/openssl/dist/apps/apps.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/apps.c Tue Mar 27 17:03:01 2018 (r331625) @@ -1738,9 +1738,9 @@ int save_serial(char *serialfile, char *suffix, BIGNUM BUF_strlcpy(buf[0], serialfile, BSIZE); else { #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[0], sizeof buf[0], "%s.%s", serialfile, suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s.%s", serialfile, suffix); #else - j = BIO_snprintf(buf[0], sizeof buf[0], "%s-%s", serialfile, suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s-%s", serialfile, suffix); #endif } #ifdef RL_DEBUG @@ -1789,14 +1789,14 @@ int rotate_serial(char *serialfile, char *new_suffix, goto err; } #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[0], sizeof buf[0], "%s.%s", serialfile, new_suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s.%s", serialfile, new_suffix); #else - j = BIO_snprintf(buf[0], sizeof buf[0], "%s-%s", serialfile, new_suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s-%s", serialfile, new_suffix); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[1], sizeof buf[1], "%s.%s", serialfile, old_suffix); + j = BIO_snprintf(buf[1], sizeof(buf[1]), "%s.%s", serialfile, old_suffix); #else - j = BIO_snprintf(buf[1], sizeof buf[1], "%s-%s", serialfile, old_suffix); + j = BIO_snprintf(buf[1], sizeof(buf[1]), "%s-%s", serialfile, old_suffix); #endif #ifdef RL_DEBUG BIO_printf(bio_err, "DEBUG: renaming \"%s\" to \"%s\"\n", @@ -1877,9 +1877,9 @@ CA_DB *load_index(char *dbfile, DB_ATTR *db_attr) goto err; #ifndef OPENSSL_SYS_VMS - BIO_snprintf(buf[0], sizeof buf[0], "%s.attr", dbfile); + BIO_snprintf(buf[0], sizeof(buf[0]), "%s.attr", dbfile); #else - BIO_snprintf(buf[0], sizeof buf[0], "%s-attr", dbfile); + BIO_snprintf(buf[0], sizeof(buf[0]), "%s-attr", dbfile); #endif dbattr_conf = NCONF_new(NULL); if (NCONF_load(dbattr_conf, buf[0], &errorline) <= 0) { @@ -1967,19 +1967,19 @@ int save_index(const char *dbfile, const char *suffix, goto err; } #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[2], sizeof buf[2], "%s.attr", dbfile); + j = BIO_snprintf(buf[2], sizeof(buf[2]), "%s.attr", dbfile); #else - j = BIO_snprintf(buf[2], sizeof buf[2], "%s-attr", dbfile); + j = BIO_snprintf(buf[2], sizeof(buf[2]), "%s-attr", dbfile); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[1], sizeof buf[1], "%s.attr.%s", dbfile, suffix); + j = BIO_snprintf(buf[1], sizeof(buf[1]), "%s.attr.%s", dbfile, suffix); #else - j = BIO_snprintf(buf[1], sizeof buf[1], "%s-attr-%s", dbfile, suffix); + j = BIO_snprintf(buf[1], sizeof(buf[1]), "%s-attr-%s", dbfile, suffix); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[0], sizeof buf[0], "%s.%s", dbfile, suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s.%s", dbfile, suffix); #else - j = BIO_snprintf(buf[0], sizeof buf[0], "%s-%s", dbfile, suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s-%s", dbfile, suffix); #endif #ifdef RL_DEBUG BIO_printf(bio_err, "DEBUG: writing \"%s\"\n", buf[0]); @@ -2028,29 +2028,29 @@ int rotate_index(const char *dbfile, const char *new_s goto err; } #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[4], sizeof buf[4], "%s.attr", dbfile); + j = BIO_snprintf(buf[4], sizeof(buf[4]), "%s.attr", dbfile); #else - j = BIO_snprintf(buf[4], sizeof buf[4], "%s-attr", dbfile); + j = BIO_snprintf(buf[4], sizeof(buf[4]), "%s-attr", dbfile); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[2], sizeof buf[2], "%s.attr.%s", dbfile, new_suffix); + j = BIO_snprintf(buf[2], sizeof(buf[2]), "%s.attr.%s", dbfile, new_suffix); #else - j = BIO_snprintf(buf[2], sizeof buf[2], "%s-attr-%s", dbfile, new_suffix); + j = BIO_snprintf(buf[2], sizeof(buf[2]), "%s-attr-%s", dbfile, new_suffix); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[0], sizeof buf[0], "%s.%s", dbfile, new_suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s.%s", dbfile, new_suffix); #else - j = BIO_snprintf(buf[0], sizeof buf[0], "%s-%s", dbfile, new_suffix); + j = BIO_snprintf(buf[0], sizeof(buf[0]), "%s-%s", dbfile, new_suffix); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[1], sizeof buf[1], "%s.%s", dbfile, old_suffix); + j = BIO_snprintf(buf[1], sizeof(buf[1]), "%s.%s", dbfile, old_suffix); #else - j = BIO_snprintf(buf[1], sizeof buf[1], "%s-%s", dbfile, old_suffix); + j = BIO_snprintf(buf[1], sizeof(buf[1]), "%s-%s", dbfile, old_suffix); #endif #ifndef OPENSSL_SYS_VMS - j = BIO_snprintf(buf[3], sizeof buf[3], "%s.attr.%s", dbfile, old_suffix); + j = BIO_snprintf(buf[3], sizeof(buf[3]), "%s.attr.%s", dbfile, old_suffix); #else - j = BIO_snprintf(buf[3], sizeof buf[3], "%s-attr-%s", dbfile, old_suffix); + j = BIO_snprintf(buf[3], sizeof(buf[3]), "%s-attr-%s", dbfile, old_suffix); #endif #ifdef RL_DEBUG BIO_printf(bio_err, "DEBUG: renaming \"%s\" to \"%s\"\n", dbfile, buf[1]); @@ -2604,7 +2604,7 @@ static void jpake_send_step3a(BIO *bconn, JPAKE_CTX *c JPAKE_STEP3A_init(&s3a); JPAKE_STEP3A_generate(&s3a, ctx); - BIO_write(bconn, s3a.hhk, sizeof s3a.hhk); + BIO_write(bconn, s3a.hhk, sizeof(s3a.hhk)); (void)BIO_flush(bconn); JPAKE_STEP3A_release(&s3a); } @@ -2615,7 +2615,7 @@ static void jpake_send_step3b(BIO *bconn, JPAKE_CTX *c JPAKE_STEP3B_init(&s3b); JPAKE_STEP3B_generate(&s3b, ctx); - BIO_write(bconn, s3b.hk, sizeof s3b.hk); + BIO_write(bconn, s3b.hk, sizeof(s3b.hk)); (void)BIO_flush(bconn); JPAKE_STEP3B_release(&s3b); } @@ -2625,7 +2625,7 @@ static void readbn(BIGNUM **bn, BIO *bconn) char buf[10240]; int l; - l = BIO_gets(bconn, buf, sizeof buf); + l = BIO_gets(bconn, buf, sizeof(buf)); assert(l > 0); assert(buf[l - 1] == '\n'); buf[l - 1] = '\0'; @@ -2672,8 +2672,8 @@ static void jpake_receive_step3a(JPAKE_CTX *ctx, BIO * int l; JPAKE_STEP3A_init(&s3a); - l = BIO_read(bconn, s3a.hhk, sizeof s3a.hhk); - assert(l == sizeof s3a.hhk); + l = BIO_read(bconn, s3a.hhk, sizeof(s3a.hhk)); + assert(l == sizeof(s3a.hhk)); if (!JPAKE_STEP3A_process(ctx, &s3a)) { ERR_print_errors(bio_err); exit(1); @@ -2687,8 +2687,8 @@ static void jpake_receive_step3b(JPAKE_CTX *ctx, BIO * int l; JPAKE_STEP3B_init(&s3b); - l = BIO_read(bconn, s3b.hk, sizeof s3b.hk); - assert(l == sizeof s3b.hk); + l = BIO_read(bconn, s3b.hk, sizeof(s3b.hk)); + assert(l == sizeof(s3b.hk)); if (!JPAKE_STEP3B_process(ctx, &s3b)) { ERR_print_errors(bio_err); exit(1); Modified: vendor-crypto/openssl/dist/apps/ca.c ============================================================================== --- vendor-crypto/openssl/dist/apps/ca.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/ca.c Tue Mar 27 17:03:01 2018 (r331625) @@ -1628,8 +1628,7 @@ static int do_body(X509 **xret, EVP_PKEY *pkey, X509 * CONF *lconf, unsigned long certopt, unsigned long nameopt, int default_op, int ext_copy, int selfsign) { - X509_NAME *name = NULL, *CAname = NULL, *subject = NULL, *dn_subject = - NULL; + X509_NAME *name = NULL, *CAname = NULL, *subject = NULL; ASN1_UTCTIME *tm, *tmptm; ASN1_STRING *str, *str2; ASN1_OBJECT *obj; @@ -1817,8 +1816,6 @@ static int do_body(X509 **xret, EVP_PKEY *pkey, X509 * if (push != NULL) { if (!X509_NAME_add_entry(subject, push, -1, 0)) { - if (push != NULL) - X509_NAME_ENTRY_free(push); BIO_printf(bio_err, "Memory allocation failure\n"); goto err; } @@ -1836,104 +1833,6 @@ static int do_body(X509 **xret, EVP_PKEY *pkey, X509 * goto err; } - if (verbose) - BIO_printf(bio_err, - "The subject name appears to be ok, checking data base for clashes\n"); - - /* Build the correct Subject if no e-mail is wanted in the subject */ - /* - * and add it later on because of the method extensions are added - * (altName) - */ - - if (email_dn) - dn_subject = subject; - else { - X509_NAME_ENTRY *tmpne; - /* - * Its best to dup the subject DN and then delete any email addresses - * because this retains its structure. - */ - if (!(dn_subject = X509_NAME_dup(subject))) { - BIO_printf(bio_err, "Memory allocation failure\n"); - goto err; - } - while ((i = X509_NAME_get_index_by_NID(dn_subject, - NID_pkcs9_emailAddress, - -1)) >= 0) { - tmpne = X509_NAME_get_entry(dn_subject, i); - X509_NAME_delete_entry(dn_subject, i); - X509_NAME_ENTRY_free(tmpne); - } - } - - if (BN_is_zero(serial)) - row[DB_serial] = BUF_strdup("00"); - else - row[DB_serial] = BN_bn2hex(serial); - if (row[DB_serial] == NULL) { - BIO_printf(bio_err, "Memory allocation failure\n"); - goto err; - } - - if (db->attributes.unique_subject) { - OPENSSL_STRING *crow = row; - - rrow = TXT_DB_get_by_index(db->db, DB_name, crow); - if (rrow != NULL) { - BIO_printf(bio_err, - "ERROR:There is already a certificate for %s\n", - row[DB_name]); - } - } - if (rrow == NULL) { - rrow = TXT_DB_get_by_index(db->db, DB_serial, row); - if (rrow != NULL) { - BIO_printf(bio_err, - "ERROR:Serial number %s has already been issued,\n", - row[DB_serial]); - BIO_printf(bio_err, - " check the database/serial_file for corruption\n"); - } - } - - if (rrow != NULL) { - BIO_printf(bio_err, "The matching entry has the following details\n"); - if (rrow[DB_type][0] == 'E') - p = "Expired"; - else if (rrow[DB_type][0] == 'R') - p = "Revoked"; - else if (rrow[DB_type][0] == 'V') - p = "Valid"; - else - p = "\ninvalid type, Data base error\n"; - BIO_printf(bio_err, "Type :%s\n", p);; - if (rrow[DB_type][0] == 'R') { - p = rrow[DB_exp_date]; - if (p == NULL) - p = "undef"; - BIO_printf(bio_err, "Was revoked on:%s\n", p); - } - p = rrow[DB_exp_date]; - if (p == NULL) - p = "undef"; - BIO_printf(bio_err, "Expires on :%s\n", p); - p = rrow[DB_serial]; - if (p == NULL) - p = "undef"; - BIO_printf(bio_err, "Serial Number :%s\n", p); - p = rrow[DB_file]; - if (p == NULL) - p = "undef"; - BIO_printf(bio_err, "File name :%s\n", p); - p = rrow[DB_name]; - if (p == NULL) - p = "undef"; - BIO_printf(bio_err, "Subject Name :%s\n", p); - ok = -1; /* This is now a 'bad' error. */ - goto err; - } - /* We are now totally happy, lets make and sign the certificate */ if (verbose) BIO_printf(bio_err, @@ -2056,12 +1955,126 @@ static int do_body(X509 **xret, EVP_PKEY *pkey, X509 * goto err; } - /* Set the right value for the noemailDN option */ - if (email_dn == 0) { - if (!X509_set_subject_name(ret, dn_subject)) + if (verbose) + BIO_printf(bio_err, + "The subject name appears to be ok, checking data base for clashes\n"); + + /* Build the correct Subject if no e-mail is wanted in the subject */ + + if (!email_dn) { + X509_NAME_ENTRY *tmpne; + X509_NAME *dn_subject; + + /* + * Its best to dup the subject DN and then delete any email addresses + * because this retains its structure. + */ + if (!(dn_subject = X509_NAME_dup(subject))) { + BIO_printf(bio_err, "Memory allocation failure\n"); goto err; + } + while ((i = X509_NAME_get_index_by_NID(dn_subject, + NID_pkcs9_emailAddress, + -1)) >= 0) { + tmpne = X509_NAME_get_entry(dn_subject, i); + X509_NAME_delete_entry(dn_subject, i); + X509_NAME_ENTRY_free(tmpne); + } + + if (!X509_set_subject_name(ret, dn_subject)) { + X509_NAME_free(dn_subject); + goto err; + } + X509_NAME_free(dn_subject); } + row[DB_name] = X509_NAME_oneline(X509_get_subject_name(ret), NULL, 0); + if (row[DB_name] == NULL) { + BIO_printf(bio_err, "Memory allocation failure\n"); + goto err; + } + + if (BN_is_zero(serial)) + row[DB_serial] = BUF_strdup("00"); + else + row[DB_serial] = BN_bn2hex(serial); + if (row[DB_serial] == NULL) { + BIO_printf(bio_err, "Memory allocation failure\n"); + goto err; + } + + if (row[DB_name][0] == '\0') { + /* + * An empty subject! We'll use the serial number instead. If + * unique_subject is in use then we don't want different entries with + * empty subjects matching each other. + */ + OPENSSL_free(row[DB_name]); + row[DB_name] = OPENSSL_strdup(row[DB_serial]); + if (row[DB_name] == NULL) { + BIO_printf(bio_err, "Memory allocation failure\n"); + goto err; + } + } + + if (db->attributes.unique_subject) { + OPENSSL_STRING *crow = row; + + rrow = TXT_DB_get_by_index(db->db, DB_name, crow); + if (rrow != NULL) { + BIO_printf(bio_err, + "ERROR:There is already a certificate for %s\n", + row[DB_name]); + } + } + if (rrow == NULL) { + rrow = TXT_DB_get_by_index(db->db, DB_serial, row); + if (rrow != NULL) { + BIO_printf(bio_err, + "ERROR:Serial number %s has already been issued,\n", + row[DB_serial]); + BIO_printf(bio_err, + " check the database/serial_file for corruption\n"); + } + } + + if (rrow != NULL) { + BIO_printf(bio_err, "The matching entry has the following details\n"); + if (rrow[DB_type][0] == 'E') + p = "Expired"; + else if (rrow[DB_type][0] == 'R') + p = "Revoked"; + else if (rrow[DB_type][0] == 'V') + p = "Valid"; + else + p = "\ninvalid type, Data base error\n"; + BIO_printf(bio_err, "Type :%s\n", p);; + if (rrow[DB_type][0] == 'R') { + p = rrow[DB_exp_date]; + if (p == NULL) + p = "undef"; + BIO_printf(bio_err, "Was revoked on:%s\n", p); + } + p = rrow[DB_exp_date]; + if (p == NULL) + p = "undef"; + BIO_printf(bio_err, "Expires on :%s\n", p); + p = rrow[DB_serial]; + if (p == NULL) + p = "undef"; + BIO_printf(bio_err, "Serial Number :%s\n", p); + p = rrow[DB_file]; + if (p == NULL) + p = "undef"; + BIO_printf(bio_err, "File name :%s\n", p); + p = rrow[DB_name]; + if (p == NULL) + p = "undef"; + BIO_printf(bio_err, "Subject Name :%s\n", p); + ok = -1; /* This is now a 'bad' error. */ + goto err; + } + if (!default_op) { BIO_printf(bio_err, "Certificate Details:\n"); /* @@ -2110,10 +2123,9 @@ static int do_body(X509 **xret, EVP_PKEY *pkey, X509 * row[DB_exp_date] = OPENSSL_malloc(tm->length + 1); row[DB_rev_date] = OPENSSL_malloc(1); row[DB_file] = OPENSSL_malloc(8); - row[DB_name] = X509_NAME_oneline(X509_get_subject_name(ret), NULL, 0); if ((row[DB_type] == NULL) || (row[DB_exp_date] == NULL) || (row[DB_rev_date] == NULL) || - (row[DB_file] == NULL) || (row[DB_name] == NULL)) { + (row[DB_file] == NULL)) { BIO_printf(bio_err, "Memory allocation failure\n"); goto err; } @@ -2143,18 +2155,16 @@ static int do_body(X509 **xret, EVP_PKEY *pkey, X509 * irow = NULL; ok = 1; err: - if (irow != NULL) { + if (ok != 1) { for (i = 0; i < DB_NUMBER; i++) OPENSSL_free(row[i]); - OPENSSL_free(irow); } + OPENSSL_free(irow); if (CAname != NULL) X509_NAME_free(CAname); if (subject != NULL) X509_NAME_free(subject); - if ((dn_subject != NULL) && !email_dn) - X509_NAME_free(dn_subject); if (tmptm != NULL) ASN1_UTCTIME_free(tmptm); if (ok <= 0) { @@ -2357,6 +2367,11 @@ static int do_revoke(X509 *x509, CA_DB *db, int type, else row[DB_serial] = BN_bn2hex(bn); BN_free(bn); + if (row[DB_name] != NULL && row[DB_name][0] == '\0') { + /* Entries with empty Subjects actually use the serial number instead */ + OPENSSL_free(row[DB_name]); + row[DB_name] = OPENSSL_strdup(row[DB_serial]); + } if ((row[DB_name] == NULL) || (row[DB_serial] == NULL)) { BIO_printf(bio_err, "Memory allocation failure\n"); goto err; Modified: vendor-crypto/openssl/dist/apps/ciphers.c ============================================================================== --- vendor-crypto/openssl/dist/apps/ciphers.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/ciphers.c Tue Mar 27 17:03:01 2018 (r331625) @@ -217,7 +217,7 @@ int MAIN(int argc, char **argv) BIO_printf(STDout, "%s - ", nm); } #endif - BIO_puts(STDout, SSL_CIPHER_description(c, buf, sizeof buf)); + BIO_puts(STDout, SSL_CIPHER_description(c, buf, sizeof(buf))); } } Modified: vendor-crypto/openssl/dist/apps/cms.c ============================================================================== --- vendor-crypto/openssl/dist/apps/cms.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/cms.c Tue Mar 27 17:03:01 2018 (r331625) @@ -4,7 +4,7 @@ * project. */ /* ==================================================================== - * Copyright (c) 2008 The OpenSSL Project. All rights reserved. + * Copyright (c) 2008-2018 The OpenSSL Project. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -977,12 +977,16 @@ int MAIN(int argc, char **argv) signer = load_cert(bio_err, signerfile, FORMAT_PEM, NULL, e, "signer certificate"); - if (!signer) + if (!signer) { + ret = 2; goto end; + } key = load_key(bio_err, keyfile, keyform, 0, passin, e, "signing key file"); - if (!key) + if (!key) { + ret = 2; goto end; + } for (kparam = key_first; kparam; kparam = kparam->next) { if (kparam->idx == i) { tflags |= CMS_KEY_PARAM; Modified: vendor-crypto/openssl/dist/apps/dgst.c ============================================================================== --- vendor-crypto/openssl/dist/apps/dgst.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/dgst.c Tue Mar 27 17:03:01 2018 (r331625) @@ -145,7 +145,7 @@ int MAIN(int argc, char **argv) goto end; /* first check the program name */ - program_name(argv[0], pname, sizeof pname); + program_name(argv[0], pname, sizeof(pname)); md = EVP_get_digestbyname(pname); Modified: vendor-crypto/openssl/dist/apps/dsaparam.c ============================================================================== --- vendor-crypto/openssl/dist/apps/dsaparam.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/dsaparam.c Tue Mar 27 17:03:01 2018 (r331625) @@ -382,6 +382,9 @@ int MAIN(int argc, char **argv) printf("\treturn(dsa);\n\t}\n"); } + if (outformat == FORMAT_ASN1 && genkey) + noout = 1; + if (!noout) { if (outformat == FORMAT_ASN1) i = i2d_DSAparams_bio(out, dsa); Modified: vendor-crypto/openssl/dist/apps/ecparam.c ============================================================================== --- vendor-crypto/openssl/dist/apps/ecparam.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/ecparam.c Tue Mar 27 17:03:01 2018 (r331625) @@ -3,7 +3,7 @@ * Written by Nils Larsch for the OpenSSL project. */ /* ==================================================================== - * Copyright (c) 1998-2005 The OpenSSL Project. All rights reserved. + * Copyright (c) 1998-2018 The OpenSSL Project. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions @@ -546,6 +546,9 @@ int MAIN(int argc, char **argv) BIO_printf(out, "\treturn(group);\n\t}\n"); } + if (outformat == FORMAT_ASN1 && genkey) + noout = 1; + if (!noout) { if (outformat == FORMAT_ASN1) i = i2d_ECPKParameters_bio(out, group); @@ -581,6 +584,9 @@ int MAIN(int argc, char **argv) if (EC_KEY_set_group(eckey, group) == 0) goto end; + + if (new_form) + EC_KEY_set_conv_form(eckey, form); if (!EC_KEY_generate_key(eckey)) { EC_KEY_free(eckey); Modified: vendor-crypto/openssl/dist/apps/enc.c ============================================================================== --- vendor-crypto/openssl/dist/apps/enc.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/enc.c Tue Mar 27 17:03:01 2018 (r331625) @@ -114,7 +114,7 @@ int MAIN(int, char **); int MAIN(int argc, char **argv) { static const char magic[] = "Salted__"; - char mbuf[sizeof magic - 1]; + char mbuf[sizeof(magic) - 1]; char *strbuf = NULL; unsigned char *buff = NULL, *bufsize = NULL; int bsize = BSIZE, verbose = 0; @@ -154,7 +154,7 @@ int MAIN(int argc, char **argv) goto end; /* first check the program name */ - program_name(argv[0], pname, sizeof pname); + program_name(argv[0], pname, sizeof(pname)); if (strcmp(pname, "base64") == 0) base64 = 1; #ifdef ZLIB @@ -247,7 +247,7 @@ int MAIN(int argc, char **argv) goto bad; } buf[0] = '\0'; - if (!fgets(buf, sizeof buf, infile)) { + if (!fgets(buf, sizeof(buf), infile)) { BIO_printf(bio_err, "unable to read key from '%s'\n", file); goto bad; } @@ -432,7 +432,7 @@ int MAIN(int argc, char **argv) for (;;) { char buf[200]; - BIO_snprintf(buf, sizeof buf, "enter %s %s password:", + BIO_snprintf(buf, sizeof(buf), "enter %s %s password:", OBJ_nid2ln(EVP_CIPHER_nid(cipher)), (enc) ? "encryption" : "decryption"); strbuf[0] = '\0'; @@ -517,31 +517,31 @@ int MAIN(int argc, char **argv) else { if (enc) { if (hsalt) { - if (!set_hex(hsalt, salt, sizeof salt)) { + if (!set_hex(hsalt, salt, sizeof(salt))) { BIO_printf(bio_err, "invalid hex salt value\n"); goto end; } - } else if (RAND_bytes(salt, sizeof salt) <= 0) + } else if (RAND_bytes(salt, sizeof(salt)) <= 0) goto end; /* * If -P option then don't bother writing */ if ((printkey != 2) && (BIO_write(wbio, magic, - sizeof magic - 1) != sizeof magic - 1 + sizeof(magic) - 1) != sizeof(magic) - 1 || BIO_write(wbio, (char *)salt, - sizeof salt) != sizeof salt)) { + sizeof(salt)) != sizeof(salt))) { BIO_printf(bio_err, "error writing output file\n"); goto end; } - } else if (BIO_read(rbio, mbuf, sizeof mbuf) != sizeof mbuf + } else if (BIO_read(rbio, mbuf, sizeof(mbuf)) != sizeof(mbuf) || BIO_read(rbio, (unsigned char *)salt, - sizeof salt) != sizeof salt) { + sizeof(salt)) != sizeof(salt)) { BIO_printf(bio_err, "error reading input file\n"); goto end; - } else if (memcmp(mbuf, magic, sizeof magic - 1)) { + } else if (memcmp(mbuf, magic, sizeof(magic) - 1)) { BIO_printf(bio_err, "bad magic number\n"); goto end; } @@ -564,7 +564,7 @@ int MAIN(int argc, char **argv) int siz = EVP_CIPHER_iv_length(cipher); if (siz == 0) { BIO_printf(bio_err, "warning: iv not use by this cipher\n"); - } else if (!set_hex(hiv, iv, sizeof iv)) { + } else if (!set_hex(hiv, iv, sizeof(iv))) { BIO_printf(bio_err, "invalid hex iv value\n"); goto end; } Modified: vendor-crypto/openssl/dist/apps/errstr.c ============================================================================== --- vendor-crypto/openssl/dist/apps/errstr.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/errstr.c Tue Mar 27 17:03:01 2018 (r331625) @@ -108,7 +108,7 @@ int MAIN(int argc, char **argv) for (i = 1; i < argc; i++) { if (sscanf(argv[i], "%lx", &l)) { - ERR_error_string_n(l, buf, sizeof buf); + ERR_error_string_n(l, buf, sizeof(buf)); printf("%s\n", buf); } else { printf("%s: bad error code\n", argv[i]); Modified: vendor-crypto/openssl/dist/apps/ocsp.c ============================================================================== --- vendor-crypto/openssl/dist/apps/ocsp.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/ocsp.c Tue Mar 27 17:03:01 2018 (r331625) @@ -1195,7 +1195,7 @@ static int do_responder(OCSP_REQUEST **preq, BIO **pcb *pcbio = cbio; for (;;) { - len = BIO_gets(cbio, inbuf, sizeof inbuf); + len = BIO_gets(cbio, inbuf, sizeof(inbuf)); if (len <= 0) return 1; /* Look for "POST" signalling start of query */ Modified: vendor-crypto/openssl/dist/apps/openssl.c ============================================================================== --- vendor-crypto/openssl/dist/apps/openssl.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/openssl.c Tue Mar 27 17:03:01 2018 (r331625) @@ -351,7 +351,7 @@ int main(int Argc, char *ARGV[]) prog = prog_init(); /* first check the program name */ - program_name(Argv[0], pname, sizeof pname); + program_name(Argv[0], pname, sizeof(pname)); f.name = pname; fp = lh_FUNCTION_retrieve(prog, &f); @@ -379,7 +379,7 @@ int main(int Argc, char *ARGV[]) for (;;) { ret = 0; p = buf; - n = sizeof buf; + n = sizeof(buf); i = 0; for (;;) { p[0] = '\0'; @@ -685,7 +685,7 @@ static LHASH_OF(FUNCTION) *prog_init(void) /* Purely so it looks nice when the user hits ? */ for (i = 0, f = functions; f->name != NULL; ++f, ++i) ; - qsort(functions, i, sizeof *functions, SortFnByName); + qsort(functions, i, sizeof(*functions), SortFnByName); if ((ret = lh_FUNCTION_new()) == NULL) return (NULL); Modified: vendor-crypto/openssl/dist/apps/passwd.c ============================================================================== --- vendor-crypto/openssl/dist/apps/passwd.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/passwd.c Tue Mar 27 17:03:01 2018 (r331625) @@ -252,7 +252,7 @@ int MAIN(int argc, char **argv) /* ignore rest of line */ char trash[BUFSIZ]; do - r = BIO_gets(in, trash, sizeof trash); + r = BIO_gets(in, trash, sizeof(trash)); while ((r > 0) && (!strchr(trash, '\n'))); } @@ -329,8 +329,8 @@ static char *md5crypt(const char *passwd, const char * EVP_DigestUpdate(&md2, passwd, passwd_len); EVP_DigestFinal_ex(&md2, buf, NULL); - for (i = passwd_len; i > sizeof buf; i -= sizeof buf) - EVP_DigestUpdate(&md, buf, sizeof buf); + for (i = passwd_len; i > sizeof(buf); i -= sizeof(buf)) + EVP_DigestUpdate(&md, buf, sizeof(buf)); EVP_DigestUpdate(&md, buf, i); n = passwd_len; @@ -343,13 +343,13 @@ static char *md5crypt(const char *passwd, const char * for (i = 0; i < 1000; i++) { EVP_DigestInit_ex(&md2, EVP_md5(), NULL); EVP_DigestUpdate(&md2, (i & 1) ? (unsigned const char *)passwd : buf, - (i & 1) ? passwd_len : sizeof buf); + (i & 1) ? passwd_len : sizeof(buf)); if (i % 3) EVP_DigestUpdate(&md2, salt_out, salt_len); if (i % 7) EVP_DigestUpdate(&md2, passwd, passwd_len); EVP_DigestUpdate(&md2, (i & 1) ? buf : (unsigned const char *)passwd, - (i & 1) ? sizeof buf : passwd_len); + (i & 1) ? sizeof(buf) : passwd_len); EVP_DigestFinal_ex(&md2, buf, NULL); } EVP_MD_CTX_cleanup(&md2); @@ -357,7 +357,7 @@ static char *md5crypt(const char *passwd, const char * { /* transform buf into output string */ - unsigned char buf_perm[sizeof buf]; + unsigned char buf_perm[sizeof(buf)]; int dest, source; char *output; @@ -369,7 +369,7 @@ static char *md5crypt(const char *passwd, const char * buf_perm[15] = buf[11]; # ifndef PEDANTIC /* Unfortunately, this generates a "no * effect" warning */ - assert(16 == sizeof buf_perm); + assert(16 == sizeof(buf_perm)); # endif output = salt_out + salt_len; Modified: vendor-crypto/openssl/dist/apps/pkcs12.c ============================================================================== --- vendor-crypto/openssl/dist/apps/pkcs12.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/pkcs12.c Tue Mar 27 17:03:01 2018 (r331625) @@ -481,7 +481,7 @@ int MAIN(int argc, char **argv) CRYPTO_push_info("read MAC password"); # endif if (EVP_read_pw_string - (macpass, sizeof macpass, "Enter MAC Password:", export_cert)) { + (macpass, sizeof(macpass), "Enter MAC Password:", export_cert)) { BIO_printf(bio_err, "Can't read Password\n"); goto end; } @@ -629,13 +629,13 @@ int MAIN(int argc, char **argv) # endif if (!noprompt && - EVP_read_pw_string(pass, sizeof pass, "Enter Export Password:", + EVP_read_pw_string(pass, sizeof(pass), "Enter Export Password:", 1)) { BIO_printf(bio_err, "Can't read Password\n"); goto export_end; } if (!twopass) - BUF_strlcpy(macpass, pass, sizeof macpass); + BUF_strlcpy(macpass, pass, sizeof(macpass)); # ifdef CRYPTO_MDEBUG CRYPTO_pop_info(); @@ -698,7 +698,7 @@ int MAIN(int argc, char **argv) CRYPTO_push_info("read import password"); # endif if (!noprompt - && EVP_read_pw_string(pass, sizeof pass, "Enter Import Password:", + && EVP_read_pw_string(pass, sizeof(pass), "Enter Import Password:", 0)) { BIO_printf(bio_err, "Can't read Password\n"); goto end; @@ -708,7 +708,7 @@ int MAIN(int argc, char **argv) # endif if (!twopass) - BUF_strlcpy(macpass, pass, sizeof macpass); + BUF_strlcpy(macpass, pass, sizeof(macpass)); if ((options & INFO) && p12->mac) BIO_printf(bio_err, "MAC Iteration %ld\n", Modified: vendor-crypto/openssl/dist/apps/pkcs8.c ============================================================================== --- vendor-crypto/openssl/dist/apps/pkcs8.c Tue Mar 27 16:38:32 2018 (r331624) +++ vendor-crypto/openssl/dist/apps/pkcs8.c Tue Mar 27 17:03:01 2018 (r331625) @@ -277,7 +277,7 @@ int MAIN(int argc, char **argv) else { p8pass = pass; if (EVP_read_pw_string - (pass, sizeof pass, "Enter Encryption Password:", 1)) + (pass, sizeof(pass), "Enter Encryption Password:", 1)) goto end; } app_RAND_load_file(NULL, bio_err, 0); @@ -331,7 +331,7 @@ int MAIN(int argc, char **argv) p8pass = passin; *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** From owner-svn-src-vendor@freebsd.org Tue Mar 27 17:04:02 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 91D0CF6B2EB; Tue, 27 Mar 2018 17:04:02 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 457527247F; Tue, 27 Mar 2018 17:04:02 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 25F711EEE2; Tue, 27 Mar 2018 17:04:02 +0000 (UTC) (envelope-from jkim@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2RH41w8049565; Tue, 27 Mar 2018 17:04:01 GMT (envelope-from jkim@FreeBSD.org) Received: (from jkim@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2RH41OU049564; Tue, 27 Mar 2018 17:04:01 GMT (envelope-from jkim@FreeBSD.org) Message-Id: <201803271704.w2RH41OU049564@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: jkim set sender to jkim@FreeBSD.org using -f From: Jung-uk Kim Date: Tue, 27 Mar 2018 17:04:01 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331626 - vendor-crypto/openssl/1.0.2o X-SVN-Group: vendor-crypto X-SVN-Commit-Author: jkim X-SVN-Commit-Paths: vendor-crypto/openssl/1.0.2o X-SVN-Commit-Revision: 331626 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Mar 2018 17:04:02 -0000 Author: jkim Date: Tue Mar 27 17:04:01 2018 New Revision: 331626 URL: https://svnweb.freebsd.org/changeset/base/331626 Log: Tag OpenSSL 1.0.2o. Added: vendor-crypto/openssl/1.0.2o/ - copied from r331625, vendor-crypto/openssl/dist/ From owner-svn-src-vendor@freebsd.org Wed Mar 28 18:12:08 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 366F8F6E87F; Wed, 28 Mar 2018 18:12:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DB00C7B071; Wed, 28 Mar 2018 18:12:07 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id D5D4F6874; Wed, 28 Mar 2018 18:12:07 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SIC7ik011299; Wed, 28 Mar 2018 18:12:07 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SIC6ps011283; Wed, 28 Mar 2018 18:12:06 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803281812.w2SIC6ps011283@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 18:12:06 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331695 - vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/fs/zfs/sys vendor-sys/illumo... X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/fs/zfs/sys vendor-sys/illumos/dist/uts/common/sys/fs ... X-SVN-Commit-Revision: 331695 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 18:12:08 -0000 Author: mav Date: Wed Mar 28 18:12:06 2018 New Revision: 331695 URL: https://svnweb.freebsd.org/changeset/base/331695 Log: 9166 zfs storage pool checkpoint illumos/illumos-gate@8671400134a11c848244896ca51a7db4d0f69da4 The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with exactly that. It can be thought of as a “pool-wide snapshot” (or a variation of extreme rewind that doesn’t corrupt your data). It remembers the entire state of the pool at the point that it was taken and the user can revert back to it later or discard it. Its generic use case is an administrator that is about to perform a set of destructive actions to ZFS as part of a critical procedure. She takes a checkpoint of the pool before performing the actions, then rewinds back to it if one of them fails or puts the pool into an unexpected state. Otherwise, she discards it. With the assumption that no one else is making modifications to ZFS, she basically wraps all these actions into a “high-level transaction”. Reviewed by: Matthew Ahrens Reviewed by: John Kennedy Reviewed by: Dan Kimmel Approved by: Richard Lowe Author: Serapheim Dimitropoulos Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.c vendor-sys/illumos/dist/common/zfs/zfeature_common.h vendor-sys/illumos/dist/common/zfs/zpool_prop.c vendor-sys/illumos/dist/uts/common/Makefile.files vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_traverse.c vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c vendor-sys/illumos/dist/uts/common/fs/zfs/dnode_sync.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dir.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_scan.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_synctask.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_userhold.c vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c vendor-sys/illumos/dist/uts/common/fs/zfs/range_tree.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c vendor-sys/illumos/dist/uts/common/fs/zfs/space_map.c vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_dir.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_pool.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_synctask.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/metaslab.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/metaslab_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/range_tree.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/space_map.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/uberblock_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev_removal.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zio.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zthr.h vendor-sys/illumos/dist/uts/common/fs/zfs/uberblock.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_indirect.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_label.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_removal.c vendor-sys/illumos/dist/uts/common/fs/zfs/zcp.c vendor-sys/illumos/dist/uts/common/fs/zfs/zcp_synctask.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_ioctl.c vendor-sys/illumos/dist/uts/common/fs/zfs/zil.c vendor-sys/illumos/dist/uts/common/fs/zfs/zio.c vendor-sys/illumos/dist/uts/common/fs/zfs/zthr.c vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Changes in other areas also in this revision: Modified: vendor/illumos/dist/cmd/zdb/zdb.c vendor/illumos/dist/cmd/zdb/zdb_il.c vendor/illumos/dist/cmd/zpool/zpool_main.c vendor/illumos/dist/cmd/ztest/ztest.c vendor/illumos/dist/lib/libzfs/common/libzfs.h vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c vendor/illumos/dist/lib/libzfs/common/libzfs_util.c vendor/illumos/dist/lib/libzfs_core/common/libzfs_core.c vendor/illumos/dist/lib/libzfs_core/common/libzfs_core.h vendor/illumos/dist/man/man1m/zdb.1m vendor/illumos/dist/man/man1m/zpool.1m vendor/illumos/dist/man/man5/zpool-features.5 Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.c ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfeature_common.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/common/zfs/zfeature_common.c Wed Mar 28 18:12:06 2018 (r331695) @@ -20,7 +20,7 @@ */ /* - * Copyright (c) 2011, 2015 by Delphix. All rights reserved. + * Copyright (c) 2011, 2017 by Delphix. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved. * Copyright (c) 2013, Joyent, Inc. All rights reserved. * Copyright (c) 2014, Nexenta Systems, Inc. All rights reserved. @@ -224,6 +224,11 @@ zpool_feature_init(void) "Blocks which compress very well use even less space.", ZFEATURE_FLAG_MOS | ZFEATURE_FLAG_ACTIVATE_ON_ENABLE, NULL); + + zfeature_register(SPA_FEATURE_POOL_CHECKPOINT, + "com.delphix:zpool_checkpoint", "zpool_checkpoint", + "Pool state can be checkpointed, allowing rewind later.", + ZFEATURE_FLAG_READONLY_COMPAT, NULL); static const spa_feature_t large_blocks_deps[] = { SPA_FEATURE_EXTENSIBLE_DATASET, Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.h ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfeature_common.h Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/common/zfs/zfeature_common.h Wed Mar 28 18:12:06 2018 (r331695) @@ -20,7 +20,7 @@ */ /* - * Copyright (c) 2011, 2015 by Delphix. All rights reserved. + * Copyright (c) 2011, 2017 by Delphix. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved. * Copyright (c) 2013, Joyent, Inc. All rights reserved. * Copyright (c) 2014 Integros [integros.com] @@ -58,6 +58,7 @@ typedef enum spa_feature { SPA_FEATURE_EDONR, SPA_FEATURE_DEVICE_REMOVAL, SPA_FEATURE_OBSOLETE_COUNTS, + SPA_FEATURE_POOL_CHECKPOINT, SPA_FEATURES } spa_feature_t; Modified: vendor-sys/illumos/dist/common/zfs/zpool_prop.c ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zpool_prop.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/common/zfs/zpool_prop.c Wed Mar 28 18:12:06 2018 (r331695) @@ -21,7 +21,7 @@ /* * Copyright (c) 2007, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright 2011 Nexenta Systems, Inc. All rights reserved. - * Copyright (c) 2012, 2014 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2014 Integros [integros.com] */ @@ -82,6 +82,8 @@ zpool_prop_init(void) ZFS_TYPE_POOL, "", "FREE"); zprop_register_number(ZPOOL_PROP_FREEING, "freeing", 0, PROP_READONLY, ZFS_TYPE_POOL, "", "FREEING"); + zprop_register_number(ZPOOL_PROP_CHECKPOINT, "checkpoint", 0, + PROP_READONLY, ZFS_TYPE_POOL, "", "CKPOINT"); zprop_register_number(ZPOOL_PROP_LEAKED, "leaked", 0, PROP_READONLY, ZFS_TYPE_POOL, "", "LEAKED"); zprop_register_number(ZPOOL_PROP_ALLOCATED, "allocated", 0, Modified: vendor-sys/illumos/dist/uts/common/Makefile.files ============================================================================== --- vendor-sys/illumos/dist/uts/common/Makefile.files Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/Makefile.files Wed Mar 28 18:12:06 2018 (r331695) @@ -1385,6 +1385,7 @@ ZFS_COMMON_OBJS += \ edonr_zfs.o \ skein_zfs.o \ spa.o \ + spa_checkpoint.o \ spa_config.o \ spa_errlog.o \ spa_history.o \ Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_traverse.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_traverse.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_traverse.c Wed Mar 28 18:12:06 2018 (r331695) @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2016 by Delphix. All rights reserved. + * Copyright (c) 2012, 2018 by Delphix. All rights reserved. */ #include @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -80,8 +81,8 @@ traverse_zil_block(zilog_t *zilog, blkptr_t *bp, void if (BP_IS_HOLE(bp)) return (0); - if (claim_txg == 0 && bp->blk_birth >= spa_first_txg(td->td_spa)) - return (0); + if (claim_txg == 0 && bp->blk_birth >= spa_min_claim_txg(td->td_spa)) + return (-1); SET_BOOKMARK(&zb, td->td_objset, ZB_ZIL_OBJECT, ZB_ZIL_LEVEL, bp->blk_cksum.zc_word[ZIL_ZC_SEQ]); @@ -120,20 +121,17 @@ static void traverse_zil(traverse_data_t *td, zil_header_t *zh) { uint64_t claim_txg = zh->zh_claim_txg; - zilog_t *zilog; /* * We only want to visit blocks that have been claimed but not yet - * replayed; plus, in read-only mode, blocks that are already stable. + * replayed; plus blocks that are already stable in read-only mode. */ if (claim_txg == 0 && spa_writeable(td->td_spa)) return; - zilog = zil_alloc(spa_get_dsl(td->td_spa)->dp_meta_objset, zh); - + zilog_t *zilog = zil_alloc(spa_get_dsl(td->td_spa)->dp_meta_objset, zh); (void) zil_parse(zilog, traverse_zil_block, traverse_zil_record, td, claim_txg); - zil_free(zilog); } Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c Wed Mar 28 18:12:06 2018 (r331695) @@ -1083,6 +1083,8 @@ dnode_hold_impl(objset_t *os, uint64_t object, int fla (spa_is_root(os->os_spa) && spa_config_held(os->os_spa, SCL_STATE, RW_WRITER))); + ASSERT((flag & DNODE_MUST_BE_ALLOCATED) || (flag & DNODE_MUST_BE_FREE)); + if (object == DMU_USERUSED_OBJECT || object == DMU_GROUPUSED_OBJECT) { dn = (object == DMU_USERUSED_OBJECT) ? DMU_USERUSED_DNODE(os) : DMU_GROUPUSED_DNODE(os); @@ -1176,7 +1178,7 @@ dnode_hold_impl(objset_t *os, uint64_t object, int fla mutex_exit(&dn->dn_mtx); zrl_remove(&dnh->dnh_zrlock); dbuf_rele(db, FTAG); - return (type == DMU_OT_NONE ? ENOENT : EEXIST); + return ((flag & DNODE_MUST_BE_ALLOCATED) ? ENOENT : EEXIST); } if (refcount_add(&dn->dn_holds, tag) == 1) dbuf_add_ref(db, dnh); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dnode_sync.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dnode_sync.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dnode_sync.c Wed Mar 28 18:12:06 2018 (r331695) @@ -608,7 +608,7 @@ dnode_sync(dnode_t *dn, dmu_tx_t *tx) dn->dn_maxblkid == 0 || list_head(list) != NULL || dn->dn_next_blksz[txgoff] >> SPA_MINBLOCKSHIFT == dnp->dn_datablkszsec || - range_tree_space(dn->dn_free_ranges[txgoff]) != 0); + !range_tree_is_empty(dn->dn_free_ranges[txgoff])); dnp->dn_datablkszsec = dn->dn_next_blksz[txgoff] >> SPA_MINBLOCKSHIFT; dn->dn_next_blksz[txgoff] = 0; Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c Wed Mar 28 18:12:06 2018 (r331695) @@ -46,6 +46,7 @@ #include #include #include +#include #include #include #include @@ -205,7 +206,9 @@ int dsl_dataset_block_kill(dsl_dataset_t *ds, const blkptr_t *bp, dmu_tx_t *tx, boolean_t async) { - int used = bp_get_dsize_sync(tx->tx_pool->dp_spa, bp); + spa_t *spa = dmu_tx_pool(tx)->dp_spa; + + int used = bp_get_dsize_sync(spa, bp); int compressed = BP_GET_PSIZE(bp); int uncompressed = BP_GET_UCSIZE(bp); @@ -3717,7 +3720,8 @@ dsl_dataset_set_refquota(const char *dsname, zprop_sou ddsqra.ddsqra_value = refquota; return (dsl_sync_task(dsname, dsl_dataset_set_refquota_check, - dsl_dataset_set_refquota_sync, &ddsqra, 0, ZFS_SPACE_CHECK_NONE)); + dsl_dataset_set_refquota_sync, &ddsqra, 0, + ZFS_SPACE_CHECK_EXTRA_RESERVED)); } static int @@ -3832,8 +3836,8 @@ dsl_dataset_set_refreservation(const char *dsname, zpr ddsqra.ddsqra_value = refreservation; return (dsl_sync_task(dsname, dsl_dataset_set_refreservation_check, - dsl_dataset_set_refreservation_sync, &ddsqra, - 0, ZFS_SPACE_CHECK_NONE)); + dsl_dataset_set_refreservation_sync, &ddsqra, 0, + ZFS_SPACE_CHECK_EXTRA_RESERVED)); } /* Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c Wed Mar 28 18:12:06 2018 (r331695) @@ -1022,7 +1022,7 @@ dsl_destroy_head(const char *name) error = dsl_sync_task(name, dsl_destroy_head_check, dsl_destroy_head_begin_sync, &ddha, - 0, ZFS_SPACE_CHECK_NONE); + 0, ZFS_SPACE_CHECK_DESTROY); if (error != 0) return (error); @@ -1047,7 +1047,7 @@ dsl_destroy_head(const char *name) } return (dsl_sync_task(name, dsl_destroy_head_check, - dsl_destroy_head_sync, &ddha, 0, ZFS_SPACE_CHECK_NONE)); + dsl_destroy_head_sync, &ddha, 0, ZFS_SPACE_CHECK_DESTROY)); } /* Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dir.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dir.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dir.c Wed Mar 28 18:12:06 2018 (r331695) @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2016 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2013 Martin Matuska. All rights reserved. * Copyright (c) 2014 Joyent, Inc. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. @@ -921,14 +921,14 @@ dsl_dir_create_sync(dsl_pool_t *dp, dsl_dir_t *pds, co ddobj = dmu_object_alloc(mos, DMU_OT_DSL_DIR, 0, DMU_OT_DSL_DIR, sizeof (dsl_dir_phys_t), tx); if (pds) { - VERIFY(0 == zap_add(mos, dsl_dir_phys(pds)->dd_child_dir_zapobj, + VERIFY0(zap_add(mos, dsl_dir_phys(pds)->dd_child_dir_zapobj, name, sizeof (uint64_t), 1, &ddobj, tx)); } else { /* it's the root dir */ - VERIFY(0 == zap_add(mos, DMU_POOL_DIRECTORY_OBJECT, + VERIFY0(zap_add(mos, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1, &ddobj, tx)); } - VERIFY(0 == dmu_bonus_hold(mos, ddobj, FTAG, &dbuf)); + VERIFY0(dmu_bonus_hold(mos, ddobj, FTAG, &dbuf)); dmu_buf_will_dirty(dbuf, tx); ddphys = dbuf->db_data; @@ -967,6 +967,12 @@ dsl_dir_get_used(dsl_dir_t *dd) } uint64_t +dsl_dir_get_compressed(dsl_dir_t *dd) +{ + return (dsl_dir_phys(dd)->dd_compressed_bytes); +} + +uint64_t dsl_dir_get_quota(dsl_dir_t *dd) { return (dsl_dir_phys(dd)->dd_quota); @@ -1193,7 +1199,8 @@ dsl_dir_space_available(dsl_dir_t *dd, used += dsl_dir_space_towrite(dd); if (dd->dd_parent == NULL) { - uint64_t poolsize = dsl_pool_adjustedsize(dd->dd_pool, FALSE); + uint64_t poolsize = dsl_pool_adjustedsize(dd->dd_pool, + ZFS_SPACE_CHECK_NORMAL); quota = MIN(quota, poolsize); } @@ -1298,11 +1305,12 @@ dsl_dir_tempreserve_impl(dsl_dir_t *dd, uint64_t asize */ uint64_t deferred = 0; if (dd->dd_parent == NULL) { - spa_t *spa = dd->dd_pool->dp_spa; - uint64_t poolsize = dsl_pool_adjustedsize(dd->dd_pool, netfree); - deferred = metaslab_class_get_deferred(spa_normal_class(spa)); - if (poolsize - deferred < quota) { - quota = poolsize - deferred; + uint64_t avail = dsl_pool_unreserved_space(dd->dd_pool, + (netfree) ? + ZFS_SPACE_CHECK_RESERVED : ZFS_SPACE_CHECK_NORMAL); + + if (avail < quota) { + quota = avail; retval = ENOSPC; } } @@ -1639,7 +1647,8 @@ dsl_dir_set_quota(const char *ddname, zprop_source_t s ddsqra.ddsqra_value = quota; return (dsl_sync_task(ddname, dsl_dir_set_quota_check, - dsl_dir_set_quota_sync, &ddsqra, 0, ZFS_SPACE_CHECK_NONE)); + dsl_dir_set_quota_sync, &ddsqra, 0, + ZFS_SPACE_CHECK_EXTRA_RESERVED)); } int @@ -1682,7 +1691,8 @@ dsl_dir_set_reservation_check(void *arg, dmu_tx_t *tx) avail = dsl_dir_space_available(dd->dd_parent, NULL, 0, FALSE); } else { - avail = dsl_pool_adjustedsize(dd->dd_pool, B_FALSE) - used; + avail = dsl_pool_adjustedsize(dd->dd_pool, + ZFS_SPACE_CHECK_NORMAL) - used; } if (MAX(used, newval) > MAX(used, dsl_dir_phys(dd)->dd_reserved)) { @@ -1761,7 +1771,8 @@ dsl_dir_set_reservation(const char *ddname, zprop_sour ddsqra.ddsqra_value = reservation; return (dsl_sync_task(ddname, dsl_dir_set_reservation_check, - dsl_dir_set_reservation_sync, &ddsqra, 0, ZFS_SPACE_CHECK_NONE)); + dsl_dir_set_reservation_sync, &ddsqra, 0, + ZFS_SPACE_CHECK_EXTRA_RESERVED)); } static dsl_dir_t * Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c Wed Mar 28 18:12:06 2018 (r331695) @@ -44,6 +44,8 @@ #include #include #include +#include +#include #include #include #include @@ -197,6 +199,8 @@ dsl_pool_open_impl(spa_t *spa, uint64_t txg) offsetof(dsl_dir_t, dd_dirty_link)); txg_list_create(&dp->dp_sync_tasks, spa, offsetof(dsl_sync_task_t, dst_node)); + txg_list_create(&dp->dp_early_sync_tasks, spa, + offsetof(dsl_sync_task_t, dst_node)); dp->dp_sync_taskq = taskq_create("dp_sync_taskq", zfs_sync_taskq_batch_pct, minclsyspri, 1, INT_MAX, @@ -373,6 +377,7 @@ dsl_pool_close(dsl_pool_t *dp) txg_list_destroy(&dp->dp_dirty_datasets); txg_list_destroy(&dp->dp_dirty_zilogs); txg_list_destroy(&dp->dp_sync_tasks); + txg_list_destroy(&dp->dp_early_sync_tasks); txg_list_destroy(&dp->dp_dirty_dirs); taskq_destroy(dp->dp_zil_clean_taskq); @@ -545,6 +550,27 @@ dsl_pool_dirty_delta(dsl_pool_t *dp, int64_t delta) cv_signal(&dp->dp_spaceavail_cv); } +static boolean_t +dsl_early_sync_task_verify(dsl_pool_t *dp, uint64_t txg) +{ + spa_t *spa = dp->dp_spa; + vdev_t *rvd = spa->spa_root_vdev; + + for (uint64_t c = 0; c < rvd->vdev_children; c++) { + vdev_t *vd = rvd->vdev_child[c]; + txg_list_t *tl = &vd->vdev_ms_list; + metaslab_t *ms; + + for (ms = txg_list_head(tl, TXG_CLEAN(txg)); ms; + ms = txg_list_next(tl, ms, TXG_CLEAN(txg))) { + VERIFY(range_tree_is_empty(ms->ms_freeing)); + VERIFY(range_tree_is_empty(ms->ms_checkpointing)); + } + } + + return (B_TRUE); +} + void dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) { @@ -561,6 +587,23 @@ dsl_pool_sync(dsl_pool_t *dp, uint64_t txg) tx = dmu_tx_create_assigned(dp, txg); /* + * Run all early sync tasks before writing out any dirty blocks. + * For more info on early sync tasks see block comment in + * dsl_early_sync_task(). + */ + if (!txg_list_empty(&dp->dp_early_sync_tasks, txg)) { + dsl_sync_task_t *dst; + + ASSERT3U(spa_sync_pass(dp->dp_spa), ==, 1); + while ((dst = + txg_list_remove(&dp->dp_early_sync_tasks, txg)) != NULL) { + ASSERT(dsl_early_sync_task_verify(dp, txg)); + dsl_sync_task_sync(dst, tx); + } + ASSERT(dsl_early_sync_task_verify(dp, txg)); + } + + /* * Write out all dirty blocks of dirty datasets. */ zio = zio_root(dp->dp_spa, NULL, NULL, ZIO_FLAG_MUSTSUCCEED); @@ -714,22 +757,66 @@ dsl_pool_sync_context(dsl_pool_t *dp) taskq_member(dp->dp_sync_taskq, curthread)); } +/* + * This function returns the amount of allocatable space in the pool + * minus whatever space is currently reserved by ZFS for specific + * purposes. Specifically: + * + * 1] Any reserved SLOP space + * 2] Any space used by the checkpoint + * 3] Any space used for deferred frees + * + * The latter 2 are especially important because they are needed to + * rectify the SPA's and DMU's different understanding of how much space + * is used. Now the DMU is aware of that extra space tracked by the SPA + * without having to maintain a separate special dir (e.g similar to + * $MOS, $FREEING, and $LEAKED). + * + * Note: By deferred frees here, we mean the frees that were deferred + * in spa_sync() after sync pass 1 (spa_deferred_bpobj), and not the + * segments placed in ms_defer trees during metaslab_sync_done(). + */ uint64_t -dsl_pool_adjustedsize(dsl_pool_t *dp, boolean_t netfree) +dsl_pool_adjustedsize(dsl_pool_t *dp, zfs_space_check_t slop_policy) { - uint64_t space, resv; + spa_t *spa = dp->dp_spa; + uint64_t space, resv, adjustedsize; + uint64_t spa_deferred_frees = + spa->spa_deferred_bpobj.bpo_phys->bpo_bytes; - /* - * If we're trying to assess whether it's OK to do a free, - * cut the reservation in half to allow forward progress - * (e.g. make it possible to rm(1) files from a full pool). - */ - space = spa_get_dspace(dp->dp_spa); - resv = spa_get_slop_space(dp->dp_spa); - if (netfree) + space = spa_get_dspace(spa) + - spa_get_checkpoint_space(spa) - spa_deferred_frees; + resv = spa_get_slop_space(spa); + + switch (slop_policy) { + case ZFS_SPACE_CHECK_NORMAL: + break; + case ZFS_SPACE_CHECK_RESERVED: resv >>= 1; + break; + case ZFS_SPACE_CHECK_EXTRA_RESERVED: + resv >>= 2; + break; + case ZFS_SPACE_CHECK_NONE: + resv = 0; + break; + default: + panic("invalid slop policy value: %d", slop_policy); + break; + } + adjustedsize = (space >= resv) ? (space - resv) : 0; - return (space - resv); + return (adjustedsize); +} + +uint64_t +dsl_pool_unreserved_space(dsl_pool_t *dp, zfs_space_check_t slop_policy) +{ + uint64_t poolsize = dsl_pool_adjustedsize(dp, slop_policy); + uint64_t deferred = + metaslab_class_get_deferred(spa_normal_class(dp->dp_spa)); + uint64_t quota = (poolsize >= deferred) ? (poolsize - deferred) : 0; + return (quota); } boolean_t Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_scan.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_scan.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_scan.c Wed Mar 28 18:12:06 2018 (r331695) @@ -20,6 +20,7 @@ */ /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. + * Copyright (c) 2011, 2017 by Delphix. All rights reserved. * Copyright 2016 Gary Mills * Copyright (c) 2011, 2017 by Delphix. All rights reserved. * Copyright 2017 Joyent, Inc. @@ -325,13 +326,23 @@ dsl_scan_done(dsl_scan_t *scn, boolean_t complete, dmu * If the scrub/resilver completed, update all DTLs to * reflect this. Whether it succeeded or not, vacate * all temporary scrub DTLs. + * + * As the scrub does not currently support traversing + * data that have been freed but are part of a checkpoint, + * we don't mark the scrub as done in the DTLs as faults + * may still exist in those vdevs. */ - vdev_dtl_reassess(spa->spa_root_vdev, tx->tx_txg, - complete ? scn->scn_phys.scn_max_txg : 0, B_TRUE); - if (complete) { + if (complete && + !spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)) { + vdev_dtl_reassess(spa->spa_root_vdev, tx->tx_txg, + scn->scn_phys.scn_max_txg, B_TRUE); + spa_event_notify(spa, NULL, NULL, scn->scn_phys.scn_min_txg ? ESC_ZFS_RESILVER_FINISH : ESC_ZFS_SCRUB_FINISH); + } else { + vdev_dtl_reassess(spa->spa_root_vdev, tx->tx_txg, + 0, B_TRUE); } spa_errlog_rotate(spa); @@ -583,7 +594,7 @@ dsl_scan_zil_block(zilog_t *zilog, blkptr_t *bp, void * (on-disk) even if it hasn't been claimed (even though for * scrub there's nothing to do to it). */ - if (claim_txg == 0 && bp->blk_birth >= spa_first_txg(dp->dp_spa)) + if (claim_txg == 0 && bp->blk_birth >= spa_min_claim_txg(dp->dp_spa)) return (0); SET_BOOKMARK(&zb, zh->zh_log.blk_cksum.zc_word[ZIL_ZC_OBJSET], @@ -634,11 +645,13 @@ dsl_scan_zil(dsl_pool_t *dp, zil_header_t *zh) zil_scan_arg_t zsa = { dp, zh }; zilog_t *zilog; + ASSERT(spa_writeable(dp->dp_spa)); + /* - * We only want to visit blocks that have been claimed but not yet - * replayed (or, in read-only mode, blocks that *would* be claimed). + * We only want to visit blocks that have been claimed + * but not yet replayed. */ - if (claim_txg == 0 && spa_writeable(dp->dp_spa)) + if (claim_txg == 0) return; zilog = zil_alloc(dp->dp_meta_objset, zh); @@ -1562,61 +1575,16 @@ dsl_scan_active(dsl_scan_t *scn) return (used != 0); } -/* Called whenever a txg syncs. */ -void -dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx) +static int +dsl_process_async_destroys(dsl_pool_t *dp, dmu_tx_t *tx) { dsl_scan_t *scn = dp->dp_scan; spa_t *spa = dp->dp_spa; int err = 0; - /* - * Check for scn_restart_txg before checking spa_load_state, so - * that we can restart an old-style scan while the pool is being - * imported (see dsl_scan_init). - */ - if (dsl_scan_restarting(scn, tx)) { - pool_scan_func_t func = POOL_SCAN_SCRUB; - dsl_scan_done(scn, B_FALSE, tx); - if (vdev_resilver_needed(spa->spa_root_vdev, NULL, NULL)) - func = POOL_SCAN_RESILVER; - zfs_dbgmsg("restarting scan func=%u txg=%llu", - func, tx->tx_txg); - dsl_scan_setup_sync(&func, tx); - } + if (spa_suspend_async_destroy(spa)) + return (0); - /* - * Only process scans in sync pass 1. - */ - if (spa_sync_pass(dp->dp_spa) > 1) - return; - - /* - * If the spa is shutting down, then stop scanning. This will - * ensure that the scan does not dirty any new data during the - * shutdown phase. - */ - if (spa_shutting_down(spa)) - return; - - /* - * If the scan is inactive due to a stalled async destroy, try again. - */ - if (!scn->scn_async_stalled && !dsl_scan_active(scn)) - return; - - scn->scn_visited_this_txg = 0; - scn->scn_suspending = B_FALSE; - scn->scn_sync_start_time = gethrtime(); - spa->spa_scrub_active = B_TRUE; - - /* - * First process the async destroys. If we suspend, don't do - * any scrubbing or resilvering. This ensures that there are no - * async destroys while we are scanning, so the scan code doesn't - * have to worry about traversing it. It is also faster to free the - * blocks than to scrub them. - */ if (zfs_free_bpobj_enabled && spa_version(dp->dp_spa) >= SPA_VERSION_DEADLISTS) { scn->scn_is_bptree = B_FALSE; @@ -1690,7 +1658,7 @@ dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx) ddt_sync(spa, tx->tx_txg); } if (err != 0) - return; + return (err); if (dp->dp_free_dir != NULL && !scn->scn_async_destroying && zfs_free_leak_on_eio && (dsl_dir_phys(dp->dp_free_dir)->dd_used_bytes != 0 || @@ -1744,6 +1712,67 @@ dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx) dsl_pool_destroy_obsolete_bpobj(dp, tx); } + return (0); +} + +void +dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx) +{ + dsl_scan_t *scn = dp->dp_scan; + spa_t *spa = dp->dp_spa; + int err = 0; + + /* + * Check for scn_restart_txg before checking spa_load_state, so + * that we can restart an old-style scan while the pool is being + * imported (see dsl_scan_init). + */ + if (dsl_scan_restarting(scn, tx)) { + pool_scan_func_t func = POOL_SCAN_SCRUB; + dsl_scan_done(scn, B_FALSE, tx); + if (vdev_resilver_needed(spa->spa_root_vdev, NULL, NULL)) + func = POOL_SCAN_RESILVER; + zfs_dbgmsg("restarting scan func=%u txg=%llu", + func, tx->tx_txg); + dsl_scan_setup_sync(&func, tx); + } + + /* + * Only process scans in sync pass 1. + */ + if (spa_sync_pass(dp->dp_spa) > 1) + return; + + /* + * If the spa is shutting down, then stop scanning. This will + * ensure that the scan does not dirty any new data during the + * shutdown phase. + */ + if (spa_shutting_down(spa)) + return; + + /* + * If the scan is inactive due to a stalled async destroy, try again. + */ + if (!scn->scn_async_stalled && !dsl_scan_active(scn)) + return; + + scn->scn_visited_this_txg = 0; + scn->scn_suspending = B_FALSE; + scn->scn_sync_start_time = gethrtime(); + spa->spa_scrub_active = B_TRUE; + + /* + * First process the async destroys. If we pause, don't do + * any scrubbing or resilvering. This ensures that there are no + * async destroys while we are scanning, so the scan code doesn't + * have to worry about traversing it. It is also faster to free the + * blocks than to scrub them. + */ + err = dsl_process_async_destroys(dp, tx); + if (err != 0) + return; + if (scn->scn_phys.scn_state != DSS_SCANNING) return; @@ -2038,7 +2067,7 @@ dsl_scan(dsl_pool_t *dp, pool_scan_func_t func) } return (dsl_sync_task(spa_name(spa), dsl_scan_setup_check, - dsl_scan_setup_sync, &func, 0, ZFS_SPACE_CHECK_NONE)); + dsl_scan_setup_sync, &func, 0, ZFS_SPACE_CHECK_EXTRA_RESERVED)); } static boolean_t Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_synctask.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_synctask.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_synctask.c Wed Mar 28 18:12:06 2018 (r331695) @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2014 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. */ #include @@ -39,33 +39,10 @@ dsl_null_checkfunc(void *arg, dmu_tx_t *tx) return (0); } -/* - * Called from open context to perform a callback in syncing context. Waits - * for the operation to complete. - * - * The checkfunc will be called from open context as a preliminary check - * which can quickly fail. If it succeeds, it will be called again from - * syncing context. The checkfunc should generally be designed to work - * properly in either context, but if necessary it can check - * dmu_tx_is_syncing(tx). - * - * The synctask infrastructure enforces proper locking strategy with respect - * to the dp_config_rwlock -- the lock will always be held when the callbacks - * are called. It will be held for read during the open-context (preliminary) - * call to the checkfunc, and then held for write from syncing context during - * the calls to the check and sync funcs. - * - * A dataset or pool name can be passed as the first argument. Typically, - * the check func will hold, check the return value of the hold, and then - * release the dataset. The sync func will VERIFYO(hold()) the dataset. - * This is safe because no changes can be made between the check and sync funcs, - * and the sync func will only be called if the check func successfully opened - * the dataset. - */ -int -dsl_sync_task(const char *pool, dsl_checkfunc_t *checkfunc, +static int +dsl_sync_task_common(const char *pool, dsl_checkfunc_t *checkfunc, dsl_syncfunc_t *syncfunc, void *arg, - int blocks_modified, zfs_space_check_t space_check) + int blocks_modified, zfs_space_check_t space_check, boolean_t early) { spa_t *spa; dmu_tx_t *tx; @@ -102,7 +79,9 @@ top: return (err); } - VERIFY(txg_list_add_tail(&dp->dp_sync_tasks, &dst, dst.dst_txg)); + txg_list_t *task_list = (early) ? + &dp->dp_early_sync_tasks : &dp->dp_sync_tasks; + VERIFY(txg_list_add_tail(task_list, &dst, dst.dst_txg)); dmu_tx_commit(tx); @@ -117,10 +96,65 @@ top: return (dst.dst_error); } -void -dsl_sync_task_nowait(dsl_pool_t *dp, dsl_syncfunc_t *syncfunc, void *arg, - int blocks_modified, zfs_space_check_t space_check, dmu_tx_t *tx) +/* + * Called from open context to perform a callback in syncing context. Waits + * for the operation to complete. + * + * The checkfunc will be called from open context as a preliminary check + * which can quickly fail. If it succeeds, it will be called again from + * syncing context. The checkfunc should generally be designed to work + * properly in either context, but if necessary it can check + * dmu_tx_is_syncing(tx). + * + * The synctask infrastructure enforces proper locking strategy with respect + * to the dp_config_rwlock -- the lock will always be held when the callbacks + * are called. It will be held for read during the open-context (preliminary) + * call to the checkfunc, and then held for write from syncing context during + * the calls to the check and sync funcs. + * + * A dataset or pool name can be passed as the first argument. Typically, + * the check func will hold, check the return value of the hold, and then + * release the dataset. The sync func will VERIFYO(hold()) the dataset. + * This is safe because no changes can be made between the check and sync funcs, + * and the sync func will only be called if the check func successfully opened + * the dataset. + */ +int +dsl_sync_task(const char *pool, dsl_checkfunc_t *checkfunc, + dsl_syncfunc_t *syncfunc, void *arg, + int blocks_modified, zfs_space_check_t space_check) { + return (dsl_sync_task_common(pool, checkfunc, syncfunc, arg, + blocks_modified, space_check, B_FALSE)); +} + +/* + * An early synctask works exactly as a standard synctask with one important + * difference on the way it is handled during syncing context. Standard + * synctasks run after we've written out all the dirty blocks of dirty + * datasets. Early synctasks are executed before writing out any dirty data, + * and thus before standard synctasks. + * + * For that reason, early synctasks can affect the process of writing dirty + * changes to disk for the txg that they run and should be used with caution. + * In addition, early synctasks should not dirty any metaslabs as this would + * invalidate the precodition/invariant for subsequent early synctasks. + * [see dsl_pool_sync() and dsl_early_sync_task_verify()] + */ +int +dsl_early_sync_task(const char *pool, dsl_checkfunc_t *checkfunc, + dsl_syncfunc_t *syncfunc, void *arg, + int blocks_modified, zfs_space_check_t space_check) +{ + return (dsl_sync_task_common(pool, checkfunc, syncfunc, arg, + blocks_modified, space_check, B_TRUE)); +} + +static void +dsl_sync_task_nowait_common(dsl_pool_t *dp, dsl_syncfunc_t *syncfunc, void *arg, + int blocks_modified, zfs_space_check_t space_check, dmu_tx_t *tx, + boolean_t early) +{ dsl_sync_task_t *dst = kmem_zalloc(sizeof (*dst), KM_SLEEP); dst->dst_pool = dp; @@ -133,9 +167,27 @@ dsl_sync_task_nowait(dsl_pool_t *dp, dsl_syncfunc_t *s dst->dst_error = 0; dst->dst_nowaiter = B_TRUE; - VERIFY(txg_list_add_tail(&dp->dp_sync_tasks, dst, dst->dst_txg)); + txg_list_t *task_list = (early) ? + &dp->dp_early_sync_tasks : &dp->dp_sync_tasks; + VERIFY(txg_list_add_tail(task_list, dst, dst->dst_txg)); } +void +dsl_sync_task_nowait(dsl_pool_t *dp, dsl_syncfunc_t *syncfunc, void *arg, + int blocks_modified, zfs_space_check_t space_check, dmu_tx_t *tx) +{ + dsl_sync_task_nowait_common(dp, syncfunc, arg, + blocks_modified, space_check, tx, B_FALSE); +} + +void +dsl_early_sync_task_nowait(dsl_pool_t *dp, dsl_syncfunc_t *syncfunc, void *arg, + int blocks_modified, zfs_space_check_t space_check, dmu_tx_t *tx) +{ + dsl_sync_task_nowait_common(dp, syncfunc, arg, + blocks_modified, space_check, tx, B_TRUE); +} + /* * Called in syncing context to execute the synctask. */ @@ -160,12 +212,12 @@ dsl_sync_task_sync(dsl_sync_task_t *dst, dmu_tx_t *tx) * (arc_tempreserve, dsl_pool_tempreserve). */ if (dst->dst_space_check != ZFS_SPACE_CHECK_NONE) { - uint64_t quota = dsl_pool_adjustedsize(dp, - dst->dst_space_check == ZFS_SPACE_CHECK_RESERVED) - - metaslab_class_get_deferred(spa_normal_class(dp->dp_spa)); + uint64_t quota = dsl_pool_unreserved_space(dp, + dst->dst_space_check); uint64_t used = dsl_dir_phys(dp->dp_root_dir)->dd_used_bytes; + /* MOS space is triple-dittoed, so we multiply by 3. */ - if (dst->dst_space > 0 && used + dst->dst_space * 3 > quota) { + if (used + dst->dst_space * 3 > quota) { dst->dst_error = SET_ERROR(ENOSPC); if (dst->dst_nowaiter) kmem_free(dst, sizeof (*dst)); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_userhold.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_userhold.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_userhold.c Wed Mar 28 18:12:06 2018 (r331695) @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2015 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2013 Steven Hartland. All rights reserved. */ @@ -602,7 +602,8 @@ dsl_dataset_user_release_impl(nvlist_t *holds, nvlist_ ddura.ddura_chkholds = fnvlist_alloc(); error = dsl_sync_task(pool, dsl_dataset_user_release_check, - dsl_dataset_user_release_sync, &ddura, 0, ZFS_SPACE_CHECK_NONE); + dsl_dataset_user_release_sync, &ddura, 0, + ZFS_SPACE_CHECK_EXTRA_RESERVED); fnvlist_free(ddura.ddura_todelete); fnvlist_free(ddura.ddura_chkholds); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Wed Mar 28 18:12:06 2018 (r331695) @@ -35,6 +35,7 @@ #include #include #include +#include #define GANG_ALLOCATION(flags) \ ((flags) & (METASLAB_GANG_CHILD | METASLAB_GANG_HEADER)) @@ -43,6 +44,14 @@ uint64_t metaslab_aliquot = 512ULL << 10; uint64_t metaslab_gang_bang = SPA_MAXBLOCKSIZE + 1; /* force gang blocks */ /* + * Since we can touch multiple metaslabs (and their respective space maps) + * with each transaction group, we benefit from having a smaller space map + * block size since it allows us to issue more I/O operations scattered + * around the disk. + */ +int zfs_metaslab_sm_blksz = (1 << 12); + +/* * The in-core space map representation is more compact than its on-disk form. * The zfs_condense_pct determines how much more compact the in-core * space map representation must be before we compact it on-disk. @@ -201,7 +210,7 @@ uint64_t metaslab_trace_max_entries = 5000; static uint64_t metaslab_weight(metaslab_t *); static void metaslab_set_fragmentation(metaslab_t *); -static void metaslab_free_impl(vdev_t *, uint64_t, uint64_t, uint64_t); +static void metaslab_free_impl(vdev_t *, uint64_t, uint64_t, boolean_t); static void metaslab_check_free_impl(vdev_t *, uint64_t, uint64_t); kmem_cache_t *metaslab_alloc_trace_cache; @@ -486,11 +495,11 @@ metaslab_verify_space(metaslab_t *msp, uint64_t txg) */ for (int t = 0; t < TXG_CONCURRENT_STATES; t++) { allocated += - range_tree_space(msp->ms_alloctree[(txg + t) & TXG_MASK]); + range_tree_space(msp->ms_allocating[(txg + t) & TXG_MASK]); } - msp_free_space = range_tree_space(msp->ms_tree) + allocated + - msp->ms_deferspace + range_tree_space(msp->ms_freedtree); + msp_free_space = range_tree_space(msp->ms_allocatable) + allocated + + msp->ms_deferspace + range_tree_space(msp->ms_freed); VERIFY3U(sm_free_space, ==, msp_free_space); } @@ -1028,9 +1037,9 @@ metaslab_rt_create(range_tree_t *rt, void *arg) metaslab_t *msp = arg; ASSERT3P(rt->rt_arg, ==, msp); - ASSERT(msp->ms_tree == NULL); + ASSERT(msp->ms_allocatable == NULL); - avl_create(&msp->ms_size_tree, metaslab_rangesize_compare, + avl_create(&msp->ms_allocatable_by_size, metaslab_rangesize_compare, sizeof (range_seg_t), offsetof(range_seg_t, rs_pp_node)); } @@ -1043,10 +1052,10 @@ metaslab_rt_destroy(range_tree_t *rt, void *arg) metaslab_t *msp = arg; ASSERT3P(rt->rt_arg, ==, msp); - ASSERT3P(msp->ms_tree, ==, rt); - ASSERT0(avl_numnodes(&msp->ms_size_tree)); + ASSERT3P(msp->ms_allocatable, ==, rt); + ASSERT0(avl_numnodes(&msp->ms_allocatable_by_size)); - avl_destroy(&msp->ms_size_tree); + avl_destroy(&msp->ms_allocatable_by_size); } static void @@ -1055,9 +1064,9 @@ metaslab_rt_add(range_tree_t *rt, range_seg_t *rs, voi metaslab_t *msp = arg; ASSERT3P(rt->rt_arg, ==, msp); - ASSERT3P(msp->ms_tree, ==, rt); + ASSERT3P(msp->ms_allocatable, ==, rt); VERIFY(!msp->ms_condensing); - avl_add(&msp->ms_size_tree, rs); + avl_add(&msp->ms_allocatable_by_size, rs); } static void @@ -1066,9 +1075,9 @@ metaslab_rt_remove(range_tree_t *rt, range_seg_t *rs, metaslab_t *msp = arg; ASSERT3P(rt->rt_arg, ==, msp); - ASSERT3P(msp->ms_tree, ==, rt); + ASSERT3P(msp->ms_allocatable, ==, rt); VERIFY(!msp->ms_condensing); - avl_remove(&msp->ms_size_tree, rs); + avl_remove(&msp->ms_allocatable_by_size, rs); } static void @@ -1077,7 +1086,7 @@ metaslab_rt_vacate(range_tree_t *rt, void *arg) metaslab_t *msp = arg; ASSERT3P(rt->rt_arg, ==, msp); - ASSERT3P(msp->ms_tree, ==, rt); + ASSERT3P(msp->ms_allocatable, ==, rt); /* * Normally one would walk the tree freeing nodes along the way. @@ -1085,7 +1094,7 @@ metaslab_rt_vacate(range_tree_t *rt, void *arg) * walking all nodes and just reinitialize the avl tree. The nodes * will be freed by the range tree, so we don't want to free them here. */ - avl_create(&msp->ms_size_tree, metaslab_rangesize_compare, + avl_create(&msp->ms_allocatable_by_size, metaslab_rangesize_compare, *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** From owner-svn-src-vendor@freebsd.org Wed Mar 28 18:12:08 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71ABAF6E880; Wed, 28 Mar 2018 18:12:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 227D47B072; Wed, 28 Mar 2018 18:12:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 1D0226875; Wed, 28 Mar 2018 18:12:08 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SIC7oq011306; Wed, 28 Mar 2018 18:12:07 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SIC7Ti011304; Wed, 28 Mar 2018 18:12:07 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803281812.w2SIC7Ti011304@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 18:12:07 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331695 - vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/fs/zfs/sys vendor-sys/illumo... X-SVN-Group: vendor X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/fs/zfs/sys vendor-sys/illumos/dist/uts/common/sys/fs ... X-SVN-Commit-Revision: 331695 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 18:12:08 -0000 Author: mav Date: Wed Mar 28 18:12:06 2018 New Revision: 331695 URL: https://svnweb.freebsd.org/changeset/base/331695 Log: 9166 zfs storage pool checkpoint illumos/illumos-gate@8671400134a11c848244896ca51a7db4d0f69da4 The idea of Storage Pool Checkpoint (aka zpool checkpoint) deals with exactly that. It can be thought of as a “pool-wide snapshot” (or a variation of extreme rewind that doesn’t corrupt your data). It remembers the entire state of the pool at the point that it was taken and the user can revert back to it later or discard it. Its generic use case is an administrator that is about to perform a set of destructive actions to ZFS as part of a critical procedure. She takes a checkpoint of the pool before performing the actions, then rewinds back to it if one of them fails or puts the pool into an unexpected state. Otherwise, she discards it. With the assumption that no one else is making modifications to ZFS, she basically wraps all these actions into a “high-level transaction”. Reviewed by: Matthew Ahrens Reviewed by: John Kennedy Reviewed by: Dan Kimmel Approved by: Richard Lowe Author: Serapheim Dimitropoulos Modified: vendor/illumos/dist/cmd/zdb/zdb.c vendor/illumos/dist/cmd/zdb/zdb_il.c vendor/illumos/dist/cmd/zpool/zpool_main.c vendor/illumos/dist/cmd/ztest/ztest.c vendor/illumos/dist/lib/libzfs/common/libzfs.h vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c vendor/illumos/dist/lib/libzfs/common/libzfs_util.c vendor/illumos/dist/lib/libzfs_core/common/libzfs_core.c vendor/illumos/dist/lib/libzfs_core/common/libzfs_core.h vendor/illumos/dist/man/man1m/zdb.1m vendor/illumos/dist/man/man1m/zpool.1m vendor/illumos/dist/man/man5/zpool-features.5 Changes in other areas also in this revision: Modified: vendor-sys/illumos/dist/common/zfs/zfeature_common.c vendor-sys/illumos/dist/common/zfs/zfeature_common.h vendor-sys/illumos/dist/common/zfs/zpool_prop.c vendor-sys/illumos/dist/uts/common/Makefile.files vendor-sys/illumos/dist/uts/common/fs/zfs/dmu_traverse.c vendor-sys/illumos/dist/uts/common/fs/zfs/dnode.c vendor-sys/illumos/dist/uts/common/fs/zfs/dnode_sync.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dataset.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_destroy.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_dir.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_pool.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_scan.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_synctask.c vendor-sys/illumos/dist/uts/common/fs/zfs/dsl_userhold.c vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c vendor-sys/illumos/dist/uts/common/fs/zfs/range_tree.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c vendor-sys/illumos/dist/uts/common/fs/zfs/spa_misc.c vendor-sys/illumos/dist/uts/common/fs/zfs/space_map.c vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dmu.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_dir.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_pool.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/dsl_synctask.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/metaslab.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/metaslab_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/range_tree.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/space_map.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/uberblock_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev_impl.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/vdev_removal.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zio.h vendor-sys/illumos/dist/uts/common/fs/zfs/sys/zthr.h vendor-sys/illumos/dist/uts/common/fs/zfs/uberblock.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_indirect.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_label.c vendor-sys/illumos/dist/uts/common/fs/zfs/vdev_removal.c vendor-sys/illumos/dist/uts/common/fs/zfs/zcp.c vendor-sys/illumos/dist/uts/common/fs/zfs/zcp_synctask.c vendor-sys/illumos/dist/uts/common/fs/zfs/zfs_ioctl.c vendor-sys/illumos/dist/uts/common/fs/zfs/zil.c vendor-sys/illumos/dist/uts/common/fs/zfs/zio.c vendor-sys/illumos/dist/uts/common/fs/zfs/zthr.c vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Modified: vendor/illumos/dist/cmd/zdb/zdb.c ============================================================================== --- vendor/illumos/dist/cmd/zdb/zdb.c Wed Mar 28 17:54:34 2018 (r331694) +++ vendor/illumos/dist/cmd/zdb/zdb.c Wed Mar 28 18:12:06 2018 (r331695) @@ -131,7 +131,7 @@ static void usage(void) { (void) fprintf(stderr, - "Usage:\t%s [-AbcdDFGhiLMPsvX] [-e [-V] [-p ...]] " + "Usage:\t%s [-AbcdDFGhikLMPsvX] [-e [-V] [-p ...]] " "[-I ]\n" "\t\t[-o =]... [-t ] [-U ] [-x ]\n" "\t\t[ [ ...]]\n" @@ -168,6 +168,8 @@ usage(void) (void) fprintf(stderr, " -h pool history\n"); (void) fprintf(stderr, " -i intent logs\n"); (void) fprintf(stderr, " -l read label contents\n"); + (void) fprintf(stderr, " -k examine the checkpointed state " + "of the pool\n"); (void) fprintf(stderr, " -L disable leak tracking (do not " "load spacemaps)\n"); (void) fprintf(stderr, " -m metaslabs\n"); @@ -729,6 +731,22 @@ get_prev_obsolete_spacemap_refcount(spa_t *spa) } static int +get_checkpoint_refcount(vdev_t *vd) +{ + int refcount = 0; + + if (vd->vdev_top == vd && vd->vdev_top_zap != 0 && + zap_contains(spa_meta_objset(vd->vdev_spa), + vd->vdev_top_zap, VDEV_TOP_ZAP_POOL_CHECKPOINT_SM) == 0) + refcount++; + + for (uint64_t c = 0; c < vd->vdev_children; c++) + refcount += get_checkpoint_refcount(vd->vdev_child[c]); + + return (refcount); +} + +static int verify_spacemap_refcounts(spa_t *spa) { uint64_t expected_refcount = 0; @@ -741,6 +759,7 @@ verify_spacemap_refcounts(spa_t *spa) actual_refcount += get_metaslab_refcount(spa->spa_root_vdev); actual_refcount += get_obsolete_refcount(spa->spa_root_vdev); actual_refcount += get_prev_obsolete_spacemap_refcount(spa); + actual_refcount += get_checkpoint_refcount(spa->spa_root_vdev); if (expected_refcount != actual_refcount) { (void) printf("space map refcount mismatch: expected %lld != " @@ -814,8 +833,8 @@ static void dump_metaslab_stats(metaslab_t *msp) { char maxbuf[32]; - range_tree_t *rt = msp->ms_tree; - avl_tree_t *t = &msp->ms_size_tree; + range_tree_t *rt = msp->ms_allocatable; + avl_tree_t *t = &msp->ms_allocatable_by_size; int free_pct = range_tree_space(rt) * 100 / msp->ms_size; /* max sure nicenum has enough space */ @@ -851,7 +870,7 @@ dump_metaslab(metaslab_t *msp) metaslab_load_wait(msp); if (!msp->ms_loaded) { VERIFY0(metaslab_load(msp)); - range_tree_stat_verify(msp->ms_tree); + range_tree_stat_verify(msp->ms_allocatable); } dump_metaslab_stats(msp); metaslab_unload(msp); @@ -2264,6 +2283,8 @@ dump_uberblock(uberblock_t *ub, const char *header, co snprintf_blkptr(blkbuf, sizeof (blkbuf), &ub->ub_rootbp); (void) printf("\trootbp = %s\n", blkbuf); } + (void) printf("\tcheckpoint_txg = %llu\n", + (u_longlong_t)ub->ub_checkpoint_txg); (void) printf("%s", footer ? footer : ""); } @@ -2624,6 +2645,7 @@ static const char *zdb_ot_extname[] = { typedef struct zdb_cb { zdb_blkstats_t zcb_type[ZB_TOTAL + 1][ZDB_OT_TOTAL + 1]; uint64_t zcb_removing_size; + uint64_t zcb_checkpoint_size; uint64_t zcb_dedup_asize; uint64_t zcb_dedup_blocks; uint64_t zcb_embedded_blocks[NUM_BP_EMBEDDED_TYPES]; @@ -2723,7 +2745,7 @@ zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const b } VERIFY3U(zio_wait(zio_claim(NULL, zcb->zcb_spa, - refcnt ? 0 : spa_first_txg(zcb->zcb_spa), + refcnt ? 0 : spa_min_claim_txg(zcb->zcb_spa), bp, NULL, NULL, ZIO_FLAG_CANFAIL)), ==, 0); } @@ -2923,7 +2945,7 @@ claim_segment_impl_cb(uint64_t inner_offset, vdev_t *v ASSERT(vdev_is_concrete(vd)); VERIFY0(metaslab_claim_impl(vd, offset, size, - spa_first_txg(vd->vdev_spa))); + spa_min_claim_txg(vd->vdev_spa))); } static void @@ -2984,70 +3006,6 @@ zdb_claim_removing(spa_t *spa, zdb_cb_t *zcb) spa_config_exit(spa, SCL_CONFIG, FTAG); } -/* - * vm_idxp is an in-out parameter which (for indirect vdevs) is the - * index in vim_entries that has the first entry in this metaslab. On - * return, it will be set to the first entry after this metaslab. - */ -static void -zdb_leak_init_ms(metaslab_t *msp, uint64_t *vim_idxp) -{ - metaslab_group_t *mg = msp->ms_group; - vdev_t *vd = mg->mg_vd; - vdev_t *rvd = vd->vdev_spa->spa_root_vdev; - - mutex_enter(&msp->ms_lock); - metaslab_unload(msp); - - /* - * We don't want to spend the CPU manipulating the size-ordered - * tree, so clear the range_tree ops. - */ - msp->ms_tree->rt_ops = NULL; - - (void) fprintf(stderr, - "\rloading vdev %llu of %llu, metaslab %llu of %llu ...", - (longlong_t)vd->vdev_id, - (longlong_t)rvd->vdev_children, - (longlong_t)msp->ms_id, - (longlong_t)vd->vdev_ms_count); - - /* - * For leak detection, we overload the metaslab ms_tree to - * contain allocated segments instead of free segments. As a - * result, we can't use the normal metaslab_load/unload - * interfaces. - */ - if (vd->vdev_ops == &vdev_indirect_ops) { - vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping; - for (; *vim_idxp < vdev_indirect_mapping_num_entries(vim); - (*vim_idxp)++) { - vdev_indirect_mapping_entry_phys_t *vimep = - &vim->vim_entries[*vim_idxp]; - uint64_t ent_offset = DVA_MAPPING_GET_SRC_OFFSET(vimep); - uint64_t ent_len = DVA_GET_ASIZE(&vimep->vimep_dst); - ASSERT3U(ent_offset, >=, msp->ms_start); - if (ent_offset >= msp->ms_start + msp->ms_size) - break; - - /* - * Mappings do not cross metaslab boundaries, - * because we create them by walking the metaslabs. - */ - ASSERT3U(ent_offset + ent_len, <=, - msp->ms_start + msp->ms_size); - range_tree_add(msp->ms_tree, ent_offset, ent_len); - } - } else if (msp->ms_sm != NULL) { - VERIFY0(space_map_load(msp->ms_sm, msp->ms_tree, SM_ALLOC)); - } - - if (!msp->ms_loaded) { - msp->ms_loaded = B_TRUE; - } - mutex_exit(&msp->ms_lock); -} - /* ARGSUSED */ static int increment_indirect_mapping_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx) @@ -3104,7 +3062,243 @@ zdb_load_obsolete_counts(vdev_t *vd) return (counts); } +typedef struct checkpoint_sm_exclude_entry_arg { + vdev_t *cseea_vd; + uint64_t cseea_checkpoint_size; +} checkpoint_sm_exclude_entry_arg_t; + +static int +checkpoint_sm_exclude_entry_cb(maptype_t type, uint64_t offset, uint64_t size, + void *arg) +{ + checkpoint_sm_exclude_entry_arg_t *cseea = arg; + vdev_t *vd = cseea->cseea_vd; + metaslab_t *ms = vd->vdev_ms[offset >> vd->vdev_ms_shift]; + uint64_t end = offset + size; + + ASSERT(type == SM_FREE); + + /* + * Since the vdev_checkpoint_sm exists in the vdev level + * and the ms_sm space maps exist in the metaslab level, + * an entry in the checkpoint space map could theoretically + * cross the boundaries of the metaslab that it belongs. + * + * In reality, because of the way that we populate and + * manipulate the checkpoint's space maps currently, + * there shouldn't be any entries that cross metaslabs. + * Hence the assertion below. + * + * That said, there is no fundamental requirement that + * the checkpoint's space map entries should not cross + * metaslab boundaries. So if needed we could add code + * that handles metaslab-crossing segments in the future. + */ + VERIFY3U(offset, >=, ms->ms_start); + VERIFY3U(end, <=, ms->ms_start + ms->ms_size); + + /* + * By removing the entry from the allocated segments we + * also verify that the entry is there to begin with. + */ + mutex_enter(&ms->ms_lock); + range_tree_remove(ms->ms_allocatable, offset, size); + mutex_exit(&ms->ms_lock); + + cseea->cseea_checkpoint_size += size; + return (0); +} + static void +zdb_leak_init_vdev_exclude_checkpoint(vdev_t *vd, zdb_cb_t *zcb) +{ + spa_t *spa = vd->vdev_spa; + space_map_t *checkpoint_sm = NULL; + uint64_t checkpoint_sm_obj; + + /* + * If there is no vdev_top_zap, we are in a pool whose + * version predates the pool checkpoint feature. + */ + if (vd->vdev_top_zap == 0) + return; + + /* + * If there is no reference of the vdev_checkpoint_sm in + * the vdev_top_zap, then one of the following scenarios + * is true: + * + * 1] There is no checkpoint + * 2] There is a checkpoint, but no checkpointed blocks + * have been freed yet + * 3] The current vdev is indirect + * + * In these cases we return immediately. + */ + if (zap_contains(spa_meta_objset(spa), vd->vdev_top_zap, + VDEV_TOP_ZAP_POOL_CHECKPOINT_SM) != 0) + return; + + VERIFY0(zap_lookup(spa_meta_objset(spa), vd->vdev_top_zap, + VDEV_TOP_ZAP_POOL_CHECKPOINT_SM, sizeof (uint64_t), 1, + &checkpoint_sm_obj)); + + checkpoint_sm_exclude_entry_arg_t cseea; + cseea.cseea_vd = vd; + cseea.cseea_checkpoint_size = 0; + + VERIFY0(space_map_open(&checkpoint_sm, spa_meta_objset(spa), + checkpoint_sm_obj, 0, vd->vdev_asize, vd->vdev_ashift)); + space_map_update(checkpoint_sm); + + VERIFY0(space_map_iterate(checkpoint_sm, + checkpoint_sm_exclude_entry_cb, &cseea)); + space_map_close(checkpoint_sm); + + zcb->zcb_checkpoint_size += cseea.cseea_checkpoint_size; +} + +static void +zdb_leak_init_exclude_checkpoint(spa_t *spa, zdb_cb_t *zcb) +{ + vdev_t *rvd = spa->spa_root_vdev; + for (uint64_t c = 0; c < rvd->vdev_children; c++) { + ASSERT3U(c, ==, rvd->vdev_child[c]->vdev_id); + zdb_leak_init_vdev_exclude_checkpoint(rvd->vdev_child[c], zcb); + } +} + +static void +load_concrete_ms_allocatable_trees(spa_t *spa, maptype_t maptype) +{ + vdev_t *rvd = spa->spa_root_vdev; + for (uint64_t i = 0; i < rvd->vdev_children; i++) { + vdev_t *vd = rvd->vdev_child[i]; + + ASSERT3U(i, ==, vd->vdev_id); + + if (vd->vdev_ops == &vdev_indirect_ops) + continue; + + for (uint64_t m = 0; m < vd->vdev_ms_count; m++) { + metaslab_t *msp = vd->vdev_ms[m]; + + (void) fprintf(stderr, + "\rloading concrete vdev %llu, " + "metaslab %llu of %llu ...", + (longlong_t)vd->vdev_id, + (longlong_t)msp->ms_id, + (longlong_t)vd->vdev_ms_count); + + mutex_enter(&msp->ms_lock); + metaslab_unload(msp); + + /* + * We don't want to spend the CPU manipulating the + * size-ordered tree, so clear the range_tree ops. + */ + msp->ms_allocatable->rt_ops = NULL; + + if (msp->ms_sm != NULL) { + VERIFY0(space_map_load(msp->ms_sm, + msp->ms_allocatable, maptype)); + } + if (!msp->ms_loaded) + msp->ms_loaded = B_TRUE; + mutex_exit(&msp->ms_lock); + } + } +} + +/* + * vm_idxp is an in-out parameter which (for indirect vdevs) is the + * index in vim_entries that has the first entry in this metaslab. + * On return, it will be set to the first entry after this metaslab. + */ +static void +load_indirect_ms_allocatable_tree(vdev_t *vd, metaslab_t *msp, + uint64_t *vim_idxp) +{ + vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping; + + mutex_enter(&msp->ms_lock); + metaslab_unload(msp); + + /* + * We don't want to spend the CPU manipulating the + * size-ordered tree, so clear the range_tree ops. + */ + msp->ms_allocatable->rt_ops = NULL; + + for (; *vim_idxp < vdev_indirect_mapping_num_entries(vim); + (*vim_idxp)++) { + vdev_indirect_mapping_entry_phys_t *vimep = + &vim->vim_entries[*vim_idxp]; + uint64_t ent_offset = DVA_MAPPING_GET_SRC_OFFSET(vimep); + uint64_t ent_len = DVA_GET_ASIZE(&vimep->vimep_dst); + ASSERT3U(ent_offset, >=, msp->ms_start); + if (ent_offset >= msp->ms_start + msp->ms_size) + break; + + /* + * Mappings do not cross metaslab boundaries, + * because we create them by walking the metaslabs. + */ + ASSERT3U(ent_offset + ent_len, <=, + msp->ms_start + msp->ms_size); + range_tree_add(msp->ms_allocatable, ent_offset, ent_len); + } + + if (!msp->ms_loaded) + msp->ms_loaded = B_TRUE; + mutex_exit(&msp->ms_lock); +} + +static void +zdb_leak_init_prepare_indirect_vdevs(spa_t *spa, zdb_cb_t *zcb) +{ + vdev_t *rvd = spa->spa_root_vdev; + for (uint64_t c = 0; c < rvd->vdev_children; c++) { + vdev_t *vd = rvd->vdev_child[c]; + + ASSERT3U(c, ==, vd->vdev_id); + + if (vd->vdev_ops != &vdev_indirect_ops) + continue; + + /* + * Note: we don't check for mapping leaks on + * removing vdevs because their ms_allocatable's + * are used to look for leaks in allocated space. + */ + zcb->zcb_vd_obsolete_counts[c] = zdb_load_obsolete_counts(vd); + + /* + * Normally, indirect vdevs don't have any + * metaslabs. We want to set them up for + * zio_claim(). + */ + VERIFY0(vdev_metaslab_init(vd, 0)); + + vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping; + uint64_t vim_idx = 0; + for (uint64_t m = 0; m < vd->vdev_ms_count; m++) { + + (void) fprintf(stderr, + "\rloading indirect vdev %llu, " + "metaslab %llu of %llu ...", + (longlong_t)vd->vdev_id, + (longlong_t)vd->vdev_ms[m]->ms_id, + (longlong_t)vd->vdev_ms_count); + + load_indirect_ms_allocatable_tree(vd, vd->vdev_ms[m], + &vim_idx); + } + ASSERT3U(vim_idx, ==, vdev_indirect_mapping_num_entries(vim)); + } +} + +static void zdb_leak_init(spa_t *spa, zdb_cb_t *zcb) { zcb->zcb_spa = spa; @@ -3115,7 +3309,7 @@ zdb_leak_init(spa_t *spa, zdb_cb_t *zcb) /* * We are going to be changing the meaning of the metaslab's - * ms_tree. Ensure that the allocator doesn't try to + * ms_allocatable. Ensure that the allocator doesn't try to * use the tree. */ spa->spa_normal_class->mc_ops = &zdb_metaslab_ops; @@ -3125,39 +3319,37 @@ zdb_leak_init(spa_t *spa, zdb_cb_t *zcb) umem_zalloc(rvd->vdev_children * sizeof (uint32_t *), UMEM_NOFAIL); + /* + * For leak detection, we overload the ms_allocatable trees + * to contain allocated segments instead of free segments. + * As a result, we can't use the normal metaslab_load/unload + * interfaces. + */ + zdb_leak_init_prepare_indirect_vdevs(spa, zcb); + load_concrete_ms_allocatable_trees(spa, SM_ALLOC); - for (uint64_t c = 0; c < rvd->vdev_children; c++) { - vdev_t *vd = rvd->vdev_child[c]; - uint64_t vim_idx = 0; + /* + * On load_concrete_ms_allocatable_trees() we loaded all the + * allocated entries from the ms_sm to the ms_allocatable for + * each metaslab. If the pool has a checkpoint or is in the + * middle of discarding a checkpoint, some of these blocks + * may have been freed but their ms_sm may not have been + * updated because they are referenced by the checkpoint. In + * order to avoid false-positives during leak-detection, we + * go through the vdev's checkpoint space map and exclude all + * its entries from their relevant ms_allocatable. + * + * We also aggregate the space held by the checkpoint and add + * it to zcb_checkpoint_size. + * + * Note that at this point we are also verifying that all the + * entries on the checkpoint_sm are marked as allocated in + * the ms_sm of their relevant metaslab. + * [see comment in checkpoint_sm_exclude_entry_cb()] + */ + zdb_leak_init_exclude_checkpoint(spa, zcb); - ASSERT3U(c, ==, vd->vdev_id); - - /* - * Note: we don't check for mapping leaks on - * removing vdevs because their ms_tree's are - * used to look for leaks in allocated space. - */ - if (vd->vdev_ops == &vdev_indirect_ops) { - zcb->zcb_vd_obsolete_counts[c] = - zdb_load_obsolete_counts(vd); - - /* - * Normally, indirect vdevs don't have any - * metaslabs. We want to set them up for - * zio_claim(). - */ - VERIFY0(vdev_metaslab_init(vd, 0)); - } - - for (uint64_t m = 0; m < vd->vdev_ms_count; m++) { - zdb_leak_init_ms(vd->vdev_ms[m], &vim_idx); - } - if (vd->vdev_ops == &vdev_indirect_ops) { - ASSERT3U(vim_idx, ==, - vdev_indirect_mapping_num_entries( - vd->vdev_indirect_mapping)); - } - } + /* for cleaner progress output */ (void) fprintf(stderr, "\n"); if (bpobj_is_open(&dp->dp_obsolete_bpobj)) { @@ -3166,12 +3358,16 @@ zdb_leak_init(spa_t *spa, zdb_cb_t *zcb) (void) bpobj_iterate_nofree(&dp->dp_obsolete_bpobj, increment_indirect_mapping_cb, zcb, NULL); } + } else { + /* + * If leak tracing is disabled, we still need to consider + * any checkpointed space in our space verification. + */ + zcb->zcb_checkpoint_size += spa_get_checkpoint_space(spa); } spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER); - zdb_ddt_leak_init(spa, zcb); - spa_config_exit(spa, SCL_CONFIG, FTAG); } @@ -3198,7 +3394,7 @@ zdb_check_for_obsolete_leaks(vdev_t *vd, zdb_cb_t *zcb for (uint64_t inner_offset = 0; inner_offset < DVA_GET_ASIZE(&vimep->vimep_dst); inner_offset += 1 << vd->vdev_ashift) { - if (range_tree_contains(msp->ms_tree, + if (range_tree_contains(msp->ms_allocatable, offset + inner_offset, 1 << vd->vdev_ashift)) { obsolete_bytes += 1 << vd->vdev_ashift; } @@ -3264,23 +3460,23 @@ zdb_leak_fini(spa_t *spa, zdb_cb_t *zcb) ASSERT3P(mg, ==, msp->ms_group); /* - * The ms_tree has been overloaded to - * contain allocated segments. Now that we - * finished traversing all blocks, any - * block that remains in the ms_tree + * ms_allocatable has been overloaded + * to contain allocated segments. Now that + * we finished traversing all blocks, any + * block that remains in the ms_allocatable * represents an allocated block that we * did not claim during the traversal. * Claimed blocks would have been removed - * from the ms_tree. For indirect vdevs, - * space remaining in the tree represents - * parts of the mapping that are not - * referenced, which is not a bug. + * from the ms_allocatable. For indirect + * vdevs, space remaining in the tree + * represents parts of the mapping that are + * not referenced, which is not a bug. */ if (vd->vdev_ops == &vdev_indirect_ops) { - range_tree_vacate(msp->ms_tree, + range_tree_vacate(msp->ms_allocatable, NULL, NULL); } else { - range_tree_vacate(msp->ms_tree, + range_tree_vacate(msp->ms_allocatable, zdb_leak, vd); } @@ -3403,7 +3599,7 @@ dump_block_stats(spa_t *spa) total_alloc = norm_alloc + metaslab_class_get_alloc(spa_log_class(spa)); total_found = tzb->zb_asize - zcb.zcb_dedup_asize + - zcb.zcb_removing_size; + zcb.zcb_removing_size + zcb.zcb_checkpoint_size; if (total_found == total_alloc) { if (!dump_opt['L']) @@ -3812,7 +4008,385 @@ verify_device_removal_feature_counts(spa_t *spa) return (ret); } +#define BOGUS_SUFFIX "_CHECKPOINTED_UNIVERSE" +/* + * Import the checkpointed state of the pool specified by the target + * parameter as readonly. The function also accepts a pool config + * as an optional parameter, else it attempts to infer the config by + * the name of the target pool. + * + * Note that the checkpointed state's pool name will be the name of + * the original pool with the above suffix appened to it. In addition, + * if the target is not a pool name (e.g. a path to a dataset) then + * the new_path parameter is populated with the updated path to + * reflect the fact that we are looking into the checkpointed state. + * + * The function returns a newly-allocated copy of the name of the + * pool containing the checkpointed state. When this copy is no + * longer needed it should be freed with free(3C). Same thing + * applies to the new_path parameter if allocated. + */ +static char * +import_checkpointed_state(char *target, nvlist_t *cfg, char **new_path) +{ + int error = 0; + char *poolname, *bogus_name; + + /* If the target is not a pool, the extract the pool name */ + char *path_start = strchr(target, '/'); + if (path_start != NULL) { + size_t poolname_len = path_start - target; + poolname = strndup(target, poolname_len); + } else { + poolname = target; + } + + if (cfg == NULL) { + error = spa_get_stats(poolname, &cfg, NULL, 0); + if (error != 0) { + fatal("Tried to read config of pool \"%s\" but " + "spa_get_stats() failed with error %d\n", + poolname, error); + } + } + + (void) asprintf(&bogus_name, "%s%s", poolname, BOGUS_SUFFIX); + fnvlist_add_string(cfg, ZPOOL_CONFIG_POOL_NAME, bogus_name); + + error = spa_import(bogus_name, cfg, NULL, + ZFS_IMPORT_MISSING_LOG | ZFS_IMPORT_CHECKPOINT); + if (error != 0) { + fatal("Tried to import pool \"%s\" but spa_import() failed " + "with error %d\n", bogus_name, error); + } + + if (new_path != NULL && path_start != NULL) + (void) asprintf(new_path, "%s%s", bogus_name, path_start); + + if (target != poolname) + free(poolname); + + return (bogus_name); +} + +typedef struct verify_checkpoint_sm_entry_cb_arg { + vdev_t *vcsec_vd; + + /* the following fields are only used for printing progress */ + uint64_t vcsec_entryid; + uint64_t vcsec_num_entries; +} verify_checkpoint_sm_entry_cb_arg_t; + +#define ENTRIES_PER_PROGRESS_UPDATE 10000 + +static int +verify_checkpoint_sm_entry_cb(maptype_t type, uint64_t offset, uint64_t size, + void *arg) +{ + verify_checkpoint_sm_entry_cb_arg_t *vcsec = arg; + vdev_t *vd = vcsec->vcsec_vd; + metaslab_t *ms = vd->vdev_ms[offset >> vd->vdev_ms_shift]; + uint64_t end = offset + size; + + ASSERT(type == SM_FREE); + + if ((vcsec->vcsec_entryid % ENTRIES_PER_PROGRESS_UPDATE) == 0) { + (void) fprintf(stderr, + "\rverifying vdev %llu, space map entry %llu of %llu ...", + (longlong_t)vd->vdev_id, + (longlong_t)vcsec->vcsec_entryid, + (longlong_t)vcsec->vcsec_num_entries); + } + vcsec->vcsec_entryid++; + + /* + * See comment in checkpoint_sm_exclude_entry_cb() + */ + VERIFY3U(offset, >=, ms->ms_start); + VERIFY3U(end, <=, ms->ms_start + ms->ms_size); + + /* + * The entries in the vdev_checkpoint_sm should be marked as + * allocated in the checkpointed state of the pool, therefore + * their respective ms_allocateable trees should not contain them. + */ + mutex_enter(&ms->ms_lock); + range_tree_verify(ms->ms_allocatable, offset, size); + mutex_exit(&ms->ms_lock); + + return (0); +} + +/* + * Verify that all segments in the vdev_checkpoint_sm are allocated + * according to the checkpoint's ms_sm (i.e. are not in the checkpoint's + * ms_allocatable). + * + * Do so by comparing the checkpoint space maps (vdev_checkpoint_sm) of + * each vdev in the current state of the pool to the metaslab space maps + * (ms_sm) of the checkpointed state of the pool. + * + * Note that the function changes the state of the ms_allocatable + * trees of the current spa_t. The entries of these ms_allocatable + * trees are cleared out and then repopulated from with the free + * entries of their respective ms_sm space maps. + */ static void +verify_checkpoint_vdev_spacemaps(spa_t *checkpoint, spa_t *current) +{ + vdev_t *ckpoint_rvd = checkpoint->spa_root_vdev; + vdev_t *current_rvd = current->spa_root_vdev; + + load_concrete_ms_allocatable_trees(checkpoint, SM_FREE); + + for (uint64_t c = 0; c < ckpoint_rvd->vdev_children; c++) { + vdev_t *ckpoint_vd = ckpoint_rvd->vdev_child[c]; + vdev_t *current_vd = current_rvd->vdev_child[c]; + + space_map_t *checkpoint_sm = NULL; + uint64_t checkpoint_sm_obj; + + if (ckpoint_vd->vdev_ops == &vdev_indirect_ops) { + /* + * Since we don't allow device removal in a pool + * that has a checkpoint, we expect that all removed + * vdevs were removed from the pool before the + * checkpoint. + */ + ASSERT3P(current_vd->vdev_ops, ==, &vdev_indirect_ops); + continue; + } + + /* + * If the checkpoint space map doesn't exist, then nothing + * here is checkpointed so there's nothing to verify. + */ + if (current_vd->vdev_top_zap == 0 || + zap_contains(spa_meta_objset(current), + current_vd->vdev_top_zap, + VDEV_TOP_ZAP_POOL_CHECKPOINT_SM) != 0) + continue; + + VERIFY0(zap_lookup(spa_meta_objset(current), + current_vd->vdev_top_zap, VDEV_TOP_ZAP_POOL_CHECKPOINT_SM, + sizeof (uint64_t), 1, &checkpoint_sm_obj)); + + VERIFY0(space_map_open(&checkpoint_sm, spa_meta_objset(current), + checkpoint_sm_obj, 0, current_vd->vdev_asize, + current_vd->vdev_ashift)); + space_map_update(checkpoint_sm); + + verify_checkpoint_sm_entry_cb_arg_t vcsec; + vcsec.vcsec_vd = ckpoint_vd; + vcsec.vcsec_entryid = 0; + vcsec.vcsec_num_entries = + space_map_length(checkpoint_sm) / sizeof (uint64_t); + VERIFY0(space_map_iterate(checkpoint_sm, + verify_checkpoint_sm_entry_cb, &vcsec)); + dump_spacemap(current->spa_meta_objset, checkpoint_sm); + space_map_close(checkpoint_sm); + } + + /* + * If we've added vdevs since we took the checkpoint, ensure + * that their checkpoint space maps are empty. + */ + if (ckpoint_rvd->vdev_children < current_rvd->vdev_children) { + for (uint64_t c = ckpoint_rvd->vdev_children; + c < current_rvd->vdev_children; c++) { + vdev_t *current_vd = current_rvd->vdev_child[c]; + ASSERT3P(current_vd->vdev_checkpoint_sm, ==, NULL); + } + } + + /* for cleaner progress output */ + (void) fprintf(stderr, "\n"); +} + +/* + * Verifies that all space that's allocated in the checkpoint is + * still allocated in the current version, by checking that everything + * in checkpoint's ms_allocatable (which is actually allocated, not + * allocatable/free) is not present in current's ms_allocatable. + * + * Note that the function changes the state of the ms_allocatable + * trees of both spas when called. The entries of all ms_allocatable + * trees are cleared out and then repopulated from their respective + * ms_sm space maps. In the checkpointed state we load the allocated + * entries, and in the current state we load the free entries. + */ +static void +verify_checkpoint_ms_spacemaps(spa_t *checkpoint, spa_t *current) +{ + vdev_t *ckpoint_rvd = checkpoint->spa_root_vdev; + vdev_t *current_rvd = current->spa_root_vdev; + + load_concrete_ms_allocatable_trees(checkpoint, SM_ALLOC); + load_concrete_ms_allocatable_trees(current, SM_FREE); + + for (uint64_t i = 0; i < ckpoint_rvd->vdev_children; i++) { + vdev_t *ckpoint_vd = ckpoint_rvd->vdev_child[i]; + vdev_t *current_vd = current_rvd->vdev_child[i]; + + if (ckpoint_vd->vdev_ops == &vdev_indirect_ops) { + /* + * See comment in verify_checkpoint_vdev_spacemaps() + */ + ASSERT3P(current_vd->vdev_ops, ==, &vdev_indirect_ops); + continue; + } + + for (uint64_t m = 0; m < ckpoint_vd->vdev_ms_count; m++) { + metaslab_t *ckpoint_msp = ckpoint_vd->vdev_ms[m]; + metaslab_t *current_msp = current_vd->vdev_ms[m]; + + (void) fprintf(stderr, + "\rverifying vdev %llu of %llu, " + "metaslab %llu of %llu ...", + (longlong_t)current_vd->vdev_id, + (longlong_t)current_rvd->vdev_children, + (longlong_t)current_vd->vdev_ms[m]->ms_id, + (longlong_t)current_vd->vdev_ms_count); + + /* + * We walk through the ms_allocatable trees that + * are loaded with the allocated blocks from the + * ms_sm spacemaps of the checkpoint. For each + * one of these ranges we ensure that none of them + * exists in the ms_allocatable trees of the + * current state which are loaded with the ranges + * that are currently free. + * + * This way we ensure that none of the blocks that + * are part of the checkpoint were freed by mistake. + */ + range_tree_walk(ckpoint_msp->ms_allocatable, + (range_tree_func_t *)range_tree_verify, + current_msp->ms_allocatable); + } + } + + /* for cleaner progress output */ + (void) fprintf(stderr, "\n"); +} + +static void +verify_checkpoint_blocks(spa_t *spa) +{ + spa_t *checkpoint_spa; + char *checkpoint_pool; + nvlist_t *config = NULL; + int error = 0; + + /* + * We import the checkpointed state of the pool (under a different + * name) so we can do verification on it against the current state + * of the pool. + */ + checkpoint_pool = import_checkpointed_state(spa->spa_name, config, + NULL); + ASSERT(strcmp(spa->spa_name, checkpoint_pool) != 0); + + error = spa_open(checkpoint_pool, &checkpoint_spa, FTAG); + if (error != 0) { + fatal("Tried to open pool \"%s\" but spa_open() failed with " + "error %d\n", checkpoint_pool, error); + } + + /* + * Ensure that ranges in the checkpoint space maps of each vdev + * are allocated according to the checkpointed state's metaslab + * space maps. + */ + verify_checkpoint_vdev_spacemaps(checkpoint_spa, spa); + + /* + * Ensure that allocated ranges in the checkpoint's metaslab + * space maps remain allocated in the metaslab space maps of + * the current state. + */ + verify_checkpoint_ms_spacemaps(checkpoint_spa, spa); + + /* + * Once we are done, we get rid of the checkpointed state. + */ + spa_close(checkpoint_spa, FTAG); + free(checkpoint_pool); +} + +static void +dump_leftover_checkpoint_blocks(spa_t *spa) +{ + vdev_t *rvd = spa->spa_root_vdev; + + for (uint64_t i = 0; i < rvd->vdev_children; i++) { + vdev_t *vd = rvd->vdev_child[i]; + + space_map_t *checkpoint_sm = NULL; + uint64_t checkpoint_sm_obj; + + if (vd->vdev_top_zap == 0) + continue; + + if (zap_contains(spa_meta_objset(spa), vd->vdev_top_zap, + VDEV_TOP_ZAP_POOL_CHECKPOINT_SM) != 0) + continue; + + VERIFY0(zap_lookup(spa_meta_objset(spa), vd->vdev_top_zap, + VDEV_TOP_ZAP_POOL_CHECKPOINT_SM, + sizeof (uint64_t), 1, &checkpoint_sm_obj)); + + VERIFY0(space_map_open(&checkpoint_sm, spa_meta_objset(spa), + checkpoint_sm_obj, 0, vd->vdev_asize, vd->vdev_ashift)); + space_map_update(checkpoint_sm); + dump_spacemap(spa->spa_meta_objset, checkpoint_sm); + space_map_close(checkpoint_sm); + } +} + +static int +verify_checkpoint(spa_t *spa) +{ + uberblock_t checkpoint; + int error; + + if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)) + return (0); + + error = zap_lookup(spa->spa_meta_objset, DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_ZPOOL_CHECKPOINT, sizeof (uint64_t), + sizeof (uberblock_t) / sizeof (uint64_t), &checkpoint); + + if (error == ENOENT) { + /* + * If the feature is active but the uberblock is missing + * then we must be in the middle of discarding the + * checkpoint. + */ + (void) printf("\nPartially discarded checkpoint " + "state found:\n"); + dump_leftover_checkpoint_blocks(spa); + return (0); + } else if (error != 0) { + (void) printf("lookup error %d when looking for " + "checkpointed uberblock in MOS\n", error); + return (error); + } + dump_uberblock(&checkpoint, "\nCheckpointed uberblock found:\n", "\n"); + + if (checkpoint.ub_checkpoint_txg == 0) { + (void) printf("\nub_checkpoint_txg not set in checkpointed " + "uberblock\n"); + error = 3; + } + + if (error == 0) + verify_checkpoint_blocks(spa); + + return (error); +} + +static void dump_zpool(spa_t *spa) { dsl_pool_t *dp = spa_get_dsl(spa); @@ -3911,6 +4485,9 @@ dump_zpool(spa_t *spa) if (dump_opt['h']) dump_history(spa); + if (rc == 0 && !dump_opt['L']) + rc = verify_checkpoint(spa); + if (rc != 0) { dump_debug_buffer(); exit(rc); @@ -4413,6 +4990,7 @@ main(int argc, char **argv) int rewind = ZPOOL_NEVER_REWIND; char *spa_config_path_env; boolean_t target_is_spa = B_TRUE; + nvlist_t *cfg = NULL; (void) setrlimit(RLIMIT_NOFILE, &rl); (void) enable_extended_FILE_stdio(-1, -1); @@ -4429,7 +5007,7 @@ main(int argc, char **argv) spa_config_path = spa_config_path_env; while ((c = getopt(argc, argv, - "AbcCdDeEFGhiI:lLmMo:Op:PqRsSt:uU:vVx:X")) != -1) { + "AbcCdDeEFGhiI:klLmMo:Op:PqRsSt:uU:vVx:X")) != -1) { switch (c) { case 'b': case 'c': @@ -4454,6 +5032,7 @@ main(int argc, char **argv) case 'A': case 'e': case 'F': + case 'k': case 'L': case 'P': case 'q': @@ -4559,7 +5138,7 @@ main(int argc, char **argv) verbose = MAX(verbose, 1); for (c = 0; c < 256; c++) { - if (dump_all && strchr("AeEFlLOPRSX", c) == NULL) + if (dump_all && strchr("AeEFklLOPRSX", c) == NULL) dump_opt[c] = 1; if (dump_opt[c]) dump_opt[c] += verbose; *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** From owner-svn-src-vendor@freebsd.org Wed Mar 28 21:00:35 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09262F730D0; Wed, 28 Mar 2018 21:00:35 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AB0FB826FA; Wed, 28 Mar 2018 21:00:34 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A5CD81023C; Wed, 28 Mar 2018 21:00:34 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SL0Y62095970; Wed, 28 Mar 2018 21:00:34 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SL0YHk095968; Wed, 28 Mar 2018 21:00:34 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282100.w2SL0YHk095968@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 21:00:34 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331700 - in vendor-sys/illumos/dist/uts/common/fs/zfs: . sys X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: in vendor-sys/illumos/dist/uts/common/fs/zfs: . sys X-SVN-Commit-Revision: 331700 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 21:00:35 -0000 Author: mav Date: Wed Mar 28 21:00:34 2018 New Revision: 331700 URL: https://svnweb.freebsd.org/changeset/base/331700 Log: Add files missed from r331695. Added: vendor-sys/illumos/dist/uts/common/fs/zfs/spa_checkpoint.c (contents, props changed) vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa_checkpoint.h (contents, props changed) Added: vendor-sys/illumos/dist/uts/common/fs/zfs/spa_checkpoint.c ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/spa_checkpoint.c Wed Mar 28 21:00:34 2018 (r331700) @@ -0,0 +1,623 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2017 by Delphix. All rights reserved. + */ + +/* + * Storage Pool Checkpoint + * + * A storage pool checkpoint can be thought of as a pool-wide snapshot or + * a stable version of extreme rewind that guarantees no blocks from the + * checkpointed state will have been overwritten. It remembers the entire + * state of the storage pool (e.g. snapshots, dataset names, etc..) from the + * point that it was taken and the user can rewind back to that point even if + * they applied destructive operations on their datasets or even enabled new + * zpool on-disk features. If a pool has a checkpoint that is no longer + * needed, the user can discard it. + * + * == On disk data structures used == + * + * - The pool has a new feature flag and a new entry in the MOS. The feature + * flag is set to active when we create the checkpoint and remains active + * until the checkpoint is fully discarded. The entry in the MOS config + * (DMU_POOL_ZPOOL_CHECKPOINT) is populated with the uberblock that + * references the state of the pool when we take the checkpoint. The entry + * remains populated until we start discarding the checkpoint or we rewind + * back to it. + * + * - Each vdev contains a vdev-wide space map while the pool has a checkpoint, + * which persists until the checkpoint is fully discarded. The space map + * contains entries that have been freed in the current state of the pool + * but we want to keep around in case we decide to rewind to the checkpoint. + * [see vdev_checkpoint_sm] + * + * - Each metaslab's ms_sm space map behaves the same as without the + * checkpoint, with the only exception being the scenario when we free + * blocks that belong to the checkpoint. In this case, these blocks remain + * ALLOCATED in the metaslab's space map and they are added as FREE in the + * vdev's checkpoint space map. + * + * - Each uberblock has a field (ub_checkpoint_txg) which holds the txg that + * the uberblock was checkpointed. For normal uberblocks this field is 0. + * + * == Overview of operations == + * + * - To create a checkpoint, we first wait for the current TXG to be synced, + * so we can use the most recently synced uberblock (spa_ubsync) as the + * checkpointed uberblock. Then we use an early synctask to place that + * uberblock in MOS config, increment the feature flag for the checkpoint + * (marking it active), and setting spa_checkpoint_txg (see its use below) + * to the TXG of the checkpointed uberblock. We use an early synctask for + * the aforementioned operations to ensure that no blocks were dirtied + * between the current TXG and the TXG of the checkpointed uberblock + * (e.g the previous txg). + * + * - When a checkpoint exists, we need to ensure that the blocks that + * belong to the checkpoint are freed but never reused. This means that + * these blocks should never end up in the ms_allocatable or the ms_freeing + * trees of a metaslab. Therefore, whenever there is a checkpoint the new + * ms_checkpointing tree is used in addition to the aforementioned ones. + * + * Whenever a block is freed and we find out that it is referenced by the + * checkpoint (we find out by comparing its birth to spa_checkpoint_txg), + * we place it in the ms_checkpointing tree instead of the ms_freeingtree. + * This way, we divide the blocks that are being freed into checkpointed + * and not-checkpointed blocks. + * + * In order to persist these frees, we write the extents from the + * ms_freeingtree to the ms_sm as usual, and the extents from the + * ms_checkpointing tree to the vdev_checkpoint_sm. This way, these + * checkpointed extents will remain allocated in the metaslab's ms_sm space + * map, and therefore won't be reused [see metaslab_sync()]. In addition, + * when we discard the checkpoint, we can find the entries that have + * actually been freed in vdev_checkpoint_sm. + * [see spa_checkpoint_discard_thread_sync()] + * + * - To discard the checkpoint we use an early synctask to delete the + * checkpointed uberblock from the MOS config, set spa_checkpoint_txg to 0, + * and wakeup the discarding zthr thread (an open-context async thread). + * We use an early synctask to ensure that the operation happens before any + * new data end up in the checkpoint's data structures. + * + * Once the synctask is done and the discarding zthr is awake, we discard + * the checkpointed data over multiple TXGs by having the zthr prefetching + * entries from vdev_checkpoint_sm and then starting a synctask that places + * them as free blocks in to their respective ms_allocatable and ms_sm + * structures. + * [see spa_checkpoint_discard_thread()] + * + * When there are no entries left in the vdev_checkpoint_sm of all + * top-level vdevs, a final synctask runs that decrements the feature flag. + * + * - To rewind to the checkpoint, we first use the current uberblock and + * open the MOS so we can access the checkpointed uberblock from the MOS + * config. After we retrieve the checkpointed uberblock, we use it as the + * current uberblock for the pool by writing it to disk with an updated + * TXG, opening its version of the MOS, and moving on as usual from there. + * [see spa_ld_checkpoint_rewind()] + * + * An important note on rewinding to the checkpoint has to do with how we + * handle ZIL blocks. In the scenario of a rewind, we clear out any ZIL + * blocks that have not been claimed by the time we took the checkpoint + * as they should no longer be valid. + * [see comment in zil_claim()] + * + * == Miscellaneous information == + * + * - In the hypothetical event that we take a checkpoint, remove a vdev, + * and attempt to rewind, the rewind would fail as the checkpointed + * uberblock would reference data in the removed device. For this reason + * and others of similar nature, we disallow the following operations that + * can change the config: + * vdev removal and attach/detach, mirror splitting, and pool reguid. + * + * - As most of the checkpoint logic is implemented in the SPA and doesn't + * distinguish datasets when it comes to space accounting, having a + * checkpoint can potentially break the boundaries set by dataset + * reservations. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * The following parameter limits the amount of memory to be used for the + * prefetching of the checkpoint space map done on each vdev while + * discarding the checkpoint. + * + * The reason it exists is because top-level vdevs with long checkpoint + * space maps can potentially take up a lot of memory depending on the + * amount of checkpointed data that has been freed within them while + * the pool had a checkpoint. + */ +uint64_t zfs_spa_discard_memory_limit = 16 * 1024 * 1024; + +int +spa_checkpoint_get_stats(spa_t *spa, pool_checkpoint_stat_t *pcs) +{ + if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)) + return (SET_ERROR(ZFS_ERR_NO_CHECKPOINT)); + + bzero(pcs, sizeof (pool_checkpoint_stat_t)); + + int error = zap_contains(spa_meta_objset(spa), + DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ZPOOL_CHECKPOINT); + ASSERT(error == 0 || error == ENOENT); + + if (error == ENOENT) + pcs->pcs_state = CS_CHECKPOINT_DISCARDING; + else + pcs->pcs_state = CS_CHECKPOINT_EXISTS; + + pcs->pcs_space = spa->spa_checkpoint_info.sci_dspace; + pcs->pcs_start_time = spa->spa_checkpoint_info.sci_timestamp; + + return (0); +} + +static void +spa_checkpoint_discard_complete_sync(void *arg, dmu_tx_t *tx) +{ + spa_t *spa = arg; + + spa->spa_checkpoint_info.sci_timestamp = 0; + + spa_feature_decr(spa, SPA_FEATURE_POOL_CHECKPOINT, tx); + + spa_history_log_internal(spa, "spa discard checkpoint", tx, + "finished discarding checkpointed state from the pool"); +} + +typedef struct spa_checkpoint_discard_sync_callback_arg { + vdev_t *sdc_vd; + uint64_t sdc_txg; + uint64_t sdc_entry_limit; +} spa_checkpoint_discard_sync_callback_arg_t; + +static int +spa_checkpoint_discard_sync_callback(maptype_t type, uint64_t offset, + uint64_t size, void *arg) +{ + spa_checkpoint_discard_sync_callback_arg_t *sdc = arg; + vdev_t *vd = sdc->sdc_vd; + metaslab_t *ms = vd->vdev_ms[offset >> vd->vdev_ms_shift]; + uint64_t end = offset + size; + + if (sdc->sdc_entry_limit == 0) + return (EINTR); + + /* + * Since the space map is not condensed, we know that + * none of its entries is crossing the boundaries of + * its respective metaslab. + * + * That said, there is no fundamental requirement that + * the checkpoint's space map entries should not cross + * metaslab boundaries. So if needed we could add code + * that handles metaslab-crossing segments in the future. + */ + VERIFY3U(type, ==, SM_FREE); + VERIFY3U(offset, >=, ms->ms_start); + VERIFY3U(end, <=, ms->ms_start + ms->ms_size); + + /* + * At this point we should not be processing any + * other frees concurrently, so the lock is technically + * unnecessary. We use the lock anyway though to + * potentially save ourselves from future headaches. + */ + mutex_enter(&ms->ms_lock); + if (range_tree_is_empty(ms->ms_freeing)) + vdev_dirty(vd, VDD_METASLAB, ms, sdc->sdc_txg); + range_tree_add(ms->ms_freeing, offset, size); + mutex_exit(&ms->ms_lock); + + ASSERT3U(vd->vdev_spa->spa_checkpoint_info.sci_dspace, >=, size); + ASSERT3U(vd->vdev_stat.vs_checkpoint_space, >=, size); + + vd->vdev_spa->spa_checkpoint_info.sci_dspace -= size; + vd->vdev_stat.vs_checkpoint_space -= size; + sdc->sdc_entry_limit--; + + return (0); +} + +static void +spa_checkpoint_accounting_verify(spa_t *spa) +{ + vdev_t *rvd = spa->spa_root_vdev; + uint64_t ckpoint_sm_space_sum = 0; + uint64_t vs_ckpoint_space_sum = 0; + + for (uint64_t c = 0; c < rvd->vdev_children; c++) { + vdev_t *vd = rvd->vdev_child[c]; + + if (vd->vdev_checkpoint_sm != NULL) { + ckpoint_sm_space_sum += + -vd->vdev_checkpoint_sm->sm_alloc; + vs_ckpoint_space_sum += + vd->vdev_stat.vs_checkpoint_space; + ASSERT3U(ckpoint_sm_space_sum, ==, + vs_ckpoint_space_sum); + } else { + ASSERT0(vd->vdev_stat.vs_checkpoint_space); + } + } + ASSERT3U(spa->spa_checkpoint_info.sci_dspace, ==, ckpoint_sm_space_sum); +} + +static void +spa_checkpoint_discard_thread_sync(void *arg, dmu_tx_t *tx) +{ + vdev_t *vd = arg; + int error; + + /* + * The space map callback is applied only to non-debug entries. + * Because the number of debug entries is less or equal to the + * number of non-debug entries, we want to ensure that we only + * read what we prefetched from open-context. + * + * Thus, we set the maximum entries that the space map callback + * will be applied to be half the entries that could fit in the + * imposed memory limit. + */ + uint64_t max_entry_limit = + (zfs_spa_discard_memory_limit / sizeof (uint64_t)) >> 1; + + uint64_t entries_in_sm = + space_map_length(vd->vdev_checkpoint_sm) / sizeof (uint64_t); + + /* + * Iterate from the end of the space map towards the beginning, + * placing its entries on ms_freeing and removing them from the + * space map. The iteration stops if one of the following + * conditions is true: + * + * 1] We reached the beginning of the space map. At this point + * the space map should be completely empty and + * space_map_incremental_destroy should have returned 0. + * The next step would be to free and close the space map + * and remove its entry from its vdev's top zap. This allows + * spa_checkpoint_discard_thread() to move on to the next vdev. + * + * 2] We reached the memory limit (amount of memory used to hold + * space map entries in memory) and space_map_incremental_destroy + * returned EINTR. This means that there are entries remaining + * in the space map that will be cleared in a future invocation + * of this function by spa_checkpoint_discard_thread(). + */ + spa_checkpoint_discard_sync_callback_arg_t sdc; + sdc.sdc_vd = vd; + sdc.sdc_txg = tx->tx_txg; + sdc.sdc_entry_limit = MIN(entries_in_sm, max_entry_limit); + + uint64_t entries_before = entries_in_sm; + + error = space_map_incremental_destroy(vd->vdev_checkpoint_sm, + spa_checkpoint_discard_sync_callback, &sdc, tx); + + uint64_t entries_after = + space_map_length(vd->vdev_checkpoint_sm) / sizeof (uint64_t); + +#ifdef DEBUG + spa_checkpoint_accounting_verify(vd->vdev_spa); +#endif + + zfs_dbgmsg("discarding checkpoint: txg %llu, vdev id %d, " + "deleted %llu entries - %llu entries are left", + tx->tx_txg, vd->vdev_id, (entries_before - entries_after), + entries_after); + + if (error != EINTR) { + if (error != 0) { + zfs_panic_recover("zfs: error %d was returned " + "while incrementally destroying the checkpoint " + "space map of vdev %llu\n", + error, vd->vdev_id); + } + ASSERT0(entries_after); + ASSERT0(vd->vdev_checkpoint_sm->sm_alloc); + ASSERT0(vd->vdev_checkpoint_sm->sm_length); + + space_map_free(vd->vdev_checkpoint_sm, tx); + space_map_close(vd->vdev_checkpoint_sm); + vd->vdev_checkpoint_sm = NULL; + + VERIFY0(zap_remove(vd->vdev_spa->spa_meta_objset, + vd->vdev_top_zap, VDEV_TOP_ZAP_POOL_CHECKPOINT_SM, tx)); + } +} + +static boolean_t +spa_checkpoint_discard_is_done(spa_t *spa) +{ + vdev_t *rvd = spa->spa_root_vdev; + + ASSERT(!spa_has_checkpoint(spa)); + ASSERT(spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)); + + for (uint64_t c = 0; c < rvd->vdev_children; c++) { + if (rvd->vdev_child[c]->vdev_checkpoint_sm != NULL) + return (B_FALSE); + ASSERT0(rvd->vdev_child[c]->vdev_stat.vs_checkpoint_space); + } + + return (B_TRUE); +} + +/* ARGSUSED */ +boolean_t +spa_checkpoint_discard_thread_check(void *arg, zthr_t *zthr) +{ + spa_t *spa = arg; + + if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)) + return (B_FALSE); + + if (spa_has_checkpoint(spa)) + return (B_FALSE); + + return (B_TRUE); +} + +int +spa_checkpoint_discard_thread(void *arg, zthr_t *zthr) +{ + spa_t *spa = arg; + vdev_t *rvd = spa->spa_root_vdev; + + for (uint64_t c = 0; c < rvd->vdev_children; c++) { + vdev_t *vd = rvd->vdev_child[c]; + + while (vd->vdev_checkpoint_sm != NULL) { + space_map_t *checkpoint_sm = vd->vdev_checkpoint_sm; + int numbufs; + dmu_buf_t **dbp; + + if (zthr_iscancelled(zthr)) + return (0); + + ASSERT3P(vd->vdev_ops, !=, &vdev_indirect_ops); + + uint64_t size = MIN(space_map_length(checkpoint_sm), + zfs_spa_discard_memory_limit); + uint64_t offset = + space_map_length(checkpoint_sm) - size; + + /* + * Ensure that the part of the space map that will + * be destroyed by the synctask, is prefetched in + * memory before the synctask runs. + */ + int error = dmu_buf_hold_array_by_bonus( + checkpoint_sm->sm_dbuf, offset, size, + B_TRUE, FTAG, &numbufs, &dbp); + if (error != 0) { + zfs_panic_recover("zfs: error %d was returned " + "while prefetching checkpoint space map " + "entries of vdev %llu\n", + error, vd->vdev_id); + } + + VERIFY0(dsl_sync_task(spa->spa_name, NULL, + spa_checkpoint_discard_thread_sync, vd, + 0, ZFS_SPACE_CHECK_NONE)); + + dmu_buf_rele_array(dbp, numbufs, FTAG); + } + } + + VERIFY(spa_checkpoint_discard_is_done(spa)); + VERIFY0(spa->spa_checkpoint_info.sci_dspace); + VERIFY0(dsl_sync_task(spa->spa_name, NULL, + spa_checkpoint_discard_complete_sync, spa, + 0, ZFS_SPACE_CHECK_NONE)); + + return (0); +} + + +/* ARGSUSED */ +static int +spa_checkpoint_check(void *arg, dmu_tx_t *tx) +{ + spa_t *spa = dmu_tx_pool(tx)->dp_spa; + + if (!spa_feature_is_enabled(spa, SPA_FEATURE_POOL_CHECKPOINT)) + return (SET_ERROR(ENOTSUP)); + + if (!spa_top_vdevs_spacemap_addressable(spa)) + return (SET_ERROR(ZFS_ERR_VDEV_TOO_BIG)); + + if (spa->spa_vdev_removal != NULL) + return (SET_ERROR(ZFS_ERR_DEVRM_IN_PROGRESS)); + + if (spa->spa_checkpoint_txg != 0) + return (SET_ERROR(ZFS_ERR_CHECKPOINT_EXISTS)); + + if (spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)) + return (SET_ERROR(ZFS_ERR_DISCARDING_CHECKPOINT)); + + return (0); +} + +/* ARGSUSED */ +static void +spa_checkpoint_sync(void *arg, dmu_tx_t *tx) +{ + dsl_pool_t *dp = dmu_tx_pool(tx); + spa_t *spa = dp->dp_spa; + uberblock_t checkpoint = spa->spa_ubsync; + + /* + * At this point, there should not be a checkpoint in the MOS. + */ + ASSERT3U(zap_contains(spa_meta_objset(spa), DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_ZPOOL_CHECKPOINT), ==, ENOENT); + + ASSERT0(spa->spa_checkpoint_info.sci_timestamp); + ASSERT0(spa->spa_checkpoint_info.sci_dspace); + + /* + * Since the checkpointed uberblock is the one that just got synced + * (we use spa_ubsync), its txg must be equal to the txg number of + * the txg we are syncing, minus 1. + */ + ASSERT3U(checkpoint.ub_txg, ==, spa->spa_syncing_txg - 1); + + /* + * Once the checkpoint is in place, we need to ensure that none of + * its blocks will be marked for reuse after it has been freed. + * When there is a checkpoint and a block is freed, we compare its + * birth txg to the txg of the checkpointed uberblock to see if the + * block is part of the checkpoint or not. Therefore, we have to set + * spa_checkpoint_txg before any frees happen in this txg (which is + * why this is done as an early_synctask as explained in the comment + * in spa_checkpoint()). + */ + spa->spa_checkpoint_txg = checkpoint.ub_txg; + spa->spa_checkpoint_info.sci_timestamp = checkpoint.ub_timestamp; + + checkpoint.ub_checkpoint_txg = checkpoint.ub_txg; + VERIFY0(zap_add(spa->spa_dsl_pool->dp_meta_objset, + DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ZPOOL_CHECKPOINT, + sizeof (uint64_t), sizeof (uberblock_t) / sizeof (uint64_t), + &checkpoint, tx)); + + /* + * Increment the feature refcount and thus activate the feature. + * Note that the feature will be deactivated when we've + * completely discarded all checkpointed state (both vdev + * space maps and uberblock). + */ + spa_feature_incr(spa, SPA_FEATURE_POOL_CHECKPOINT, tx); + + spa_history_log_internal(spa, "spa checkpoint", tx, + "checkpointed uberblock txg=%llu", checkpoint.ub_txg); +} + +/* + * Create a checkpoint for the pool. + */ +int +spa_checkpoint(const char *pool) +{ + int error; + spa_t *spa; + + error = spa_open(pool, &spa, FTAG); + if (error != 0) + return (error); + + mutex_enter(&spa->spa_vdev_top_lock); + + /* + * Wait for current syncing txg to finish so the latest synced + * uberblock (spa_ubsync) has all the changes that we expect + * to see if we were to revert later to the checkpoint. In other + * words we want the checkpointed uberblock to include/reference + * all the changes that were pending at the time that we issued + * the checkpoint command. + */ + txg_wait_synced(spa_get_dsl(spa), 0); + + /* + * As the checkpointed uberblock references blocks from the previous + * txg (spa_ubsync) we want to ensure that are not freeing any of + * these blocks in the same txg that the following synctask will + * run. Thus, we run it as an early synctask, so the dirty changes + * that are synced to disk afterwards during zios and other synctasks + * do not reuse checkpointed blocks. + */ + error = dsl_early_sync_task(pool, spa_checkpoint_check, + spa_checkpoint_sync, NULL, 0, ZFS_SPACE_CHECK_NORMAL); + + mutex_exit(&spa->spa_vdev_top_lock); + + spa_close(spa, FTAG); + return (error); +} + +/* ARGSUSED */ +static int +spa_checkpoint_discard_check(void *arg, dmu_tx_t *tx) +{ + spa_t *spa = dmu_tx_pool(tx)->dp_spa; + + if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT)) + return (SET_ERROR(ZFS_ERR_NO_CHECKPOINT)); + + if (spa->spa_checkpoint_txg == 0) + return (SET_ERROR(ZFS_ERR_DISCARDING_CHECKPOINT)); + + VERIFY0(zap_contains(spa_meta_objset(spa), + DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ZPOOL_CHECKPOINT)); + + return (0); +} + +/* ARGSUSED */ +static void +spa_checkpoint_discard_sync(void *arg, dmu_tx_t *tx) +{ + spa_t *spa = dmu_tx_pool(tx)->dp_spa; + + VERIFY0(zap_remove(spa_meta_objset(spa), DMU_POOL_DIRECTORY_OBJECT, + DMU_POOL_ZPOOL_CHECKPOINT, tx)); + + spa->spa_checkpoint_txg = 0; + + zthr_wakeup(spa->spa_checkpoint_discard_zthr); + + spa_history_log_internal(spa, "spa discard checkpoint", tx, + "started discarding checkpointed state from the pool"); +} + +/* + * Discard the checkpoint from a pool. + */ +int +spa_checkpoint_discard(const char *pool) +{ + /* + * Similarly to spa_checkpoint(), we want our synctask to run + * before any pending dirty data are written to disk so they + * won't end up in the checkpoint's data structures (e.g. + * ms_checkpointing and vdev_checkpoint_sm) and re-create any + * space maps that the discarding open-context thread has + * deleted. + * [see spa_discard_checkpoint_sync and spa_discard_checkpoint_thread] + */ + return (dsl_early_sync_task(pool, spa_checkpoint_discard_check, + spa_checkpoint_discard_sync, NULL, 0, + ZFS_SPACE_CHECK_DISCARD_CHECKPOINT)); +} Added: vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa_checkpoint.h ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/sys/spa_checkpoint.h Wed Mar 28 21:00:34 2018 (r331700) @@ -0,0 +1,44 @@ +/* + * CDDL HEADER START + * + * The contents of this file are subject to the terms of the + * Common Development and Distribution License (the "License"). + * You may not use this file except in compliance with the License. + * + * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE + * or http://www.opensolaris.org/os/licensing. + * See the License for the specific language governing permissions + * and limitations under the License. + * + * When distributing Covered Code, include this CDDL HEADER in each + * file and include the License file at usr/src/OPENSOLARIS.LICENSE. + * If applicable, add the following below this CDDL HEADER, with the + * fields enclosed by brackets "[]" replaced with your own identifying + * information: Portions Copyright [yyyy] [name of copyright owner] + * + * CDDL HEADER END + */ + +/* + * Copyright (c) 2017 by Delphix. All rights reserved. + */ + +#ifndef _SYS_SPA_CHECKPOINT_H +#define _SYS_SPA_CHECKPOINT_H + +#include + +typedef struct spa_checkpoint_info { + uint64_t sci_timestamp; /* when checkpointed uberblock was synced */ + uint64_t sci_dspace; /* disk space used by checkpoint in bytes */ +} spa_checkpoint_info_t; + +int spa_checkpoint(const char *); +int spa_checkpoint_discard(const char *); + +boolean_t spa_checkpoint_discard_thread_check(void *, zthr_t *); +int spa_checkpoint_discard_thread(void *, zthr_t *); + +int spa_checkpoint_get_stats(spa_t *, pool_checkpoint_stat_t *); + +#endif /* _SYS_SPA_CHECKPOINT_H */ From owner-svn-src-vendor@freebsd.org Wed Mar 28 22:06:13 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F21EEF76341; Wed, 28 Mar 2018 22:06:12 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A52BE84D10; Wed, 28 Mar 2018 22:06:12 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A015410D55; Wed, 28 Mar 2018 22:06:12 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SM6CQH030777; Wed, 28 Mar 2018 22:06:12 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SM6CX3030776; Wed, 28 Mar 2018 22:06:12 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282206.w2SM6CX3030776@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 22:06:12 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331702 - vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Commit-Revision: 331702 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 22:06:13 -0000 Author: mav Date: Wed Mar 28 22:06:12 2018 New Revision: 331702 URL: https://svnweb.freebsd.org/changeset/base/331702 Log: 9187 racing condition between vdev label and spa_last_synced_txg in vdev_validate illumos/illumos-gate@d1de72cfa29ab77ff80e2bb0e668a6afa5bccaf0 ztest failed with uncorrectable IO error despite having the fix for #7163. Both sides of the mirror have CANT_OPEN_BAD_LABEL, which also distinguishes it from that issue. Definitely seems like a racing condition between the vdev_validate and spa_sync: 1. Thread A (spa_sync): vdev label is updated to latest txg 2. Thread B (vdev_validate): vdev label's txg is compared to spa_last_synced_txg and is ahead. 3. Thread A (spa_sync): spa_last_synced_txg is updated to latest txg. Solution: do not check txg in vdev_validate unless config lock is held. Reviewed by: George Wilson Reviewed by: Matt Ahrens Approved by: Robert Mustacchi Author: Pavel Zakharov Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c Wed Mar 28 22:01:27 2018 (r331701) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/vdev.c Wed Mar 28 22:06:12 2018 (r331702) @@ -1573,8 +1573,11 @@ vdev_validate(vdev_t *vd) /* * If we are performing an extreme rewind, we allow for a label that * was modified at a point after the current txg. + * If config lock is not held do not check for the txg. spa_sync could + * be updating the vdev's label before updating spa_last_synced_txg. */ - if (spa->spa_extreme_rewind || spa_last_synced_txg(spa) == 0) + if (spa->spa_extreme_rewind || spa_last_synced_txg(spa) == 0 || + spa_config_held(spa, SCL_CONFIG, RW_WRITER) != SCL_CONFIG) txg = UINT64_MAX; else txg = spa_last_synced_txg(spa); From owner-svn-src-vendor@freebsd.org Wed Mar 28 22:08:58 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 37A7BF76648; Wed, 28 Mar 2018 22:08:58 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D7578850C5; Wed, 28 Mar 2018 22:08:57 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id D205410D58; Wed, 28 Mar 2018 22:08:57 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SM8vvj030964; Wed, 28 Mar 2018 22:08:57 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SM8vel030963; Wed, 28 Mar 2018 22:08:57 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282208.w2SM8vel030963@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 22:08:57 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331704 - vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Commit-Revision: 331704 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 22:08:58 -0000 Author: mav Date: Wed Mar 28 22:08:57 2018 New Revision: 331704 URL: https://svnweb.freebsd.org/changeset/base/331704 Log: 9191 dump vdev tree to zfs_dbgmsg when spa load fails due to missing log devices illumos/illumos-gate@ccef24b493bcbd146fcd6d8946666cae081470b6 Reviewed by: George Wilson Reviewed by: Prakash Surya Reviewed by: Matt Ahrens Approved by: Robert Mustacchi Author: Pavel Zakharov Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Wed Mar 28 22:07:31 2018 (r331703) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Wed Mar 28 22:08:57 2018 (r331704) @@ -1814,6 +1814,7 @@ spa_check_for_missing_logs(spa_t *spa) if (idx > 0) { spa_load_failed(spa, "some log devices are missing"); + vdev_dbgmsg_print_tree(rvd, 2); return (SET_ERROR(ENXIO)); } } else { @@ -1825,6 +1826,7 @@ spa_check_for_missing_logs(spa_t *spa) spa_set_log_state(spa, SPA_LOG_CLEAR); spa_load_note(spa, "some log devices are " "missing, ZIL is dropped."); + vdev_dbgmsg_print_tree(rvd, 2); break; } } From owner-svn-src-vendor@freebsd.org Wed Mar 28 22:16:52 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AFC5AF4E007; Wed, 28 Mar 2018 22:16:52 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6020285BC7; Wed, 28 Mar 2018 22:16:52 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 56BAB10EFF; Wed, 28 Mar 2018 22:16:52 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SMGqrn035971; Wed, 28 Mar 2018 22:16:52 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SMGpKW035967; Wed, 28 Mar 2018 22:16:51 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282216.w2SMGpKW035967@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 22:16:51 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331706 - vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/sys/fs vendor/illumos/dist/cmd/zdb vendor/illumos/dist/cmd/zpoo... X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/sys/fs vendor/illumos/dist/cmd/zdb vendor/illumos/dist/cmd/zpool vendor/illumos/dist/lib... X-SVN-Commit-Revision: 331706 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 22:16:52 -0000 Author: mav Date: Wed Mar 28 22:16:51 2018 New Revision: 331706 URL: https://svnweb.freebsd.org/changeset/base/331706 Log: 9235 rename zpool_rewind_policy_t to zpool_load_policy_t illumos/illumos-gate@5dafeea3ebd2dd77affc802bcb90f63faf01589f We want to be able to pass various settings during import/open of a pool, which are not only related to rewind. Instead of adding a new policy and duplicate a bunch of code, we should just rename rewind_policy to a more generic term like load_policy. For instance, we'd like to set spa->spa_import_flags from the nvlist, rather from a flags parameter passed to spa_import as in some cases we want those flags not only for the import case, but also for the open case. One such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would allow zfs to open a pool when logs are missing. Reviewed by: Matt Ahrens Reviewed by: George Wilson Approved by: Robert Mustacchi Author: Pavel Zakharov Modified: vendor-sys/illumos/dist/common/zfs/zfs_comutil.c vendor-sys/illumos/dist/common/zfs/zfs_comutil.h vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Changes in other areas also in this revision: Modified: vendor/illumos/dist/cmd/zdb/zdb.c vendor/illumos/dist/cmd/zpool/zpool_main.c vendor/illumos/dist/lib/libzfs/common/libzfs.h vendor/illumos/dist/lib/libzfs/common/libzfs_import.c vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c Modified: vendor-sys/illumos/dist/common/zfs/zfs_comutil.c ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfs_comutil.c Wed Mar 28 22:10:06 2018 (r331705) +++ vendor-sys/illumos/dist/common/zfs/zfs_comutil.c Wed Mar 28 22:16:51 2018 (r331706) @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. */ /* @@ -67,17 +67,17 @@ zfs_allocatable_devs(nvlist_t *nv) } void -zpool_get_rewind_policy(nvlist_t *nvl, zpool_rewind_policy_t *zrpp) +zpool_get_load_policy(nvlist_t *nvl, zpool_load_policy_t *zlpp) { nvlist_t *policy; nvpair_t *elem; char *nm; /* Defaults */ - zrpp->zrp_request = ZPOOL_NO_REWIND; - zrpp->zrp_maxmeta = 0; - zrpp->zrp_maxdata = UINT64_MAX; - zrpp->zrp_txg = UINT64_MAX; + zlpp->zlp_rewind = ZPOOL_NO_REWIND; + zlpp->zlp_maxmeta = 0; + zlpp->zlp_maxdata = UINT64_MAX; + zlpp->zlp_txg = UINT64_MAX; if (nvl == NULL) return; @@ -85,24 +85,24 @@ zpool_get_rewind_policy(nvlist_t *nvl, zpool_rewind_po elem = NULL; while ((elem = nvlist_next_nvpair(nvl, elem)) != NULL) { nm = nvpair_name(elem); - if (strcmp(nm, ZPOOL_REWIND_POLICY) == 0) { + if (strcmp(nm, ZPOOL_LOAD_POLICY) == 0) { if (nvpair_value_nvlist(elem, &policy) == 0) - zpool_get_rewind_policy(policy, zrpp); + zpool_get_load_policy(policy, zlpp); return; - } else if (strcmp(nm, ZPOOL_REWIND_REQUEST) == 0) { - if (nvpair_value_uint32(elem, &zrpp->zrp_request) == 0) - if (zrpp->zrp_request & ~ZPOOL_REWIND_POLICIES) - zrpp->zrp_request = ZPOOL_NO_REWIND; - } else if (strcmp(nm, ZPOOL_REWIND_REQUEST_TXG) == 0) { - (void) nvpair_value_uint64(elem, &zrpp->zrp_txg); - } else if (strcmp(nm, ZPOOL_REWIND_META_THRESH) == 0) { - (void) nvpair_value_uint64(elem, &zrpp->zrp_maxmeta); - } else if (strcmp(nm, ZPOOL_REWIND_DATA_THRESH) == 0) { - (void) nvpair_value_uint64(elem, &zrpp->zrp_maxdata); + } else if (strcmp(nm, ZPOOL_LOAD_REWIND_POLICY) == 0) { + if (nvpair_value_uint32(elem, &zlpp->zlp_rewind) == 0) + if (zlpp->zlp_rewind & ~ZPOOL_REWIND_POLICIES) + zlpp->zlp_rewind = ZPOOL_NO_REWIND; + } else if (strcmp(nm, ZPOOL_LOAD_REQUEST_TXG) == 0) { + (void) nvpair_value_uint64(elem, &zlpp->zlp_txg); + } else if (strcmp(nm, ZPOOL_LOAD_META_THRESH) == 0) { + (void) nvpair_value_uint64(elem, &zlpp->zlp_maxmeta); + } else if (strcmp(nm, ZPOOL_LOAD_DATA_THRESH) == 0) { + (void) nvpair_value_uint64(elem, &zlpp->zlp_maxdata); } } - if (zrpp->zrp_request == 0) - zrpp->zrp_request = ZPOOL_NO_REWIND; + if (zlpp->zlp_rewind == 0) + zlpp->zlp_rewind = ZPOOL_NO_REWIND; } typedef struct zfs_version_spa_map { Modified: vendor-sys/illumos/dist/common/zfs/zfs_comutil.h ============================================================================== --- vendor-sys/illumos/dist/common/zfs/zfs_comutil.h Wed Mar 28 22:10:06 2018 (r331705) +++ vendor-sys/illumos/dist/common/zfs/zfs_comutil.h Wed Mar 28 22:16:51 2018 (r331706) @@ -20,7 +20,7 @@ */ /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. */ #ifndef _ZFS_COMUTIL_H @@ -34,7 +34,7 @@ extern "C" { #endif extern boolean_t zfs_allocatable_devs(nvlist_t *); -extern void zpool_get_rewind_policy(nvlist_t *, zpool_rewind_policy_t *); +extern void zpool_get_load_policy(nvlist_t *, zpool_load_policy_t *); extern int zfs_zpl_version_map(int spa_version); extern int zfs_spa_version_map(int zpl_version); Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Wed Mar 28 22:10:06 2018 (r331705) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c Wed Mar 28 22:16:51 2018 (r331706) @@ -2021,13 +2021,13 @@ spa_load_verify(spa_t *spa) { zio_t *rio; spa_load_error_t sle = { 0 }; - zpool_rewind_policy_t policy; + zpool_load_policy_t policy; boolean_t verify_ok = B_FALSE; int error = 0; - zpool_get_rewind_policy(spa->spa_config, &policy); + zpool_get_load_policy(spa->spa_config, &policy); - if (policy.zrp_request & ZPOOL_NEVER_REWIND) + if (policy.zlp_rewind & ZPOOL_NEVER_REWIND) return (0); dsl_pool_config_enter(spa->spa_dsl_pool, FTAG); @@ -2066,8 +2066,8 @@ spa_load_verify(spa_t *spa) } if (spa_load_verify_dryrun || - (!error && sle.sle_meta_count <= policy.zrp_maxmeta && - sle.sle_data_count <= policy.zrp_maxdata)) { + (!error && sle.sle_meta_count <= policy.zlp_maxmeta && + sle.sle_data_count <= policy.zlp_maxdata)) { int64_t loss = 0; verify_ok = B_TRUE; @@ -2767,17 +2767,17 @@ spa_ld_trusted_config(spa_t *spa, spa_import_type_t ty /* * We will use spa_config if we decide to reload the spa or if spa_load * fails and we rewind. We must thus regenerate the config using the - * MOS information with the updated paths. Rewind policy is an import - * setting and is not in the MOS. We copy it over to our new, trusted - * config. + * MOS information with the updated paths. ZPOOL_LOAD_POLICY is used to + * pass settings on how to load the pool and is not stored in the MOS. + * We copy it over to our new, trusted config. */ mos_config_txg = fnvlist_lookup_uint64(mos_config, ZPOOL_CONFIG_POOL_TXG); nvlist_free(mos_config); mos_config = spa_config_generate(spa, NULL, mos_config_txg, B_FALSE); - if (nvlist_lookup_nvlist(spa->spa_config, ZPOOL_REWIND_POLICY, + if (nvlist_lookup_nvlist(spa->spa_config, ZPOOL_LOAD_POLICY, &policy) == 0) - fnvlist_add_nvlist(mos_config, ZPOOL_REWIND_POLICY, policy); + fnvlist_add_nvlist(mos_config, ZPOOL_LOAD_POLICY, policy); spa_config_set(spa, mos_config); spa->spa_config_source = SPA_CONFIG_SRC_MOS; @@ -4036,11 +4036,11 @@ spa_open_common(const char *pool, spa_t **spapp, void } if (spa->spa_state == POOL_STATE_UNINITIALIZED) { - zpool_rewind_policy_t policy; + zpool_load_policy_t policy; - zpool_get_rewind_policy(nvpolicy ? nvpolicy : spa->spa_config, + zpool_get_load_policy(nvpolicy ? nvpolicy : spa->spa_config, &policy); - if (policy.zrp_request & ZPOOL_DO_REWIND) + if (policy.zlp_rewind & ZPOOL_DO_REWIND) state = SPA_LOAD_RECOVER; spa_activate(spa, spa_mode_global); @@ -4050,8 +4050,8 @@ spa_open_common(const char *pool, spa_t **spapp, void spa->spa_config_source = SPA_CONFIG_SRC_CACHEFILE; zfs_dbgmsg("spa_open_common: opening %s", pool); - error = spa_load_best(spa, state, policy.zrp_txg, - policy.zrp_request); + error = spa_load_best(spa, state, policy.zlp_txg, + policy.zlp_rewind); if (error == EBADF) { /* @@ -5018,7 +5018,7 @@ spa_import(const char *pool, nvlist_t *config, nvlist_ spa_t *spa; char *altroot = NULL; spa_load_state_t state = SPA_LOAD_IMPORT; - zpool_rewind_policy_t policy; + zpool_load_policy_t policy; uint64_t mode = spa_mode_global; uint64_t readonly = B_FALSE; int error; @@ -5069,8 +5069,8 @@ spa_import(const char *pool, nvlist_t *config, nvlist_ */ spa_async_suspend(spa); - zpool_get_rewind_policy(config, &policy); - if (policy.zrp_request & ZPOOL_DO_REWIND) + zpool_get_load_policy(config, &policy); + if (policy.zlp_rewind & ZPOOL_DO_REWIND) state = SPA_LOAD_RECOVER; spa->spa_config_source = SPA_CONFIG_SRC_TRYIMPORT; @@ -5080,9 +5080,9 @@ spa_import(const char *pool, nvlist_t *config, nvlist_ zfs_dbgmsg("spa_import: importing %s", pool); } else { zfs_dbgmsg("spa_import: importing %s, max_txg=%lld " - "(RECOVERY MODE)", pool, (longlong_t)policy.zrp_txg); + "(RECOVERY MODE)", pool, (longlong_t)policy.zlp_txg); } - error = spa_load_best(spa, state, policy.zrp_txg, policy.zrp_request); + error = spa_load_best(spa, state, policy.zlp_txg, policy.zlp_rewind); /* * Propagate anything learned while loading the pool and pass it @@ -5204,7 +5204,7 @@ spa_tryimport(nvlist_t *tryconfig) spa_t *spa; uint64_t state; int error; - zpool_rewind_policy_t policy; + zpool_load_policy_t policy; if (nvlist_lookup_string(tryconfig, ZPOOL_CONFIG_POOL_NAME, &poolname)) return (NULL); @@ -5220,16 +5220,14 @@ spa_tryimport(nvlist_t *tryconfig) spa_activate(spa, FREAD); /* - * Rewind pool if a max txg was provided. Note that even though we - * retrieve the complete rewind policy, only the rewind txg is relevant - * for tryimport. + * Rewind pool if a max txg was provided. */ - zpool_get_rewind_policy(spa->spa_config, &policy); - if (policy.zrp_txg != UINT64_MAX) { - spa->spa_load_max_txg = policy.zrp_txg; + zpool_get_load_policy(spa->spa_config, &policy); + if (policy.zlp_txg != UINT64_MAX) { + spa->spa_load_max_txg = policy.zlp_txg; spa->spa_extreme_rewind = B_TRUE; zfs_dbgmsg("spa_tryimport: importing %s, max_txg=%lld", - poolname, (longlong_t)policy.zrp_txg); + poolname, (longlong_t)policy.zlp_txg); } else { zfs_dbgmsg("spa_tryimport: importing %s", poolname); } Modified: vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h ============================================================================== --- vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Wed Mar 28 22:10:06 2018 (r331705) +++ vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Wed Mar 28 22:16:51 2018 (r331706) @@ -491,7 +491,7 @@ typedef enum { #define ZPL_VERSION_USERSPACE ZPL_VERSION_4 #define ZPL_VERSION_SA ZPL_VERSION_5 -/* Rewind request information */ +/* Rewind policy information */ #define ZPOOL_NO_REWIND 1 /* No policy - default behavior */ #define ZPOOL_NEVER_REWIND 2 /* Do not search for best txg or rewind */ #define ZPOOL_TRY_REWIND 4 /* Search for best txg, but do not rewind */ @@ -500,12 +500,12 @@ typedef enum { #define ZPOOL_REWIND_MASK 28 /* All the possible rewind bits */ #define ZPOOL_REWIND_POLICIES 31 /* All the possible policy bits */ -typedef struct zpool_rewind_policy { - uint32_t zrp_request; /* rewind behavior requested */ - uint64_t zrp_maxmeta; /* max acceptable meta-data errors */ - uint64_t zrp_maxdata; /* max acceptable data errors */ - uint64_t zrp_txg; /* specific txg to load */ -} zpool_rewind_policy_t; +typedef struct zpool_load_policy { + uint32_t zlp_rewind; /* rewind policy requested */ + uint64_t zlp_maxmeta; /* max acceptable meta-data errors */ + uint64_t zlp_maxdata; /* max acceptable data errors */ + uint64_t zlp_txg; /* specific txg to load */ +} zpool_load_policy_t; /* * The following are configuration names used in the nvlist describing a pool's @@ -593,12 +593,12 @@ typedef struct zpool_rewind_policy { #define ZPOOL_CONFIG_FRU "fru" #define ZPOOL_CONFIG_AUX_STATE "aux_state" -/* Rewind policy parameters */ -#define ZPOOL_REWIND_POLICY "rewind-policy" -#define ZPOOL_REWIND_REQUEST "rewind-request" -#define ZPOOL_REWIND_REQUEST_TXG "rewind-request-txg" -#define ZPOOL_REWIND_META_THRESH "rewind-meta-thresh" -#define ZPOOL_REWIND_DATA_THRESH "rewind-data-thresh" +/* Pool load policy parameters */ +#define ZPOOL_LOAD_POLICY "load-policy" +#define ZPOOL_LOAD_REWIND_POLICY "load-rewind-policy" +#define ZPOOL_LOAD_REQUEST_TXG "load-request-txg" +#define ZPOOL_LOAD_META_THRESH "load-meta-thresh" +#define ZPOOL_LOAD_DATA_THRESH "load-data-thresh" /* Rewind data discovered */ #define ZPOOL_CONFIG_LOAD_TIME "rewind_txg_ts" From owner-svn-src-vendor@freebsd.org Wed Mar 28 22:16:53 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4DDC5F4E010; Wed, 28 Mar 2018 22:16:53 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EED3885BC9; Wed, 28 Mar 2018 22:16:52 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id E9F4E10F00; Wed, 28 Mar 2018 22:16:52 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SMGq3Y035982; Wed, 28 Mar 2018 22:16:52 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SMGqUK035976; Wed, 28 Mar 2018 22:16:52 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282216.w2SMGqUK035976@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 22:16:52 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331706 - vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/sys/fs vendor/illumos/dist/cmd/zdb vendor/illumos/dist/cmd/zpoo... X-SVN-Group: vendor X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/common/zfs vendor-sys/illumos/dist/uts/common/fs/zfs vendor-sys/illumos/dist/uts/common/sys/fs vendor/illumos/dist/cmd/zdb vendor/illumos/dist/cmd/zpool vendor/illumos/dist/lib... X-SVN-Commit-Revision: 331706 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 22:16:53 -0000 Author: mav Date: Wed Mar 28 22:16:51 2018 New Revision: 331706 URL: https://svnweb.freebsd.org/changeset/base/331706 Log: 9235 rename zpool_rewind_policy_t to zpool_load_policy_t illumos/illumos-gate@5dafeea3ebd2dd77affc802bcb90f63faf01589f We want to be able to pass various settings during import/open of a pool, which are not only related to rewind. Instead of adding a new policy and duplicate a bunch of code, we should just rename rewind_policy to a more generic term like load_policy. For instance, we'd like to set spa->spa_import_flags from the nvlist, rather from a flags parameter passed to spa_import as in some cases we want those flags not only for the import case, but also for the open case. One such flag could be ZFS_IMPORT_MISSING_LOG (as used in zdb) which would allow zfs to open a pool when logs are missing. Reviewed by: Matt Ahrens Reviewed by: George Wilson Approved by: Robert Mustacchi Author: Pavel Zakharov Modified: vendor/illumos/dist/cmd/zdb/zdb.c vendor/illumos/dist/cmd/zpool/zpool_main.c vendor/illumos/dist/lib/libzfs/common/libzfs.h vendor/illumos/dist/lib/libzfs/common/libzfs_import.c vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c Changes in other areas also in this revision: Modified: vendor-sys/illumos/dist/common/zfs/zfs_comutil.c vendor-sys/illumos/dist/common/zfs/zfs_comutil.h vendor-sys/illumos/dist/uts/common/fs/zfs/spa.c vendor-sys/illumos/dist/uts/common/sys/fs/zfs.h Modified: vendor/illumos/dist/cmd/zdb/zdb.c ============================================================================== --- vendor/illumos/dist/cmd/zdb/zdb.c Wed Mar 28 22:10:06 2018 (r331705) +++ vendor/illumos/dist/cmd/zdb/zdb.c Wed Mar 28 22:16:51 2018 (r331706) @@ -5183,8 +5183,8 @@ main(int argc, char **argv) (dump_opt['X'] ? ZPOOL_EXTREME_REWIND : 0); if (nvlist_alloc(&policy, NV_UNIQUE_NAME_TYPE, 0) != 0 || - nvlist_add_uint64(policy, ZPOOL_REWIND_REQUEST_TXG, max_txg) != 0 || - nvlist_add_uint32(policy, ZPOOL_REWIND_REQUEST, rewind) != 0) + nvlist_add_uint64(policy, ZPOOL_LOAD_REQUEST_TXG, max_txg) != 0 || + nvlist_add_uint32(policy, ZPOOL_LOAD_REWIND_POLICY, rewind) != 0) fatal("internal error: %s", strerror(ENOMEM)); error = 0; @@ -5201,7 +5201,7 @@ main(int argc, char **argv) } if (nvlist_add_nvlist(cfg, - ZPOOL_REWIND_POLICY, policy) != 0) { + ZPOOL_LOAD_POLICY, policy) != 0) { fatal("can't open '%s': %s", target, strerror(ENOMEM)); } Modified: vendor/illumos/dist/cmd/zpool/zpool_main.c ============================================================================== --- vendor/illumos/dist/cmd/zpool/zpool_main.c Wed Mar 28 22:10:06 2018 (r331705) +++ vendor/illumos/dist/cmd/zpool/zpool_main.c Wed Mar 28 22:16:51 2018 (r331706) @@ -2325,8 +2325,9 @@ zpool_do_import(int argc, char **argv) /* In the future, we can capture further policy and include it here */ if (nvlist_alloc(&policy, NV_UNIQUE_NAME, 0) != 0 || - nvlist_add_uint64(policy, ZPOOL_REWIND_REQUEST_TXG, txg) != 0 || - nvlist_add_uint32(policy, ZPOOL_REWIND_REQUEST, rewind_policy) != 0) + nvlist_add_uint64(policy, ZPOOL_LOAD_REQUEST_TXG, txg) != 0 || + nvlist_add_uint32(policy, ZPOOL_LOAD_REWIND_POLICY, + rewind_policy) != 0) goto error; if (searchdirs == NULL) { @@ -2451,7 +2452,7 @@ zpool_do_import(int argc, char **argv) if (do_destroyed && pool_state != POOL_STATE_DESTROYED) continue; - verify(nvlist_add_nvlist(config, ZPOOL_REWIND_POLICY, + verify(nvlist_add_nvlist(config, ZPOOL_LOAD_POLICY, policy) == 0); if (argc == 0) { @@ -3939,8 +3940,10 @@ zpool_do_clear(int argc, char **argv) /* In future, further rewind policy choices can be passed along here */ if (nvlist_alloc(&policy, NV_UNIQUE_NAME, 0) != 0 || - nvlist_add_uint32(policy, ZPOOL_REWIND_REQUEST, rewind_policy) != 0) + nvlist_add_uint32(policy, ZPOOL_LOAD_REWIND_POLICY, + rewind_policy) != 0) { return (1); + } pool = argv[0]; device = argc == 2 ? argv[1] : NULL; Modified: vendor/illumos/dist/lib/libzfs/common/libzfs.h ============================================================================== --- vendor/illumos/dist/lib/libzfs/common/libzfs.h Wed Mar 28 22:10:06 2018 (r331705) +++ vendor/illumos/dist/lib/libzfs/common/libzfs.h Wed Mar 28 22:16:51 2018 (r331706) @@ -393,7 +393,7 @@ typedef struct importargs { int can_be_active : 1; /* can the pool be active? */ int unique : 1; /* does 'poolname' already exist? */ int exists : 1; /* set on return if pool already exists */ - nvlist_t *policy; /* rewind policy (rewind txg, etc.) */ + nvlist_t *policy; /* load policy (max txg, rewind, etc.) */ } importargs_t; extern nvlist_t *zpool_search_import(libzfs_handle_t *, importargs_t *); Modified: vendor/illumos/dist/lib/libzfs/common/libzfs_import.c ============================================================================== --- vendor/illumos/dist/lib/libzfs/common/libzfs_import.c Wed Mar 28 22:10:06 2018 (r331705) +++ vendor/illumos/dist/lib/libzfs/common/libzfs_import.c Wed Mar 28 22:16:51 2018 (r331706) @@ -21,7 +21,7 @@ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. - * Copyright (c) 2012, 2016 by Delphix. All rights reserved. + * Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright 2015 RackTop Systems. * Copyright 2017 Nexenta Systems, Inc. */ @@ -748,7 +748,7 @@ get_configs(libzfs_handle_t *hdl, pool_list_t *pl, boo } if (policy != NULL) { - if (nvlist_add_nvlist(config, ZPOOL_REWIND_POLICY, + if (nvlist_add_nvlist(config, ZPOOL_LOAD_POLICY, policy) != 0) goto nomem; } Modified: vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c ============================================================================== --- vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c Wed Mar 28 22:10:06 2018 (r331705) +++ vendor/illumos/dist/lib/libzfs/common/libzfs_pool.c Wed Mar 28 22:16:51 2018 (r331706) @@ -1714,7 +1714,7 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *con nvlist_t *props, int flags) { zfs_cmd_t zc = { 0 }; - zpool_rewind_policy_t policy; + zpool_load_policy_t policy; nvlist_t *nv = NULL; nvlist_t *nvinfo = NULL; nvlist_t *missing = NULL; @@ -1786,7 +1786,7 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *con zcmd_free_nvlists(&zc); - zpool_get_rewind_policy(config, &policy); + zpool_get_load_policy(config, &policy); if (error) { char desc[1024]; @@ -1795,7 +1795,7 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *con * Dry-run failed, but we print out what success * looks like if we found a best txg */ - if (policy.zrp_request & ZPOOL_TRY_REWIND) { + if (policy.zlp_rewind & ZPOOL_TRY_REWIND) { zpool_rewind_exclaim(hdl, newname ? origname : thename, B_TRUE, nv); nvlist_free(nv); @@ -1888,10 +1888,10 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *con ret = -1; else if (zhp != NULL) zpool_close(zhp); - if (policy.zrp_request & + if (policy.zlp_rewind & (ZPOOL_DO_REWIND | ZPOOL_TRY_REWIND)) { zpool_rewind_exclaim(hdl, newname ? origname : thename, - ((policy.zrp_request & ZPOOL_TRY_REWIND) != 0), nv); + ((policy.zlp_rewind & ZPOOL_TRY_REWIND) != 0), nv); } nvlist_free(nv); return (0); @@ -3268,7 +3268,7 @@ zpool_clear(zpool_handle_t *zhp, const char *path, nvl zfs_cmd_t zc = { 0 }; char msg[1024]; nvlist_t *tgt; - zpool_rewind_policy_t policy; + zpool_load_policy_t policy; boolean_t avail_spare, l2cache; libzfs_handle_t *hdl = zhp->zpool_hdl; nvlist_t *nvi = NULL; @@ -3300,8 +3300,8 @@ zpool_clear(zpool_handle_t *zhp, const char *path, nvl &zc.zc_guid) == 0); } - zpool_get_rewind_policy(rewindnvl, &policy); - zc.zc_cookie = policy.zrp_request; + zpool_get_load_policy(rewindnvl, &policy); + zc.zc_cookie = policy.zlp_rewind; if (zcmd_alloc_dst_nvlist(hdl, &zc, zhp->zpool_config_size * 2) != 0) return (-1); @@ -3317,13 +3317,13 @@ zpool_clear(zpool_handle_t *zhp, const char *path, nvl } } - if (!error || ((policy.zrp_request & ZPOOL_TRY_REWIND) && + if (!error || ((policy.zlp_rewind & ZPOOL_TRY_REWIND) && errno != EPERM && errno != EACCES)) { - if (policy.zrp_request & + if (policy.zlp_rewind & (ZPOOL_DO_REWIND | ZPOOL_TRY_REWIND)) { (void) zcmd_read_dst_nvlist(hdl, &zc, &nvi); zpool_rewind_exclaim(hdl, zc.zc_name, - ((policy.zrp_request & ZPOOL_TRY_REWIND) != 0), + ((policy.zlp_rewind & ZPOOL_TRY_REWIND) != 0), nvi); nvlist_free(nvi); } From owner-svn-src-vendor@freebsd.org Wed Mar 28 22:43:55 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8BE1F5032B; Wed, 28 Mar 2018 22:43:55 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6C46287038; Wed, 28 Mar 2018 22:43:55 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 673B81140A; Wed, 28 Mar 2018 22:43:55 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SMht26051021; Wed, 28 Mar 2018 22:43:55 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SMhtES051020; Wed, 28 Mar 2018 22:43:55 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282243.w2SMhtES051020@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 22:43:55 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331708 - vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Commit-Revision: 331708 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 22:43:55 -0000 Author: mav Date: Wed Mar 28 22:43:55 2018 New Revision: 331708 URL: https://svnweb.freebsd.org/changeset/base/331708 Log: 9321 arc_loan_compressed_buf() can increment arc_loaned_bytes by the wrong value illumos/illumos-gate@9be12bd737714550277bd02b0c693db560976990 arc_loan_compressed_buf() increments arc_loaned_bytes by psize unconditionally In the case of zfs_compressed_arc_enabled=0, when the buf is returned via arc_return_buf(), if ARC_BUF_COMPRESSED(buf) is false, then arc_loaned_bytes is decremented by lsize, not psize. Switch to using arc_buf_size(buf), instead of psize, which will return psize or lsize, depending on the result of ARC_BUF_COMPRESSED(buf). Reviewed by: Matt Ahrens Reviewed by: George Wilson Approved by: Garrett D'Amore Author: Allan Jude Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/arc.c Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/arc.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/arc.c Wed Mar 28 22:29:06 2018 (r331707) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/arc.c Wed Mar 28 22:43:55 2018 (r331708) @@ -2557,7 +2557,7 @@ arc_loan_buf(spa_t *spa, boolean_t is_metadata, int si arc_buf_t *buf = arc_alloc_buf(spa, arc_onloan_tag, is_metadata ? ARC_BUFC_METADATA : ARC_BUFC_DATA, size); - arc_loaned_bytes_update(size); + arc_loaned_bytes_update(arc_buf_size(buf)); return (buf); } @@ -2569,7 +2569,7 @@ arc_loan_compressed_buf(spa_t *spa, uint64_t psize, ui arc_buf_t *buf = arc_alloc_compressed_buf(spa, arc_onloan_tag, psize, lsize, compression_type); - arc_loaned_bytes_update(psize); + arc_loaned_bytes_update(arc_buf_size(buf)); return (buf); } From owner-svn-src-vendor@freebsd.org Wed Mar 28 22:57:03 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9E602F51684; Wed, 28 Mar 2018 22:57:03 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4875887A7E; Wed, 28 Mar 2018 22:57:03 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 3AF3D115BB; Wed, 28 Mar 2018 22:57:03 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SMv3pe056221; Wed, 28 Mar 2018 22:57:03 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SMv3WN056220; Wed, 28 Mar 2018 22:57:03 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282257.w2SMv3WN056220@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 22:57:03 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331710 - vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/uts/common/fs/zfs X-SVN-Commit-Revision: 331710 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 22:57:03 -0000 Author: mav Date: Wed Mar 28 22:57:02 2018 New Revision: 331710 URL: https://svnweb.freebsd.org/changeset/base/331710 Log: 9188 increase size of dbuf cache to reduce indirect block decompression illumos/illumos-gate@268bbb2a2fa79c36d4695d13a595ba50a7754b76 With compressed ARC (6950) we use up to 25% of our CPU to decompress indirect blocks, under a workload of random cached reads. To reduce this decompression cost, we would like to increase the size of the dbuf cache so that more indirect blocks can be stored uncompressed. If we are caching entire large files of recordsize=8K, the indirect blocks use 1/64th as much memory as the data blocks (assuming they have the same compression ratio). We suggest making the dbuf cache be 1/32nd of all memory, so that in this scenario we should be able to keep all the indirect blocks decompressed in the dbuf cache. (We want it to be more than the 1/64th that the indirect blocks would use because we need to cache other stuff in the dbuf cache as well.) In real world workloads, this won't help as dramatically as the example above, but we think it's still worth it because the risk of decreasing performance is low. The potential negative performance impact is that we will be slightly reducing the size of the ARC (by ~3%). Reviewed by: Dan Kimmel Reviewed by: Prashanth Sreenivasa Reviewed by: Paul Dagnelie Reviewed by: Sanjay Nadkarni Reviewed by: Allan Jude Reviewed by: Igor Kozhukhov Approved by: Garrett D'Amore Author: George Wilson Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Wed Mar 28 22:50:05 2018 (r331709) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/dbuf.c Wed Mar 28 22:57:02 2018 (r331710) @@ -85,10 +85,10 @@ static boolean_t dbuf_evict_thread_exit; */ static multilist_t *dbuf_cache; static refcount_t dbuf_cache_size; -uint64_t dbuf_cache_max_bytes = 100 * 1024 * 1024; +uint64_t dbuf_cache_max_bytes = 0; -/* Cap the size of the dbuf cache to log2 fraction of arc size. */ -int dbuf_cache_max_shift = 5; +/* Set the default size of the dbuf cache to log2 fraction of arc size. */ +int dbuf_cache_shift = 5; /* * The dbuf cache uses a three-stage eviction policy: @@ -600,11 +600,15 @@ retry: mutex_init(&h->hash_mutexes[i], NULL, MUTEX_DEFAULT, NULL); /* - * Setup the parameters for the dbuf cache. We cap the size of the - * dbuf cache to 1/32nd (default) of the size of the ARC. + * Setup the parameters for the dbuf cache. We set the size of the + * dbuf cache to 1/32nd (default) of the size of the ARC. If the value + * has been set in /etc/system and it's not greater than the size of + * the ARC, then we honor that value. */ - dbuf_cache_max_bytes = MIN(dbuf_cache_max_bytes, - arc_max_bytes() >> dbuf_cache_max_shift); + if (dbuf_cache_max_bytes == 0 || + dbuf_cache_max_bytes >= arc_max_bytes()) { + dbuf_cache_max_bytes = arc_max_bytes() >> dbuf_cache_shift; + } /* * All entries are queued via taskq_dispatch_ent(), so min/maxalloc From owner-svn-src-vendor@freebsd.org Wed Mar 28 23:12:04 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 793E5F52BEF; Wed, 28 Mar 2018 23:12:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2A9896866A; Wed, 28 Mar 2018 23:12:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 257D2117B7; Wed, 28 Mar 2018 23:12:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SNC4bB065367; Wed, 28 Mar 2018 23:12:04 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SNC4iL065366; Wed, 28 Mar 2018 23:12:04 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282312.w2SNC4iL065366@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 23:12:04 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331712 - vendor-sys/illumos/dist/uts/common/fs/zfs vendor/illumos/dist/cmd/ztest X-SVN-Group: vendor-sys X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/uts/common/fs/zfs vendor/illumos/dist/cmd/ztest X-SVN-Commit-Revision: 331712 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 23:12:04 -0000 Author: mav Date: Wed Mar 28 23:12:03 2018 New Revision: 331712 URL: https://svnweb.freebsd.org/changeset/base/331712 Log: 9280 Assertion failure while running removal_with_ganging test with 4K devices illumos/illumos-gate@243952c7eeef020886e3e2e3df99a513df40584a Reviewed by: George Wilson Reviewed by: John Kennedy Approved by: Garrett D'Amore Author: Matt Ahrens Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Changes in other areas also in this revision: Modified: vendor/illumos/dist/cmd/ztest/ztest.c Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c ============================================================================== --- vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Wed Mar 28 23:05:48 2018 (r331711) +++ vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Wed Mar 28 23:12:03 2018 (r331712) @@ -41,7 +41,7 @@ ((flags) & (METASLAB_GANG_CHILD | METASLAB_GANG_HEADER)) uint64_t metaslab_aliquot = 512ULL << 10; -uint64_t metaslab_gang_bang = SPA_MAXBLOCKSIZE + 1; /* force gang blocks */ +uint64_t metaslab_force_ganging = SPA_MAXBLOCKSIZE + 1; /* force gang blocks */ /* * Since we can touch multiple metaslabs (and their respective space maps) @@ -3080,7 +3080,7 @@ metaslab_alloc_dva(spa_t *spa, metaslab_class_t *mc, u /* * For testing, make some blocks above a certain size be gang blocks. */ - if (psize >= metaslab_gang_bang && (ddi_get_lbolt() & 3) == 0) { + if (psize >= metaslab_force_ganging && (ddi_get_lbolt() & 3) == 0) { metaslab_trace_add(zal, NULL, NULL, psize, d, TRACE_FORCE_GANG); return (SET_ERROR(ENOSPC)); } From owner-svn-src-vendor@freebsd.org Wed Mar 28 23:12:04 2018 Return-Path: Delivered-To: svn-src-vendor@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4533F52BF7; Wed, 28 Mar 2018 23:12:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 675526866C; Wed, 28 Mar 2018 23:12:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 62524117B9; Wed, 28 Mar 2018 23:12:04 +0000 (UTC) (envelope-from mav@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w2SNC4W5065373; Wed, 28 Mar 2018 23:12:04 GMT (envelope-from mav@FreeBSD.org) Received: (from mav@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w2SNC4R4065372; Wed, 28 Mar 2018 23:12:04 GMT (envelope-from mav@FreeBSD.org) Message-Id: <201803282312.w2SNC4R4065372@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mav set sender to mav@FreeBSD.org using -f From: Alexander Motin Date: Wed, 28 Mar 2018 23:12:04 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-vendor@freebsd.org Subject: svn commit: r331712 - vendor-sys/illumos/dist/uts/common/fs/zfs vendor/illumos/dist/cmd/ztest X-SVN-Group: vendor X-SVN-Commit-Author: mav X-SVN-Commit-Paths: vendor-sys/illumos/dist/uts/common/fs/zfs vendor/illumos/dist/cmd/ztest X-SVN-Commit-Revision: 331712 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-vendor@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the vendor work area tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Mar 2018 23:12:05 -0000 Author: mav Date: Wed Mar 28 23:12:03 2018 New Revision: 331712 URL: https://svnweb.freebsd.org/changeset/base/331712 Log: 9280 Assertion failure while running removal_with_ganging test with 4K devices illumos/illumos-gate@243952c7eeef020886e3e2e3df99a513df40584a Reviewed by: George Wilson Reviewed by: John Kennedy Approved by: Garrett D'Amore Author: Matt Ahrens Modified: vendor/illumos/dist/cmd/ztest/ztest.c Changes in other areas also in this revision: Modified: vendor-sys/illumos/dist/uts/common/fs/zfs/metaslab.c Modified: vendor/illumos/dist/cmd/ztest/ztest.c ============================================================================== --- vendor/illumos/dist/cmd/ztest/ztest.c Wed Mar 28 23:05:48 2018 (r331711) +++ vendor/illumos/dist/cmd/ztest/ztest.c Wed Mar 28 23:12:03 2018 (r331712) @@ -162,7 +162,7 @@ typedef struct ztest_shared_opts { int zo_init; uint64_t zo_time; uint64_t zo_maxloops; - uint64_t zo_metaslab_gang_bang; + uint64_t zo_metaslab_force_ganging; } ztest_shared_opts_t; static const ztest_shared_opts_t ztest_opts_defaults = { @@ -184,10 +184,10 @@ static const ztest_shared_opts_t ztest_opts_defaults = .zo_init = 1, .zo_time = 300, /* 5 minutes */ .zo_maxloops = 50, /* max loops during spa_freeze() */ - .zo_metaslab_gang_bang = 32 << 10 + .zo_metaslab_force_ganging = 32 << 10 }; -extern uint64_t metaslab_gang_bang; +extern uint64_t metaslab_force_ganging; extern uint64_t metaslab_df_alloc_threshold; extern uint64_t zfs_deadman_synctime_ms; extern int metaslab_preload_limit; @@ -565,12 +565,12 @@ usage(boolean_t requested) const ztest_shared_opts_t *zo = &ztest_opts_defaults; char nice_vdev_size[NN_NUMBUF_SZ]; - char nice_gang_bang[NN_NUMBUF_SZ]; + char nice_force_ganging[NN_NUMBUF_SZ]; FILE *fp = requested ? stdout : stderr; nicenum(zo->zo_vdev_size, nice_vdev_size, sizeof (nice_vdev_size)); - nicenum(zo->zo_metaslab_gang_bang, nice_gang_bang, - sizeof (nice_gang_bang)); + nicenum(zo->zo_metaslab_force_ganging, nice_force_ganging, + sizeof (nice_force_ganging)); (void) fprintf(fp, "Usage: %s\n" "\t[-v vdevs (default: %llu)]\n" @@ -605,7 +605,7 @@ usage(boolean_t requested) zo->zo_raidz_parity, /* -R */ zo->zo_datasets, /* -d */ zo->zo_threads, /* -t */ - nice_gang_bang, /* -g */ + nice_force_ganging, /* -g */ zo->zo_init, /* -i */ (u_longlong_t)zo->zo_killrate, /* -k */ zo->zo_pool, /* -p */ @@ -674,8 +674,8 @@ process_options(int argc, char **argv) zo->zo_threads = MAX(1, value); break; case 'g': - zo->zo_metaslab_gang_bang = MAX(SPA_MINBLOCKSIZE << 1, - value); + zo->zo_metaslab_force_ganging = + MAX(SPA_MINBLOCKSIZE << 1, value); break; case 'i': zo->zo_init = value; @@ -6418,7 +6418,7 @@ main(int argc, char **argv) zs = ztest_shared; if (fd_data_str) { - metaslab_gang_bang = ztest_opts.zo_metaslab_gang_bang; + metaslab_force_ganging = ztest_opts.zo_metaslab_force_ganging; metaslab_df_alloc_threshold = zs->zs_metaslab_df_alloc_threshold;