From owner-freebsd-questions@freebsd.org Sat Apr 23 14:39:28 2016 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 38A63B19B9D for ; Sat, 23 Apr 2016 14:39:28 +0000 (UTC) (envelope-from galtsev@kicp.uchicago.edu) Received: from cosmo.uchicago.edu (cosmo.uchicago.edu [128.135.70.90]) by mx1.freebsd.org (Postfix) with ESMTP id F35D01A96 for ; Sat, 23 Apr 2016 14:39:27 +0000 (UTC) (envelope-from galtsev@kicp.uchicago.edu) Received: by cosmo.uchicago.edu (Postfix, from userid 48) id 3211DCB8CA1; Sat, 23 Apr 2016 09:39:27 -0500 (CDT) Received: from 76.193.16.109 (SquirrelMail authenticated user valeri) by cosmo.uchicago.edu with HTTP; Sat, 23 Apr 2016 09:39:27 -0500 (CDT) Message-ID: <54220.76.193.16.109.1461422367.squirrel@cosmo.uchicago.edu> In-Reply-To: <571AD606.4000203@holgerdanske.com> References: <29462.128.135.52.6.1461352625.squirrel@cosmo.uchicago.edu> <571AD606.4000203@holgerdanske.com> Date: Sat, 23 Apr 2016 09:39:27 -0500 (CDT) Subject: Re: Storage cluster advise, anybody? From: "Valeri Galtsev" To: "David Christensen" Cc: freebsd-questions@freebsd.org Reply-To: galtsev@kicp.uchicago.edu User-Agent: SquirrelMail/1.4.8-5.el5.centos.7 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Apr 2016 14:39:28 -0000 On Fri, April 22, 2016 8:55 pm, David Christensen wrote: > On 04/22/2016 12:17 PM, Valeri Galtsev wrote: >> I would like to ask everybody: what would you advise to use as a storage >> cluster, or as a distributed filesystem. > ... >> My requirements are: >> >> 1. I would like to have one big (say, comparable to petabyte) >> filesystem, >> accessible on more than one machine, composed of disk space leftovers on >> a >> bunch of machines having 1 gigabit per second ethernet connections >> >> 2. It can be a bit slow, as filesystem one would need for backups onto >> it >> (say, using bacula or bareos), and/or for long term storage of large >> datasets, portions of which can be copied over to faster storage for >> processing if necessary. I would be thinking in 1-2 TB of data written >> to >> it daily. >> >> 3. It would be great to have it single machine failure/reboot resilient >> >> 4. metadata machines should be redundant (or at least backup medatada >> host >> should be manually convertible into master metadata host if fatal >> failure >> to master or corruption of its data happens) >> >> >> What I would like to avoid/exclude: >> >> 1. Proprietary commercial solutions, as: >> >> a. I would like to stay on as minimal budget as possible >> b. I want to be able to predict that it will exist for long time, and I >> have better experience with my predictions of this sort about open >> source >> projects as opposed to proprietary ones >> >> 2. Open source solutions using portions of proprietary closed source >> binaries/libraries (e.g., I would like to stay away from google >> proprietary code/binaries/libraries/modules) >> >> 3. Kernel level modifications. I really would like to have this >> independent of OS as much as I can, or rather available on multiple OSes >> (though I do not like Java based things - just my personal experience >> with >> some of them). I have a bunch of Linux boxes and a bunch of FreeBSD >> boxes, >> and I do not want to exclude neither of them if possible. Also, the need >> to have custom Linux kernel specifically scares me: Linux kernels get >> critical updates often, and having customizations lagging behind the >> need >> of critical update is as unpleasant as rebooting the machine because of >> kernel update is. >> >> I'm not too scared of a "split nature" projects: proprietary projects >> having open source satellite. I have mixed experience with those, using >> open source satellite I mean. Some of them are indeed not neglected, and >> even though you may be missing some features commercial counterpart has, >> some are really great ones: they are just missing commercial support, >> and >> maybe having a bit sparse documentation, thus making you to invest more >> effort into making it work, which I don't mind: I can earn my sysadmin's >> salary here. I would say I more often had good experience with those >> than >> bad one (and I have a list of early indications of potential bad >> outcome, >> so I can more or less predict my future with this kind of projects). >> >> ... moosefs. ... > > If you want solution that works on both BSD and Linux, FUSE comes to mind: > > https://en.wikipedia.org/wiki/Filesystem_in_Userspace > Indeed. That is what moosefs is taking advantage of. Though I'm through with moosefs (at least for quite some time). > > Are all the file system storage member computers located in one room? > In one building? In one campus? > Yes. Same group of racks, connected to the same stack of switches (1 gbps connections). > > You say "composed of disk space leftovers on a bunch of machines". How > many machines? What form of leftovers -- whole disks, whole partitions, > block-contiguous files, non-contiguous files? > About a dozen currently. Mostly large partitions on RAIDs, sometimes whole RAIDS or whole disks. Majority of them are different in size. > > Do you need checksumming, mirroring, RAID, caching, whatever? > Yes if possible, not a must (so I didn't list it in my "requirements"). Yes for RAID (RAID is kind of subset of my "fault tolerance to one machine failure or reboot" requirement) > > What are your degradation expectations in the face of missing/ > unresponsive member machines? Failing drives/ files? > I expect to only have one trouble at a time, meaning one machine at a time. If machine adds more than one "chunk of space" to the bucket, all of these chunks excluded from the bucket simultaneously shouldn't be catastrophic. (Moosefs is pretty good about that as I understand. If you set "goal" 2 meaning each chunk of data has two copies somewhere it makes sure these two copies live on different physical machines) > > Are you going to serve the file system via NFS? Samba? Other? > Doesn't matter. Whichever it is able to do, I will use it. > > You indicate 1-2 TB/day of writes, expected to be backups and dataset > archives. What sort of read traffic do you expect? How many connected > clients? How many hitting it at the same time? Any usage spikes (say, > everyone downloading files at work start, everyone uploading or backing > up at quiting time)? > Write pattern is mostly randomly spread over time, as you usually do with backup scheduling. Read pattern: sometimes small portions of data will be accessed at the best available read rate (as when you recover small directories from backup), and sometimes there may be steady longer read most likely 1-10TB in size to faster storage for processing - at available read speed. Thanks for asking. I didn't mention these details to not overburden my question with details that are of less priority for me and with which I will live once the main goals are met. Valeri > > David > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "freebsd-questions-unsubscribe@freebsd.org" > ++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++