Date: Sat, 9 Nov 2013 10:37:58 -0700 From: Alan Somers <asomers@freebsd.org> To: symbolics@gmx.com Cc: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: freebsd perf testing Message-ID: <CAOtMX2gYw5OO8pUp1-aKHWKibL_puJkM02bVK-aY-tUpfojpFw@mail.gmail.com> In-Reply-To: <20131109133754.GA11249@lemon> References: <527C462F.9040707@elischer.org> <CA%2Bq%2BTcrVYbQVjTKQrAreksZRUEtBF-SUUWp6ojxY_mRZduVCbA@mail.gmail.com> <527D4952.7040407@freebsd.org> <20131109133754.GA11249@lemon>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Nov 9, 2013 at 6:37 AM, <symbolics@gmx.com> wrote: > On Fri, Nov 08, 2013 at 12:28:02PM -0800, Julian Elischer wrote: >> On 11/8/13, 1:54 AM, Olivier Cochard-Labb=E9 wrote: >> > On Fri, Nov 8, 2013 at 3:02 AM, Julian Elischer <julian@elischer.org >> > <mailto:julian@elischer.org>>wrote: >> > >> > Some time ago someone showed some freebsd performance graphs >> > graphed against time. >> > He had them up on a website that was updated each day or so. >> > >> > I think they were network perf tests but I'm not sure. >> > He indicated that he was going to continue the daily testing >> > but I've not seen any mention of them since. >> > >> > If you know who that was or how to find him let me (or gnn) know..= . >> > >> > >> > Hi Julian, >> > >> > Perhaps you are referring to my network performance graphs on this >> > thread: >> > http://lists.freebsd.org/pipermail/freebsd-current/2013-April/041323.h= tml >> > >> >> yes, you are the person we are looking for. >> In yesterday's 'vendor summit' we were discussing performance >> monitoring and your methodology was cited as one worth looking at. >> >> The idea of graphing the output of various performance tests against >> svn commit number is a very good one. >> I thonk it migh teven be worth doing these tests daily, and putting >> the output onto a web site, showing the last month, the last year and >> the whole range. >> it would even be interesting to put out 'xplot' files so that people >> can zoom in and out using xplot to see exactly which revision was >> responsinble for reversions or problems. >> >> George.. this is what we mentioned at the meeting yesterday. >> >> Julian >> > > As it happens I've been thinking over a design for something along these > lines recently. It's just some ideas at the moment but it might be of > interest to others. Forgive me; it's a long E-mail and it gets a bit > pie-in-the-sky too. > > I was prompted to think about the problem in the first place because I > read commit mail and I see performance related changes going into the > tree from time to time. These changes often do not come with any > specific data and when they do its normally quite narrow in focus. For > instance, an organisation contributes performance improvements specific > to their workloads and without interest in anyone elses (fair enough). > > Initially, what I wanted was a way of viewing how performance changed > for a number of workloads on a commit by commit basis. This sounds very > much like what you are after. > > Anyway, after thinking about this for sometime it occurred to me that > much of the infrastructure required to do performance testing could be > generalised to all sorts of software experiments. E.g. software builds, > regression tests, and so on. So, my first conclusion was: build an > experimentation framework within which performance is one aspect. > > Having decided this, I thought about the scope of experiments I wanted > to make. For instance, it would be good to test at least every supported > platform. On top of that I would like to be able to vary the relevant > configuration options too. Taking the product of commit, platform, > n-configuration options (not to mention compilers, etc...) you start to > get some pretty big numbers. The numbers grow far too fast and no person > or even organisation could feasibly cover the hardware resources > required to test every permutation. This led me to my next conclusion: > build a distributed system that allows for anyone to contribute their > hardware to the cause. Collectively the project, vendors, and users > could tackle a big chunk of this. > > My rough sketch for how this would work is as follows. A bootable USB > image would be made for all platforms. This would boot up, connect to > the network and checkout a repository. The first phase of the process > would be to profile what the host can offer. For example, we might have > experiments that require four identical hard drives, or a particular CPU > type, and so on. Shell scripts or short programmes would be written, > e.g. "has-atom-cpu", with these returning either 1 or 0. > > The results of this profiling would be submitted to a service. The > service matches the host with available experiments based on its > particular capabilities and current experimental priorities laid down by > the developers. A priority system would allow for the system to be > controlled precisely. If, for instance, major work is done to the VM > subsystem, relevant experiments could be prioritised over others for a > period. > > Once a decision on the experiment to conduct has been made, the relevant > image must be deployed to the system. Free space on the USB device would > be used a staging area, with a scripted installation occurring after > reboot. The images would need to be built somewhere, since it doesn't > make sense to rebuild the system endlessly, especially if we're > including low-powered embedded devices (which we should be). One > possible solution to this would be to use more powerful contributed > hosts to cross-build images and make them available for download. > > Finally, the experiments would be conducted. Data produced would be > submitted back to the project using another service where it could be > processed and analysed. To keep things flexible this would just consist > of a bunch of flat files, rather than trying to find some standardised, > one-size-fits all format. Statistics and graphics could be performed on > the data with R/Julia/etc. In particular I imagined DTrace scripts being > attached to experiments so that specific data can be collected. If > something warranting further investigation is found the experiment could > be amended with additional scripts, allowing developers to drill down > into issues. > > After some time the process repeats with new image deployed and new > experiments conducted. I envisage some means of identifying individual > hosts so that a developer could repeat the same experiment on the same > host if desired. > > Among the many potential problems with this plan, a big one is how would > we protect contributors privacy and security whilst still having a > realistic test environment? I guess the only way to do this would be to > (1) tell users that they should treat the system as if its hacked and > put it in its own network, (2) prevent the experiment images from > accessing anything besides FreeBSD.org. > > In relation to network performance, this might not be much good, since > multiple hosts might be necessary. It might be possible to build that > into the design too but it's already more than complicated enough. > > Anyhow, I think such a facility could be an asset if could be built. I > may try and put this together, but I've committed myself to enough > things already recently to take this any further at the moment. I'd be > interested to hear what people think, naturally. > This sounds exactly like the Phoronix test suite and its web-based reporting platform, openbenchmarking.org. It already has a large number of benchmarks to choose from, and it runs on FreeBSD. The downsides are that it can't do anything involving multiple hosts and it doesn't have a good interface to query results vs machine parameters, eg how does the score of benchmark X vary with the amount of RAM? But it's open-source, and I'm sure that patches are welcome ;) -Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gYw5OO8pUp1-aKHWKibL_puJkM02bVK-aY-tUpfojpFw>