Author Archives: Blair Bethwaite

Computational Resource Framework

Over the past year the Monash NeCTAR porting programme has worked with Griffith University eResearch developers on their Computational Resource Framework (CRF) project. We’re well overdue to promote their excellent work (and a little of ours), and as there will be a presentation on this at eResearch Australasia (A RESTful Web Service for High Performance Computing based on Nimrod/G), now seems a good time to blog about it!

hpcportal_login_sm

The CRF aims to address one of the long-standing issues in HPC, that is uptake by non-technical users. HPC is a domain with a well-entrenched command-line ethos, unfortunately this does alienate a large cohort of potential users, and that has negative implications for research productivity and collaboration. At Monash, our HPC specialists go to a great deal of effort to ease new users into the CLI (command line interface) environment, however, this is a labour-intensive process that doesn’t catch everybody and often leaves users reliant on individual consultants or team-members.

For some time portals have been the go-to panacea for democratising HPC and scientific computing, and there are some great systems being deployed on the RC, but they still seem to require a large amount of development effort to build and typically cater to a particular domain. Another common issue with “job-management” style portals (including Nimrod’s own ancient CGI beauty) is that they expose and delegate too much information and responsibility to the end user – typically an end user who doesn’t know, or want to know, about the intricacies of the various computational resources. Such mundanities as what credentials they need, which ones are actually working today, etc.

The CRF is different in this respect as it is not a domain-specific interface, instead the Griffith team have taken the approach of concentrating on a minimum set of functionality for some popular HPC and scientific computing applications. The user just inputs info relevant to the application and the CRF configuration determines where and how the job is run. Currently there are UIs for creating, submitting and managing NAMD, R and MATLAB jobs and sets thereof; AutoDock, Octave and Gaussian are in the works. They’ve put a bunch of effort into building out a web-service in front of Nimrod/G, that layer means their UIs are fairly self-contained bits of php that present an interface and translate inputs into a template Nimrod experiment. The web-service has also enabled them to build some other cool bits and pieces, like experiment submission and results over email.

hpcportal_namd_sm

Using Nimrod/G as the back-end resource meta-scheduler means the CRF can intermingle jobs over the local cluster and burst or overflow into the NeCTAR Research Cloud, and that’s the focus of the Monash & Griffith collaboration for this project. We’re now looking to implement scheduling functionality that will make it possible to put simple policies on resources, e.g., use my local cluster but if jobs don’t start within 30 mins then head to the cloud. There should be some updates following in that area after the eResearch conference!

Volumes on R@CMon

The Monash node now has a block-storage volume service in pre-production. The volumes are delivered via OpenStack Cinder and Ceph. Cinder hooks up networked real or virtual block devices to your virtual machine instances from a huge variety of backend storage systems. This gives you persistent storage on-demand, in the mold of Amazon EBS.  Like object storage, volumes exist independent of your instances; but unlike object storage, volumes provide virtual direct-attached storage suitable for your favourite filesystem, database or application store. Some tools built on top of the RC require volumes to unleash their full abilities, e.g., Galaxy. There is more general information about the volume service on the NeCTAR wiki Research Cloud Storage page.

Originally, NeCTAR never specifically required the RC nodes to provide persistent block-storage, only to capable of interfacing with it at some stage in the future – clearly that was an intended meeting/union point (in budget line items) for NeCTAR and RDSI. At Monash we have seeded our volumes capability via NeCTAR and plan to expand it via RDSI. We plan to offer trial quota of about 50GB to those who ask, we’ll give some out via NeCTAR merit-allocation, and more via ReDS. Thanks to the “data-deluge” (please don’t hurt me for repeating that), block-storage will prove to be the scarcest resource across the RC, so we will probably have to implement some form of quota retirement – similar to how HPC resources put tight quotas on and have regular cleanups of their /short or /scratch filesystems.

We expect the Monash volume service to stay in “pre-production” until the end of the year. What we mean by that is that we are giving it best-effort support. It is a highly available and redundant system (Ceph, if you are interested – more specific details below), but we have not operationalised the system administration to give it 24/7 attention, or for that matter done much in the way of tuning. So in summary, we don’t expect or plan to cause any problems/outages, but we also have no fixed service levels or detailed DR plans as yet.

The current setup is running across eight Dell R720xd boxes with a total of 192TB raw storage, all very closely coupled with the monash-01 compute AZ (availability-zone). The “nova” pool has two replicas of each object, so in practice we have about 90TB of usable capacity at the moment. It seems to be working out quite well so we expect to expand this as part of R@CMon stage 2. A bunch of folks are already using it, as you can see from the status:

root@rcstor01:~# ceph -s
 cluster b72.....x
 health HEALTH_OK
 monmap e1: 5 mons at {0=w.x.y.z2:6789/0,1=w.x.y.z3:6789/0,2=w.x.y.z4:6789/0,3=w.x.y.z5:6789/0,4=w.x.y.z6:6789/0}, election epoch 184, quorum 0,1,2,3,4 0,1,2,3,4
 osdmap e696: 96 osds: 96 up, 96 in
 pgmap v2671880: 4800 pgs: 4800 active+clean; 8709 GB data, 18387 GB used, 156 TB / 174 TB avail; 17697B/s rd, 134KB/s wr, 21op/s
 mdsmap e1: 0/0/1 up

The awesome thing about Ceph is the bang for buck and fine-grained scalability. The value proposition is exactly the same as for object storage – you can expand a server at a time at commodity prices per TB (which seem to be about half to two-thirds that of vendor proprietary solutions), you don’t ever need to buy a huge amount more storage than you really need, you keep buying today’s spec of the hardware and reap the capacity and performance improvements, and every time you add a storage server you’re increasing the number of clients you can serve (no other components to scale).

If you think your project needs volumes please put a request in or edit an existing request through the NeCTAR Dashboard

R@CMon accelerates forensic data research

Campbell Wilson and Janis Dalins of Monash’s Faculty of IT are developing a forensic data ranking and analysis tool, focusing on accelerating existing multimedia forensics analytics. Their prototype will collect, index and analyse geo-tagged multimedia in a tool intended for use by international law enforcement agencies for multi-jurisdictional victim identification.

Till recently they were running their own Solr cluster on a set of spare desktop machines which Janis was spending a good deal of time maintaining. Then their NAS box died, badly. R@CMon to the rescue!

Janis and I met previously in the server room at Caulfield where I told him all about the research cloud and the imminent availability of the Monash node, so we were both excited to see what kind of Solr performance they could get on the cloud compared to their dedicated hardware. The sticking point previously had been storage performance (or the lack thereof) on the existing cloud node and the relatively small size of the ephemeral storage available in the standard NeCTAR flavors. Fortunately R@CMon came online just in time and with enough flexibility to help them recompute their lost data. We created a special large flavor with 32 cores, 128 GB RAM and 2.4TB of ephemeral storage – a mini cluster-in-a-box.

Janis kindly shared a recent progress update:

Since restarting our indexing almost a fortnight ago, we’ve successfully indexed over 1.2 billion (1,226,707,169 at last count) tweets. This is almost double the count we achieved on our existing cluster, I’d guesstimate the overall index construction workload/performance as being around 130% that of the cluster. Faceted querying now works, and in a timely fashion.

So performance is great, but Janis also nicely sums up the key empowering aspect of the NeCTAR research cloud:

Beyond simple performance measures, I wanted to emphasise probably the most important thing: Unlike the existing indexing cluster, the NeCTAR instance has been treated very much as a turn-key project. We used the instance “as is” (with the exception of adding user accounts and an nginx based reverse proxy for security reasons), and haven’t performed any maintenance or configuration work. I can’t emphasise enough how much time that’s freed up for actually doing research – in commercial lingo, I’d say that’s improved efficiency greatly.

Thanks a bunch to Janis for taking the time to share the story. If you like the sound of this and think we might be able to help with your research computing please contact merc@monash.edu.

Corollary of local ephemeral storage

I mentioned in the last post that the Monash NeCTAR node is using local direct-attached disk for instance ephemeral storage, but I forgot to expand on the caveats, so here goes:

It’s important to note that this means loss or failure of a hypervisor/compute-node could mean loss of the ephemeral storage of instances on that node at the time, depending on the severity of the failure. In many cases we’ll be able to restart and recover these instances later (of course memory state is gone), but how long that will take depends on the failure, so ephemeral storage availability in the face of hardware issues is unpredictable versus other NeCTAR nodes using shared storage who can (given available time and resources) more readily attempt recovery. However, this is cloud computing roughly mimicking the model pioneered by EC2, so really, just don’t rely on ephemeral storage to hold anything you can’t easily recompute or aren’t readily backing up to persistent storage – and go backup your laptop while you’re thinking about it!

Persistent block storage (aka volumes) will make an appearance on the Monash node soon (hopefully a generally available pre-production service before end of July). University of Melbourne should have theirs in production within a few weeks. So, stay tuned…

R@CMon Phase 1 is go!

Two weeks ago the first phase of the Monash node of the NeCTAR RC (Research Cloud) went live to public (yes, this post is late!). If you’re a RC user you can now select the “monash” cell when launching instances. If you’re not already using the RC then what are you waiting for?! Get to http://www.nectar.org.au/research-cloud and spin up your first server.

You can launch instances to the Monash node via the Dashboard (see pic), through the OS API by specifying either a scheduler hint “cell=monash”, or via the EC2 API specifying availability-zone=”monash”. Those last two instructions are likely to evolve with future versions of the APIs so stay tuned to http://support.rc.nectar.org.au/support/faq.html (and search for “cells”) for current instructions.

RC-dashboard_monash-cell

A bit of background about Monash phase 1.

There’s about 2.3k cores available in the current deployment – phase 2 will more than double this. The hypervisors are running Ubuntu 12.04 LTS on AMD6234 CPUs with 4GB RAM per core.

What’s different from the other NeCTAR RC nodes?

The instance associated ephemeral storage is delivered by a local RAID10 on each hypervisor (as opposed to a pool of shared storage, e.g., NFS share) with live-migration possible thanks to KVM’s live block migration feature (anybody interested should note that block migration is replaced with live block streaming in the upstream QEMU code). This means ephemeral storage performance actually scales as you add instances, more or less predictably subject to activity from co-located guests. As OpenStack incorporates features to limit noisy neighbour syndrome we’ll be able to take full advantage of those too.

We pass host CPU information from the host through libvirt to the guests to give best performance possible to code that can make use of it.

Who built it?

Thanks goes to a big team both here at Monash across MeRC (the Monash eResearch Centre) and eSolutions (Monash’s central ICT provider), and also the folks at NeCTAR and the NeCTAR lead node at University of Melbourne. Of course it wouldn’t have been possible without OpenStack and the community of contributors around it, so thanks all!