Last July 9-12 2013, Bioplatforms Australia and CSIRO conducted the latest NGS (Next Generation Sequencing) training workshop at Monash University.
This is the second NGS training workshop organised by Bioplatforms Australia that is held at Monash University since the very first, last year.
Using the same but improved tools and machine image from the very first workshop, the team provisioned virtual machines on the Monash node of the NeCTAR Research Cloud. See this announcement regarding the Monash node.
Bioplatforms Australia has conducted 7 NGS training workshops across Australia (Melbourne, Sydney, Brisbane, Adelaide, Canberra and Perth) with a total of 190 attendees in the last 12 months with 2 more workshops planned later this year.
Future Bioplatforms Australia workshops are listed on their website.
Campbell Wilson and Janis Dalins of Monash’s Faculty of IT are developing a forensic data ranking and analysis tool, focusing on accelerating existing multimedia forensics analytics. Their prototype will collect, index and analyse geo-tagged multimedia in a tool intended for use by international law enforcement agencies for multi-jurisdictional victim identification.
Till recently they were running their own Solr cluster on a set of spare desktop machines which Janis was spending a good deal of time maintaining. Then their NAS box died, badly. R@CMon to the rescue!
Janis and I met previously in the server room at Caulfield where I told him all about the research cloud and the imminent availability of the Monash node, so we were both excited to see what kind of Solr performance they could get on the cloud compared to their dedicated hardware. The sticking point previously had been storage performance (or the lack thereof) on the existing cloud node and the relatively small size of the ephemeral storage available in the standard NeCTAR flavors. Fortunately R@CMon came online just in time and with enough flexibility to help them recompute their lost data. We created a special large flavor with 32 cores, 128 GB RAM and 2.4TB of ephemeral storage – a mini cluster-in-a-box.
Janis kindly shared a recent progress update:
Since restarting our indexing almost a fortnight ago, we’ve successfully indexed over 1.2 billion (1,226,707,169 at last count) tweets. This is almost double the count we achieved on our existing cluster, I’d guesstimate the overall index construction workload/performance as being around 130% that of the cluster. Faceted querying now works, and in a timely fashion.
So performance is great, but Janis also nicely sums up the key empowering aspect of the NeCTAR research cloud:
Beyond simple performance measures, I wanted to emphasise probably the most important thing: Unlike the existing indexing cluster, the NeCTAR instance has been treated very much as a turn-key project. We used the instance “as is” (with the exception of adding user accounts and an nginx based reverse proxy for security reasons), and haven’t performed any maintenance or configuration work. I can’t emphasise enough how much time that’s freed up for actually doing research – in commercial lingo, I’d say that’s improved efficiency greatly.
Thanks a bunch to Janis for taking the time to share the story. If you like the sound of this and think we might be able to help with your research computing please contact firstname.lastname@example.org.
I mentioned in the last post that the Monash NeCTAR node is using local direct-attached disk for instance ephemeral storage, but I forgot to expand on the caveats, so here goes:
It’s important to note that this means loss or failure of a hypervisor/compute-node could mean loss of the ephemeral storage of instances on that node at the time, depending on the severity of the failure. In many cases we’ll be able to restart and recover these instances later (of course memory state is gone), but how long that will take depends on the failure, so ephemeral storage availability in the face of hardware issues is unpredictable versus other NeCTAR nodes using shared storage who can (given available time and resources) more readily attempt recovery. However, this is cloud computing roughly mimicking the model pioneered by EC2, so really, just don’t rely on ephemeral storage to hold anything you can’t easily recompute or aren’t readily backing up to persistent storage – and go backup your laptop while you’re thinking about it!
Persistent block storage (aka volumes) will make an appearance on the Monash node soon (hopefully a generally available pre-production service before end of July). University of Melbourne should have theirs in production within a few weeks. So, stay tuned…
Two weeks ago the first phase of the Monash node of the NeCTAR RC (Research Cloud) went live to public (yes, this post is late!). If you’re a RC user you can now select the “monash” cell when launching instances. If you’re not already using the RC then what are you waiting for?! Get to http://www.nectar.org.au/research-cloud and spin up your first server.
You can launch instances to the Monash node via the Dashboard (see pic), through the OS API by specifying either a scheduler hint “cell=monash”, or via the EC2 API specifying availability-zone=”monash”. Those last two instructions are likely to evolve with future versions of the APIs so stay tuned to http://support.rc.nectar.org.au/support/faq.html (and search for “cells”) for current instructions.
A bit of background about Monash phase 1.
There’s about 2.3k cores available in the current deployment – phase 2 will more than double this. The hypervisors are running Ubuntu 12.04 LTS on AMD6234 CPUs with 4GB RAM per core.
What’s different from the other NeCTAR RC nodes?
The instance associated ephemeral storage is delivered by a local RAID10 on each hypervisor (as opposed to a pool of shared storage, e.g., NFS share) with live-migration possible thanks to KVM’s live block migration feature (anybody interested should note that block migration is replaced with live block streaming in the upstream QEMU code). This means ephemeral storage performance actually scales as you add instances, more or less predictably subject to activity from co-located guests. As OpenStack incorporates features to limit noisy neighbour syndrome we’ll be able to take full advantage of those too.
We pass host CPU information from the host through libvirt to the guests to give best performance possible to code that can make use of it.
Who built it?
Thanks goes to a big team both here at Monash across MeRC (the Monash eResearch Centre) and eSolutions (Monash’s central ICT provider), and also the folks at NeCTAR and the NeCTAR lead node at University of Melbourne. Of course it wouldn’t have been possible without OpenStack and the community of contributors around it, so thanks all!
We’re currently working on deploying the Garuda/Flint K3 system in the NeCTAR Research Cloud.
The Garuda platform provides fundamental technology to link software and knowledge in systems biology in a coherent manner.
Flint K3 is an online simulation platform that receives PHML and SBML models from PhysioDesigner, CellDesigner and other applications.
More about the Systems Biology Institute can be found on their website.
We’ve successfully engaged Bioplatforms Australia and CSIRO in utilising the NeCTAR Research Cloud infrastructure to deliver next-generation sequencing workshops in Monash University and University of New South Wales last July 9-10, 2012.
We’ve created a custom cloud image which contains the relevant training materials, datasets and software stack.The image has been uploaded into the NeCTAR cloud and used to instantiate multiple virtual machines for the hands-on workshop. The same image has also been made available for download for participants to try locally on their local machines.
Automation tools that integrates provisioning of virtual machines, software stack installation, dataset preparation have been created for easy re-deployment of resources for various workshop sites around Australia.
A manuscript entitled “Next Generation Sequencing (NGS): A challenge to meet the increasing demand for training workshops in Australia” has been recently accepted for publication in Briefings in Bioinformatics.
This article can also be found, published created commons here .
Welcome to the Research @ Cloud Monash Blog, The hub for cloud computing using the NeCTAR Research Cloud.