Ceph Days are a series of regular events in support of the Ceph open source community. They now occur at locations all around the world. In November, R@CMon hosted Australia’s first Ceph Day. The day hosted 70-odd guests, many of which were from interstate and a few from overseas. There participants were from the research sector, private industry and ICT providers. It was a fantastic culmination of Australia’s growing Ceph community.
If you don’t already know, Ceph is basically an open-source technology for software-defined cluster-based storage. It means our storage backend is essentially infinitely scalable, and our focus can shift to the access mechanisms for data.
Check out the promo:
R@CMon has pioneered the adoption of Ceph for accessible research data storage and at mid-2013 was the first NeCTAR Research Cloud node to provide un-throttled volume storage. R@CMon has also worked closely with was InkTank and now Redhat to develop the support model for such an enterprise (see Ceph Enterprise – a disruptive period in the storage marketplace).
The day began with the Ceph Community Director – Patrick McGarry. His presentation included information about the upcoming expanded Ceph metrics platform, what the Ceph User Committee has been up to, new community infrastructure for a better contributor experience, and revised open source governance.
Undoubtedly the highlight of the day was the joint talk given by R@CMon’s very own director – Steve Quenette and technical lead – Blair Bethwaite. Here we explain Ceph in the context of the 21st century microscope – the tool each researcher creates to do modern day research. We also explain how we technically approached creating our fabric.
At SuperComputing 2015 in Austin our network/fabric partner Mellanox announced R@CMon (Monash University) as a “HPC Centre of Excellence“. A core goal of the HPC CoE is to drive the technological innovations required for the next generation (exascale) supercomputing, whilst also ensuring that such an exascale computer is relevant to modern research. R@CMon is a stand out pioneer at converging cloud, HPC and data, all of which are key to the “next generation”.
“We see Monash as a leader in Cloud and HPC on the Cloud with Openstack, Ceph and Lustre on our Ethernet CloudX platform.” Sudarshan Ramachandran, Regional Sales Manager, Australia & New Zealand
From a fabric innovation point of view, it has been a very productive and exciting 24months for R@CMon. By early 2014 the internal Monash University HPC system “MCC” was burst onto the Research Cloud, allowing a researcher’s own merit the be leveraged with institutional investment. It also represents a shift towards soft HPC, where the size of a HPC system changes regularly with time. Earlier this year we announced our early adoption of RoCE (RDMA over Converged Ethernet) using Mellanox technologies. The meant the same fabric used for cloud networking could also be used for HPC and data storage backplanes. In turn MCC on the R@CMon also enabled RDMA communications, that is, real HPC performance but on an otherwise orchestrated cloud.
Finally at the Tokyo OpenSack summit 2015, Mellanox announced R@CMon as debuting the World’s first 100G End-to-End Cloud. This technology eases scaling and heterogeneity of performance aspects. In particular, it sets the basis for processor and storage performance for peak and converged cloud/HPC needs. Watch this space!
Our journey towards R@CMon Storage (Storage-as-a-Service)…
In May 2013 R@CMon went live with an OpenStack cell within the NeCTAR (Australian) Research Cloud confederation. It was an innovation in its own right, targeting the commodity end of both the fundamental and translational research needs of Australia (see R@CMon IDC Spotlight – AMD & DELL). Our technical partner, Dell, has successfully applied the design pattern to many other subsequent Research Cloud nodes, and many other OpenStack based private cloud deployments both nationally and internationally. Shortly after the launch of this initial IaaS compute cell, we introduced Ceph based volume storage, becoming the first volume storage service on the Research Cloud, and in doing so, instigated a collaboration with InkTank (now Redhat). By November 2014 R@CMon launched the “Phase 2” Specialist IaaS cell, an “e”-resource motivated by research that pushes boundaries. Within this cell R@CMon added an RDMA-able interconnect to our storage and compute fabric, instigating an innovative technical collaboration with Mellanox.
Thus R@CMon is an environment to build what we call “21st Century Microscopes” – where researchers orchestrate the instruments, compute, storage, analysis and visualisation themselves, looking down and tuning this 21st century lens, using big data and big computing to make new discoveries.
And accordingly, R@CMon is an environment for innovative data services for the long-tail (if you like – more ICT like). Unashamedly – Our instances of Ceph is what we can “enterprise”, whilst each user or tenant has their own needs on file protocol, capacity and latency.
R@CMon Storage is a collection of storage access methods and underlying storage infrastructure products. Why do we present storage as both front-ends and infrastructure? Because most users want access methods – it should just work, but most microscope builders want infrastructure – it should be a building block. R@CMon Storage is also the Monash operating centre to VicNode – where we explain some of these products.
We now have a series of R@CMon Storage products and services available – ranging from infrastructure products, access methods and data management.
Yesterday Mellanox made the following press release – “Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative“. Through Monash University’s own co-investment into R@CMon, the Mellanox Cloudx products were chosen as the networking technology to Phase 2, providing RDMA capable networking within and between R@CMon Research Cloud and Data (RDSI) facilities. This means our one fabric can run multi-host MPI workloads, and leverage fast I/O storage, but also remain near the cost-point of commodity networking for the resources that are generic and commodity.
This is a key ingredient to the “21st Century Microscope”, where researchers orchestrate the instruments, compute, storage, analysis and visualisation themselves, looking down and tuning this 21st century lens, using big data and big computing to make new discoveries. R@CMon has been designed to be the platform where Australian researchers can lead the way at establishing their own 21st century microscope – for themselves and for their communities.
Once again Monash is leading platform technology innovation and accessibility by example. Through 2015 we look forward to optimising this technology, and encouraging increased self-service to these sorts of technologies.
Download (PDF, Unknown)
Back in 2012 our submission to NeCTAR planned R@CMon as being delivered in two phases. First a commodity phase, letting the ideals of en masse computing dominate technical choices. We have been operating phase 1 since May 2013. Our new specialist second phase went live in October! R@Cmon phase 2 (R@CMon RDC cell) scales out high-performing and accelerating hardware as driven by the demands of the precinct. Often ‘big data’ is just not possible without ‘big memory’ to hold the problem space without going to disk (x100 slower). Often ‘more memory’ is the barrier, not ‘more cores’. Often ‘I need to interact with a 3D model’. And so on. R@CMon is truly now a scalable and critical mass of self-service, on-demand computing infrastructure. It is also the play-pit where research leaders can build their own 21st century microscopes.
One of the four racks of NeCTAR monash-02. From top to bottom: Mellanox 56G switches, management switch, R820 compute nodes, R720 Ceph storage nodes
In addition to phase 1, phase 2 has –
- 2064 new Intel virtual cores
- 3 nodes with 1TB of RAM
- 10 nodes with GPUs for 3d desktops
- 3 nodes (the large memory ones) with high-performance PCIe SSD
- All standard compute nodes mix SAS & SSD for low-latency local ephemeral storage
- All nodes with RDMA (Remote Direct Memory Access – the stuff that makes fast, large-scale, multi-node HPC jobs possible) capable networking
As with phase 1, the entire infrastructure is orchestrated through OpenStack and presented on the Australian Research Cloud. R@CMon is once again pioneering research cloud infrastructure, virtualising all these specialist resources.
Over the next week we’ll blog with emerging examples of GPUs, SSDs and 1TB memory machines…
One of the specialist nodes – a quad-socket R820 with 1TB RAM and high-performance PCIe-attached flash