Tag Archives: HPC

More FLAIR to Fluid Mechanics via the Monash Research Cloud

Advanced research in engineering can often benefit from extra compute capacity. This is where a research-oriented computational cloud like R@CMon is very handy. We report on the use of cloud resources to augment the resources available for running large-scale fluid mechanics studies.

FLAIR (Fluids Laboratory for Aeronautical and Industrial Research), from the Department of Mechanical and Aerospace Engineering, Faculty of Engineering, has been conducting experimental and computational fluid mechanics research for over twenty years, focusing on fundamental fluid flow problems that impact the automotive, aeronautical, industrial and biomedical fields.

A key research focus in recent years has been understanding the wake dynamics of particles near walls. Particle-particle and wall-particle interactions were investigated using an in-house spectral-element numerical solver. Understanding these interactions is key in many engineering industries. When applied to biological engineering, blood cells / leukocytes are numerically modelled as canonical bluff bodies (i.e., as cylinders and spheres) and numerical computations are carried out. These simulations are not only useful in understanding biological cell transport but have wider applications in mineral processing, chemical engineering and applications in ball sports. Due to the computational and data-intensive nature of this research, it has always been a challenge to get access to sufficient computing resources for its needs.

In particular, their project aims to understand the wake dynamics on multiple particles in various scenarios such as rolling, collisions and vortex-induced vibrations; and the resultant mixing which occurs as a result of these interactions, etc. The group’s two- and three-dimensional fluid flow solver also incorporates two-way body dynamics to model these effects. As the studies involve multiple parameters such as Reynolds number, body rotation, height of the body above the wall, etc, the total parameter space is extensive, requiring significant computational resources. While the two-dimensional simulations are carried out on single processors, their three-dimensional counterparts require parallel processing, making NeCTAR nodes an ideal platform to run these computations. Some of the visualisations from the group’s three-dimensional simulations are shown in Figures 1 and 2 below.

Since 2008, the FLAIR team has been making good use of the Monash Campus Cluster (MCC), a high-performance/high-throughput heterogeneous system with over two thousand CPU cores. However, MCC is heavily utilised by researchers from across the university and FLAIR users often found themselves waiting long periods before they could run their fluid flow simulations. It became clear that FLAIR researchers needed additional computational resources.

R@CMON was able to secure a 160-core allocation to the FLAIR team, which increased valuable resources for the group. Now, thanks to both NeCTAR and MCC-R@CMon, over one million CPU hours distributed across 4,000 jobs were provided for the project’s CPU-intensive calculations.

This has resulted in a number of publications in the highest impact fluid mechanics journals, with several more in a pre-submission stage; for example:
  • Rao, A., Thompson, M.C., & Hourigan, K. (2016) “A universal three-dimensional instability of the wakes of two-dimensional bluff bodies.” Journal of Fluid Mechanics, 792, 50-66.
  • Rao, A., Radi, A., Leontini, J.S., Thompson, M.C., Sheridan, J., & Hourigan, K. (2015) “A review of rotating cylinder wake transitions.” Journal of Fluids and Structures, 53, 2–14.
  • Rao, A., Radi, A., Leontini, J.S., Thompson, M.C., Sheridan, J., & Hourigan, K. (2015) “The influence of a small upstream wire on transition in a rotating cylinder wake.” Journal of Fluid Mechanics (published online) 769 (R2), 1-12. DOI
  • Rao, A., Thompson, M.C., Leweke, T., & Hourigan, K. (2013) “The flow past a circular cylinder translating at different heights above a wall.” Journal of Fluids and Structures, 41, 9–21.
  • Rao, A., Passaggia, P.-Y., Bolnot, H., Thompson, M.C., Leweke, T., & Hourigan, K. (2012) “Transition to chaos in the wake of a rolling sphere.” Journal of Fluid Mechanics, 695, 135-148.

Figure
Figure

R@CMon announced as a Mellanox “HPC Center of Excellence”

At SuperComputing 2015 in Austin our network/fabric partner Mellanox announced R@CMon (Monash University) as a “HPC Centre of Excellence. A core goal of the HPC CoE is to drive the technological innovations required for the next generation (exascale) supercomputing, whilst also ensuring that such an exascale computer is relevant to modern research. R@CMon is a stand out pioneer at converging cloud, HPC and data, all of which are key to the “next generation”.

“We see Monash as a leader in Cloud and HPC on the Cloud with Openstack, Ceph and Lustre on our Ethernet CloudX platform.” Sudarshan Ramachandran, Regional Sales Manager, Australia & New Zealand

From a fabric innovation point of view, it has been a very productive and exciting 24months for R@CMon. By early 2014 the internal Monash University HPC system “MCC” was burst onto the Research Cloud, allowing a researcher’s own merit the be leveraged with institutional investment. It also represents a shift towards soft HPC, where the size of a HPC system changes regularly with time. Earlier this year we announced our early adoption of RoCE (RDMA over Converged Ethernet) using Mellanox technologies. The meant the same fabric used for cloud networking could also be used for HPC and data storage backplanes.  In turn MCC on the R@CMon also enabled RDMA communications, that is, real HPC performance but on an otherwise orchestrated cloud.

 

Finally at the Tokyo OpenSack summit 2015, Mellanox announced R@CMon as debuting the World’s first 100G End-to-End Cloud. This technology eases scaling and heterogeneity of performance aspects. In particular, it sets the basis for processor and storage performance for peak and converged cloud/HPC needs. Watch this space!

 

 

Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative

Yesterday Mellanox made the following press release – “Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative“. Through Monash University’s own co-investment into R@CMon, the Mellanox Cloudx products were chosen as the networking technology to Phase 2, providing RDMA capable networking within and between R@CMon Research Cloud and Data (RDSI) facilities. This means our one fabric can run multi-host MPI workloads, and leverage fast I/O storage, but also remain near the cost-point of commodity networking for the resources that are generic and commodity.

This is a key ingredient to the “21st Century Microscope”, where researchers orchestrate the instruments, compute, storage, analysis and visualisation themselves, looking down and tuning this 21st century lens, using big data and big computing to make new discoveries. R@CMon has been designed to be the platform where Australian researchers can lead the way at establishing their own 21st century microscope – for themselves and for their communities.

Once again Monash is leading platform technology innovation and accessibility by example. Through 2015 we look forward to optimising this technology, and encouraging increased self-service to these sorts of technologies.

 

Download (PDF, Unknown)

MCC-on-R@CMon Phase 2 – HPC on the cloud

Almost a year ago, the Monash HPC team embarked on a journey to extend the Monash Campus Cluster (MCC), the university’s internal heterogeneous HPC workhorse, onto R@CMon and the wider NeCTAR Australian Research Cloud. This is an ongoing collaborative effort between the R@CMon architects and tech-crew, and the MCC team, which has long-standing and strong engagements with the Monash research community. Recently, this journey has been further enriched by the close coordination with the MASSIVE team, which will enhance the sharing of technical artefacts and learnings between the two teams.

By September 2014, the MCC-on-the-Cloud has grown to over 600 cores, spanning across three nodes on the Australian Research Cloud. Its size was only limited because the Research Cloud was full and awaiting a wave of new infrastructure to be put in place. Nevertheless, Monash researchers from Engineering, Science, and FIT have collectively used over 850,000 CPU-core hours. Preferring the “MCC service”, they have offered their NeCTAR allocations to be managed by the MCC team, rather than building a cluster and installing the software stack by themselves. From the researchers’ perspective, this has the twofold benefit of providing a consistent user experience to that of the dedicated MCC and freeing them from the burden of managing cloud instances, software deployment, queue management, etc.

Deploying a usable high-performance/high-throughput computing (HPC/HTC) service on the cloud poses many challenges. Users expect a certain robustness and guaranteed service availability typical of traditional clusters. All this must be achieved despite the fluidity and heterogeneity of the cloud infrastructure and nuances in service offerings across the Research Cloud nodes. For example, one user reported that jobs were cancelled by the scheduler because they exceeded the specified wall time limits, and we subsequently discovered that some MCC “cloud” compute nodes were running on oversubscribed hosts (contrary to NeCTAR architecture guidelines). Nevertheless, we can declare that our efforts have paid off – MCC-on-the-cloud is now operating and delivering the reliable HPC/HTC computing service wrapped in the classic MCC look-and-feel that Monash researchers have come to depend on. Despite the many challenges, we are convinced that this is a good way to drive the federation forward.

Now with R@CMon Phase 2 coming online, we have taken a step closer towards realising this aim of “high-performance” computing on the cloud. Equipped with Intel Ivy Bridge Xeon processors, R@CMon Phase 2 hardware stands out amidst the cloud of commodity hardware on most other NeCTAR nodes. These specialist servers are already proving invaluable for floating-point intensive MPI applications. In production runs of a three-dimensional Spectral-Element method code, we observed performance of nearly double on these Xeons as compared to the AMD Opteron nodes across most of the rest of the cloud, even when hyper-threading is enabled. By pinning the guest vCPUs to a range of hyper-threaded cores on the host, we achieved a further 50% performance improvement; this is effectively over 2.6x improvement to the “commodity” AMD nodes. We look forward to implement this vCPU pinning feature once it is natively supported in OpenStack Juno, the RC’s next version.

Measured performance improvement with a production 3D Spectral Element code R@CMon Phase 1: AMD Opteron 6276 @ 2.3 GHz                 Phase 2: Intel Xeon E5-4620v2 @ 2.6 GHz

Measured performance improvement with a production 3D Spectral Element code
R@CMon Phase 1: AMD Opteron 6276 @ 2.3 GHz
Phase 2: Intel Xeon E5-4620v2 @ 2.6 GHz

Thus, our journey continues… Once RDMA (Remote Direct Memory Access) is enabled on Phase 2, accelerated networking will make it feasible to run large-scale, multi-host MPI workloads. Achieving this will take us even closer to a truly high-performance computing environment on the cloud. Look out for MCC science stories and infrastructure updates soon!

PROSPER on R@CMon

PROSPER (PROtease Specificity Prediction servER) is an integrated feature-based web server that provides prediction of novel substrates and their cleavage sites of 24 different protease families from primary sequences. PROSPER addresses the “substrate identification” problem to support understanding of protease biology and development of therapeutics targeting specific protease regulated pathways. 

Screen Shot 2014-05-09 at 10.40.53 am

PROSPER’s web server for protein sequence user input.

Query sequences in FASTA format are submitted to PROSPER using a simple web interface. PROSPER then uses a machine learning approach based on “support vector regression” to produce real-valued prediction of substrate cleave probability. After the prediction tasks have been performed, PROSPER provides users with a link to access their query sequence’s prediction results.

Screen Shot 2014-05-09 at 11.07.42 am

PROSPER’s result page showing colour-coded predicted cleavage sites.

The R@CMon team helped the Monash Bioinformatics Platform in migrating the PROSPER web server into the NeCTAR Research Cloud. PROSPER is now using persistent storage (Volumes) granted via VicNode computational storage allocation for its model database which currently contains 24 proteases. To date, PROSPER has served more than 6000 users from 68 countries for their own research and these numbers are expected to grow in the near future.

1pyy_A_0.pdb

DOUBLE MUTANT PBP2X T338A/M339F FROM STREPTOCOCCUS PNEUMONIAE STRAIN 2 R6 AT 2.4 A RESOLUTION

Future plans for PROSPER include tighter integration with high-performance computing (HPC) facilities and the NeCTAR Research Cloud to enable simultaneous sequence prediction analyses; increase the coverage of the model database from 24 to 50 proteases to support the wider protease biology community; and implementation of an online computational database of PROSPER-predicted novel substrates and cleavage sites in the whole human proteome. The later will facilitate the functional annotation of the complete human proteome and complement with ongoing efforts to characterise their functions.

The MCC on R@CMon

The Monash Campus Cluster (MCC) is a heterogeneous high-performance  (HPC) and high-throughput computing (HTC) facility for conducting large-scale computationally-intensive simulations and analyses. With over 2,500 CPU cores across 230 servers of different CPU and memory configuration, the MCC is specifically designed to serve diverse computational workloads. In 2013, the MCC provided over 13 million CPU-core hours to over 300 Monash researchers.

During the past few months, we have been developing a software architecture to extend MCC’s computational resources into the NeCTAR Research Cloud. Users are presented with the MCC’s familiar batch queueing and software environment, so they can seamlessly execute compute jobs on either the legacy cluster nodes of MCC or the NeCTAR Research Cloud.

MCC-NeCTAR

Monash Campus Cluster, “bursting” into the NeCTAR Research Cloud.

We achieve this by integrating the NeCTAR virtual machines as compute bricks into the MCC batch queuing system, presently the Open Grid Scheduler. This provides users with:

    • a new nectar queue which consists of MCC on R@CMon compute bricks; and
    • the ability to pick a specific availability zone on the research cloud to run compute jobs on

Researchers not only get to burst their computational jobs into the research cloud seamlessly, they can leverage the unique properties of each of the cloud nodes (e.g., hardware and/or software capabilities).

$ qconf -sql
nectar
nectar-gaia
nectar-melbourne
nectar-monash
nectar-sa

Since December 2013, over 40,000 CPU hours worth of computational jobs have been executed into the NeCTAR queue of MCC. The resources used by MCC on R@CMon will be expanded with deployment of R@CMon Phase 2 to accommodate specialised computational workloads (e.g., high memory jobs). Monash researchers who have their own NeCTAR Allocations and want that allocation to be presented through MCC, can opt for their allocation be managed by the “MCC on R@CMon” project. This can be arranged by contacting the Monash eResearch Centre.

Computational Resource Framework

Over the past year the Monash NeCTAR porting programme has worked with Griffith University eResearch developers on their Computational Resource Framework (CRF) project. We’re well overdue to promote their excellent work (and a little of ours), and as there will be a presentation on this at eResearch Australasia (A RESTful Web Service for High Performance Computing based on Nimrod/G), now seems a good time to blog about it!

hpcportal_login_sm

The CRF aims to address one of the long-standing issues in HPC, that is uptake by non-technical users. HPC is a domain with a well-entrenched command-line ethos, unfortunately this does alienate a large cohort of potential users, and that has negative implications for research productivity and collaboration. At Monash, our HPC specialists go to a great deal of effort to ease new users into the CLI (command line interface) environment, however, this is a labour-intensive process that doesn’t catch everybody and often leaves users reliant on individual consultants or team-members.

For some time portals have been the go-to panacea for democratising HPC and scientific computing, and there are some great systems being deployed on the RC, but they still seem to require a large amount of development effort to build and typically cater to a particular domain. Another common issue with “job-management” style portals (including Nimrod’s own ancient CGI beauty) is that they expose and delegate too much information and responsibility to the end user – typically an end user who doesn’t know, or want to know, about the intricacies of the various computational resources. Such mundanities as what credentials they need, which ones are actually working today, etc.

The CRF is different in this respect as it is not a domain-specific interface, instead the Griffith team have taken the approach of concentrating on a minimum set of functionality for some popular HPC and scientific computing applications. The user just inputs info relevant to the application and the CRF configuration determines where and how the job is run. Currently there are UIs for creating, submitting and managing NAMD, R and MATLAB jobs and sets thereof; AutoDock, Octave and Gaussian are in the works. They’ve put a bunch of effort into building out a web-service in front of Nimrod/G, that layer means their UIs are fairly self-contained bits of php that present an interface and translate inputs into a template Nimrod experiment. The web-service has also enabled them to build some other cool bits and pieces, like experiment submission and results over email.

hpcportal_namd_sm

Using Nimrod/G as the back-end resource meta-scheduler means the CRF can intermingle jobs over the local cluster and burst or overflow into the NeCTAR Research Cloud, and that’s the focus of the Monash & Griffith collaboration for this project. We’re now looking to implement scheduling functionality that will make it possible to put simple policies on resources, e.g., use my local cluster but if jobs don’t start within 30 mins then head to the cloud. There should be some updates following in that area after the eResearch conference!