Tag Archives: HPC

2nd place in the SC21 Indy Student Cluster Competition

Six Monash University students have taken 2nd prize in the SuperComputing 2021 Indy Student Cluster Competition (IndySCC).

The IndySCC is a 48 hour contest where students run a range of benchmarking software (this year – HPL and HPCG), well established scientific applications (Gromacs, John The Ripper) and a mystery program (Devito), whilst also keeping power consumption to under 1.1KW. That’s right – even the most advanced digital research infrastructure has meaningful Net Zero aspirations!

The six students – the Student Cluster Team – are part of an undergraduate team called Deep Neuron. Deep Neuron itself is part of a larger group of Engineering Teams that offer a range of extra-curricular activities. DeepNeuron is focused on improving the world through the combination of Artificial Intelligence (AI) and High-Performance Computing (HPC).

“The experience of participating in such a well known competition and the opportunity to collaborate with different students and experts allowed us to learn valuable skills outside of our classroom. We feel privileged and would like to thank all the support from DeepNeuron, supervisors and the faculty”

Yusuke Miyashita, HPC Lead, Deep Neuron

This achievement is even more impressive given that the students have never physically met each other due to covid restrictions. Earlier this year, the students also entered the Asia Supercomputing Community 2021 virtual Student Cluster Competition (ASC21 SCC), where they won the Application Innovation Award (shared with Tsinghua University) for the fastest time for the Mystery Application. That team was led by Emily Trau, who also works as a casual at MeRC.

“Despite the COVID lockdown, the students from Monash University’s Deep Neuron have hit well above their weight, winning significant prizes in two prestigious International Student Cluster Competitions. Well done to all involved”

Simon Michnowicz, Monash HPC team

All teams in the competition were tasked with configuring a resource made available to them on the Chameleon Cloud for each benchmark. Chameleon is similar to the Nectar Research Cloud, in that it provides Infrastructure as a Service to researchers. However Chameleon focus is experiments in edge, cloud and HPC (experiments on the infrastructure itself). The Research Cloud focus is being a resource for, and the instigator of collaboration for all research disciplines. Where Chameleon and the Research Cloud and Monash are particularly similar is being the test bed for new hardware and software technologies pertinent to digital research infrastructure. For example, MASSIVE and Monash’s own MonARCH HPC are built on the Research Cloud.

It is formally the end of the competition. What a journey! You all did an excellent job and we are impressed by how smart, hard-working and dedicated all the teams were. You all deserve a round of applause”

IndySCC21 Chairs Aroua Gharbi and Darshan Sarojini

JohntheRipper cracking passwords

GROMACS simulation of a model membrane

More FLAIR to Fluid Mechanics via the Monash Research Cloud

Advanced research in engineering can often benefit from extra compute capacity. This is where a research-oriented computational cloud like R@CMon is very handy. We report on the use of cloud resources to augment the resources available for running large-scale fluid mechanics studies.

FLAIR (Fluids Laboratory for Aeronautical and Industrial Research), from the Department of Mechanical and Aerospace Engineering, Faculty of Engineering, has been conducting experimental and computational fluid mechanics research for over twenty years, focusing on fundamental fluid flow problems that impact the automotive, aeronautical, industrial and biomedical fields.

A key research focus in recent years has been understanding the wake dynamics of particles near walls. Particle-particle and wall-particle interactions were investigated using an in-house spectral-element numerical solver. Understanding these interactions is key in many engineering industries. When applied to biological engineering, blood cells / leukocytes are numerically modelled as canonical bluff bodies (i.e., as cylinders and spheres) and numerical computations are carried out. These simulations are not only useful in understanding biological cell transport but have wider applications in mineral processing, chemical engineering and applications in ball sports. Due to the computational and data-intensive nature of this research, it has always been a challenge to get access to sufficient computing resources for its needs.

In particular, their project aims to understand the wake dynamics on multiple particles in various scenarios such as rolling, collisions and vortex-induced vibrations; and the resultant mixing which occurs as a result of these interactions, etc. The group’s two- and three-dimensional fluid flow solver also incorporates two-way body dynamics to model these effects. As the studies involve multiple parameters such as Reynolds number, body rotation, height of the body above the wall, etc, the total parameter space is extensive, requiring significant computational resources. While the two-dimensional simulations are carried out on single processors, their three-dimensional counterparts require parallel processing, making NeCTAR nodes an ideal platform to run these computations. Some of the visualisations from the group’s three-dimensional simulations are shown in Figures 1 and 2 below.

Since 2008, the FLAIR team has been making good use of the Monash Campus Cluster (MCC), a high-performance/high-throughput heterogeneous system with over two thousand CPU cores. However, MCC is heavily utilised by researchers from across the university and FLAIR users often found themselves waiting long periods before they could run their fluid flow simulations. It became clear that FLAIR researchers needed additional computational resources.

R@CMON was able to secure a 160-core allocation to the FLAIR team, which increased valuable resources for the group. Now, thanks to both NeCTAR and MCC-R@CMon, over one million CPU hours distributed across 4,000 jobs were provided for the project’s CPU-intensive calculations.

This has resulted in a number of publications in the highest impact fluid mechanics journals, with several more in a pre-submission stage; for example:
  • Rao, A., Thompson, M.C., & Hourigan, K. (2016) “A universal three-dimensional instability of the wakes of two-dimensional bluff bodies.” Journal of Fluid Mechanics, 792, 50-66.
  • Rao, A., Radi, A., Leontini, J.S., Thompson, M.C., Sheridan, J., & Hourigan, K. (2015) “A review of rotating cylinder wake transitions.” Journal of Fluids and Structures, 53, 2–14.
  • Rao, A., Radi, A., Leontini, J.S., Thompson, M.C., Sheridan, J., & Hourigan, K. (2015) “The influence of a small upstream wire on transition in a rotating cylinder wake.” Journal of Fluid Mechanics (published online) 769 (R2), 1-12. DOI
  • Rao, A., Thompson, M.C., Leweke, T., & Hourigan, K. (2013) “The flow past a circular cylinder translating at different heights above a wall.” Journal of Fluids and Structures, 41, 9–21.
  • Rao, A., Passaggia, P.-Y., Bolnot, H., Thompson, M.C., Leweke, T., & Hourigan, K. (2012) “Transition to chaos in the wake of a rolling sphere.” Journal of Fluid Mechanics, 695, 135-148.

Figure
Figure

R@CMon announced as a Mellanox “HPC Center of Excellence”

At SuperComputing 2015 in Austin our network/fabric partner Mellanox announced R@CMon (Monash University) as a “HPC Centre of Excellence. A core goal of the HPC CoE is to drive the technological innovations required for the next generation (exascale) supercomputing, whilst also ensuring that such an exascale computer is relevant to modern research. R@CMon is a stand out pioneer at converging cloud, HPC and data, all of which are key to the “next generation”.

“We see Monash as a leader in Cloud and HPC on the Cloud with Openstack, Ceph and Lustre on our Ethernet CloudX platform.” Sudarshan Ramachandran, Regional Sales Manager, Australia & New Zealand

From a fabric innovation point of view, it has been a very productive and exciting 24months for R@CMon. By early 2014 the internal Monash University HPC system “MCC” was burst onto the Research Cloud, allowing a researcher’s own merit the be leveraged with institutional investment. It also represents a shift towards soft HPC, where the size of a HPC system changes regularly with time. Earlier this year we announced our early adoption of RoCE (RDMA over Converged Ethernet) using Mellanox technologies. The meant the same fabric used for cloud networking could also be used for HPC and data storage backplanes.  In turn MCC on the R@CMon also enabled RDMA communications, that is, real HPC performance but on an otherwise orchestrated cloud.

 

Finally at the Tokyo OpenSack summit 2015, Mellanox announced R@CMon as debuting the World’s first 100G End-to-End Cloud. This technology eases scaling and heterogeneity of performance aspects. In particular, it sets the basis for processor and storage performance for peak and converged cloud/HPC needs. Watch this space!

 

 

Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative

Yesterday Mellanox made the following press release – “Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative“. Through Monash University’s own co-investment into R@CMon, the Mellanox Cloudx products were chosen as the networking technology to Phase 2, providing RDMA capable networking within and between R@CMon Research Cloud and Data (RDSI) facilities. This means our one fabric can run multi-host MPI workloads, and leverage fast I/O storage, but also remain near the cost-point of commodity networking for the resources that are generic and commodity.

This is a key ingredient to the “21st Century Microscope”, where researchers orchestrate the instruments, compute, storage, analysis and visualisation themselves, looking down and tuning this 21st century lens, using big data and big computing to make new discoveries. R@CMon has been designed to be the platform where Australian researchers can lead the way at establishing their own 21st century microscope – for themselves and for their communities.

Once again Monash is leading platform technology innovation and accessibility by example. Through 2015 we look forward to optimising this technology, and encouraging increased self-service to these sorts of technologies.

 

Download (PDF, Unknown)

MCC-on-R@CMon Phase 2 – HPC on the cloud

Almost a year ago, the Monash HPC team embarked on a journey to extend the Monash Campus Cluster (MCC), the university’s internal heterogeneous HPC workhorse, onto R@CMon and the wider NeCTAR Australian Research Cloud. This is an ongoing collaborative effort between the R@CMon architects and tech-crew, and the MCC team, which has long-standing and strong engagements with the Monash research community. Recently, this journey has been further enriched by the close coordination with the MASSIVE team, which will enhance the sharing of technical artefacts and learnings between the two teams.

By September 2014, the MCC-on-the-Cloud has grown to over 600 cores, spanning across three nodes on the Australian Research Cloud. Its size was only limited because the Research Cloud was full and awaiting a wave of new infrastructure to be put in place. Nevertheless, Monash researchers from Engineering, Science, and FIT have collectively used over 850,000 CPU-core hours. Preferring the “MCC service”, they have offered their NeCTAR allocations to be managed by the MCC team, rather than building a cluster and installing the software stack by themselves. From the researchers’ perspective, this has the twofold benefit of providing a consistent user experience to that of the dedicated MCC and freeing them from the burden of managing cloud instances, software deployment, queue management, etc.

Deploying a usable high-performance/high-throughput computing (HPC/HTC) service on the cloud poses many challenges. Users expect a certain robustness and guaranteed service availability typical of traditional clusters. All this must be achieved despite the fluidity and heterogeneity of the cloud infrastructure and nuances in service offerings across the Research Cloud nodes. For example, one user reported that jobs were cancelled by the scheduler because they exceeded the specified wall time limits, and we subsequently discovered that some MCC “cloud” compute nodes were running on oversubscribed hosts (contrary to NeCTAR architecture guidelines). Nevertheless, we can declare that our efforts have paid off – MCC-on-the-cloud is now operating and delivering the reliable HPC/HTC computing service wrapped in the classic MCC look-and-feel that Monash researchers have come to depend on. Despite the many challenges, we are convinced that this is a good way to drive the federation forward.

Now with R@CMon Phase 2 coming online, we have taken a step closer towards realising this aim of “high-performance” computing on the cloud. Equipped with Intel Ivy Bridge Xeon processors, R@CMon Phase 2 hardware stands out amidst the cloud of commodity hardware on most other NeCTAR nodes. These specialist servers are already proving invaluable for floating-point intensive MPI applications. In production runs of a three-dimensional Spectral-Element method code, we observed performance of nearly double on these Xeons as compared to the AMD Opteron nodes across most of the rest of the cloud, even when hyper-threading is enabled. By pinning the guest vCPUs to a range of hyper-threaded cores on the host, we achieved a further 50% performance improvement; this is effectively over 2.6x improvement to the “commodity” AMD nodes. We look forward to implement this vCPU pinning feature once it is natively supported in OpenStack Juno, the RC’s next version.

Measured performance improvement with a production 3D Spectral Element code R@CMon Phase 1: AMD Opteron 6276 @ 2.3 GHz                 Phase 2: Intel Xeon E5-4620v2 @ 2.6 GHz

Measured performance improvement with a production 3D Spectral Element code
R@CMon Phase 1: AMD Opteron 6276 @ 2.3 GHz
Phase 2: Intel Xeon E5-4620v2 @ 2.6 GHz

Thus, our journey continues… Once RDMA (Remote Direct Memory Access) is enabled on Phase 2, accelerated networking will make it feasible to run large-scale, multi-host MPI workloads. Achieving this will take us even closer to a truly high-performance computing environment on the cloud. Look out for MCC science stories and infrastructure updates soon!