Tag Archives: Research Stories

Monash Data Science on R@CMon

Back in 2015, the Faculty of Information Technology at Monash University has started exploring various data science platforms that are easily available on the web. Many of its researchers including lecturers have used interactive Python and R notebooks on their own desktops and laptops for small and medium-size kind of problems. These interactive notebooks provides ease-of-use, portability and collaboration tools. Very useful features that the faculty decided to use them for teaching and have the software stack installed on the teaching labs computers. Data science courses can then be done on these labs where students run their analyses on the notebook instances running on each lab machines. For sometime, this setup has served the faculty’s teaching requirements really well, but as the number of students grow and more advanced and complex problems are tackled, it has become apparent, that a more scalable and highly-available data science platform is needed.

Training Dataset Visualisation in JupyterHub

The R@CMon team started a journey with the faculty’s staff to evaluate the already available data science platforms. The team first deployed SageMathCloud (SMC) on the Monash node of the NeCTAR Research Cloud and assessed it for a couple of months. SageMath and its cloud version – SageMathCloud (SMC) are open-source platforms for mathematical and scientific analyses. It provides a similar intuitive and interactive interface for running models and generating visualisations. The most attractive feature of SMC is that it’s been developed as a teaching platform from the outset, so various plugins for teacher-student interactions were already developed and available, for example: notebook sharing and marking. Although SMC is open-source,  the R@CMon team encountered various setup and deployment issues. The team was able to deploy a basic setup of SMC eventually with key features. The developers and maintainers of SMC have been consulted for support but didn’t at that time support private deployments. The next available data science platform was then assessed.

Samples Distribution Visualisation in JupyterHub

The team then moved on to evaluate IBM’s Data Science Workbench (DSW) platform. DSW is not open-sourced and cannot be deployed privately on the research cloud, but at that time, DSW had the requisite analytic  (e.g. Python, R) and collaboration features.  DSW was used by the faculty to deliver a number of teaching courses. However, after several rounds of teaching courses, licensing issues caused teachers and students to be unable login to DSW, as well as running notebooks crashing.  These issues led the faculty to resume the search for another data science platform.

Features Correlation Visualisation in JupyterHub

JupyterHub is a multi-user system for serving interactive notebooks. It provides a comprehensive documentation for various type of deployments and scaling options. Since its inception, JupyterHub has become mainstream in various teaching and research communities. For example, there were some early adopters of JupyterHub for education from UC Berkley. JupyterHub has been used also to provide a publicly accessible and re-runnable model in Nature. These early adopters inspired the R@CMon team and faculty staff to replicate their success stories in the then being developed online course of Graduate Diploma for Data Science.

R Classification Visualisation in JupyterHub

The R@CMon team deployed an instance of JupyterHub locally on the Monash node of the NeCTAR Research Cloud. The team then coordinated with the relevant lecturers for the configuration of various Python and R libraries (e.g numpy, scipy, ggplots, matplotlib) that will be used for the units. To support a more dynamic user management of JupyterHub, the R@CMon team has integrated it with the Monash User Directory service. This enabled easier addition and removal of users from the system, plus users can use their own Monash credentials to access JupyterHub and do their analysis. To date, and after ~2 years of usage, the R@CMon-hosted JupyterHub service has gone several rounds of teaching periods and served hundreds of students. The R@CMon team is actively engaging with the faculty for future directions in delivering new content (e.g. PySpark) and preparing for the next and more exciting forms of interactive analyses (e.g JupyterLab).

More FLAIR to Fluid Mechanics via the Monash Research Cloud

Advanced research in engineering can often benefit from extra compute capacity. This is where a research-oriented computational cloud like R@CMon is very handy. We report on the use of cloud resources to augment the resources available for running large-scale fluid mechanics studies.

FLAIR (Fluids Laboratory for Aeronautical and Industrial Research), from the Department of Mechanical and Aerospace Engineering, Faculty of Engineering, has been conducting experimental and computational fluid mechanics research for over twenty years, focusing on fundamental fluid flow problems that impact the automotive, aeronautical, industrial and biomedical fields.

A key research focus in recent years has been understanding the wake dynamics of particles near walls. Particle-particle and wall-particle interactions were investigated using an in-house spectral-element numerical solver. Understanding these interactions is key in many engineering industries. When applied to biological engineering, blood cells / leukocytes are numerically modelled as canonical bluff bodies (i.e., as cylinders and spheres) and numerical computations are carried out. These simulations are not only useful in understanding biological cell transport but have wider applications in mineral processing, chemical engineering and applications in ball sports. Due to the computational and data-intensive nature of this research, it has always been a challenge to get access to sufficient computing resources for its needs.

In particular, their project aims to understand the wake dynamics on multiple particles in various scenarios such as rolling, collisions and vortex-induced vibrations; and the resultant mixing which occurs as a result of these interactions, etc. The group’s two- and three-dimensional fluid flow solver also incorporates two-way body dynamics to model these effects. As the studies involve multiple parameters such as Reynolds number, body rotation, height of the body above the wall, etc, the total parameter space is extensive, requiring significant computational resources. While the two-dimensional simulations are carried out on single processors, their three-dimensional counterparts require parallel processing, making NeCTAR nodes an ideal platform to run these computations. Some of the visualisations from the group’s three-dimensional simulations are shown in Figures 1 and 2 below.

Since 2008, the FLAIR team has been making good use of the Monash Campus Cluster (MCC), a high-performance/high-throughput heterogeneous system with over two thousand CPU cores. However, MCC is heavily utilised by researchers from across the university and FLAIR users often found themselves waiting long periods before they could run their fluid flow simulations. It became clear that FLAIR researchers needed additional computational resources.

R@CMON was able to secure a 160-core allocation to the FLAIR team, which increased valuable resources for the group. Now, thanks to both NeCTAR and MCC-R@CMon, over one million CPU hours distributed across 4,000 jobs were provided for the project’s CPU-intensive calculations.

This has resulted in a number of publications in the highest impact fluid mechanics journals, with several more in a pre-submission stage; for example:
  • Rao, A., Thompson, M.C., & Hourigan, K. (2016) “A universal three-dimensional instability of the wakes of two-dimensional bluff bodies.” Journal of Fluid Mechanics, 792, 50-66.
  • Rao, A., Radi, A., Leontini, J.S., Thompson, M.C., Sheridan, J., & Hourigan, K. (2015) “A review of rotating cylinder wake transitions.” Journal of Fluids and Structures, 53, 2–14.
  • Rao, A., Radi, A., Leontini, J.S., Thompson, M.C., Sheridan, J., & Hourigan, K. (2015) “The influence of a small upstream wire on transition in a rotating cylinder wake.” Journal of Fluid Mechanics (published online) 769 (R2), 1-12. DOI
  • Rao, A., Thompson, M.C., Leweke, T., & Hourigan, K. (2013) “The flow past a circular cylinder translating at different heights above a wall.” Journal of Fluids and Structures, 41, 9–21.
  • Rao, A., Passaggia, P.-Y., Bolnot, H., Thompson, M.C., Leweke, T., & Hourigan, K. (2012) “Transition to chaos in the wake of a rolling sphere.” Journal of Fluid Mechanics, 695, 135-148.

Figure
Figure

The Monash Country Lines Archive on R@CMon

The Monash Country Lines Archive (MCLA) is a collaborative project between the Monash Indigenous Centre (MIC), Faculty of Arts and the Faculty of Information Technology with a team of researchers, digital animators and students. The MCLA aims to support the indigenous Australian communities in the preservation of stories that combine their history, knowledge, poetry, songs, performance and language. MCLA began working with the Yanyuwa people of Borroloola, NT, creating a number of animations between 2007 and 2010. It was these animations that caught the attention of Dr Alan Finkel, the then Chancellor of Monash University. In 2011, the Alan and Elizabeth Finkel Foundation supported the project for a further five years.

Render from “Why We All Die” 2015 ©MCLA & Taungurung Dolodanin-dat Animation Group.

Since its foundation in 2011, the MCLA has produced nine short-form animated films ranging from four minutes to twenty-four minutes in length while working with the communities through every step of the animation process; script, storyboards, character and landscape concepts and construction, animation, rendering, sound and post-production. Initially, producing these animations was challenging due to their heavy computational requirements. The MCLA team didn’t have access to any dedicated render-farm resources as would be normal for a commercial animation studio, so all rendering works were done on individual desktops and laptops. This resource limitation forced the MCLA team to compromise advanced rendering techniques in order to quickly render a large number of scenes while still maintaining a certain level of production quality.

Render from “Jibi the Giant Spirit Birds” 2013 ©MCLA & Nyamba Buru Yawuru.

Render frame from “Jibi the Giant Spirit Birds” 2013 ©MCLA & Nyamba Buru Yawuru.

In 2013, the MCLA team gained access to the NeCTAR Research Cloud, giving them a much needed rendering capacity boost. The R@CMon team assisted the MCLA in deploying and dynamically scaling their workflow into a distributed rendering workflow in the research cloud. Modelling, animation and rendering software have been licensed and configured on this virtual render farm. The farm has been configured so that MCLA can easily access it remotely to submit jobs and inspect their renders. The MCLA then started applying advanced rendering techniques in their workflow, techniques that weren’t possible on their previous setup. After several years of usage, demands for MCLA to produce more and more high quality visualisations also increased. This required the render farm to scale more, much more, and it did.

Render frame from “Janyju the Red Lizard” 2014 ©MCLA & Nyamba Buru Yawuru.

Render frame from “Janyju the Red Lizard” 2014 ©MCLA & Nyamba Buru Yawuru.

Access to the research cloud-backed render farm removed a huge limitation for the MCLA, inspiring them to produce more animations for the indigenous Australian communities without compromising on quality. The R@CMon team will continue to support the MCLA going forward and will be there when the time comes that the farm needs more power. The MCLA is composed of Dr John Bradley, Dr Shannon Faulkhead, Brent D McKee, Dr Tom Chandler and Chandara Ung.

Monash Macromolecular Crystallisation Facility Upgrade on R@CMon

The Monash Macromolecular Crystallisation Facility (MMCF), a Monash Technology Research Platform, was established in 2009 and is operated by the Structural Biology Unit at Monash University. The MMCF provides access to a fully automated platform for the high-throughput crystallisation of biological macromolecules. Macromolecular crystallography provides unparalleled details of 3D structure of biological macromolecules and provides the basis for the rational design of therapeutics. The MMCF is considered to be the largest Macromolecular Crystallisation Facility in the world.

The Monash Macromolecular Crystallisation Facility

The MMCF partnered with Formulatrix, Monash eSolutions and R@CMon to upgrade the facility’s IT infrastructure for the next-generation of crystallisation technology. The R@CMon team provisioned a custom Microsoft Windows-based infrastructure on the Monash node of the NeCTAR Research Cloud for hosting the platform’s new crystallisation and imaging system. An enterprise database has been configured and maintained by Monash eSolutions to support this new system. The facility’s networking infrastructure has been completely revamped by Monash eSolutions too. The R@CMon team worked with the vendor, Formulatrix to deploy the instrument’s software stack. A dedicated research data storage has been provisioned for the facility’s experiments and imaging data.

Protein Crystals Imaging in Action

Protein Crystals Imaging in Action

The R@CMon team and Monash eSolutions are working together to support the facility going forward. The Monash Macromolecular Crystallisation Facility (MMCF) project story first appeared on  Monash University’s The Insider.

Big data mining market segmentation of ANZ Bank EFTPOS data

In Australia, the big 4 banks receive large amounts of Electronic Funds Transfer at Point of Sale (EFTPOS) transaction data on a daily basis, but despite this, this information-rich data are not stored nor analysed. The fact that EFTPOS data is both very large and very messy makes it difficult for banks themselves to gain visibility of the characteristics of the stakeholders of the data.

That changed in 2014, when a researcher in Monash’s Faculty of IT, Dr. Grace Rumantir, approached us for assistance in accessing/building a secure analysis environment for a data mining project on a collection of commercially sensitive EFTPOS data obtained through an award winning collaboration with the Australia and New Zealand Banking Group (ANZ). To our knowledge this is the first time market segmentation analyses have been applied to such a large amount of EFTPOS data anywhere in the world.

As a pilot, ANZ collated 5 months of EFTPOS transaction records, where all customer and retailer identifying data was redacted. Before this commercial in-confidence data could be released for research purposes, ANZ produced a list of comprehensive requirements pertaining to the secure storage and processing of the data. Securing the release of this data through ANZ Information Security protocol has been a lengthy and difficult process. The success was gained for the main part due to our team’s ability to demonstrate how we can very confidently meet these requirements with the infrastructure we have in place at Monash.

Our team very quickly built a workhorse but appropriately secure environment on R@CMon (specialist nodes due to the memory requirements for processing such a large dataset). The R@CMon environment already uses software defined virtualisation technology. We sandbox servers and R@CMon is housed in Monash’s own secure access facility. All ingress/egress access was locked down to allow only a few known clients (Grace and her research students). Remote desktop software and several data-mining tools of interest were configured for use by the researchers. The data (in daily csv samples) was stored in an encrypted volume file which was uploaded to a R@CMon volume attached to the analysis server. Individual passwords were used to unlock and mount the encrypted data, with a strict usage protocol to ensure the data remained locked when not in use. And so on.

A paper outlining our experience in acquiring, secured-storing and processing of the EFTPOS data can be found at:

Ashishkumar Singh, Grace Rumantir, Annie South, and Blair Bethwaite, Clustering Experiments on Big Transaction Data for Market Segmentation. In Proceedings of the 2014 International Conference on Big Data Science and Computing (BigDataScience ’14). ACM, New York, NY, USA, Article 16, DOI=http://dx.doi.org/10.1145/2640087.2644161

The market segmentation experiments on the retailers of the EFTPOS data involve reduction of the transaction data using the RFM (Recency, Frequency, Monetary) and clustering analysis with results indicating distinct combinations of RFM values of retailers in the clusters that could give the bank indications of different marketing strategies that can be applied to each of the retailer performance categories. This ground breaking revelation of the existence of retailer segments extracted from EFTPOS data has won Best Paper Award Industry Track at the Australasian Data Mining and Analytics Conference 2014.

Publication references:

Ashishkumar Singh, Grace Rumantir and Annie South, Market Segmentation of EFTPOS Retailers. In Proceedings of the 12th Australasian Data Mining Conference (AusDM 2014), Brisbane, Australia (http://ausdm14.ausdm.org/program) – Best Paper Award Industry Track

Ashishkumar Singh, Grace Rumantir. Two-tiered Clustering Classification Experiments for Market Segmentation of EFTPOS Retailers. Australasian Journal of Information Systems, [S.l.], v. 19, sep. 2015. ISSN 1449-8618. Available at: <http://journal.acs.org.au/index.php/ajis/article/view/1184>. Date accessed: 18 Oct. 2015. doi:http://dx.doi.org/10.3127/ajis.v19i0.1184.

This exciting result has been cited in the financial industry publications as an important example of how academia can help business gain insights into their own massive amount of data that can help them in making business decision.

On the success of this collaborative project, Patrick Maes, ANZ Chief Technology Officer, writes:

“The key here is to find the data scientists who can work with these models, a skill not easy to find nowadays”

(see http://www.itnews.com.au/news/me-bank-hires-data-boss-in-it-exec-restructure-411908 and https://bluenotes.anz.com/posts/2015/03/big-data-from-customer-targeting-to-customer-centric ).

On lessons learnt from this important pilot project, Dr. Grace Rumantir says:

“There is a long standing gap between what research in academia can offer and the needs in the industry. This gap takes the form of mistrust on the part of the people in the industry that academics may not deliver a solution that is relevant to their business on a timely manner. The results of this ground breaking project using EFTPOS data shows that we do understand what business needs and come up with a practical solution that business can directly translate into business strategies which can give them an edge in the competitive business environment.

We are able to do this with our ability to talk in the same wavelength with our industry clients, with our research skills in bleeding edge technology and with the support of the world class research support and infrastructure that Monash has been investing heavily on.”