Category Archives: All Stories

Melbourne Weather Server on R@CMon

Dr. Simon Clarke is a senior lecturer from the School of Mathematical Sciences at Monash University. He’s been granted permission by the Bureau of Meteorology to repackage its observational and forecasting Melbourne weather data for downstream analysis and visualisations. Weather data from the bureau is downloaded and processed at regular 10 minute intervals. Various metrics and visualisations are then computed using a custom-made MATLAB batch script developed in-house. The resulting output is then fed to a web server for public presentation with integrations to other external sites hosted by the bureau. The original weather server was housed on a “legacy” hosting platform, that had reached its end of life. The Melbourne weather server needed a new home.

Melbourne Weather Server

The R@CMon team engaged with Simon to scope the various weather server’s hosting requirements. Aside from the traditional LAMP-style type of hosting required, the server also needed direct access to MATLAB’s batch mode functionality. A new R@CMon-hosted instance was deployed on the Monash node of the NeCTAR Research Cloud. With it, a standard LAMP stack was also installed and configured. A Monash University-licensed installation of MATLAB has been made available onto the new weather server, allowing the downstream analysis of the raw data from the bureau to be conducted.

Melbourne Weather Server Visits for 2017

The new Melbourne weather server is now publicly accessible and available across world. The regular live feed is serving the Australian and international community providing live Melbourne weather observations and forecasts. With the support of the R@CMon team, it will continue to do so for more years to come.

Geodata Server on R@CMon

The Australian Bureau of Statistics (ABS) provides public access to internet activity data as “data cubes” under the catalog number “8153.0”. These statistics are derived from data provided by  internet service providers (ISPs) and offer an estimate of the number of users (frequency) having access to a specific Internet technology such as ADSL. While this survey is adequate for general observations, the granularity is too coarse to assess the impact of internet access on Australian society and economic growth. The Geodata Server project led by Klaus Ackermann (Faculty of Business and Economics, Monash University) was created with an aim to provide significantly enhanced granularity on Internet usage, in both the temporal and spatial dimensions for Australia on a local government level and for other cities worldwide.

IPv4 Heatmap and Project Background, Ackermann, Angus & Raschky: Economics of Technology, Wombat 2016

One of the main challenges in the project is the analysis of 1.5 trillion observations from the ABS data sets. The project requires high-performance and high-throughput computational resources to analyse this vast amount of data. Also, a reasonable amount of data storage space is vital for storing reference and computed data. Another major challenge is how to architect the analysis pipeline to fully utilise the available resources. Over the last 3 years, several iterations of the methodology as well as infrastructure setup have been developed and tested to optimise the analysis pipeline. The R@CMon team engaged with Klaus to address the various computational, storage and analysis requirements of the project. A dedicated NeCTAR project has been provisioned for Geodata Server, which includes the computational resources to be used on the Monash node of the NeCTAR Research Cloud. Computational storage was provisioned to the project via VicNode allocation scheme.

Processing Workflow on R@CMon, Ackermann, Angus & Raschky: Economics of Technology, Wombat 2016

With the computational and storage resources in place, the project was able to progress with the development of the analysis pipeline based on various “big data” technologies. In coordination with the R@CMon team, several Hadoop distributions have been evaluated, namely, Cloudera, MapR and Hortonworks. The latter was chosen for its  ease of installation and 100% open source commitment. The resulting cluster consists of 32 cores with 8TB of Hadoop Filesystem (HDFS) storage divided among 4 nodes. Tested configuration includes 16 cores and 2 nodes or 32 cores and 1 node. The data has been distributed into 2TB volume drives. The master node of the cluster has an extra large volume attached to store the raw (reference) data. To optimize the performance of the distributed HDFS, all loaded data is stored in compressed Lempel–Ziv–Oberhumer (LZO) format to reduce the burden on the network, that is shared among other tenants on the NeCTAR Research Cloud.

Multi City Analysis, Ackermann, Angus & Raschky: Economics of Technology, Wombat 2016

Through R@CMon, the Geodata Server project was able to successfully handle and curate trillions of IP-activity observational data and link these data accurately to its geo-location in single and multi-city models. Analysis tools were laid down as part of the pipeline from high-performance (HPC) processing on Monash’s supercomputers, to Hadoop-like type of data parallelisation in the research cloud. From this, preliminary observations suggest strong spatial-correlation and evidence of political boundaries discontinuities on IP activities, which suggests some cultural and/or institutional factors. Some of the models produced from this project are currently being curated in preparation for public release to the wider Australian research community. The models are actively being improved with additional IP statistical data from other cities in the world. As the data grows, the analysis pipeline, computational and storage requirements are expected to scale as well. The R@CMon team will continue to support the Geodata Server project to reach its next milestones.

Worm Strains Catalogue on R@CMon

Associate Professor Roger Pocock is the head of the Neuronal Development and Plasticity Laboratory at Monash University. Roger’s lab investigates the various fundamental mechanisms that factors in brain development using the Caenorhabditis elegans organism as a model system. Roger joined Monash University in 2014, bringing with him a comprehensive catalogue of worm strains data that has been carefully curated for years from his previous laboratory at the University of Copenhagen. The strains catalogue is held in a FileMaker database, that the laboratory members regularly update and query for current and new strains’ entries.

C. elegans as a model system, Neuronal Development and Plasticity Laboratory

FileMaker (and its derivatives) is commercial software for creating custom applications for a variety target platforms (e.g. web, iPad, Windows, Mac). The worm strains catalogue from Roger’s lab was the first FileMaker-based database deployment on R@CMon. The R@CMon team were able to install and configure a fully-licensed and latest version of FileMaker Pro on the Monash node of the NeCTAR Research Cloud inside a dedicated tenancy (i.e. computational and storage resources) provisioned for the lab. The FileMaker software itself has been deployed on a Monash-licensed Windows Server instance, which has access to the latest system and security updates from Microsoft.

Worm Strains Catalogue Entry

The FileMaker WebDirect feature has been enabled on the new server to allow easy access to the strains catalogue from standard web browsers via internet, without any need for additional programming or software installation on the user’s client machine. Proper HTTPS have been prepared and enabled on new the WebDirect interface. Since then, and with the ongoing support of R@CMon, the catalogue has grown to include  external collaborators’ models that are derived from other strains.

Monash Connections Online

The Monash University Library’s Special Collections are a large and special compilation of various media like rare books, music and multimedia in various forms and languages such as Slavic, Asian, Yiddish and Jewish. These collections are considered among the most comprehensive in whole of Australasia. Hosted on legacy infrastructure, the original collection’s web presence has recently become a maintenance challenge for library administrators due to its legacy hardware and software stack. There was also a push in early 2016 to centralise the university’s data centre infrastructure, where the legacy collections platform is being hosted. This presented an opportunity for Monash University Library to to migrate the collections onto one of the latest community-supported public collections publishing platforms.   After evaluation the open-source, freely available   Omeka  LAMP stack was chosen for the new platform.

Monash Collections Online Main Page

The R@CMon team engaged the various library stakeholders to spin-up a test instance of Omeka on the Monash node of the NeCTAR Research Cloud. The team at the library tested the various hosting and publishing capabilities of Omeka, including installation and integration of custom themes and plugins (e.g multimedia playback plugins). After several consultations, demonstrations and rigorous testing between the R@CMon and library teams, the executive decision has been made for Omeka to be the new publishing and showcasing platform for the library’s special collections.

Monash Collections Online Tall Tales and True Exhibition

The R@CMon team deployed a highly-available instance of Omeka on the NeCTAR Research Cloud utilising the LAMP stack plus HAProxy. Through VicNode, a dedicated and accessible storage share has been provisioned for the collections, housing the various types of files and media for current and future public showcases and exhibitions. The newly minted Monash Collections Online platform has been officially unveiled at the start of 2017 and is now publicly available. The new platform is also regularly updated with new content by the library team. Since its release, the R@CMon team continues to support the new platform through standard and regular engagements with the Monash University Library.

The Monash Country Lines Archive on R@CMon

The Monash Country Lines Archive (MCLA) is a collaborative project between the Monash Indigenous Centre (MIC), Faculty of Arts and the Faculty of Information Technology with a team of researchers, digital animators and students. The MCLA aims to support the indigenous Australian communities in the preservation of stories that combine their history, knowledge, poetry, songs, performance and language. MCLA began working with the Yanyuwa people of Borroloola, NT, creating a number of animations between 2007 and 2010. It was these animations that caught the attention of Dr Alan Finkel, the then Chancellor of Monash University. In 2011, the Alan and Elizabeth Finkel Foundation supported the project for a further five years.

Render from “Why We All Die” 2015 ©MCLA & Taungurung Dolodanin-dat Animation Group.

Since its foundation in 2011, the MCLA has produced nine short-form animated films ranging from four minutes to twenty-four minutes in length while working with the communities through every step of the animation process; script, storyboards, character and landscape concepts and construction, animation, rendering, sound and post-production. Initially, producing these animations was challenging due to their heavy computational requirements. The MCLA team didn’t have access to any dedicated render-farm resources as would be normal for a commercial animation studio, so all rendering works were done on individual desktops and laptops. This resource limitation forced the MCLA team to compromise advanced rendering techniques in order to quickly render a large number of scenes while still maintaining a certain level of production quality.

Render from “Jibi the Giant Spirit Birds” 2013 ©MCLA & Nyamba Buru Yawuru.

Render frame from “Jibi the Giant Spirit Birds” 2013 ©MCLA & Nyamba Buru Yawuru.

In 2013, the MCLA team gained access to the NeCTAR Research Cloud, giving them a much needed rendering capacity boost. The R@CMon team assisted the MCLA in deploying and dynamically scaling their workflow into a distributed rendering workflow in the research cloud. Modelling, animation and rendering software have been licensed and configured on this virtual render farm. The farm has been configured so that MCLA can easily access it remotely to submit jobs and inspect their renders. The MCLA then started applying advanced rendering techniques in their workflow, techniques that weren’t possible on their previous setup. After several years of usage, demands for MCLA to produce more and more high quality visualisations also increased. This required the render farm to scale more, much more, and it did.

Render frame from “Janyju the Red Lizard” 2014 ©MCLA & Nyamba Buru Yawuru.

Render frame from “Janyju the Red Lizard” 2014 ©MCLA & Nyamba Buru Yawuru.

Access to the research cloud-backed render farm removed a huge limitation for the MCLA, inspiring them to produce more animations for the indigenous Australian communities without compromising on quality. The R@CMon team will continue to support the MCLA going forward and will be there when the time comes that the farm needs more power. The MCLA is composed of Dr John Bradley, Dr Shannon Faulkhead, Brent D McKee, Dr Tom Chandler and Chandara Ung.

This slideshow requires JavaScript.

Monash Macromolecular Crystallisation Facility Upgrade on R@CMon

The Monash Macromolecular Crystallisation Facility (MMCF), a Monash Technology Research Platform, was established in 2009 and is operated by the Structural Biology Unit at Monash University. The MMCF provides access to a fully automated platform for the high-throughput crystallisation of biological macromolecules. Macromolecular crystallography provides unparalleled details of 3D structure of biological macromolecules and provides the basis for the rational design of therapeutics. The MMCF is considered to be the largest Macromolecular Crystallisation Facility in the world.

The Monash Macromolecular Crystallisation Facility

The MMCF partnered with Formulatrix, Monash eSolutions and R@CMon to upgrade the facility’s IT infrastructure for the next-generation of crystallisation technology. The R@CMon team provisioned a custom Microsoft Windows-based infrastructure on the Monash node of the NeCTAR Research Cloud for hosting the platform’s new crystallisation and imaging system. An enterprise database has been configured and maintained by Monash eSolutions to support this new system. The facility’s networking infrastructure has been completely revamped by Monash eSolutions too. The R@CMon team worked with the vendor, Formulatrix to deploy the instrument’s software stack. A dedicated research data storage has been provisioned for the facility’s experiments and imaging data.

Protein Crystals Imaging in Action

Protein Crystals Imaging in Action

The R@CMon team and Monash eSolutions are working together to support the facility going forward. The Monash Macromolecular Crystallisation Facility (MMCF) project story first appeared on  Monash University’s The Insider.

Big data mining market segmentation of ANZ Bank EFTPOS data

In Australia, the big 4 banks receive large amounts of Electronic Funds Transfer at Point of Sale (EFTPOS) transaction data on a daily basis, but despite this, this information-rich data are not stored nor analysed. The fact that EFTPOS data is both very large and very messy makes it difficult for banks themselves to gain visibility of the characteristics of the stakeholders of the data.

That changed in 2014, when a researcher in Monash’s Faculty of IT, Dr. Grace Rumantir, approached us for assistance in accessing/building a secure analysis environment for a data mining project on a collection of commercially sensitive EFTPOS data obtained through an award winning collaboration with the Australia and New Zealand Banking Group (ANZ). To our knowledge this is the first time market segmentation analyses have been applied to such a large amount of EFTPOS data anywhere in the world.

As a pilot, ANZ collated 5 months of EFTPOS transaction records, where all customer and retailer identifying data was redacted. Before this commercial in-confidence data could be released for research purposes, ANZ produced a list of comprehensive requirements pertaining to the secure storage and processing of the data. Securing the release of this data through ANZ Information Security protocol has been a lengthy and difficult process. The success was gained for the main part due to our team’s ability to demonstrate how we can very confidently meet these requirements with the infrastructure we have in place at Monash.

Our team very quickly built a workhorse but appropriately secure environment on R@CMon (specialist nodes due to the memory requirements for processing such a large dataset). The R@CMon environment already uses software defined virtualisation technology. We sandbox servers and R@CMon is housed in Monash’s own secure access facility. All ingress/egress access was locked down to allow only a few known clients (Grace and her research students). Remote desktop software and several data-mining tools of interest were configured for use by the researchers. The data (in daily csv samples) was stored in an encrypted volume file which was uploaded to a R@CMon volume attached to the analysis server. Individual passwords were used to unlock and mount the encrypted data, with a strict usage protocol to ensure the data remained locked when not in use. And so on.

A paper outlining our experience in acquiring, secured-storing and processing of the EFTPOS data can be found at:

Ashishkumar Singh, Grace Rumantir, Annie South, and Blair Bethwaite, Clustering Experiments on Big Transaction Data for Market Segmentation. In Proceedings of the 2014 International Conference on Big Data Science and Computing (BigDataScience ’14). ACM, New York, NY, USA, Article 16, DOI=http://dx.doi.org/10.1145/2640087.2644161

The market segmentation experiments on the retailers of the EFTPOS data involve reduction of the transaction data using the RFM (Recency, Frequency, Monetary) and clustering analysis with results indicating distinct combinations of RFM values of retailers in the clusters that could give the bank indications of different marketing strategies that can be applied to each of the retailer performance categories. This ground breaking revelation of the existence of retailer segments extracted from EFTPOS data has won Best Paper Award Industry Track at the Australasian Data Mining and Analytics Conference 2014.

Publication references:

Ashishkumar Singh, Grace Rumantir and Annie South, Market Segmentation of EFTPOS Retailers. In Proceedings of the 12th Australasian Data Mining Conference (AusDM 2014), Brisbane, Australia (http://ausdm14.ausdm.org/program) – Best Paper Award Industry Track

Ashishkumar Singh, Grace Rumantir. Two-tiered Clustering Classification Experiments for Market Segmentation of EFTPOS Retailers. Australasian Journal of Information Systems, [S.l.], v. 19, sep. 2015. ISSN 1449-8618. Available at: <http://journal.acs.org.au/index.php/ajis/article/view/1184>. Date accessed: 18 Oct. 2015. doi:http://dx.doi.org/10.3127/ajis.v19i0.1184.

This exciting result has been cited in the financial industry publications as an important example of how academia can help business gain insights into their own massive amount of data that can help them in making business decision.

On the success of this collaborative project, Patrick Maes, ANZ Chief Technology Officer, writes:

“The key here is to find the data scientists who can work with these models, a skill not easy to find nowadays”

(see http://www.itnews.com.au/news/me-bank-hires-data-boss-in-it-exec-restructure-411908 and https://bluenotes.anz.com/posts/2015/03/big-data-from-customer-targeting-to-customer-centric ).

On lessons learnt from this important pilot project, Dr. Grace Rumantir says:

“There is a long standing gap between what research in academia can offer and the needs in the industry. This gap takes the form of mistrust on the part of the people in the industry that academics may not deliver a solution that is relevant to their business on a timely manner. The results of this ground breaking project using EFTPOS data shows that we do understand what business needs and come up with a practical solution that business can directly translate into business strategies which can give them an edge in the competitive business environment.

We are able to do this with our ability to talk in the same wavelength with our industry clients, with our research skills in bleeding edge technology and with the support of the world class research support and infrastructure that Monash has been investing heavily on.”