Category Archives: MeRC

Monash Macromolecular Crystallisation Facility Upgrade on R@CMon

The Monash Macromolecular Crystallisation Facility (MMCF), a Monash Technology Research Platform, was established in 2009 and is operated by the Structural Biology Unit at Monash University. The MMCF provides access to a fully automated platform for the high-throughput crystallisation of biological macromolecules. Macromolecular crystallography provides unparalleled details of 3D structure of biological macromolecules and provides the basis for the rational design of therapeutics. The MMCF is considered to be the largest Macromolecular Crystallisation Facility in the world.

The Monash Macromolecular Crystallisation Facility

The MMCF partnered with Formulatrix, Monash eSolutions and R@CMon to upgrade the facility’s IT infrastructure for the next-generation of crystallisation technology. The R@CMon team provisioned a custom Microsoft Windows-based infrastructure on the Monash node of the NeCTAR Research Cloud for hosting the platform’s new crystallisation and imaging system. An enterprise database has been configured and maintained by Monash eSolutions to support this new system. The facility’s networking infrastructure has been completely revamped by Monash eSolutions too. The R@CMon team worked with the vendor, Formulatrix to deploy the instrument’s software stack. A dedicated research data storage has been provisioned for the facility’s experiments and imaging data.

Protein Crystals Imaging in Action

Protein Crystals Imaging in Action

The R@CMon team and Monash eSolutions are working together to support the facility going forward. The Monash Macromolecular Crystallisation Facility (MMCF) project story first appeared on  Monash University’s The Insider.

MyTardis for Genomics

The Monash Bioinformatics Platform has recently partnered with MyTardis, the Characterisation Virtual Laboratory (CVL) and R@CMon to develop an automated, structured and managed the overall data management pipeline of sequencing results for the Monash Health Translational Precinct (MHTP) Medical Genomics Facility. The result is the MyTardis-Seq system, an extension to the well-estabilished MyTardis data management platform for Next-Generation Sequencing (NGS) data.

MyTardis with NGS Extension

MyTardis with NGS Extension

Information how MyTardis-Seq integrates with research data storage can be found here. A detailed architecture and workflow background can be found on the MyTardis page.

Big data mining market segmentation of ANZ Bank EFTPOS data

In Australia, the big 4 banks receive large amounts of Electronic Funds Transfer at Point of Sale (EFTPOS) transaction data on a daily basis, but despite this, this information-rich data are not stored nor analysed. The fact that EFTPOS data is both very large and very messy makes it difficult for banks themselves to gain visibility of the characteristics of the stakeholders of the data.

That changed in 2014, when a researcher in Monash’s Faculty of IT, Dr. Grace Rumantir, approached us for assistance in accessing/building a secure analysis environment for a data mining project on a collection of commercially sensitive EFTPOS data obtained through an award winning collaboration with the Australia and New Zealand Banking Group (ANZ). To our knowledge this is the first time market segmentation analyses have been applied to such a large amount of EFTPOS data anywhere in the world.

As a pilot, ANZ collated 5 months of EFTPOS transaction records, where all customer and retailer identifying data was redacted. Before this commercial in-confidence data could be released for research purposes, ANZ produced a list of comprehensive requirements pertaining to the secure storage and processing of the data. Securing the release of this data through ANZ Information Security protocol has been a lengthy and difficult process. The success was gained for the main part due to our team’s ability to demonstrate how we can very confidently meet these requirements with the infrastructure we have in place at Monash.

Our team very quickly built a workhorse but appropriately secure environment on R@CMon (specialist nodes due to the memory requirements for processing such a large dataset). The R@CMon environment already uses software defined virtualisation technology. We sandbox servers and R@CMon is housed in Monash’s own secure access facility. All ingress/egress access was locked down to allow only a few known clients (Grace and her research students). Remote desktop software and several data-mining tools of interest were configured for use by the researchers. The data (in daily csv samples) was stored in an encrypted volume file which was uploaded to a R@CMon volume attached to the analysis server. Individual passwords were used to unlock and mount the encrypted data, with a strict usage protocol to ensure the data remained locked when not in use. And so on.

A paper outlining our experience in acquiring, secured-storing and processing of the EFTPOS data can be found at:

Ashishkumar Singh, Grace Rumantir, Annie South, and Blair Bethwaite, Clustering Experiments on Big Transaction Data for Market Segmentation. In Proceedings of the 2014 International Conference on Big Data Science and Computing (BigDataScience ’14). ACM, New York, NY, USA, Article 16, DOI=http://dx.doi.org/10.1145/2640087.2644161

The market segmentation experiments on the retailers of the EFTPOS data involve reduction of the transaction data using the RFM (Recency, Frequency, Monetary) and clustering analysis with results indicating distinct combinations of RFM values of retailers in the clusters that could give the bank indications of different marketing strategies that can be applied to each of the retailer performance categories. This ground breaking revelation of the existence of retailer segments extracted from EFTPOS data has won Best Paper Award Industry Track at the Australasian Data Mining and Analytics Conference 2014.

Publication references:

Ashishkumar Singh, Grace Rumantir and Annie South, Market Segmentation of EFTPOS Retailers. In Proceedings of the 12th Australasian Data Mining Conference (AusDM 2014), Brisbane, Australia (http://ausdm14.ausdm.org/program) – Best Paper Award Industry Track

Ashishkumar Singh, Grace Rumantir. Two-tiered Clustering Classification Experiments for Market Segmentation of EFTPOS Retailers. Australasian Journal of Information Systems, [S.l.], v. 19, sep. 2015. ISSN 1449-8618. Available at: <http://journal.acs.org.au/index.php/ajis/article/view/1184>. Date accessed: 18 Oct. 2015. doi:http://dx.doi.org/10.3127/ajis.v19i0.1184.

This exciting result has been cited in the financial industry publications as an important example of how academia can help business gain insights into their own massive amount of data that can help them in making business decision.

On the success of this collaborative project, Patrick Maes, ANZ Chief Technology Officer, writes:

“The key here is to find the data scientists who can work with these models, a skill not easy to find nowadays”

(see http://www.itnews.com.au/news/me-bank-hires-data-boss-in-it-exec-restructure-411908 and https://bluenotes.anz.com/posts/2015/03/big-data-from-customer-targeting-to-customer-centric ).

On lessons learnt from this important pilot project, Dr. Grace Rumantir says:

“There is a long standing gap between what research in academia can offer and the needs in the industry. This gap takes the form of mistrust on the part of the people in the industry that academics may not deliver a solution that is relevant to their business on a timely manner. The results of this ground breaking project using EFTPOS data shows that we do understand what business needs and come up with a practical solution that business can directly translate into business strategies which can give them an edge in the competitive business environment.

We are able to do this with our ability to talk in the same wavelength with our industry clients, with our research skills in bleeding edge technology and with the support of the world class research support and infrastructure that Monash has been investing heavily on.”

 

Disruptive change in the clinical treatment of pancreatic cancer

Professor Jenkins’ research focuses on pancreatic cancer, an inflammation-associated cancer and the fourth most common cause of cancer death worldwide, with an extremely low 5% five-year survival rate. Typically studies look at gene expression patterns between normal pancreas and cancerous pancreas in order to identify unique signatures, which can be indicative of sensitivity or resistance to specific chemotherapeutic treatments.

“Using next generation gene sequencing, involving big instruments, big data and big computing – allows near-term disruptive change in the clinical treatment of pancreatic cancer.” Prof. Jenkins, Monash Health..

To date, gene expression studies have largely focused on samples taken from open surgical biopsy; a procedure known to be very invasive and only possible in 20% of pancreatic cancers. Prof Jenkins’ group, in collaboration with Dr Daniel Croagh from the Department of Upper Gastrointestinal and Hepatobiliary Surgery at Monash Medical Centre, recently trialled an alternative less invasive process available to nearly all pancreatic cancer patients known as endoscopic ultrasound-guided fine-needle aspirate (EUS-FNA) which uses a thin, hollow needle to collect the samples of cells from which genetic material can be extracted and analysed. The challenge then becomes to ensure gene sequencing from EUS-FNA samples is comparable to open surgical biopsy such that established analysis and treatment can be used.


Twenty-four EUS-FNA-derived genetic samples from normal and cancerous pancreas were sequenced at the MHTP Medical Genomics Facility producing a total amount of 40Gb of raw data. Those data were securely transferred onto R@CMon by the Monash Bioinformatics Platform for processing, statistical analysis and computational exploration using state-of-the-art Bioinformatics methods.

super_computer

Results thus far from this study show that data from EUS-FNA-derived samples were of high quality and also allowed the identification of gene expression signatures between normal and cancerous pancreas. Professor Jenkins’ group is now confident that EUS-FNA-derived material not only has the potential to capture nearly all of pancreatic cancer patients (compared to ~20% by surgery), but to also improve patient management and their treatment in the clinic.

“The current clinical genomics research space requires specialized high performance computational and storage infrastructure to support the processing and long term storage of those so-called “big data”. Thus R@CMon plays a major role in the discovery and development of new therapies and the improvement of Human health care in general.” Roxane Legaie, Senior Bioinformatician, Monash Bioinformatics Platform

 

The Digital Object Identifier (DOI) Minter on R@CMon

The Monash Digital Object Identifier (DOI) Minter was developed by the  ANDS-funded Monash University Major Open Data Collections (MODC) Project as an extendible service and deployed on the Monash node (R@CMon) of the NeCTAR Research Cloud for providing a persistent and unique identifier for datasets and research publications. A DOI is permanently assigned to datasets and publications to provide information about them, including where they or information about them can be found on the Internet. The DOI will not change even if information about the datasets changes over time.

Store.Synchrotron's data publishing form

Store.Synchrotron’s data publishing form using the Monash DOI minter service.

The Monash DOI Minter gives Monash University the ability to mint DOIs for data collections that are hosted and managed by services on R@CMon. The integration and accessibly to DOIs has never been easier. For instance the Monash Library can now use this service to mint DOIs for publicly accessible research collections.  But also it is now being utilised by the Australian Synchrotron’s Store.Synchrotron service, which manages data produced by the Macromolecular Crystallography (MX) beamline and streamlines DOI minting for datasets through a publication workflow.

Demo publication

A demo published collection on Store.Synchrotron.

An MX beamline user can now collect data on the beamline which is stored, archived and made accessible through the Store.Synchrotron service. When the researcher has publication quality data, a copy of this data is deposited in the Protein Data Bank (PDB), with the appropriate metadata. The new publication workflow allows researchers to publish data hosted by the Store.Synchrotron service, with PDB metadata being automatically attached to the datasets, and a DOI being minted and activated after a researcher-selected embargo period. The DOI reference can then be included in their research papers.

We think it is a brilliant pattern of play for accelerating persistent identifiers of research data held at universities. To this end, we have made the DOI Minter available for others to instantiate.