Last April 30 to May 1, Associate Professor Paul Lajbcygier and Senior Lecturer Huu Nhan Duong from the Monash Business School organised a Financial Markets Workshop at Monash Caulfield Campus, bringing in a number of prominent Australian and international market microstructure researchers as well as high-profile high frequency traders and regulators from the US. The workshop covered several research topics such as “market design and quality”; “high frequency trading”; “volatility and liquidity modelling”; “short selling”; “stock market crashes”; “cryptocurrencies”; and the real effect of financial markets on corporate decisions. The R@CMon team has worked with Paul’s group for several years now, supporting their “big data analysis” workflows on the research cloud. Enabling them to crunch more data, which contributed in several high-impact publications, ARC grant submissions and attainment of a major SEED funding. The international financial workshop event marks the culmination of Paul’s groups accomplishments in high frequency trading research over the years and serves as foundation for future critical mass of research in financial markets. The R@CMon team will continue to support Paul’s group and the Department of Banking and Finance as they work on more high-impact research and in tackling various computational challenges that they may encounter along the journey.
Glycosylation is an ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes such as cellular communication, ligand recognition and subcellular recognition. It is estimated that greater than 50% of the entire human proteome is glycosylated. However, it is still a significant challenge to identify glycosylation sites, which requires expensive and laborious experimental research. Thus, bioinformatics approaches that can predict the glycan occupancy at specific sequons in protein sequences would be useful for understanding and utilising this very important PTM.
Dr. Jiangning Song from the Department of Biochemistry and Molecular Biology at Monash University and his collaborators have designed and developed a bioinformatics tool – GlycoMine-Struct for predicting glycosylation sites. GlycoMine-Struct is a comprehensive tool for the systematic in-silico identification of N-linked and O-linked glycosylation sites in the human proteome. Through R@CMon, a dedicated cloud project with computational and storage resources has been provisioned to develop and host the GlycoMine-Struct tool. The flexible and scalable R@CMon-powered development environment enabled rapid prototyping, testing and re-deployment of the tool.
GlycoMine-Struct is now a publicly accessible web service, available to the wider research community. Users can now easily submit protein structure input files in PDB (Protein Data Bank) format to perform sites prediction on GlycoMine-Struct. Since it went public, GlycoMine-Struct has been accessed and used by thousands of local and international users, and still growing. A scientific reports paper has been published, highlighting the collaborative work done to develop GlycoMine-Struct, as an essential bioinformatics tool for improving the prediction of human glycosylation sites. The R@CMon team is actively supporting the GlycoMine-Struct project as it continues to serve the research community and develop performance improvements.
At the start of 2017, the R@CMon team had its first user consultation with Dr. Sri Ramarathinam, a research fellow from the Immunproteomics Laboratory (Purcell Laboratory) at the School of Biomedical Sciences in Monash University. Sri and his group at the lab studies metabolomics compounds in various samples by conducting a “search” and “identification” process using a pipeline of analysis and visualisation tools. The lab has acquired the license to use the commercial XCMSPlus metabolomics platform from SCIEX on their workflow. XCMSPlus provides a powerful solution for analysis of untargeted metabolomics data in a stand-alone configuration, which will greatly increase the lab’s capacity to analyse more samples, with faster and easeful results generation and interpretation.
During the first engagement meeting with Sri and the lab, it’s been highlighted that a specialised hosting platform (with appropriate storage and computational capacity) would be required for XCMSPlus. XCMSPlus is distributed as stand-alone appliance (personal cloud) from the vendor. As an appliance, XCMSPlus has been optimised and packaged to be deployed on a single, multi-core and high-memory machine. An added minor complication is that this appliance was distributed in VMWare’s appliance format, which need to be translated into an OpenStack-friendly format. The R@CMon team provided the hosting platform required for XCMSPlus through the Monash node of the Nectar Research Cloud.
A dedicated Nectar project has been provisioned for the lab, which is now being used for hosting XCMSPlus. This project also has enough capacity for future expansion and new analysis platform deployments. The now R@CMon-hosted (and supported) XCMSPlus platform for the Immunproteomics Laboratory is the first custom XCMSPlus deployment in Australia. Due to being the first in Australia, there were some early minor issues encountered during its first test runs. These technical issues were eventually sorted out due to collaborative troubleshooting efforts from the R@CM team, the lab and the vendor. And after several months of usage, hundred of jobs submitted and processed by XCMSPlus, and counting, the lab is continuing to fully integrate it as part of their analysis workflow. The R@CMon team is actively engaging with the lab for supporting its adaption of XCMSPlus and planning for future analysis workflow expansions.
Measuring the changes in gene expressions levels and determining differential expressed genes during the processes of human immunodeficiency virus (HIV) infection, replication and latency is instrumental in further understanding HIV infections. These measurements or studies are vital in developing strategies for virus eradication from the human body. Dr. Chen Li, a research fellow from the Immunoproteomics Laboratory at Monash University has developed a novel compendium of comprehensive functional genes annotations from genes expressions and proteomics studies. The genes in the compendium have been carefully curated and shown to be differentially expressed during HIV infection, replication and latency.
The R@CMon team assisted with the deployment of the online database – HIVed on the Monash node of the NeCTAR Research Cloud. The system has been running on R@CMon and serving the public community for more than a year. HIVed is considered to be the first fully comprehensive database that combines datasets from a wide range of experimental studies that have been carefully curated using a variety of experimental conditions. The datasets are further enriched by integrating it with other public databases to provide additional annotations for each data points. The HIVed online database has been developed to facilitate the functional annotation and experimental hypothesis HIV related genes with an intuitive web interface which enables dynamic display or presentation of common threads across HIV latency and infection conditions and measurements. The work done for the development of HIVed has been recently published into Scientific Reports and the Immunoproteomics Laboratory has plans to incorporate new experimental studies and external annotations into the HIVed database as they become available.
Dr. Simon Clarke is a senior lecturer from the School of Mathematical Sciences at Monash University. He’s been granted permission by the Bureau of Meteorology to repackage its observational and forecasting Melbourne weather data for downstream analysis and visualisations. Weather data from the bureau is downloaded and processed at regular 10 minute intervals. Various metrics and visualisations are then computed using a custom-made MATLAB batch script developed in-house. The resulting output is then fed to a web server for public presentation with integrations to other external sites hosted by the bureau. The original weather server was housed on a “legacy” hosting platform, that had reached its end of life. The Melbourne weather server needed a new home.
The R@CMon team engaged with Simon to scope the various weather server’s hosting requirements. Aside from the traditional LAMP-style type of hosting required, the server also needed direct access to MATLAB’s batch mode functionality. A new R@CMon-hosted instance was deployed on the Monash node of the NeCTAR Research Cloud. With it, a standard LAMP stack was also installed and configured. A Monash University-licensed installation of MATLAB has been made available onto the new weather server, allowing the downstream analysis of the raw data from the bureau to be conducted.
The new Melbourne weather server is now publicly accessible and available across world. The regular live feed is serving the Australian and international community providing live Melbourne weather observations and forecasts. With the support of the R@CMon team, it will continue to do so for more years to come.
The Australian Bureau of Statistics (ABS) provides public access to internet activity data as “data cubes” under the catalog number “8153.0”. These statistics are derived from data provided by internet service providers (ISPs) and offer an estimate of the number of users (frequency) having access to a specific Internet technology such as ADSL. While this survey is adequate for general observations, the granularity is too coarse to assess the impact of internet access on Australian society and economic growth. The Geodata Server project led by Klaus Ackermann (Faculty of Business and Economics, Monash University) was created with an aim to provide significantly enhanced granularity on Internet usage, in both the temporal and spatial dimensions for Australia on a local government level and for other cities worldwide.
One of the main challenges in the project is the analysis of 1.5 trillion observations from the ABS data sets. The project requires high-performance and high-throughput computational resources to analyse this vast amount of data. Also, a reasonable amount of data storage space is vital for storing reference and computed data. Another major challenge is how to architect the analysis pipeline to fully utilise the available resources. Over the last 3 years, several iterations of the methodology as well as infrastructure setup have been developed and tested to optimise the analysis pipeline. The R@CMon team engaged with Klaus to address the various computational, storage and analysis requirements of the project. A dedicated NeCTAR project has been provisioned for Geodata Server, which includes the computational resources to be used on the Monash node of the NeCTAR Research Cloud. Computational storage was provisioned to the project via VicNode allocation scheme.
With the computational and storage resources in place, the project was able to progress with the development of the analysis pipeline based on various “big data” technologies. In coordination with the R@CMon team, several Hadoop distributions have been evaluated, namely, Cloudera, MapR and Hortonworks. The latter was chosen for its ease of installation and 100% open source commitment. The resulting cluster consists of 32 cores with 8TB of Hadoop Filesystem (HDFS) storage divided among 4 nodes. Tested configuration includes 16 cores and 2 nodes or 32 cores and 1 node. The data has been distributed into 2TB volume drives. The master node of the cluster has an extra large volume attached to store the raw (reference) data. To optimize the performance of the distributed HDFS, all loaded data is stored in compressed Lempel–Ziv–Oberhumer (LZO) format to reduce the burden on the network, that is shared among other tenants on the NeCTAR Research Cloud.
Through R@CMon, the Geodata Server project was able to successfully handle and curate trillions of IP-activity observational data and link these data accurately to its geo-location in single and multi-city models. Analysis tools were laid down as part of the pipeline from high-performance (HPC) processing on Monash’s supercomputers, to Hadoop-like type of data parallelisation in the research cloud. From this, preliminary observations suggest strong spatial-correlation and evidence of political boundaries discontinuities on IP activities, which suggests some cultural and/or institutional factors. Some of the models produced from this project are currently being curated in preparation for public release to the wider Australian research community. The models are actively being improved with additional IP statistical data from other cities in the world. As the data grows, the analysis pipeline, computational and storage requirements are expected to scale as well. The R@CMon team will continue to support the Geodata Server project to reach its next milestones.
Associate Professor Roger Pocock is the head of the Neuronal Development and Plasticity Laboratory at Monash University. Roger’s lab investigates the various fundamental mechanisms that factors in brain development using the Caenorhabditis elegans organism as a model system. Roger joined Monash University in 2014, bringing with him a comprehensive catalogue of worm strains data that has been carefully curated for years from his previous laboratory at the University of Copenhagen. The strains catalogue is held in a FileMaker database, that the laboratory members regularly update and query for current and new strains’ entries.
FileMaker (and its derivatives) is commercial software for creating custom applications for a variety target platforms (e.g. web, iPad, Windows, Mac). The worm strains catalogue from Roger’s lab was the first FileMaker-based database deployment on R@CMon. The R@CMon team were able to install and configure a fully-licensed and latest version of FileMaker Pro on the Monash node of the NeCTAR Research Cloud inside a dedicated tenancy (i.e. computational and storage resources) provisioned for the lab. The FileMaker software itself has been deployed on a Monash-licensed Windows Server instance, which has access to the latest system and security updates from Microsoft.
The FileMaker WebDirect feature has been enabled on the new server to allow easy access to the strains catalogue from standard web browsers via internet, without any need for additional programming or software installation on the user’s client machine. Proper HTTPS have been prepared and enabled on new the WebDirect interface. Since then, and with the ongoing support of R@CMon, the catalogue has grown to include external collaborators’ models that are derived from other strains.