Archives

Rail Network Catastrophe Analysis on R@CMon

Monash University, through the Institute of Railway Technology (IRT), has been working on a research project with Vale S.A., a Brazilian multinational metals and mining corporation and one of the largest logistical operators in Brazil, to continuously monitor and assess the health of the Carajás Railroad Passenger Train (EFC) mixed-use rail network in Northern Brazil. This project will identify locations that produce “significant dynamic responses” with the aim for proactive maintenance to prevent catastrophic rail failure. As a part of this project, IRT researchers have been involved in (a) the analysis of the collected data and (b) the establishment of a database with visualisation capabilities that allows for the interrogation of the analysed data.
irt-vale-vis-01

GPU-powered DataMap analysis and visualisation on R@CMon.

Researchers use the DataMap analysis software for data interrogation and visualisation. DataMap is a Windows-based client-server tool that integrates data from various measurements and recording systems into a geographical map. Traditionally they have the software running on a commodity laptop with a dedicated GPU connecting to their database server. To scale to larger models, conduct more rigorous analysis and visualisation, as well as support remote collaboration, the system of tools needed to go beyond the laptop.
The R@CMon team supported IRT in deploying the software on the NeCTAR Research Cloud. The deployed instance runs on the Monash-licensed Windows flavours with GPU-passthrough to support DataMap’s DirectX requirements.
irt-vale-vis-02

GPU-powered DataMap analysis and visualisation on R@CMon.

Through the Research Cloud IRT researchers and Vale S.A. counterparts are able to collaborate for modelling, analysis and results using remote access to the GPU-enabled virtual machines.
“The assistance of R@CMon in providing virtual machines that have GPU support, has been instrumental in facilitating global collaboration between staff located at Vale S.A. (Brazil) and Monash University (Australia).”
Dr. Paul Reichl
Senior Research Engineer and Data Scientist
Institute of Railway Technology

3D Stellar Hydrodynamics Volume Rendering on R@CMon Phase 2

Simon Campbell, Research Fellow from the Faculty of Science, Monash University has been running large-scale 3D stellar hydrodynamics parallel calculations on the Magnus super computing facility at iVEC and Raijin, the national peak facility at NCI. These calculations aim to improve 1D modelling of the core helium burning (CHeB) phase of stars using a novel multi-dimensional fluid dynamics approach. The improved models will have significant impact on many fields of astronomy and astrophysics such as stellar population synthesis, galactic chemical evolution and interpretation of extragalactic objects.

The parallel calculations generate raw data dumps (heavy data) containing several scalar variables, which are pre-processed and converted into HDF5. A custom script is used to extract the metadata (light data) into XDMF format, a standard format used by HPC codes and recognised by various scientific visualisation applications like ParaView and VisIt. The stellar data are loaded into VisIt and visualised using volume rendering. Initial volume renderings were done on a modest dual core laptop using just low resolution models (200 x 200 x 100, 106 zones). It’s been identified that applying the same visualisation workflow on high resolution models (400 x 200 x 200, 107 zones and above), would require a parallel (MPI) build of VisIt running on a higher performance machine.

Snapshot of turbulent convection deep inside the core of a star, volume-rendered using parallel VisIt.

Snapshot of turbulent convection deep inside the core of a star that has a mass 8 times that of the Sun. Colours indicate gas moving at different velocities. Volume rendered in parallel using VisIt + MPI.

R@CMon Phase 2 to the rescue! The timely release of R@CMon Phase 2 provided the required computational grunt to perform these high resolution volume renderings. The new specialist kit in this release includes hypervisors housing 1TB of memory. The R@CMon team allocated a share (~50%, ~460GB of memory) on one of these high memory hypervisors to do the high resolution volume renderings. Persistent storage on R@CMon Phase 2 is also provided on the computational instance for ingestion of data from the supercomputing facilities and generation of processing and rendering results. VisIt has been rebuilt on the high-memory instance, this time with MPI capabilities and XDMF support.

Initial parallel volume rendering using 24 processes shows a ~10x speed-up. Medium (400 x 200 x 200, 107 zones) and high-resolution (800 x 400 x 400, 108 zones) plots are now being generated using the high-memory instance seamlessly, and an even higher resolution (1536 x 1024 x 1024, 109 zones) simulation is currently running on Magnus. The resulting datasets from this simulation, which are expected to be several hundred gigabytes in size, will then be put in the same parallel volume rendering workflow.

Stock Price Impact Models Study on R@CMon Phase 2

Paul Lajbcygier, Associate Professor from the Faculty of Business and Economics, Monash University is studying one of the important changes that affects the cost of trading in financial markets. This change relates to the effects of trading to prices, known as “price impact”, which is brought by wide propagation of algorithmic and high frequency trading and augmented by technological and computational advances. Professor Lajbcygier’s group has recently published new results supported by R@CMon infrastructure and application migration activities, providing new insights into the trading behaviour of so-called “Flash Boys“.

This study uses datasets licensed from Sirca and represents stocks in the S&P/ASX 200 index from year range 2000 to 2014. These datasets are pre-processed using Pentaho and later ingested into relational databases for detailed analysis using advanced queries. Two NeCTAR instances on R@CMon have been used initially in the early stages of the study. One of the instances is used as the processing engine where Pentaho and Microsoft Visual Studio 2012 are installed for pre-processing and post-processing tasks. The second instance is configured as the database server where the extraction queries are executed. Persistent volume storage is used to store reference datasets, pre-processed input files and extracted results. A VicNode merit application for research data storage allocation has been submitted to support the computational access to the preprocessed data supporting the analysis workflow running on the NeCTAR Research Cloud.

Ingestion of pre-processed data into the database running on the high-memory instance, for analysis.

Ingestion of pre-processed data into the database running on the high-memory instance, for analysis.

Initially econometric analyses were done on just the lowest two groups of stocks in the S&P/ASX 200 index. Some performance hiccups were encountered when processing higher frequency groups in the index – some of the extraction queries, which require a significant amount of memory, would not complete when run on the exponentially higher stock groups. The release of R@CMon Phase 2 provided the analysis workflow the capability to attack the higher stock groups using a high-memory instance, instantiated on the new “specialist” kit. Parallel extraction queries are now running on this instance (close to 100% utilisation) to traverse the remaining stock groups from year range 2000 to 2014.

A recent paper by Manh Pham, Huu Nhan Duong and Paul Lajbcygier, entitled, “A Comparison of the Forecasting Ability of Immediate Price Impact Models” has been accepted for the “1st Conference on Recent Developments in Financial Econometrics and Applications”. This paper highlights the results of the examination of the lowest two groups of the S&P/ASX 200 index, i.e., just the initial results. Future research and publications include examination of the upper group of the index based on the latest reference data as they come available and analysis of other price impact models.

This is an excellent example of novel research empowered by specialist infrastructure, and a clear win for a build-it-yourself cloud (you can’t get a 920GB instance from AWS). The researchers are able to use existing and well-understood computational methods, i.e., relational databases, but at much greater capacity than normally available. This has the effect of speeding up initial exploratory work and discovery. Future work may investigate the use of contemporary data-intensive frameworks such as Hadoop + Hive for even larger analyses.

This article can also be found, published created commons here 1.

VISIONET on R@CMon

VISIONET (Visualizing Transcriptomic Profiles Integrated with Overlapping Transcription Factor Networks) is a visualisation web service for cellular regulatory network studies. It’s been developed as a tool for creating human-readable visualisations of transcription factor networks from user’s microarray and ChiP-seq input data. VISIONET’s node-filtering feature provides a more human-readable large networks visualisation compared to CellDesigner and Cytoscape

Gata4_Tbx20

Gata4-Tbx20 transcription factor network.

R@CMon helped SBI Australia in porting the VISIONET web service into the NeCTAR Research Cloud, enabling rapid development and customisation. VISIONET’s .NET-based framework is now running on a Windows Server 2012 instance inside R@CMon, and it’s now using persistent storage (Volumes) for storing large generated network visualisations. VISIONET is now publicly available to biologists, and user traffic is expected to grow in the near future.

PROSPER on R@CMon

PROSPER (PROtease Specificity Prediction servER) is an integrated feature-based web server that provides prediction of novel substrates and their cleavage sites of 24 different protease families from primary sequences. PROSPER addresses the “substrate identification” problem to support understanding of protease biology and development of therapeutics targeting specific protease regulated pathways. 

Screen Shot 2014-05-09 at 10.40.53 am

PROSPER’s web server for protein sequence user input.

Query sequences in FASTA format are submitted to PROSPER using a simple web interface. PROSPER then uses a machine learning approach based on “support vector regression” to produce real-valued prediction of substrate cleave probability. After the prediction tasks have been performed, PROSPER provides users with a link to access their query sequence’s prediction results.

Screen Shot 2014-05-09 at 11.07.42 am

PROSPER’s result page showing colour-coded predicted cleavage sites.

The R@CMon team helped the Monash Bioinformatics Platform in migrating the PROSPER web server into the NeCTAR Research Cloud. PROSPER is now using persistent storage (Volumes) granted via VicNode computational storage allocation for its model database which currently contains 24 proteases. To date, PROSPER has served more than 6000 users from 68 countries for their own research and these numbers are expected to grow in the near future.

1pyy_A_0.pdb

DOUBLE MUTANT PBP2X T338A/M339F FROM STREPTOCOCCUS PNEUMONIAE STRAIN 2 R6 AT 2.4 A RESOLUTION

Future plans for PROSPER include tighter integration with high-performance computing (HPC) facilities and the NeCTAR Research Cloud to enable simultaneous sequence prediction analyses; increase the coverage of the model database from 24 to 50 proteases to support the wider protease biology community; and implementation of an online computational database of PROSPER-predicted novel substrates and cleavage sites in the whole human proteome. The later will facilitate the functional annotation of the complete human proteome and complement with ongoing efforts to characterise their functions.

The Proteome Browser on R@CMon

The Proteome Browser (TPB) is a web portal that integrates human protein data and information. It provides an up-to-date view of the proteome (the entire library of proteins that can be expressed by cells or organisms – like us!) across large gene sets to support human proteome characterisation as part of the Chromosome-centric Human Proteome Project (C-HPP). Pertinent genomic and protein data from multiple international biological databases are assembled by TPB in a searchable format supporting C-HPP’s global proteomics effort.

Screen Shot 2014-04-23 at 5.27.23 pm

TPB’s primary report of chromosome-ordered genes visualised using traffic light colour system.

TPB’s framework extracts biological data from numerous sources, maps it into the genome, and performs categorisation on the results based on quality and information content. The result (level of evidence) is presented by TPB using a simple point matrix coded by traffic light system (green – highly reliable evidence, yellow – reasonable evidence, red – some evidence is available or black – there is no available evidence). TPB uses hierarchical data types to group similar information from different experiment types.

Screen Shot 2014-04-23 at 4.06.09 pm

TPB’s summary report for the chosen chromosome.

TPB is supported by Monash University, Monash eResearch Centre (MeRC)Chromosome-centric Human Proteome Project (C-HPP)Australia/New Zealand Chromosome 7 Consortium and the Australian National Data Service (ANDS). Researchers are now using TPB in various proteomic-related discoveries.

The R@CMon cloud team recently provided assistance to migrate The Proteome Browser web service to be hosted on the Monash node of the NeCTAR Research Cloud. TPB is using persistent storage (Volumes) granted via a VicNode computational storage allocation to house its underlying database. TPB’s new home will ensure it has stable and scalable long-term hosting supported by the NeCTAR and RDSI federal research infrastructure programmes.

 

The MCC on R@CMon

The Monash Campus Cluster (MCC) is a heterogeneous high-performance  (HPC) and high-throughput computing (HTC) facility for conducting large-scale computationally-intensive simulations and analyses. With over 2,500 CPU cores across 230 servers of different CPU and memory configuration, the MCC is specifically designed to serve diverse computational workloads. In 2013, the MCC provided over 13 million CPU-core hours to over 300 Monash researchers.

During the past few months, we have been developing a software architecture to extend MCC’s computational resources into the NeCTAR Research Cloud. Users are presented with the MCC’s familiar batch queueing and software environment, so they can seamlessly execute compute jobs on either the legacy cluster nodes of MCC or the NeCTAR Research Cloud.

MCC-NeCTAR

Monash Campus Cluster, “bursting” into the NeCTAR Research Cloud.

We achieve this by integrating the NeCTAR virtual machines as compute bricks into the MCC batch queuing system, presently the Open Grid Scheduler. This provides users with:

    • a new nectar queue which consists of MCC on R@CMon compute bricks; and
    • the ability to pick a specific availability zone on the research cloud to run compute jobs on

Researchers not only get to burst their computational jobs into the research cloud seamlessly, they can leverage the unique properties of each of the cloud nodes (e.g., hardware and/or software capabilities).

$ qconf -sql
nectar
nectar-gaia
nectar-melbourne
nectar-monash
nectar-sa

Since December 2013, over 40,000 CPU hours worth of computational jobs have been executed into the NeCTAR queue of MCC. The resources used by MCC on R@CMon will be expanded with deployment of R@CMon Phase 2 to accommodate specialised computational workloads (e.g., high memory jobs). Monash researchers who have their own NeCTAR Allocations and want that allocation to be presented through MCC, can opt for their allocation be managed by the “MCC on R@CMon” project. This can be arranged by contacting the Monash eResearch Centre.