Tag Archives: All Stories

MaxQuant Proteomic Searches on R@CMon

David Stroud, NHMRC Doherty Fellow and member of the Ryan Lab from the Department of Biochemistry and Molecular Biology, Monash University does proteomics research and uses the MaxQuant quantitative proteomics software as part of his analysis workflows. MaxQuant is designed for processing high-resolution Mass Spectrometry data and is freely available on the Microsoft Windows platform. Step one in the workflow is to do sample analyses using Liquid chromatography-mass spectrometry (LC-MS) on a Thermo Orbitrap Mass-spectrometer. This step produces raw files containing spectra that represent thousands of peptides. The resulting raw files are then loaded into MaxQuant to perform searches where spectra are compared against known list of peptides. A quantification step is then performed enabling peptide abundance to be compared across samples. Once this process is completed, the resulting tab delimited files are captured for downstream analysis.

Inspection of results using the MaxQuant software.

MaxQuant searches are both CPU and IO intensive tasks. A typical search takes 24 to 48 hours, and in some cases up to a week, depending on the size of the raw files being processed. David has been running his workflow on his own machine with 8 cores, 16 gigabytes of memory (RAM) and a solid state drive (SSD) for storage where a standard search takes 2 to 3 weeks to complete. Performing large MaxQuant searches on the local machine became a struggle, and David needed a bigger machine with a desktop environment to scale up his analysis workflow. The R@CMon team assisted David in deploying the MaxQuant software on the Monash node of the NeCTAR Research Cloud with an m1.xxlarge instance, spawned using the Monash-licensed Windows Server 2012 image. MaxQuant searches on the NeCTAR instance shows a 3-4x speed-up compared to the local machine, what takes several weeks on the local machine now just takes several days on the NeCTAR instance.

Maxquant search of Thermo RAW files.

Maxquant search of Thermo RAW files.

The R@CMon team are currently working with David to explore further scaling options. The high-memory and PCIe SSD-enabled specialist kit on R@CMon Phase 2 can be exploited by MaxQuant for bursting IO intensive activities during searches. More on this coming soon!

VISIONET on R@CMon (Update)

Back in early 2014, the R@CMon team assisted SBI Australia to deploy the VISIONET (Visualizing Transcriptomic Profiles Integrated with Overlapping Transcription Factor Networks) visualisation web service on the Monash node of the NeCTAR Research Cloud. Since then, VISIONET has been further enhanced to support more complex transcription factor network topologies. To date, VISIONET has been published in two papers.

Nim, H.T., Boyd, S.E., and Rosenthal, N.A. (2014). Systems approaches in integrative cardiac biology: Illustrations from cardiac heterocellular signalling studies. Progress in Biophysics and Molecular Biology 117, 69-77.

Nim, H.T., Furtado, M.E., Costa, M.W., Rosenthal, N.A, Kitano, H., and Boyd, S.E.. (2015). VISIONET: intuitive visualisation of overlapping transcription factor networks, with applications in cardiogenic gene discovery. BMC Bioinformatics.

The R@CMon team will continue supporting SBI Australia with its plan to further develop the VISIONET web service this year.

Stock Price Impact Models Study on R@CMon Phase 2 (Update)

A mere six months ago Paul Lajbcygier and his research group used R@CMon Phase 2 “specialist kit” for processing and analysing higher frequency stock data, as part of their stock price impact models study. Since then, they’ve been running extraction queries continuously and recently published a paper highlighting their latest findings while acknowledging the NeCTAR Research Cloud infrastructure.

Lajbcygier, P., Sojka, J. (2015). The Viability of Alternative Indexation when including all Costs”, International Review of Financial Analysis

The group will continue to use the high-memory instance on R@CMon Phase 2 as they progress their research pipeline and the R@CMon team will continue to support them on their journey.

“I expect that over the coming months we will fully utilise the generous resources on the Monash node of the NeCTAR  Research Cloud as we extend our research into this cutting edge and exciting data intensive topic.”

Associate Professor Paul Lajbcygier
Faculty of Business and Economics
Department of Accounting and Finance
Department of Banking and Finance
Monash University

Rail Network Catastrophe Analysis on R@CMon

Monash University, through the Institute of Railway Technology (IRT), has been working on a research project with Vale S.A., a Brazilian multinational metals and mining corporation and one of the largest logistical operators in Brazil, to continuously monitor and assess the health of the Carajás Railroad Passenger Train (EFC) mixed-use rail network in Northern Brazil. This project will identify locations that produce “significant dynamic responses” with the aim for proactive maintenance to prevent catastrophic rail failure. As a part of this project, IRT researchers have been involved in (a) the analysis of the collected data and (b) the establishment of a database with visualisation capabilities that allows for the interrogation of the analysed data.
irt-vale-vis-01

GPU-powered DataMap analysis and visualisation on R@CMon.

Researchers use the DataMap analysis software for data interrogation and visualisation. DataMap is a Windows-based client-server tool that integrates data from various measurements and recording systems into a geographical map. Traditionally they have the software running on a commodity laptop with a dedicated GPU connecting to their database server. To scale to larger models, conduct more rigorous analysis and visualisation, as well as support remote collaboration, the system of tools needed to go beyond the laptop.
The R@CMon team supported IRT in deploying the software on the NeCTAR Research Cloud. The deployed instance runs on the Monash-licensed Windows flavours with GPU-passthrough to support DataMap’s DirectX requirements.
irt-vale-vis-02

GPU-powered DataMap analysis and visualisation on R@CMon.

Through the Research Cloud IRT researchers and Vale S.A. counterparts are able to collaborate for modelling, analysis and results using remote access to the GPU-enabled virtual machines.
“The assistance of R@CMon in providing virtual machines that have GPU support, has been instrumental in facilitating global collaboration between staff located at Vale S.A. (Brazil) and Monash University (Australia).”
Dr. Paul Reichl
Senior Research Engineer and Data Scientist
Institute of Railway Technology

Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative

Yesterday Mellanox made the following press release – “Australia’s Largest University Selects Mellanox CloudX Platform and Open Ethernet Switch Systems for Nationwide Research Initiative“. Through Monash University’s own co-investment into R@CMon, the Mellanox Cloudx products were chosen as the networking technology to Phase 2, providing RDMA capable networking within and between R@CMon Research Cloud and Data (RDSI) facilities. This means our one fabric can run multi-host MPI workloads, and leverage fast I/O storage, but also remain near the cost-point of commodity networking for the resources that are generic and commodity.

This is a key ingredient to the “21st Century Microscope”, where researchers orchestrate the instruments, compute, storage, analysis and visualisation themselves, looking down and tuning this 21st century lens, using big data and big computing to make new discoveries. R@CMon has been designed to be the platform where Australian researchers can lead the way at establishing their own 21st century microscope – for themselves and for their communities.

Once again Monash is leading platform technology innovation and accessibility by example. Through 2015 we look forward to optimising this technology, and encouraging increased self-service to these sorts of technologies.

 

Download (PDF, Unknown)

The CVL on R@CMon Phase 2

Monash is home to the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE), a national facility for the imaging and characterisation community. An important and rather novel feature of the MASSIVE compute cluster is the interactive desktop visualisation environment available to assist users in the characterisation process. The MASSIVE desktop environment provided part of the inspiration for the Characterisation Virtual Laboratory (CVL), a NeCTAR VL project combining specialist software visualisation and rendering tools from a variety of disciplines and making them available on and through the NeCTAR research cloud.

The recently released monash-02 zone of the NeCTAR cloud provides enhanced capability to the CVL, bringing a critical mass of GPU accelerated cloud instances. monash-02 includes ten GPU capable hypervisors, currently able to provide up to thirty GPU accelerated instances via direct PCI passthrough. Most of these are NVIDIA GRID K2 GPUs (CUDA 3.0 capable), though we also have one K1. Special thanks to NVIDIA for providing us with a couple of seed units to get this going and supplement our capacity! After consultation with various users we created the following set of flavors/instance-types for these GPUs:

Flavor name#vcoresRAM (MB)/dev/vda (GB)/dev/vdb (GB)
mon.r2.5.gpu-k21540030N/A
mon.r2.10.gpu-k22108003040
mon.r2.21.gpu-k242170030160
mon.r2.63.gpu-k2126500030320
mon.r2.5.gpu-k11540030N/A
mon.r2.10.gpu-k12108003040
mon.r2.21.gpu-k142170030160
mon.r2.63.gpu-k1126500030320

R@CMon has so far dedicated two of these GPU nodes to the CVL, and this is our preferred method for use of this equipment, as the CVL provides a managed environment and queuing system for access (regular plain IaaS usage is available where needed). There were some initial hiccups getting the CVL’s base CentOS 6.6 image working with NVIDIA drivers on these nodes, solved by moving to a newer kernel, and some performance tuning tasks still remain. However, the CVL has now been updated to make use of the new GPU flavors on monash-02, as demonstrated in the following video…

GPU-accelerated Chimera application running on the CVL, showing the structure of human follicle-stimulating hormone (FSH) and its receptor.

If you’re interested in using GPGPUs on the cloud please contact the R@CMon team or Monash eResearch Centre.

3D Stellar Hydrodynamics Volume Rendering on R@CMon Phase 2

Simon Campbell, Research Fellow from the Faculty of Science, Monash University has been running large-scale 3D stellar hydrodynamics parallel calculations on the Magnus super computing facility at iVEC and Raijin, the national peak facility at NCI. These calculations aim to improve 1D modelling of the core helium burning (CHeB) phase of stars using a novel multi-dimensional fluid dynamics approach. The improved models will have significant impact on many fields of astronomy and astrophysics such as stellar population synthesis, galactic chemical evolution and interpretation of extragalactic objects.

The parallel calculations generate raw data dumps (heavy data) containing several scalar variables, which are pre-processed and converted into HDF5. A custom script is used to extract the metadata (light data) into XDMF format, a standard format used by HPC codes and recognised by various scientific visualisation applications like ParaView and VisIt. The stellar data are loaded into VisIt and visualised using volume rendering. Initial volume renderings were done on a modest dual core laptop using just low resolution models (200 x 200 x 100, 106 zones). It’s been identified that applying the same visualisation workflow on high resolution models (400 x 200 x 200, 107 zones and above), would require a parallel (MPI) build of VisIt running on a higher performance machine.

Snapshot of turbulent convection deep inside the core of a star, volume-rendered using parallel VisIt.

Snapshot of turbulent convection deep inside the core of a star that has a mass 8 times that of the Sun. Colours indicate gas moving at different velocities. Volume rendered in parallel using VisIt + MPI.

R@CMon Phase 2 to the rescue! The timely release of R@CMon Phase 2 provided the required computational grunt to perform these high resolution volume renderings. The new specialist kit in this release includes hypervisors housing 1TB of memory. The R@CMon team allocated a share (~50%, ~460GB of memory) on one of these high memory hypervisors to do the high resolution volume renderings. Persistent storage on R@CMon Phase 2 is also provided on the computational instance for ingestion of data from the supercomputing facilities and generation of processing and rendering results. VisIt has been rebuilt on the high-memory instance, this time with MPI capabilities and XDMF support.

Initial parallel volume rendering using 24 processes shows a ~10x speed-up. Medium (400 x 200 x 200, 107 zones) and high-resolution (800 x 400 x 400, 108 zones) plots are now being generated using the high-memory instance seamlessly, and an even higher resolution (1536 x 1024 x 1024, 109 zones) simulation is currently running on Magnus. The resulting datasets from this simulation, which are expected to be several hundred gigabytes in size, will then be put in the same parallel volume rendering workflow.