Tag Archives: Research Stories

PROSPER on R@CMon

PROSPER (PROtease Specificity Prediction servER) is an integrated feature-based web server that provides prediction of novel substrates and their cleavage sites of 24 different protease families from primary sequences. PROSPER addresses the “substrate identification” problem to support understanding of protease biology and development of therapeutics targeting specific protease regulated pathways. 

Screen Shot 2014-05-09 at 10.40.53 am

PROSPER’s web server for protein sequence user input.

Query sequences in FASTA format are submitted to PROSPER using a simple web interface. PROSPER then uses a machine learning approach based on “support vector regression” to produce real-valued prediction of substrate cleave probability. After the prediction tasks have been performed, PROSPER provides users with a link to access their query sequence’s prediction results.

Screen Shot 2014-05-09 at 11.07.42 am

PROSPER’s result page showing colour-coded predicted cleavage sites.

The R@CMon team helped the Monash Bioinformatics Platform in migrating the PROSPER web server into the NeCTAR Research Cloud. PROSPER is now using persistent storage (Volumes) granted via VicNode computational storage allocation for its model database which currently contains 24 proteases. To date, PROSPER has served more than 6000 users from 68 countries for their own research and these numbers are expected to grow in the near future.

1pyy_A_0.pdb

DOUBLE MUTANT PBP2X T338A/M339F FROM STREPTOCOCCUS PNEUMONIAE STRAIN 2 R6 AT 2.4 A RESOLUTION

Future plans for PROSPER include tighter integration with high-performance computing (HPC) facilities and the NeCTAR Research Cloud to enable simultaneous sequence prediction analyses; increase the coverage of the model database from 24 to 50 proteases to support the wider protease biology community; and implementation of an online computational database of PROSPER-predicted novel substrates and cleavage sites in the whole human proteome. The later will facilitate the functional annotation of the complete human proteome and complement with ongoing efforts to characterise their functions.

The Proteome Browser on R@CMon

The Proteome Browser (TPB) is a web portal that integrates human protein data and information. It provides an up-to-date view of the proteome (the entire library of proteins that can be expressed by cells or organisms – like us!) across large gene sets to support human proteome characterisation as part of the Chromosome-centric Human Proteome Project (C-HPP). Pertinent genomic and protein data from multiple international biological databases are assembled by TPB in a searchable format supporting C-HPP’s global proteomics effort.

Screen Shot 2014-04-23 at 5.27.23 pm

TPB’s primary report of chromosome-ordered genes visualised using traffic light colour system.

TPB’s framework extracts biological data from numerous sources, maps it into the genome, and performs categorisation on the results based on quality and information content. The result (level of evidence) is presented by TPB using a simple point matrix coded by traffic light system (green – highly reliable evidence, yellow – reasonable evidence, red – some evidence is available or black – there is no available evidence). TPB uses hierarchical data types to group similar information from different experiment types.

Screen Shot 2014-04-23 at 4.06.09 pm

TPB’s summary report for the chosen chromosome.

TPB is supported by Monash University, Monash eResearch Centre (MeRC)Chromosome-centric Human Proteome Project (C-HPP)Australia/New Zealand Chromosome 7 Consortium and the Australian National Data Service (ANDS). Researchers are now using TPB in various proteomic-related discoveries.

The R@CMon cloud team recently provided assistance to migrate The Proteome Browser web service to be hosted on the Monash node of the NeCTAR Research Cloud. TPB is using persistent storage (Volumes) granted via a VicNode computational storage allocation to house its underlying database. TPB’s new home will ensure it has stable and scalable long-term hosting supported by the NeCTAR and RDSI federal research infrastructure programmes.

 

Deakin Bioinformatics Workshop (February 17-19, 2014)

Last February 17-19, 2014, a bioinformatics workshop was held at Deakin University – Geelong Waterfront Campus. The workshop covered Genotype By Sequences (GBS) methodologies using various well known bioinformatics tools. The two main tools used in the workshop were Trait Analysis by aSSociation, Evolution and Linkage (TASSEL) and Bowtie. TASSEL is used to investigate relationships between phenotypes and genotypes while Bowtie is a tool used to align DNA sequences to the human genome.

20140218_134552_scaled

Trainees at the Deakin workshop, using the NeCTAR-provisioned training environment.

The workshop was delivered using the NeCTAR Research Cloud infrastructure and Bioplatforms Australia Training Platform. The R@CMon team supported the workshop organisers at Deakin University in creating a customised cloud image containing required tools and datasets as well as ensuring allocation of computational and storage resources in the cloud. The CloudBioLinux-based cloud image has been instantiated for each trainee, giving each one a dedicated virtual desktop environment for their analyses.

20140219_110953_scaled

Workshop trainers demonstrating Genotype By Sequences (GBS) methodologies and tools using a custom NeCTAR cloud image.

Feedback collected from participants on the day was overwhelmingly positive. However, some user-experience issues were encountered with the remote desktops (NX), those can be attributed to the network between the cloud servers (hosted on the eRSA Node in South Australia) and the participants. Though such issues haven’t shown up for BPA workshops utilising the Monash Node up-and-down the east coast, this demonstrates the importance of being able to reserve local cloud capacity for certain use-cases like this which are latency and jitter sensitive. Fortunately those issues were isolated and according to instructors from Cornell University, the training platform used in the workshop was one of the best they’ve used and the trainees were keen to attend future GBS-related workshops delivered using the cloud.

 

The CVL on R@CMon

The Characterisation Virtual Laboratory (CVL) is a powerful platform that integrates Australian imaging facilities with computational and data storage infrastructure, together with sophisticated processing and analysis toolsets. The CVL platform provides scientists working in various fields with a common analysis and collaboration environment, the CVL turns the humble remote desktop into a highly flexible Scientific Software as-a-Service delivery platform powered by the NeCTAR Research Cloud.

CVL-Desktop-01

The CVL Desktop

The current production CVL includes toolsets covering Neuroimaging, Energy Materials and Structural Biology research drivers. The project includes so-called “CVL fabric services”, which provide the necessary infrastructure to modularise popular software toolsets from any number of domains.

The R@CMon team assisted the CVL team in migrating CVL services into R@CMon. The use of persistent storage (Volumes on R@CMon) ensured consistent user home directories and software-stack repositories. The default “CVL Desktop” pool is now serving users with software-rendered CVL environments running on R@CMon. The CVL team is also a beta user of GPU flavours on R@CMon and is currently testing GPU-enabled CVL environments on the “CVL GPU node” pool (via CVL Launcher).

Screen Shot 2014-03-11 at 2.48.55 pm

The available pools on the CVL Launcher.

The following video demonstrates a GPU-enabled CVL environment launched on R@CMon. It shows the PyMOL and UCSF Chimera applications from the Structural Biology workbench, running and utilising the available GPU. The use of GPU enables seamless interaction and manipulation of datasets.

The plan is to increase the “CVL GPU node” pool to accommodate more users once GPU node capacity on R@CMon has been upgraded with deployment of R@CMon Phase 2. Watch this space for more CVL on R@CMon news. Other updates about the CVL and its sub-projects are also available on the CVL site.

Bioplatforms Australia – CSIRO Metagenomics Workshop (February 6-7, 10-11 2014)

Bioplatforms Australia and CSIRO conducted an “Introduction to Metagenomics” workshop last February 6-7, 2014 at University of New South Wales and February 10-11, 2014 at Monash University. The workshop was aimed for bench biologies with no or little experience in Bioinformatics using publicly available data resources and toolsets.

As per previous Bioplatforms Australia workshops, the Metagenomics workshop was delivered using the Monash node of the NeCTAR Research Cloud infrastructure – R@CMon. Cloud provisioning tools used in previous workshops have been reused to provide a seamless virtual desktop training platform.

The R@CMon team worked with Bioplatforms Australia, CSIRO and EMBL-EBI in producing an appropriate cloud image and toolset for the workshop. Some of the tools used in the workshop are QIIME, FastQC, and InterProScan.

Given the success and popularity of the training, R@CMon has begun work with Bioplatforms Australia and other nodes of the NeCTAR Research Cloud to scale the training environment with trainees as they progress from taking the course, to taking the training environment home with them, to preparing for production genomics facilities.

The trainees have the following to say about the 2-days workshop:

“Everyone is very helpful and the content was clear and concise. I thank everyone for getting this program up and running and I would definitely like to return for similar courses.”

“Definitely a fantastic & informative 2-days. I definitely feel that most of what I learned today can be directly applied to the molecular work I am currently engaged in.”

“Good amount of hands on to actually see what happens during the analysis of big data, most of what was taught was very clear and concise. good range of programs were used, only wish there was more time to go through more analyses.”

“I strongly recommend this as a complete course for students or beginners in NGS analyses.  i think the first metagenomic workshop is a winner.”

“Definitely this course has provided me a great basis to look at future data sets.”

“Well ran, good materials, virtual machine made life easy.”