Tag Archives: Bioinformatics

Histone H3.3 Analysis on R@CMon

The Epigenetics and Chromatin (EpiC) Lab at Monash University is working on understanding how mutations in certain chromatin factors promote the formation of brain tumours. This project involves the generation and analysis of high-throughput sequencing data of chromatin modifications and remodellers in normal and mutated cells. The sequencing is carried out at the MHTP Medical Genomics Facility and the resulting datasets are then imported into  the analysis workflow running on the Monash node (R@CMon) of the NeCTAR Research Cloud. The sequencing reads are first aligned to the repetitive fraction of the genome using a script developed by Day et al. (Genome Biology 2010) to determine enrichment at repeats. Sequencing reads are then aligned to the genome using Bowtie. The resulting files are filtered for quality, poor matches and PCR duplicates using customised Perl scripts. The filtered files are then imported into SeqMonk for further analysis.

Overlap analysis using SeqMonk

Overlap analysis using SeqMonk

This allows for rapid visualisation of individual aligned reads across the entire genome. The inbuilt MACs peak caller is used for first pass peak calling. A selection of peaks is then validated in the lab by ChIP-qPCR experiments and peak-calling parameters can be adjusted based on these results. Overlap analysis with regions of interest can be performed in SeqMonk. Aligned sequence files are converted to BigWig format using customised Perl scripts and uploaded onto the NeCTAR Object Storage (Swift), which can then be loaded seamlessly on the UCSC Genome Browser for visualisation and further investigation. Once the sequence files are uploaded to the object storage, it can then be easily compared against public ENCODE datasets and UCSC genomic annotations to identify any potentially interesting correlations.

Aligned sequence visualisation using the UCSC Genome Browser.

Aligned sequence visualisation using the UCSC Genome Browser.

The R@CMon team and the Monash Bioinformatics Platform supported the EpiC Lab by deploying a dedicated analysis instance on the NeCTAR Research Cloud based on the training environment first developed for the BPA-CSIRO Bioinformatics Training Platform. The open access and reusability of the training platform means it can be easily readapted to various analysis workflows. The R@CMon team and the Monash Bioinformatics Platform will continue to engage with the EpiC Lab as they grow and scale their analysis workflow on the NeCTAR Research Cloud.

Interferome on R@CMon

Interferons (IFNs) were identified as antiviral proteins more than 50 years ago. However, their involvement in immunomodulation, cell proliferation, inflammation and other homeostatic process has since been identified. These cytokines are used as therapeutics in many diseases such as chronic viral infections, cancer and multiple sclerosis. These IFNs regulate the transcription of approximately 2000 genes in a IFN subtype, dose, cell type and stimulus dependent manner. 

Interferome Wordle

Interferome Wordle

Interferome is an online database of IFN regulated genes.  The database is a valuable resource for biomedical researchers, being regularly used by scientists from across the world. This database of IFN regulated genes is an attempt at integrating information from high-throughput experiments to gain a detailed understanding of IFN biology. Interferome enables reliable identification of an individual Interferon Regulated Gene (IRG) or IRG signatures from high-throughput data sets (i.e. microarray, proteomic data etc.). It also assists in identifying regulatory elements, chromosomal location and tissue expression of IRGs in humans and mice.

Interferome Database Statistics

Interferome Database Statistics

The R@CMon team assisted Prof. Paul Hertzog and the Centre of Innate Immunity & Infectious Diseases at MIMR-PHI in migrating versions 1.0 and 2.0 of the Interferome online database into the NeCTAR Research Cloud. Interferome Version 2.0 has quantitative data, more detailed annotation and search capabilities and can be queried for one gene or thousands as in a gene list from a microarray experiment. To ensure availability of data and assist researchers with hypothesis generation and novel biological discoveries, the Interferome database is backed by VicNode Collection 2014R9.06. More information about Interferome is available on the help page.

Bioplatforms Australia – CSIRO NGS Workshop (July 1-3, 2014)

Last July 1-3, 2014, the latest Bioplatforms Australia – CSIRO joint Next Generation Sequencing hands-on workshop was held at the University of New South Wales, Sydney. The workshop was delivered using the established Bioinformatics Training Platform running on the NeCTAR Research Cloud and provided bench biologists and PhD students with NGS training on the following topics:

      • Introduction to the command-line interface – Software Carpentry
      • Introduction to Next Generation Sequencing
      • Illumina Next Generation Sequencing Data Quality
      • Sequence Alignment Algorithms
      • ChIP-Seq Analysis
      • RNA-Seq Analysis
      • de novo Genome Assembly
Sequence data quality analysis and visualisation using FastQC and FASTX-Toolkit.

Sequence data quality analysis and visualisation using FastQC and FASTX-Toolkit.

The R@CMon team helped the workshop organisers in updating the training environment with the latest tools, datasets and other materials as well as ensuring resource stability throughout the 3 day workshop. Future Bioplatforms Australia and CSIRO joint workshops will be announced on the Bioplatforms Australia Training page.

Screen Shot 2014-07-16 at 12.40.46 pm

Alignment visualisation using IGV.

The trainees have the following to say about the workshop:

“The practical component made it 1000 times easier to get my head around the course and I feel like I can be confident in actually applying what I’ve learned (instead of just in lecture format).”

“The beginning with introduction to Unix environment and explanation of the de novo assembly was the best part of the course as the commands were described in more detail so I could understand what the different commands were executing. There was more practical work with the de novo assembly which was good.”

“Hands on experience is good, and the first part on command lines is good for the beginners.”


PROSPER (PROtease Specificity Prediction servER) is an integrated feature-based web server that provides prediction of novel substrates and their cleavage sites of 24 different protease families from primary sequences. PROSPER addresses the “substrate identification” problem to support understanding of protease biology and development of therapeutics targeting specific protease regulated pathways. 

Screen Shot 2014-05-09 at 10.40.53 am

PROSPER’s web server for protein sequence user input.

Query sequences in FASTA format are submitted to PROSPER using a simple web interface. PROSPER then uses a machine learning approach based on “support vector regression” to produce real-valued prediction of substrate cleave probability. After the prediction tasks have been performed, PROSPER provides users with a link to access their query sequence’s prediction results.

Screen Shot 2014-05-09 at 11.07.42 am

PROSPER’s result page showing colour-coded predicted cleavage sites.

The R@CMon team helped the Monash Bioinformatics Platform in migrating the PROSPER web server into the NeCTAR Research Cloud. PROSPER is now using persistent storage (Volumes) granted via VicNode computational storage allocation for its model database which currently contains 24 proteases. To date, PROSPER has served more than 6000 users from 68 countries for their own research and these numbers are expected to grow in the near future.



Future plans for PROSPER include tighter integration with high-performance computing (HPC) facilities and the NeCTAR Research Cloud to enable simultaneous sequence prediction analyses; increase the coverage of the model database from 24 to 50 proteases to support the wider protease biology community; and implementation of an online computational database of PROSPER-predicted novel substrates and cleavage sites in the whole human proteome. The later will facilitate the functional annotation of the complete human proteome and complement with ongoing efforts to characterise their functions.

Deakin Bioinformatics Workshop (February 17-19, 2014)

Last February 17-19, 2014, a bioinformatics workshop was held at Deakin University – Geelong Waterfront Campus. The workshop covered Genotype By Sequences (GBS) methodologies using various well known bioinformatics tools. The two main tools used in the workshop were Trait Analysis by aSSociation, Evolution and Linkage (TASSEL) and Bowtie. TASSEL is used to investigate relationships between phenotypes and genotypes while Bowtie is a tool used to align DNA sequences to the human genome.


Trainees at the Deakin workshop, using the NeCTAR-provisioned training environment.

The workshop was delivered using the NeCTAR Research Cloud infrastructure and Bioplatforms Australia Training Platform. The R@CMon team supported the workshop organisers at Deakin University in creating a customised cloud image containing required tools and datasets as well as ensuring allocation of computational and storage resources in the cloud. The CloudBioLinux-based cloud image has been instantiated for each trainee, giving each one a dedicated virtual desktop environment for their analyses.


Workshop trainers demonstrating Genotype By Sequences (GBS) methodologies and tools using a custom NeCTAR cloud image.

Feedback collected from participants on the day was overwhelmingly positive. However, some user-experience issues were encountered with the remote desktops (NX), those can be attributed to the network between the cloud servers (hosted on the eRSA Node in South Australia) and the participants. Though such issues haven’t shown up for BPA workshops utilising the Monash Node up-and-down the east coast, this demonstrates the importance of being able to reserve local cloud capacity for certain use-cases like this which are latency and jitter sensitive. Fortunately those issues were isolated and according to instructors from Cornell University, the training platform used in the workshop was one of the best they’ve used and the trainees were keen to attend future GBS-related workshops delivered using the cloud.