Tag Archives: HPC

PROSPER on R@CMon

PROSPER (PROtease Specificity Prediction servER) is an integrated feature-based web server that provides prediction of novel substrates and their cleavage sites of 24 different protease families from primary sequences. PROSPER addresses the “substrate identification” problem to support understanding of protease biology and development of therapeutics targeting specific protease regulated pathways. 

Screen Shot 2014-05-09 at 10.40.53 am

PROSPER’s web server for protein sequence user input.

Query sequences in FASTA format are submitted to PROSPER using a simple web interface. PROSPER then uses a machine learning approach based on “support vector regression” to produce real-valued prediction of substrate cleave probability. After the prediction tasks have been performed, PROSPER provides users with a link to access their query sequence’s prediction results.

Screen Shot 2014-05-09 at 11.07.42 am

PROSPER’s result page showing colour-coded predicted cleavage sites.

The R@CMon team helped the Monash Bioinformatics Platform in migrating the PROSPER web server into the NeCTAR Research Cloud. PROSPER is now using persistent storage (Volumes) granted via VicNode computational storage allocation for its model database which currently contains 24 proteases. To date, PROSPER has served more than 6000 users from 68 countries for their own research and these numbers are expected to grow in the near future.

1pyy_A_0.pdb

DOUBLE MUTANT PBP2X T338A/M339F FROM STREPTOCOCCUS PNEUMONIAE STRAIN 2 R6 AT 2.4 A RESOLUTION

Future plans for PROSPER include tighter integration with high-performance computing (HPC) facilities and the NeCTAR Research Cloud to enable simultaneous sequence prediction analyses; increase the coverage of the model database from 24 to 50 proteases to support the wider protease biology community; and implementation of an online computational database of PROSPER-predicted novel substrates and cleavage sites in the whole human proteome. The later will facilitate the functional annotation of the complete human proteome and complement with ongoing efforts to characterise their functions.

The MCC on R@CMon

The Monash Campus Cluster (MCC) is a heterogeneous high-performance  (HPC) and high-throughput computing (HTC) facility for conducting large-scale computationally-intensive simulations and analyses. With over 2,500 CPU cores across 230 servers of different CPU and memory configuration, the MCC is specifically designed to serve diverse computational workloads. In 2013, the MCC provided over 13 million CPU-core hours to over 300 Monash researchers.

During the past few months, we have been developing a software architecture to extend MCC’s computational resources into the NeCTAR Research Cloud. Users are presented with the MCC’s familiar batch queueing and software environment, so they can seamlessly execute compute jobs on either the legacy cluster nodes of MCC or the NeCTAR Research Cloud.

MCC-NeCTAR

Monash Campus Cluster, “bursting” into the NeCTAR Research Cloud.

We achieve this by integrating the NeCTAR virtual machines as compute bricks into the MCC batch queuing system, presently the Open Grid Scheduler. This provides users with:

    • a new nectar queue which consists of MCC on R@CMon compute bricks; and
    • the ability to pick a specific availability zone on the research cloud to run compute jobs on

Researchers not only get to burst their computational jobs into the research cloud seamlessly, they can leverage the unique properties of each of the cloud nodes (e.g., hardware and/or software capabilities).

$ qconf -sql
nectar
nectar-gaia
nectar-melbourne
nectar-monash
nectar-sa

Since December 2013, over 40,000 CPU hours worth of computational jobs have been executed into the NeCTAR queue of MCC. The resources used by MCC on R@CMon will be expanded with deployment of R@CMon Phase 2 to accommodate specialised computational workloads (e.g., high memory jobs). Monash researchers who have their own NeCTAR Allocations and want that allocation to be presented through MCC, can opt for their allocation be managed by the “MCC on R@CMon” project. This can be arranged by contacting the Monash eResearch Centre.

Computational Resource Framework

Over the past year the Monash NeCTAR porting programme has worked with Griffith University eResearch developers on their Computational Resource Framework (CRF) project. We’re well overdue to promote their excellent work (and a little of ours), and as there will be a presentation on this at eResearch Australasia (A RESTful Web Service for High Performance Computing based on Nimrod/G), now seems a good time to blog about it!

hpcportal_login_sm

The CRF aims to address one of the long-standing issues in HPC, that is uptake by non-technical users. HPC is a domain with a well-entrenched command-line ethos, unfortunately this does alienate a large cohort of potential users, and that has negative implications for research productivity and collaboration. At Monash, our HPC specialists go to a great deal of effort to ease new users into the CLI (command line interface) environment, however, this is a labour-intensive process that doesn’t catch everybody and often leaves users reliant on individual consultants or team-members.

For some time portals have been the go-to panacea for democratising HPC and scientific computing, and there are some great systems being deployed on the RC, but they still seem to require a large amount of development effort to build and typically cater to a particular domain. Another common issue with “job-management” style portals (including Nimrod’s own ancient CGI beauty) is that they expose and delegate too much information and responsibility to the end user – typically an end user who doesn’t know, or want to know, about the intricacies of the various computational resources. Such mundanities as what credentials they need, which ones are actually working today, etc.

The CRF is different in this respect as it is not a domain-specific interface, instead the Griffith team have taken the approach of concentrating on a minimum set of functionality for some popular HPC and scientific computing applications. The user just inputs info relevant to the application and the CRF configuration determines where and how the job is run. Currently there are UIs for creating, submitting and managing NAMD, R and MATLAB jobs and sets thereof; AutoDock, Octave and Gaussian are in the works. They’ve put a bunch of effort into building out a web-service in front of Nimrod/G, that layer means their UIs are fairly self-contained bits of php that present an interface and translate inputs into a template Nimrod experiment. The web-service has also enabled them to build some other cool bits and pieces, like experiment submission and results over email.

hpcportal_namd_sm

Using Nimrod/G as the back-end resource meta-scheduler means the CRF can intermingle jobs over the local cluster and burst or overflow into the NeCTAR Research Cloud, and that’s the focus of the Monash & Griffith collaboration for this project. We’re now looking to implement scheduling functionality that will make it possible to put simple policies on resources, e.g., use my local cluster but if jobs don’t start within 30 mins then head to the cloud. There should be some updates following in that area after the eResearch conference!