Solr | R@CMon

Campbell Wilson and Janis Dalins of Monash’s Faculty of IT are developing a forensic data ranking and analysis tool, focusing on accelerating existing multimedia forensics analytics. Their prototype will collect, index and analyse geo-tagged multimedia in a tool intended for use by international law enforcement agencies for multi-jurisdictional victim identification.

Till recently they were running their own Solr cluster on a set of spare desktop machines which Janis was spending a good deal of time maintaining. Then their NAS box died, badly. R@CMon to the rescue!

Janis and I met previously in the server room at Caulfield where I told him all about the research cloud and the imminent availability of the Monash node, so we were both excited to see what kind of Solr performance they could get on the cloud compared to their dedicated hardware. The sticking point previously had been storage performance (or the lack thereof) on the existing cloud node and the relatively small size of the ephemeral storage available in the standard NeCTAR flavors. Fortunately R@CMon came online just in time and with enough flexibility to help them recompute their lost data. We created a special large flavor with 32 cores, 128 GB RAM and 2.4TB of ephemeral storage – a mini cluster-in-a-box.

Janis kindly shared a recent progress update:

Since restarting our indexing almost a fortnight ago, we’ve successfully indexed over 1.2 billion (1,226,707,169 at last count) tweets. This is almost double the count we achieved on our existing cluster, I’d guesstimate the overall index construction workload/performance as being around 130% that of the cluster. Faceted querying now works, and in a timely fashion.

So performance is great, but Janis also nicely sums up the key empowering aspect of the NeCTAR research cloud:

Beyond simple performance measures, I wanted to emphasise probably the most important thing: Unlike the existing indexing cluster, the NeCTAR instance has been treated very much as a turn-key project. We used the instance “as is” (with the exception of adding user accounts and an nginx based reverse proxy for security reasons), and haven’t performed any maintenance or configuration work. I can’t emphasise enough how much time that’s freed up for actually doing research – in commercial lingo, I’d say that’s improved efficiency greatly.

Thanks a bunch to Janis for taking the time to share the story. If you like the sound of this and think we might be able to help with your research computing please contact merc@monash.edu.

R@CMon

Research @ Cloud Monash

Tag Archives: Solr

R@CMon accelerates forensic data research