Lloyd Harischandra

Lloyd designs the backend data architecture determining the solutions for ingesting and querying data at the AAO-Macquarie. He configures and manages the database cluster (we call it the data lake!) and does benchmarking on the node configuration against current popular database technologies as required. He also writes backend software code to ingest and query data in and out of the data lake.

Lloyd possess over 12 years of experience as a software/data engineer.

The speaker's profile picture

Affiliation – Australian Astronomical Optics - Macquarie University Position – Data Scientist Twitter – @Llo_y_d

Talks

Realtime Data Transferring and Processing from Remote Telescopes Using Apache NiFi and MongoDB

Data central at AAO-MQ provides data archiving and hosting solutions to several external telescopes such as Anglo Australian Telescope(AAT) and Huntsman Telephoto Array in Australia. Up until recently AAT raw data had been manually processed and transported to be archived at AAO-MQ for the astronomical community to access. This was time consuming and took extra resources to manage the data flow. In this talk, we will discuss how we automated the data transfer and archiving using Apache NiFi and MongoDB resulting in raw data being available for the astronomers to query in real-time as they are produced by the telescope.

We examine the usability, flexibility and scalability of Apache NiFi and various processors and controllers available to collect, process and move data across multiple data points. We will discuss great features available in NiFi especially for data provenance, guaranteed delivery and loss tolerance. We will also discuss what flow files and flow templates are, the role of the Data Flow Manager and how to schedule processors. In addition to that we will show you how to secure site-to-site data transferring and utilise NiFi clustering to balance the load.

Lastly we will show how survey teams can write their custom data processing tasks and data reduction pipelines in their favourite language(Python, R etc.) and integrate them into the data flow to be automatically processed and ingested into their favourite data stores for further processing or public availability.