Guillaume Eynard Bontemps
In charge of CNES Computing Center for a year, Guillaume is a specialist in distributed commputing on huge datasets. He has in the past deployed a Hadoop cluster and deployed algorithm on it. He is also a member of Pangeo steering committee and a Dask active user.
Pangeo Data Analysis Platform at CNES and Astronomical Use Cases
Pangeo is first a scientific community, but also a Python software ecosystem and a platform that can be deployed on many infrastructures. Its goal is to provide means ways to make scientific research and programming easier on big datasets coming from simulations ran on HPC clusters (climatic model) or from sensors like observation satellites.
In this demo or talk, we will see how a scientist or an engineer will be able to analyze and process huge data volumes interactively, in a few lines of code, using the software components that are at the heart of Pangeo: Jupyter, Dask and Xarray.
These main pieces of software will be presented:
- Jupyter is the main graphical interface, it advantageously replaces a terminal.
- Dask allows scaling computations and data analysis through many nodes or virtual machines.
- Xarray gives a high level representation of multi-dimensional scientific data.
We will also describe the main possibilities for deploying a Pangeo platform: your personal laptop, a public cloud provider or an HPC cluster.
Finally, we will demonstrate Pangeo stack usage through some concrete use cases:
- Statistic computations and visualisation of Gaia data catalog.
- A multi-temporal analysis on Sentinel 2 satellite tiles, in order to watch the evolution of the NDVI (Normalized Difference Vegetation Index) on its pixels.