Pangeo Data Analysis Platform at CNES and Astronomical Use Cases
2020-11-10, 12:30–13:00, Times in UTC

Pangeo is first a scientific community, but also a Python software ecosystem and a platform that can be deployed on many infrastructures. Its goal is to provide means ways to make scientific research and programming easier on big datasets coming from simulations ran on HPC clusters (climatic model) or from sensors like observation satellites.

In this demo or talk, we will see how a scientist or an engineer will be able to analyze and process huge data volumes interactively, in a few lines of code, using the software components that are at the heart of Pangeo: Jupyter, Dask and Xarray.

These main pieces of software will be presented:
- Jupyter is the main graphical interface, it advantageously replaces a terminal.
- Dask allows scaling computations and data analysis through many nodes or virtual machines.
- Xarray gives a high level representation of multi-dimensional scientific data.

We will also describe the main possibilities for deploying a Pangeo platform: your personal laptop, a public cloud provider or an HPC cluster.

Finally, we will demonstrate Pangeo stack usage through some concrete use cases:
- Statistic computations and visualisation of Gaia data catalog.
- A multi-temporal analysis on Sentinel 2 satellite tiles, in order to watch the evolution of the NDVI (Normalized Difference Vegetation Index) on its pixels.


Theme – Science Platforms and Data Lakes, Cloud Computing at Different Scales, Data Processing Pipelines and Science-Ready Data, Open Source Software and Community Development in Astronomy