Francesc Alted – Mastering Large NDArray Handling with Blosc2 and Caterva2 | PyData Global 2024

www.pydata.org

As data grows larger and more complex, efficient storage and processing become critical to achieving scalable and high-performance computing. Blosc2 (https://www.blosc.org), a powerful meta-compressor library, addresses these challenges by enabling rapid compression and decompression of large, multidimensional arrays (NDArrays). This tutorial will introduce the core concepts of working with Blosc2, focusing on how it can be leveraged to optimize both storage and computational performance in Python.

Attendees will learn how to:

Efficiently create and manage large NDArrays, including options for persistence.
Select the best codecs and filters for specific data types and workflows to achieve optimal compression ratios and performance.
Perform computations directly on compressed data to save memory and speed up processing.
Seamlessly share NDArrays using Caterva2, a versatile library designed to enable remote sharing and serving of multidimensional datasets.

This tutorial is ideal for Python developers working with large-scale data in scientific computing, machine learning, and other data-intensive fields.

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.

Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps