Overview

Overview

Datasets Server automatically converts and publishes public datasets less than 5GB on the Hub as Parquet files. Parquet files are column-based and they shine when you’re working with big data. There are several different libraries you can use to work with the published Parquet files:

  • ClickHouse, a column-oriented database management system for online analytical processing

  • DuckDB, a high-performance SQL database for analytical queries

  • Pandas, a data analysis tool for working with data structures

  • Polars, a Rust based DataFrame library

Last updated