Apache Spark

spark.apache.org

Unified engine for large-scale data analytics

Data & Analytics big-data data-engineering machine-learning distributed-computing open-source python spark

/ About /

Apache Spark is an open-source, multi-language analytics engine designed for large-scale data engineering, data science, and machine learning workloads. It supports Python, SQL, Scala, Java, and R, and can run on single-node machines or distributed clusters. Spark provides high-level APIs for batch processing, streaming, and ML model training.

/ How it works /

Users install PySpark or use the official Docker image to run distributed data processing jobs using DataFrame APIs or SQL across single nodes or clusters.

/ Who it's for /

Data engineers, data scientists, and machine learning practitioners

/ More info /

Background.

Status: launched
Business model: open-source
Company: Apache Software Foundation

Contact

/ Discovered patterns /

Similar projects.

Coming soonSpektrail’s read on Data & Analytics

Editorial take on the space this project sits in — momentum signals, adjacent moves, our call on whether the wedge is real. Get pinged when we publish a new read or when the landscape shifts.

Coming soon

Have a take on this space?

Tell us what you’d build differently, where you think the incumbents miss, or what we’ve gotten wrong about this project. Comments + reactions are coming soon.

Apache Spark

Background.

Contact

Similar projects.

Databricks

Trino

Apache Superset

Have a take on this space?