spark
https://github.com/apache/spark
Scala
Apache Spark - A unified analytics engine for large-scale data processing
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Scala not yet supported75 Subscribers
View all SubscribersAdd a CodeTriage badge to spark
Help out
- Issues
- Add recursive read for partition
- [SPARK-56918][CORE] Add ManagedConsumer SPI for shrinkable external storage memory
- [SPARK-56894][SQL] Add vectorized Parquet BYTE_STREAM_SPLIT reader
- [SPARK-56897][SQL] Reduce per-value allocations in DELTA_BYTE_ARRAY Parquet decoder
- [SPARK-56898][SQL] Rewrite COUNT(DISTINCT IF) to COUNT(DISTINCT) FILTER for Expand reduction
- [SPARK-56907][SQL] Reduce per-value allocation in DELTA_LENGTH_BYTE_ARRAY Parquet vectorized reader
- fix issue MapType as a type hint for schema 55900
- Pyspark: `from_json` is missing `MapType` as a type hint for `schema` argument
- [SPARK-44734][PYTHON][DOCS] Expand PySpark type conversion guide
- [WIP][CONNECT] Support directory uploads in Spark Connect copyFromLocalToFs
- Docs
- Scala not yet supported