spark
https://github.com/apache/spark
Scala
Apache Spark - A unified analytics engine for large-scale data processing
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Scala not yet supported76 Subscribers
View all SubscribersAdd a CodeTriage badge to spark
Help out
- Issues
- [SPARK-55320][SQL][CONNECT] Use raise_error instead of divide by zero in Observation tests
- [SPARK-55321][PYTHON][TESTS] Ignore null difference when comparing ps df/series
- [WIP][SPARK-53928][SQL] Enhance DSV2 partition filtering using catalyst expression
- [SPARK-28098][SQL] Supports reading hive tables when there are subdirectories under the table or partition location.
- Avoid traversing nested folders if recursiveFileLookup = false
- [SPARK-53890][SDP] Test (and fix) read/readstream options are respected for pipelines
- [SQL][MINOR] Defensive code for number of Partitions in rCTEs
- [DOCS] Fix Missing Documentation for SparkSession in Declarative Pipelines (Python)
- [SPARK-27853][SQL] Enable custom partitioning logic for Dataset via Partitioner
- [WIP][PYTHON] Zero Copy Pandas UDF
- Docs
- Scala not yet supported