beam
https://github.com/apache/beam
Java
Apache Beam is a unified programming model for Batch and Streaming data processing.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Java not yet supported56 Subscribers
View all SubscribersAdd a CodeTriage badge to beam
Help out
- Issues
- Read TFRecord Files from hdfs will meet exception if file size is large
- WindmillStateCache grossly misunderestimates object size in its weighting function
- Support Fanout in Apache BEAM SQL extension
- Document optional kwargs argument in Partition Ptransform
- Create IO tests for synthetic sources
- Synthetic unbounded source looses (duplicates?) data while splitting
- CombineGlobally translation is risky and not very performant.
- ExecutableStage should be able to accept multiple input PCollection
- Intermittent empty accumulator values in extractOutput of Combine.perKey on Dataflow
- Retry createJob requests in Dataflow Runner for retriable errors.
- Docs
- Java not yet supported