beam
https://github.com/apache/beam
Java
Apache Beam is a unified programming model for Batch and Streaming data processing.
Triage Issues!
When you volunteer to triage issues, you'll receive an email each day with a link to an open issue that needs help in this project. You'll also receive instructions on how to triage issues.
Triage Docs!
Receive a documented method or class from your favorite GitHub repos in your inbox every day. If you're really pro, receive undocumented methods or classes and supercharge your commit history.
Java not yet supported56 Subscribers
View all SubscribersAdd a CodeTriage badge to beam
Help out
- Issues
- Make Python side input tags always key, value pairs instead of depending in index suffixed tag names
- Autodetect Avro schema from Avro file
- Apache Beam/Dataflow flowed a CalledProcessError with beam.Pipeline("DataflowRunner", options=opts)
- Use FlinkRunner instead of PortableRunner for load tests
- Ensure that the environment is propagated through from ExpansionService to Dataflow
- Reading from pubsub in portable FlinkRunner (ambigious ReadFromPubSub transform)
- Too many shards in GCS
- standardize docker and docker-compose usage in ITs
- "Socket closed" Spurious GRPC errors in Flink/Spark runner log output
- Pipeline creation with large number of shards/streams takes long time
- Docs
- Java not yet supported