Starter kit for apache beam

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines.

Beam’s supported distributed processing back-ends, which include Apache Flink, Apache Spark, and Google Cloud Dataflow.

It is Data Processing platform mainly used fro analytics or for ETL which is execution platform agnostics , data agnostics , and also support programming languages like python , java , go , scala also apache beam support cross language pipelines

Note version compatibility java is jdk 8 for python is 3.6, 3.7, and 3.8, 2.7 for Go version 1.10

Below is basic data pipeline for apache beam

Now lets understand some terminology for the apache beam

1) Pipeline

  • A pipeline is a graph of transformations that a user constructs that defines the data processing they want to do.

inputs = [0, 1, 2, 3]

# Create a pipeline.

with beam.Pipeline() as pipeline:

# Feed it some input elements with `Create`.

outputs = pipeline| ‘Create initial values’ >> beam.Create(inputs)

2) PCollection -

  • A PCollection is immutable

3) PTransforms -

  • Transforms are the operations in your pipeline, and provide a generic processing framework

inputs = [0, 1, 2, 3]

with beam.Pipeline() as pipeline:

outputs = (pipeline| ‘Create values’ >> beam.Create(inputs)| ‘Multiply by 2’ >> beam.Map(lambda x: x * 2))

outputs | beam.Map(print)

  • FlatMap

import apache_beam as beam

with beam.Pipeline() as pipeline:

plants = (pipeline | ‘Gardening plants’ >> beam.Create([‘🍓Strawberry 🥕Carrot 🍆Eggplant’,‘🍅Tomato 🥔Potato’])| ‘Split words’ >> beam.FlatMap(str.split)| beam.Map(print))

  • ParDo

def process(self, element):

return [len(element)]

# Apply a ParDo to the PCollection “words” to compute lengths for each word.

word_lengths = words | beam.ParDo(ComputeWordLengthFn())



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store