From 1eddbdca4c5efe4bfccc11af29624f9689ea2997 Mon Sep 17 00:00:00 2001 From: Hai Joey Tran Date: Thu, 26 Sep 2024 11:49:24 -0400 Subject: [PATCH] Tour of Beam markdown touchups (#32536) * Touch up introduction md * runners docs touchup * add techincal clarificaiton to ptransform definition --- .../overview-pipeline/description.md | 2 +- .../runner-concepts/description.md | 6 ++-- .../introduction-guide/description.md | 35 ++++++++++++++----- 3 files changed, 30 insertions(+), 13 deletions(-) diff --git a/learning/tour-of-beam/learning-content/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md b/learning/tour-of-beam/learning-content/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md index 5144f737524f..50955741a9f0 100644 --- a/learning/tour-of-beam/learning-content/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md +++ b/learning/tour-of-beam/learning-content/introduction/introduction-concepts/pipeline-concepts/overview-pipeline/description.md @@ -22,7 +22,7 @@ The Beam SDKs provide several abstractions that simplify the mechanics of large- → `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. -→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects. +→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes zero or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects. {{if (eq .Sdk "go")}} → `Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is then passed to `PTransform` functions that place them in the `Pipeline` that owns the `Scope`. diff --git a/learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md b/learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md index 3989f6de6510..6eb1c04e966a 100644 --- a/learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md +++ b/learning/tour-of-beam/learning-content/introduction/introduction-concepts/runner-concepts/description.md @@ -15,7 +15,7 @@ limitations under the License. Apache Beam provides a portable API layer for building sophisticated data-parallel processing `pipelines` that may be executed across a diversity of execution engines, or `runners`. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam `runner`. -### Direct runner +### Direct Runner The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model. Some of these checks include: * enforcing immutability of elements @@ -61,9 +61,9 @@ In java, you need to set runner to `args` when you start the program. {{end}} {{if (eq .Sdk "python")}} -In the Python SDK , the default is runner **DirectRunner**. +In the Python SDK , the **DirectRunner** is the default runner and is used if no runner is specified. -Additionally, you can read more about the Direct Runner [here](https://beam.apache.org/documentation/runners/direct/) +You can read more about the **DirectRunner** [here](https://beam.apache.org/documentation/runners/direct/) #### Run example diff --git a/learning/tour-of-beam/learning-content/introduction/introduction-guide/description.md b/learning/tour-of-beam/learning-content/introduction/introduction-guide/description.md index 9b9d7a09827e..a8fb8e750683 100644 --- a/learning/tour-of-beam/learning-content/introduction/introduction-guide/description.md +++ b/learning/tour-of-beam/learning-content/introduction/introduction-guide/description.md @@ -11,12 +11,29 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> -# Tour of Beam Programming Guide - -Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam. -The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles. -You can access the full list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it. -Throughout the tour, you will find learning materials, examples, exercises and challenges for you to complete. -Learning units are accompanied by code examples that you can review in the upper right pane. You can edit the code, or just run the example by clicking the ‘Run’ button. Output is displayed in the lower right pane. -Each module also contains a challenge based on the material learned. Try to solve as many as you can, and if you need help, just click on the ‘Hint’ button or examine the correct solution by clicking the ‘Solution’ button. -Now let’s start the tour by learning some core Beam principles. \ No newline at end of file +# Welcome to a Tour of Beam + +The Tour of Beam is a learning guide you can use to familiarize yourself with **Apache Beam**. + +The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles. You can access the full list of modules by clicking the ‘<<’ button on the left. For each module, learning progress is displayed next to it. + +Throughout the tour, you will find: + +- **Learning materials** +- **Examples** +- **Exercises** +- **Challenges** for you to complete + +Learning units are accompanied by code examples that you can review in the upper right pane. You can: + +- **Edit the code** +- **Run the example** + +After running the example, the output will be displayed in the lower right pane. + +Each module also contains a challenge based on the material learned. Try to solve as many as you can, and if you need help, just click on the: + +- **Hint** button +- **Solution** button to examine the correct solution + +Now, let’s start the tour by learning some core Beam principles! \ No newline at end of file