Skip to content

Commit

Permalink
Tour of Beam markdown touchups (#32536)
Browse files Browse the repository at this point in the history
* Touch up introduction md

* runners docs touchup

* add techincal clarificaiton to ptransform definition
  • Loading branch information
hjtran authored Sep 26, 2024
1 parent 11318ae commit 1eddbdc
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 13 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The Beam SDKs provide several abstractions that simplify the mechanics of large-

`PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline.

`PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects.
`PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes zero or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects.
{{if (eq .Sdk "go")}}

`Scope`: The Go SDK has an explicit scope variable used to build a `Pipeline`. A Pipeline can return it’s root scope with the `Root()` method. The scope variable is then passed to `PTransform` functions that place them in the `Pipeline` that owns the `Scope`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ limitations under the License.

Apache Beam provides a portable API layer for building sophisticated data-parallel processing `pipelines` that may be executed across a diversity of execution engines, or `runners`. The core concepts of this layer are based upon the Beam Model (formerly referred to as the Dataflow Model), and implemented to varying degrees in each Beam `runner`.

### Direct runner
### Direct Runner
The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model. Some of these checks include:

* enforcing immutability of elements
Expand Down Expand Up @@ -61,9 +61,9 @@ In java, you need to set runner to `args` when you start the program.
{{end}}

{{if (eq .Sdk "python")}}
In the Python SDK , the default is runner **DirectRunner**.
In the Python SDK , the **DirectRunner** is the default runner and is used if no runner is specified.

Additionally, you can read more about the Direct Runner [here](https://beam.apache.org/documentation/runners/direct/)
You can read more about the **DirectRunner** [here](https://beam.apache.org/documentation/runners/direct/)

#### Run example

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,29 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Tour of Beam Programming Guide

Welcome to a Tour Of Beam, a learning guide you can use to familiarize yourself with the Apache Beam.
The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles.
You can access the full list of modules by clicking ‘<<’ button on the left . For each module, learning progress is displayed next to it.
Throughout the tour, you will find learning materials, examples, exercises and challenges for you to complete.
Learning units are accompanied by code examples that you can review in the upper right pane. You can edit the code, or just run the example by clicking the ‘Run’ button. Output is displayed in the lower right pane.
Each module also contains a challenge based on the material learned. Try to solve as many as you can, and if you need help, just click on the ‘Hint’ button or examine the correct solution by clicking the ‘Solution’ button.
Now let’s start the tour by learning some core Beam principles.
# Welcome to a Tour of Beam

The Tour of Beam is a learning guide you can use to familiarize yourself with **Apache Beam**.

The tour is divided into a list of modules that contain learning units covering various Apache Beam features and principles. You can access the full list of modules by clicking the ‘<<’ button on the left. For each module, learning progress is displayed next to it.

Throughout the tour, you will find:

- **Learning materials**
- **Examples**
- **Exercises**
- **Challenges** for you to complete

Learning units are accompanied by code examples that you can review in the upper right pane. You can:

- **Edit the code**
- **Run the example**

After running the example, the output will be displayed in the lower right pane.

Each module also contains a challenge based on the material learned. Try to solve as many as you can, and if you need help, just click on the:

- **Hint** button
- **Solution** button to examine the correct solution

Now, let’s start the tour by learning some core Beam principles!

0 comments on commit 1eddbdc

Please sign in to comment.