Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Job docs to include info about enabling pod-to-pod communication within a job using pod hostnames #37771

Merged
merged 39 commits into from
Nov 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7a17b9f
Update Job docs to include info about using a headless service to ena…
danielvegamyhre Nov 7, 2022
79be50e
Change section title
danielvegamyhre Nov 7, 2022
0cfbc3b
fix phrasing
danielvegamyhre Nov 8, 2022
477fd2d
update yaml example
danielvegamyhre Nov 8, 2022
6c88a52
update label selector
danielvegamyhre Nov 16, 2022
c6e7857
more specific phrasing
danielvegamyhre Nov 16, 2022
00abb8b
address comments and add new example
danielvegamyhre Nov 20, 2022
62cba84
add note about pod dns policies
danielvegamyhre Nov 20, 2022
a48dba8
minor fixes
danielvegamyhre Nov 20, 2022
003349e
add link to job patterns
danielvegamyhre Nov 20, 2022
7c0a4a0
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
b987524
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
10458e7
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
0973696
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
6741374
Update content/en/docs/concepts/workloads/controllers/job.md
danielvegamyhre Nov 21, 2022
b77d098
address comments
danielvegamyhre Nov 21, 2022
bf4272c
clarify sentence
danielvegamyhre Nov 21, 2022
1c001eb
move minikube note to prereqs
danielvegamyhre Nov 21, 2022
ea4b322
address comments
danielvegamyhre Nov 21, 2022
a23d3ab
captitalize all instances of Job
danielvegamyhre Nov 21, 2022
fb02aa3
move minikube notes to bottom of prereqs
danielvegamyhre Nov 21, 2022
e26da91
address comments
danielvegamyhre Nov 21, 2022
59879c0
update example
danielvegamyhre Nov 21, 2022
443e5e8
fix typo
danielvegamyhre Nov 21, 2022
a97339f
update phrasing
danielvegamyhre Nov 21, 2022
2863800
link to this from the completion modes section of the job docs
danielvegamyhre Nov 21, 2022
38114e5
address phrasing comments
danielvegamyhre Nov 22, 2022
34acaeb
add newlines to break up block of text
danielvegamyhre Nov 22, 2022
340520f
update phrasing
danielvegamyhre Nov 22, 2022
7575e90
update phrasing
danielvegamyhre Nov 22, 2022
52349df
Update content/en/docs/concepts/workloads/controllers/job.md
danielvegamyhre Nov 30, 2022
fbc586d
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
5a8b396
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
4d5b1b1
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
67950c4
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
ae413dd
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
6c2f31c
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
60a65e7
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
1bd542e
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions content/en/docs/concepts/workloads/controllers/job.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,8 @@ Jobs with _fixed completion count_ - that is, jobs that have non null
- As part of the Pod hostname, following the pattern `$(job-name)-$(index)`.
When you use an Indexed Job in combination with a
{{< glossary_tooltip term_id="Service" >}}, Pods within the Job can use
the deterministic hostnames to address each other via DNS.
the deterministic hostnames to address each other via DNS. For more information about
how to configure this, see [Job with Pod-to-Pod Communication](/docs/tasks/job/job-with-pod-to-pod-communication/).
- From the containerized task, in the environment variable `JOB_COMPLETION_INDEX`.

The Job is considered complete when there is one successfully completed Pod
Expand Down Expand Up @@ -461,12 +462,13 @@ The tradeoffs are:
The tradeoffs are summarized here, with columns 2 to 4 corresponding to the above tradeoffs.
The pattern names are also links to examples and more detailed description.

| Pattern | Single Job object | Fewer pods than work items? | Use app unmodified? |
| ----------------------------------------- |:-----------------:|:---------------------------:|:-------------------:|
| [Queue with Pod Per Work Item] | ✓ | | sometimes |
| [Queue with Variable Pod Count] | ✓ | ✓ | |
| [Indexed Job with Static Work Assignment] | ✓ | | ✓ |
| [Job Template Expansion] | | | ✓ |
| Pattern | Single Job object | Fewer pods than work items? | Use app unmodified? |
| ----------------------------------------------- |:-----------------:|:---------------------------:|:-------------------:|
| [Queue with Pod Per Work Item] | ✓ | | sometimes |
| [Queue with Variable Pod Count] | ✓ | ✓ | |
| [Indexed Job with Static Work Assignment] | ✓ | | ✓ |
| [Job Template Expansion] | | | ✓ |
| [Job with Pod-to-Pod Communication] | ✓ | sometimes | sometimes |

When you specify completions with `.spec.completions`, each Pod created by the Job controller
has an identical [`spec`](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status). This means that
Expand All @@ -477,17 +479,19 @@ are different ways to arrange for pods to work on different things.
This table shows the required settings for `.spec.parallelism` and `.spec.completions` for each of the patterns.
Here, `W` is the number of work items.

| Pattern | `.spec.completions` | `.spec.parallelism` |
| ----------------------------------------- |:-------------------:|:--------------------:|
| [Queue with Pod Per Work Item] | W | any |
| [Queue with Variable Pod Count] | null | any |
| [Indexed Job with Static Work Assignment] | W | any |
| [Job Template Expansion] | 1 | should be 1 |
| Pattern | `.spec.completions` | `.spec.parallelism` |
| ----------------------------------------------- |:-------------------:|:--------------------:|
| [Queue with Pod Per Work Item] | W | any |
| [Queue with Variable Pod Count] | null | any |
| [Indexed Job with Static Work Assignment] | W | any |
| [Job Template Expansion] | 1 | should be 1 |
| [Job with Pod-to-Pod Communication] | W | W |

[Queue with Pod Per Work Item]: /docs/tasks/job/coarse-parallel-processing-work-queue/
[Queue with Variable Pod Count]: /docs/tasks/job/fine-parallel-processing-work-queue/
[Indexed Job with Static Work Assignment]: /docs/tasks/job/indexed-parallel-processing-static/
[Job Template Expansion]: /docs/tasks/job/parallel-processing-expansion/
[Job with Pod-to-Pod Communication]: /docs/tasks/job/job-with-pod-to-pod-communication/
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this line, the references to "Job with Pod-to-Pod Communication" will show up as string literals, rather than as links. Can you clarify why you want to remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 267 is already a complete link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the solution is rather the opposite: remove the complete link from line 267

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em... both are valid markdown syntax. The full link syntax is preferred over the out-of-band link syntax because we have scripts to scan bad links and that script is not good at handling the out-of-band syntax.


## Advanced usage

Expand Down
127 changes: 127 additions & 0 deletions content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: Job with Pod-to-Pod Communication
content_type: task
min-kubernetes-server-version: v1.21
weight: 30
---

<!-- overview -->

In this example, you will run a Job in [Indexed completion mode](/blog/2021/04/19/introducing-indexed-jobs/) configured such that
the pods created by the Job can communicate with each other using pod hostnames rather than pod IP addresses.

Pods within a Job might need to communicate among themselves. The user workload running in each pod could query the Kubernetes API server
to learn the IPs of the other Pods, but it's much simpler to rely on Kubernetes' built-in DNS resolution.

Jobs in Indexed completion mode automatically set the pods' hostname to be in the format of
`${jobName}-${completionIndex}`. You can use this format to deterministically build
pod hostnames and enable pod communication *without* needing to create a client connection to
the Kubernetes control plane to obtain pod hostnames/IPs via API requests.

This configuration is useful
for use cases where pod networking is required but you don't want to depend on a network
connection with the Kubernetes API server.

## {{% heading "prerequisites" %}}

You should already be familiar with the basic use of [Job](/docs/concepts/workloads/controllers/job/).

{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}

{{<note>}}
If you are using MiniKube or a similar tool, you may need to take
[extra steps](https://minikube.sigs.k8s.io/docs/handbook/addons/ingress-dns/)
to ensure you have DNS.
{{</note>}}

<!-- steps -->

## Starting a Job with Pod-to-Pod Communication

To enable pod-to-pod communication using pod hostnames in a Job, you must do the following:

1. Set up a [headless service](/docs/concepts/services-networking/service/#headless-services)
with a valid label selector for the pods created by your Job. The headless service must be in the same namespace as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with a valid label selector for the pods created by your Job. The headless service must be in the same namespace as
with a valid label selector for the pods created by your Job.
The headless service must be in the same namespace as the Job.

the Job. One easy way to do this is to use the `job-name: <your-job-name>` selector, since the `job-name` label will be automatically added by Kubernetes. This configuration will trigger the DNS system to create records of the hostnames of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the Job. One easy way to do this is to use the `job-name: <your-job-name>` selector, since the `job-name` label will be automatically added by Kubernetes. This configuration will trigger the DNS system to create records of the hostnames of
One easy way to do this is to use the `job-name: <your-job-name>` selector, since the `job-name`
label will be automatically added by Kubernetes. This configuration will trigger the DNS system to
create records of the hostnames of the pods running your Job.

the pods running your Job.

2. Configure the headless service as subdomain service for the Job pods by including the following value in your Job template spec:
Comment on lines +46 to +48
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the pods running your Job.
2. Configure the headless service as subdomain service for the Job pods by including the following value in your Job template spec:
2. Configure the headless service as subdomain service for the Job pods by including the following value
in your Job template spec:


```yaml
subdomain: <headless-svc-name>
```

### Example
Below is a working example of a Job with pod-to-pod communication via pod hostnames enabled.
The Job is completed only after all pods successfully ping each other using hostnames.

{{<note>}}
In the Bash script executed on each pod in the example below, the pod hostnames can be prefixed
by the namespace as well if the pod needs to be reached from outside the namespace.
{{</note>}}

```yaml

apiVersion: v1
kind: Service
metadata:
name: headless-svc
spec:
clusterIP: None # clusterIP must be None to create a headless service
selector:
job-name: example-job # must match Job name
---
apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
completions: 3
parallelism: 3
completionMode: Indexed
template:
spec:
subdomain: headless-svc # has to match Service name
restartPolicy: Never
containers:
- name: example-workload
image: bash:latest
command:
- bash
- -c
- |
for i in 0 1 2
do
gotStatus="-1"
wantStatus="0"
while [ $gotStatus -ne $wantStatus ]
do
ping -c 1 example-job-${i}.headless-svc > /dev/null 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be prefixed by the namespace as well if the pod needs to be reached from outside the namespace.

Copy link
Member Author

@danielvegamyhre danielvegamyhre Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note about this above the example, let me know if that is what you had in mind

gotStatus=$?
if [ $gotStatus -ne $wantStatus ]; then
echo "Failed to ping pod example-job-${i}.headless-svc, retrying in 1 second..."
sleep 1
fi
done
echo "Successfully pinged pod: example-job-${i}.headless-svc"
done
```

After applying the example above, reach each other over the network
using: `<pod-hostname>.<headless-service-name>`. You should see output similar to the following:
```shell
kubectl logs example-job-0-qws42
```

```
Failed to ping pod example-job-0.headless-svc, retrying in 1 second...
Successfully pinged pod: example-job-0.headless-svc
Successfully pinged pod: example-job-1.headless-svc
Successfully pinged pod: example-job-2.headless-svc
```
```
{{<note>}}
Keep in mind that the `<pod-hostname>.<headless-service-name>` name format used
in this example would not work with DNS policy set to `None` or `Default`.
You can learn more about pod DNS policies [here](/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy).
{{</note>}}