Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Job docs to include info about enabling pod-to-pod communication within a job using pod hostnames #37771

Merged
merged 39 commits into from
Nov 30, 2022
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7a17b9f
Update Job docs to include info about using a headless service to ena…
danielvegamyhre Nov 7, 2022
79be50e
Change section title
danielvegamyhre Nov 7, 2022
0cfbc3b
fix phrasing
danielvegamyhre Nov 8, 2022
477fd2d
update yaml example
danielvegamyhre Nov 8, 2022
6c88a52
update label selector
danielvegamyhre Nov 16, 2022
c6e7857
more specific phrasing
danielvegamyhre Nov 16, 2022
00abb8b
address comments and add new example
danielvegamyhre Nov 20, 2022
62cba84
add note about pod dns policies
danielvegamyhre Nov 20, 2022
a48dba8
minor fixes
danielvegamyhre Nov 20, 2022
003349e
add link to job patterns
danielvegamyhre Nov 20, 2022
7c0a4a0
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
b987524
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
10458e7
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
0973696
Update content/en/docs/tasks/job/intra-job-pod-networking-using-pod-h…
danielvegamyhre Nov 21, 2022
6741374
Update content/en/docs/concepts/workloads/controllers/job.md
danielvegamyhre Nov 21, 2022
b77d098
address comments
danielvegamyhre Nov 21, 2022
bf4272c
clarify sentence
danielvegamyhre Nov 21, 2022
1c001eb
move minikube note to prereqs
danielvegamyhre Nov 21, 2022
ea4b322
address comments
danielvegamyhre Nov 21, 2022
a23d3ab
captitalize all instances of Job
danielvegamyhre Nov 21, 2022
fb02aa3
move minikube notes to bottom of prereqs
danielvegamyhre Nov 21, 2022
e26da91
address comments
danielvegamyhre Nov 21, 2022
59879c0
update example
danielvegamyhre Nov 21, 2022
443e5e8
fix typo
danielvegamyhre Nov 21, 2022
a97339f
update phrasing
danielvegamyhre Nov 21, 2022
2863800
link to this from the completion modes section of the job docs
danielvegamyhre Nov 21, 2022
38114e5
address phrasing comments
danielvegamyhre Nov 22, 2022
34acaeb
add newlines to break up block of text
danielvegamyhre Nov 22, 2022
340520f
update phrasing
danielvegamyhre Nov 22, 2022
7575e90
update phrasing
danielvegamyhre Nov 22, 2022
52349df
Update content/en/docs/concepts/workloads/controllers/job.md
danielvegamyhre Nov 30, 2022
fbc586d
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
5a8b396
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
4d5b1b1
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
67950c4
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
ae413dd
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
6c2f31c
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
60a65e7
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
1bd542e
Update content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
danielvegamyhre Nov 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 15 additions & 12 deletions content/en/docs/concepts/workloads/controllers/job.md
Original file line number Diff line number Diff line change
Expand Up @@ -461,12 +461,13 @@ The tradeoffs are:
The tradeoffs are summarized here, with columns 2 to 4 corresponding to the above tradeoffs.
The pattern names are also links to examples and more detailed description.

| Pattern | Single Job object | Fewer pods than work items? | Use app unmodified? |
| ----------------------------------------- |:-----------------:|:---------------------------:|:-------------------:|
| [Queue with Pod Per Work Item] | ✓ | | sometimes |
| [Queue with Variable Pod Count] | ✓ | ✓ | |
| [Indexed Job with Static Work Assignment] | ✓ | | ✓ |
| [Job Template Expansion] | | | ✓ |
| Pattern | Single Job object | Fewer pods than work items? | Use app unmodified? |
| ----------------------------------------------- |:-----------------:|:---------------------------:|:-------------------:|
| [Queue with Pod Per Work Item] | ✓ | | sometimes |
| [Queue with Variable Pod Count] | ✓ | ✓ | |
| [Indexed Job with Static Work Assignment] | ✓ | | ✓ |
| [Job Template Expansion] | | | ✓ |
| [Job with Pod-to-Pod Communication] | ✓ | sometimes | sometimes |

When you specify completions with `.spec.completions`, each Pod created by the Job controller
has an identical [`spec`](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status). This means that
Expand All @@ -477,17 +478,19 @@ are different ways to arrange for pods to work on different things.
This table shows the required settings for `.spec.parallelism` and `.spec.completions` for each of the patterns.
Here, `W` is the number of work items.

| Pattern | `.spec.completions` | `.spec.parallelism` |
| ----------------------------------------- |:-------------------:|:--------------------:|
| [Queue with Pod Per Work Item] | W | any |
| [Queue with Variable Pod Count] | null | any |
| [Indexed Job with Static Work Assignment] | W | any |
| [Job Template Expansion] | 1 | should be 1 |
| Pattern | `.spec.completions` | `.spec.parallelism` |
| ----------------------------------------------- |:-------------------:|:--------------------:|
| [Queue with Pod Per Work Item] | W | any |
| [Queue with Variable Pod Count] | null | any |
| [Indexed Job with Static Work Assignment] | W | any |
| [Job Template Expansion] | 1 | should be 1 |
| [Job with Pod-to-Pod Communication] | W | W |

[Queue with Pod Per Work Item]: /docs/tasks/job/coarse-parallel-processing-work-queue/
[Queue with Variable Pod Count]: /docs/tasks/job/fine-parallel-processing-work-queue/
[Indexed Job with Static Work Assignment]: /docs/tasks/job/indexed-parallel-processing-static/
[Job Template Expansion]: /docs/tasks/job/parallel-processing-expansion/
[Job with Pod-to-Pod Communication]: /docs/tasks/job/job-with-pod-to-pod-communication/
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this line, the references to "Job with Pod-to-Pod Communication" will show up as string literals, rather than as links. Can you clarify why you want to remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 267 is already a complete link.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the solution is rather the opposite: remove the complete link from line 267

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em... both are valid markdown syntax. The full link syntax is preferred over the out-of-band link syntax because we have scripts to scan bad links and that script is not good at handling the out-of-band syntax.


## Advanced usage

Expand Down
111 changes: 111 additions & 0 deletions content/en/docs/tasks/job/job-with-pod-to-pod-communication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
title: Job with Pod-to-Pod Communication
content_type: task
min-kubernetes-server-version: v1.21
weight: 30
---

<!-- overview -->

In this example, we will run a Job in [Indexed completion mode](https://kubernetes.io/blog/2021/04/19/introducing-indexed-jobs/) configured such that
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
the pods created by the Job can communicate with each other using pod hostnames rather than pod IPs.
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved

Pods within a Job might need to communicate among themselves. They could query the Kubernetes API
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
to learn the IPs of the other Pods, but it's much simpler to rely on Kubernetes' built-in DNS resolution.
Jobs in Indexed completion mode automatically set the pods' hostname to be in the format of
`${jobName}-${completionIndex}`, which can be used to deterministically determine
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
pod hostnames and enable pod communication *without* needing to create a client connection to
the Kubernetes control plane to obtain pod hostnames/IPs via API requests. This can be useful
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
for use cases where pod networking is required but we don't want to depend on a network
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
connection with the Kubernetes API server.

## {{% heading "prerequisites" %}}

You should already be familiar with the basic use of [Job](/docs/concepts/workloads/controllers/job/).

{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}

{{<note>}} If you are using MiniKube or a similar tool, you may need to take [extra steps](https://minikube.sigs.k8s.io/docs/handbook/addons/ingress-dns/) to ensure you have DNS. {{</note>}}
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved

<!-- steps -->

## Starting a Job with Pod-to-Pod Communication

To enable pod-to-pod communication using pod hostnames in a Job, you must do the following:

1. Set up a [headless service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services)
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
with a valid label selector for the pods created by your Job. This will trigger
DNS to create records of the hostnames of
the pods running your Job (note that the headless service must be in the same namespace as
the Job). One easy way to do this is to use the `job-name: <your-job-name>` selector, since the `job-name` label will be automatically added by Kubernetes.
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved

2. Include the following your Job template spec: `subdomain: <headless-svc-name>`
where `<headless-svc-name>` must match the name of your headless service
exactly.
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved

### Example
Below is a working example of a Job with pod-to-pod communication via pod hostnames enabled.
The Job completes only after all pods successfully ping each other using hostnames.
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved

{{<note>}} In the Bash script executed on each pod in the example below, the pod hostnames can be prefixed by the namespace as well
if the pod needs to be reached from outside the namespace. {{</note>}}
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved

```yaml

apiVersion: v1
kind: Service
metadata:
name: headless-svc
spec:
clusterIP: None # clusterIP must be None to create a headless service
selector:
job-name: example-job # must match Job name
---
apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
completions: 3
parallelism: 3
completionMode: Indexed
template:
spec:
subdomain: headless-svc # has to match Service name
restartPolicy: Never
containers:
- name: example-workload
image: bash:latest
command:
- bash
- -c
- |
for i in 0 1 2
do
gotStatus="-1"
wantStatus="0"
while [ $gotStatus -ne $wantStatus ]
do
ping -c 1 example-job-${i}.headless-svc > /dev/null 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be prefixed by the namespace as well if the pod needs to be reached from outside the namespace.

Copy link
Member Author

@danielvegamyhre danielvegamyhre Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note about this above the example, let me know if that is what you had in mind

gotStatus=$?
if [ $gotStatus -ne $wantStatus ]; then
echo "Failed to ping pod example-job-${i}.headless-svc, retrying in 1 second..."
sleep 1
fi
done
echo "Successfully pinged pod: example-job-${i}.headless-svc"
done
```

After applying the example above, pods will be able to reach each other over the network
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
using: `<pod-hostname>.<headless-service-name>`. You should see output similar to the following:
```
$ kubectl logs example-job-0-qws42
Failed to ping pod example-job-0.headless-svc, retrying in 1 second...
Successfully pinged pod: example-job-0.headless-svc
Successfully pinged pod: example-job-1.headless-svc
Successfully pinged pod: example-job-2.headless-svc
```
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
{{<note>}} It is important note that the `<pod-hostname>.<headless-service-name>` name format used
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved
in this example would not work with DNS policy set to `None` or `Default`. You can learn more about pod
DNS policies [here](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy). {{</note>}}
danielvegamyhre marked this conversation as resolved.
Show resolved Hide resolved