Skip to content

Commit

Permalink
docs(flow): clean up (#5255)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexcg1 authored Oct 10, 2022
1 parent 573f607 commit bcf17c3
Show file tree
Hide file tree
Showing 8 changed files with 271 additions and 293 deletions.
120 changes: 60 additions & 60 deletions docs/fundamentals/flow/add-executors.md

Large diffs are not rendered by default.

43 changes: 21 additions & 22 deletions docs/fundamentals/flow/create-flow.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
(flow)=
# Basic
# Basics


{class}`~jina.Flow` defines how your Executors are connected together and how your data *flows* through them.
A {class}`~jina.Flow` defines how your Executors are connected together and how your data *flows* through them.


## Create
Expand Down Expand Up @@ -32,14 +32,14 @@ An empty Flow contains only {ref}`the Gateway<flow>`.
:scale: 70%
```

For production, it is recommended to define the Flows with YAML. This is because YAML files are independent of Python logic code and easy to maintain.
For production, you should define your Flows with YAML. This is because YAML files are independent of the Python logic code and easier to maintain.


### Conversion between Python and YAML

Python Flow definition can be easily converted to/from YAML definition.
A Python Flow definition can be easily converted to/from a YAML definition.

To load a Flow from a YAML file, use the {meth}`~jina.Flow.load_config`:
To load a Flow from a YAML file, use {meth}`~jina.Flow.load_config`:

```python
from jina import Flow
Expand All @@ -61,12 +61,12 @@ f.save_config('flow.yml')

When a {class}`~jina.Flow` starts, all its {ref}`added Executors <flow-add-executors>` will start as well, making it possible to {ref}`reach the service through its API <access-flow-api>`.

There are three ways to start a Flow. Depending on the use case, you can start a Flow either in Python, or from a YAML file, or from the terminal.
There are three ways to start a Flow: In Python, from a YAML file, or from the terminal.

- Generally in Python: use Flow as a context manager in Python.
- As an entrypoint from terminal: use Jina CLI and a Flow YAML.
- As an entrypoint from terminal: use `Jina CLI <cli>` and a Flow YAML file.
- As an entrypoint from Python code: use Flow as a context manager inside `if __name__ == '__main__'`
- No context manager: manually call {meth}`~jina.Flow.start` and {meth}`~jina.Flow.close`.
- No context manager: manually call {meth}`~jina.Flow.start` and {meth}`~jina.Flow.close`.


````{tab} General in Python
Expand Down Expand Up @@ -119,14 +119,14 @@ A successful start of a Flow looks like this:
:scale: 70%
```

Your addresses and entrypoints can be found in the output. When enabling more features such as monitoring, HTTP gateway, TLS encryption, this display will also expand to contain more information.
Your addresses and entrypoints can be found in the output. When you enable more features such as monitoring, HTTP gateway, TLS encryption, this display expands to contain more information.


### Set multiprocessing `spawn`

Some cornet cases require to force `spawn` start method for multiprocessing, e.g. if you encounter "Cannot re-initialize CUDA in forked subprocess".
Some corner cases require forcing a `spawn` start method for multiprocessing, for example if you encounter "Cannot re-initialize CUDA in forked subprocess".

You may try `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.
You can use `JINA_MP_START_METHOD=spawn` before starting the Python script to enable this.

```bash
JINA_MP_START_METHOD=spawn python app.py
Expand All @@ -139,8 +139,7 @@ There's no need to set this for Windows, as it only supports spawn method for mu
## Serve forever

In most scenarios, a Flow should remain reachable for prolonged periods of time.
This can be achieved by `jina flow --uses flow.yml` from terminal.

This can be achieved by `jina flow --uses flow.yml` from the terminal.

Or if you are serving a Flow from Python:

Expand All @@ -153,7 +152,7 @@ with f:
f.block()
```

The `.block()` method blocks the execution of the current thread or process, which enables external clients to access the Flow.
The `.block()` method blocks the execution of the current thread or process, enabling external clients to access the Flow.

In this case, the Flow can be stopped by interrupting the thread or process.

Expand Down Expand Up @@ -186,7 +185,7 @@ e.set() # set event and stop (unblock) the Flow

## Visualize

A {class}`~jina.Flow` has a built-in `.plot()` function which can be used to visualize a `Flow`:
A {class}`~jina.Flow` has a built-in `.plot()` function which can be used to visualize the `Flow`:
```python
from jina import Flow

Expand All @@ -210,13 +209,13 @@ f.plot('flow-2.svg')
:width: 70%
```

One can also do it in the terminal via:
You can also do it in the terminal:

```bash
jina export flowchart flow.yml flow.svg
```

One can also visualize a remote Flow by passing the URL to `jina export flowchart`.
You can also visualize a remote Flow by passing the URL to `jina export flowchart`.

## Export

Expand All @@ -230,7 +229,7 @@ f = Flow().add()
f.to_docker_compose_yaml()
```

One can also do it in the terminal via:
You can also do it in the terminal:

```shell
jina export docker-compose flow.yml docker-compose.yml
Expand All @@ -250,16 +249,16 @@ f = Flow().add()
f.to_kubernetes_yaml('flow_k8s_configuration')
```

One can also do it in the terminal via:
You can also do it in the terminal:

```shell
jina export kubernetes flow.yml ./my-k8s
```

This will generate the necessary Kubernetes configuration files for all the {class}`~jina.Executor`s of the Flow.
This generates the Kubernetes configuration files for all the {class}`~jina.Executor`s in the Flow.
The generated folder can be used directly with `kubectl` to deploy the Flow to an existing Kubernetes cluster.

For an advance utilisation of Kubernetes with jina please refer to this {ref}`How to <kubernetes>`
For advanced utilisation of Kubernetes with Jina please refer to {ref}`How to <kubernetes>`


```{tip}
Expand All @@ -270,7 +269,7 @@ If you do not wish to rebuild the image, set the environment variable `JINA_HUB_

```{admonition} See also
:class: seealso
For more in-depth guides on Flow deployment, take a look at our how-tos for {ref}`Docker compose <docker-compose>` and
For more in-depth guides on Flow deployment, check our how-tos for {ref}`Docker compose <docker-compose>` and
{ref}`Kubernetes <kubernetes>`.
```

77 changes: 38 additions & 39 deletions docs/fundamentals/flow/health-check.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Readiness & health check
A Jina {class}`~jina.Flow` consists of {ref}`a Gateway and Executors<architecture-overview>`,
each of which have to be healthy before the Flow is ready to receive requests.
all of which have to be healthy before the Flow is ready to receive requests.

A Flow is marked as "ready", when all its Executors and its Gateway are fully loaded and ready.

Each Executor provides a health check in the form of a [standardized gRPC endpoint](https://github.com/grpc/grpc/blob/master/doc/health-checking.md) that exposes this information to the outside world.
This means that health checks can automatically be performed by Jina itself as well as external tools like Docker Compose, Kubernetes service meshes, or load balancers.
This means health checks can be automatically performed by Jina itself, as well as external tools like Docker Compose, Kubernetes service meshes, or load balancers.


## Readiness of a Flow
## Flow Readiness

In most cases, it is most useful to check if an entire Flow is ready to accept requests.
In most cases, it is useful to check if an entire Flow is ready to accept requests.
To enable this readiness check, the Jina Gateway can aggregate health check information from all services and provides
a readiness check endpoint for the complete Flow.


<!-- start flow-ready -->

{class}`~jina.Client` offer a convenient API to query these readiness endpoints. You can call {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready` or {meth}`~jina.Flow.is_flow_ready`, it will return `True` if the Flow is ready, and `False` when it is not.
{class}`~jina.Client` offers an API to query these readiness endpoints. You can call {meth}`~jina.clients.mixin.HealthCheckMixin.is_flow_ready` or {meth}`~jina.Flow.is_flow_ready`. It returns `True` if the Flow is ready, and `False` if it is not.

````{tab} via Flow
```python
Expand Down Expand Up @@ -115,7 +115,7 @@ WARNI… JINA@92986 message lost 100% (3/3)

### Flow status using third-party clients

You can check the status of a Flow using any gRPC/HTTP/Websocket client, not just Jina's Client implementation.
You can check the status of a Flow using any gRPC/HTTP/WebSockets client, not just Jina's Client implementation.

To see how this works, first instantiate the Flow with its corresponding protocol and block it for serving:

Expand Down Expand Up @@ -149,7 +149,7 @@ DEBUG Flow@19059 2 Deployments (i.e. 2 Pods) are running in this Flow

#### Using gRPC

When using grpc, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) to hit the Gateway's gRPC service that is responsible for reporting the Flow status.
When using grpc, use [grpcurl](https://github.com/fullstorydev/grpcurl) to access the Gateway's gRPC service that is responsible for reporting the Flow status.

```shell
docker pull fullstorydev/grpcurl:latest
Expand All @@ -166,7 +166,7 @@ You can simulate an Executor going offline by killing its process.
kill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059
```

Then by doing the same check, you will see that it returns an error:
Then by doing the same check, you can see that it returns an error:

```shell
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina.JinaGatewayDryRunRPC/dry_run
Expand Down Expand Up @@ -209,40 +209,39 @@ docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 jina
````


#### Using HTTP or Websocket
#### Using HTTP or WebSockets

When using HTTP or Websocket as the Gateway protocol, you can use curl to target the `/dry_run` endpoint and get the status of the Flow.
When using HTTP or WebSockets as the Gateway protocol, use curl to target the `/dry_run` endpoint and get the status of the Flow.


```shell
curl http://localhost:12345/dry_run
```
The error-free output below signifies a correctly running Flow:
Error-free output signifies a correctly running Flow:
```json
{"code":0,"description":"","exception":null}
```

You can simulate an Executor going offline by killing its process.
You can simulate an Executor going offline by killing its process:

```shell script
kill -9 $EXECUTOR_PID # in this case we can see in the logs that it is 19059
```

Then by doing the same check, you will see that the call returns an error:
Then by doing the same check, you can see that the call returns an error:

```json
{"code":1,"description":"failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.","exception":{"name":"InternalNetworkError","args":["failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down."],"stacks":["Traceback (most recent call last):\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 726, in task_wrapper\n timeout=timeout,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 241, in send_requests\n await call_result,\n"," File \"/home/joan/.local/lib/python3.7/site-packages/grpc/aio/_call.py\", line 291, in __await__\n self._cython_call._status)\n","grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"failed to connect to all addresses\"\n\tdebug_error_string = \"{\"created\":\"@1654074272.702044542\",\"description\":\"Failed to pick subchannel\",\"file\":\"src/core/ext/filters/client_channel/client_channel.cc\",\"file_line\":3134,\"referenced_errors\":[{\"created\":\"@1654074272.702043378\",\"description\":\"failed to connect to all addresses\",\"file\":\"src/core/lib/transport/error_utils.cc\",\"file_line\":163,\"grpc_status\":14}]}\"\n>\n","\nDuring handling of the above exception, another exception occurred:\n\n","Traceback (most recent call last):\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 142, in _flow_health\n data_type=DataInputType.DOCUMENT,\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/http/app.py\", line 399, in _get_singleton_result\n async for k in streamer.stream(request_iterator=request_iterator):\n"," File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 78, in stream\n async for response in async_iter:\n"," File \"/home/joan/jina/jina/jina/serve/stream/__init__.py\", line 154, in _stream_requests\n response = self._result_handler(future.result())\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/request_handling.py\", line 148, in _process_results_at_end_gateway\n partial_responses = await asyncio.gather(*tasks)\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 128, in _wait_previous_and_send\n self._handle_internalnetworkerror(err)\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 70, in _handle_internalnetworkerror\n raise err\n"," File \"/home/joan/jina/jina/jina/serve/runtimes/gateway/graph/topology_graph.py\", line 125, in _wait_previous_and_send\n timeout=self._timeout_send,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 734, in task_wrapper\n num_retries=num_retries,\n"," File \"/home/joan/jina/jina/jina/serve/networking.py\", line 697, in _handle_aiorpcerror\n details=e.details(),\n","jina.excepts.InternalNetworkError: failed to connect to all addresses |Gateway: Communication error with deployment executor0 at address(es) {'0.0.0.0:12346'}. Head or worker(s) may be down.\n"],"executor":""}}
```

(health-check-microservices)=
## Health check of an Executor
## Executor health check

In addition to a performing a readiness check for the entire Flow, it is also possible to check every individual Executor in said Flow,
by utilizing a [standardized gRPC health check endpoint](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
You can check every individual Executor in a Flow, by using a [standard gRPC health check endpoint](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
In most cases this is not necessary, since such checks are performed by Jina, a Kubernetes service mesh or a load balancer under the hood.
Nevertheless, it is possible to perform these checks as a user.
Nevertheless, you can perform these checks yourself.

When performing these checks, you can expect on of the following `ServingStatus` responses:
When performing these checks, you can expect one of the following `ServingStatus` responses:
- **`UNKNOWN` (0)**: The health of the Executor could not be determined
- **`SERVING` (1)**: The Executor is healthy and ready to receive requests
- **`NOT_SERVING` (2)**: The Executor is *not* healthy and *not* ready to receive requests
Expand All @@ -264,7 +263,7 @@ with f:
f.block()
```

On another terminal, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) to send RPC requests to your services.
In another terminal, you can use [grpcurl](https://github.com/fullstorydev/grpcurl) to send gRPC requests to your services.

```shell
docker pull fullstorydev/grpcurl:latest
Expand All @@ -278,18 +277,18 @@ docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12346 grpc
```

(health-check-gateway)=
## Health check of the Gateway
## Gateway health check

Just like each individual Executor, the Gateway also exposes a health check endpoint.
Just like each individual Executors, the Gateway also exposes a health check endpoint.

In contrast to Executors however, a Gateway can use gRPC, HTTP, or Websocket, and the health check endpoint changes accordingly.
In contrast to Executors however, a Gateway can use gRPC, HTTP, or WebSocketss, and the health check endpoint changes accordingly.


#### Gateway health check with gRPC

When using gRPC as the protocol to communicate with the Gateway, the Gateway uses the exact same mechanism as Executors to expose its health status: It exposes the [ standard gRPC health check](https://github.com/grpc/grpc/blob/master/doc/health-checking.md) to the outside world.
When using gRPC as the protocol to communicate with the Gateway, the Gateway uses the exact same mechanism as Executors to expose its health status: It exposes the [standard gRPC health check](https://github.com/grpc/grpc/blob/master/doc/health-checking.md) to the outside world.

With the same Flow as described before, you can use the same way to check the Gateway status:
With the same Flow as before, you can use the same way to check the Gateway status:

```bash
docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 grpc.health.v1.Health/Check
Expand All @@ -302,18 +301,18 @@ docker run --network='host' fullstorydev/grpcurl -plaintext 127.0.0.1:12345 grpc
```


#### Gateway health check with HTTP or Websocket
#### Gateway health check with HTTP or WebSockets

````{admonition} Caution
:class: caution
For Gateways running with HTTP or Websocket, the gRPC health check response codes outlined {ref}`above <health-check-microservices>` do not apply.
For Gateways running with HTTP or WebSockets, the gRPC health check response codes outlined {ref}`above <health-check-microservices>` do not apply.
Instead, an error free response signifies healthiness.
````

When using HTTP or Websocket as the protocol for the Gateway, it exposes the endpoint `'/'` that one can query to check the status.
When using HTTP or WebSockets as the protocol for the Gateway, you can query the endpoint `'/'` to check the status.

First, crate a Flow with HTTP or Websocket protocol:
First, crate a Flow with HTTP or WebSockets protocol:

```python
from jina import Flow
Expand All @@ -322,21 +321,21 @@ f = Flow(protocol='http', port=12345).add()
with f:
f.block()
```
Then, you can query the "empty" endpoint:
Then query the "empty" endpoint:
```bash
curl http://localhost:12345
```

And you will get a valid empty response indicating the Gateway's ability to serve.
You get a valid empty response indicating the Gateway's ability to serve:
```json
{}
```

## Use jina ping to do health checks
## Use jina ping for health checks

Once a Flow is running, you can use `jina ping` CLI {ref}`CLI <../api/jina_cli>` to run readiness check of the complete Flow or of individual Executors or Gateway.
Once a Flow is running, you can use `jina ping` CLI {ref}`CLI <../api/jina_cli>` to run a readiness check of the complete Flow or of individual Executors or Gateway.

Let's start a Flow in the terminal by executing the following python code:
Start a Flow in Python:

```python
from jina import Flow
Expand All @@ -345,32 +344,32 @@ with Flow(protocol='grpc', port=12345).add(port=12346) as f:
f.block()
```

We can check the readiness of the Flow:
Check the readiness of the Flow:

```bash
jina ping flow grpc://localhost:12345
```

Also we can check the readiness of an Executor:
You can also check the readiness of an Executor:

```bash
jina ping executor localhost:12346
```

or the readiness of the Gateway service:
...or the readiness of the Gateway service:

```bash
jina ping gateway grpc://localhost:12345
```

When these commands succeed, you will see something like:
When these commands succeed, you should see something like:

```text
INFO JINA@28600 readiness check succeeded 1 times!!!
```

```admonition Use it in Kubernetes
```admonition Use in Kubernetes
:class: note
This CLI exits with code 1 when the readiness check is not successful, which makes it a good choice to be used as readinessProbe for Executor and Gateway when
The CLI exits with code 1 when the readiness check is not successful, which makes it a good choice to be used as readinessProbe for Executor and Gateway when
deployed in Kubernetes.
```
Loading

0 comments on commit bcf17c3

Please sign in to comment.