Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

052-load-balancer-proxies-varnish.md #2267

Merged
merged 1 commit into from
Aug 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 48 additions & 48 deletions docs/books/web_services/052-load-balancer-proxies-varnish.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
author: Antoine Le Morvan
contributors:
contributors: Ganna Zhyrnova
title: Part 5.2 Varnish
---

## Varnish

In this chapter, you will learn about the web accelerator proxy cache : Varnish.
This chapter will teach you about the web accelerator proxy cache: Varnish.

****

**Objectives**: In this chapter, you will learn how to:
**Objectives**: You will learn how to:

:heavy_check_mark: Install and configure Varnish;
:heavy_check_mark: Cache the content of a website.
Expand All @@ -26,30 +26,30 @@ In this chapter, you will learn about the web accelerator proxy cache : Varnish.

### Generalities

Varnish is an HTTP reverse-proxy-cache service, or a website accelerator.
Varnish is an HTTP reverse-proxy-cache service or a website accelerator.

Varnish receives HTTP requests from visitors:

* if the response to the cached request is available, it returns the response directly to the client from the server's memory,
* if it does not have the response, Varnish addresses the web server. Varnish then sends the request to the web server, retrieves the response, stores it in its cache and responds to the client.
* if it does not have the response, Varnish addresses the web server. Varnish then sends the request to the web server, retrieves the response, stores it in its cache, and responds to the client.

Providing the response from the in-memory cache improves response times for clients. In this case, there is no access to physical disks.
Responding from the in-memory cache improves response times for clients. In this case, there is no access to physical disks.

By default, Varnish listens on port **6081** and uses **VCL** (**V**arnish **C**onfiguration **L**anguage) for its configuration. Thanks to VCL, it is possible to:

* Decide the content the client receives by way of transmission
* What the cached content is
* From what site, and how, modifications of the response occur.
* From what site and how do modifications of the response occur?

Varnish is extensible with VMOD modules (Varnish Modules).

#### Ensuring high availability

The use of several mechanisms ensure high availability throughout a web chain:
The use of several mechanisms ensures high availability throughout a web chain:

* if varnish is behind load balancers(LBs): as the LBs are generally in cluster mode, they are already in HA mode. A check from the LBs verifies varnish availability. If a varnish server no longer responds, it is automatically removed from the pool of available servers. In this case, varnish is in ACTIVE/ACTIVE mode.
* if varnish is not behind an LB cluster, clients address a VIP (see Heartbeat chapter) shared between the 2 varnishes. In this case, varnish is in ACTIVE/PASSIVE mode. If the active server is no longer available, the VIP switches to the second varnish node.
* When a backend is no longer available, you can remove it from the varnish backend pool, either automatically (with a health check), or manually in CLI mode (useful to ease upgrades or updates).
* If Varnish is behind load balancers(LBs), they are already in HA mode, as the LBs are generally in cluster mode. A check from the LBs verifies varnish availability. If a varnish server no longer responds, it is automatically removed from the pool of available servers. In this case, the Varnish is in ACTIVE/ACTIVE mode.
* if varnish is not behind an LB cluster, clients address a VIP (see Heartbeat chapter) shared between the 2 varnishes. In this case, varnish is in ACTIVE/PASSIVE mode. The VIP switches to the second varnish node if the active server is unavailable.
* When a backend is no longer available, you can remove it from the varnish backend pool, either automatically (with a health check) or manually in CLI mode (useful for easing upgrades or updates).

#### Ensuring scalability

Expand All @@ -60,43 +60,43 @@ If the backends are no longer sufficient to support the workload:

#### Facilitating scalability

During creation, a web page is made up of HTML (often dynamically generated by PHP) and more static resources (jpg, gif, css, js, and so on), it quickly becomes interesting to cache the resources that are cacheable (the static ones), which offloads a large number of requests from the backends.
A web page is often composed of HTML (often dynamically generated by PHP) and more static resources (JPG, gif, CSS, js, and so on) during creation. It quickly becomes interesting to cache the cacheable resources (the static ones), which offloads many requests from the backends.

!!! NOTE

It is possible to cache web pages (html, php, asp, jsp, etc.), but this is more complicated. You need to know the application and whether the pages are cacheable, which should be the case with a REST API, for example.
Caching web pages (HTML, PHP, ASP, JSP, etc.) is possible but more complicated. You need to know the application and whether the pages are cacheable, which should be true with a REST API.

When a client accesses a web server directly, the server must return the same image as many times as the clients requesting it. Once the client has received the image for the first time, it is cached on the browser side, depending on the configuration of the site and the web application.
When a client accesses a web server directly, the server must return the same image as often as the client requests. Once the client has received the image for the first time, it is cached on the browser side, depending on the configuration of the site and the web application.

When accessing the server behind a properly configured cache server, the first client requesting the image will result in an initial request to the backend, but caching of the image will occur for a certain period of time, and subsequent delivery is direct to other clients requesting the same resource.
When accessing the server behind a properly configured cache server, the first client requesting the image will initiate an initial backend request. However, caching of the image will occur for a certain period of time, and subsequent delivery will be directed to other clients requesting the same resource.

Although a well-configured browser-side cache reduces the number of requests to the backend, it is complementary to the use of a varnish proxy-cache.
Although a well-configured browser-side cache reduces the number of requests to the backend, it complements the use of a varnish proxy cache.

#### TLS certificate management

Varnish cannot communicate in HTTPS (and it is not its role to do so).

The certificate must therefore be either :
The certificate must, therefore, be either:

* carried by the LB when the flow passes through it (which is the recommended solution: centralization of the certificate, etc.). The flow then passes unencrypted through the data center
* carried by an Apache, Nginx or HAProxy service on the varnish server itself, which only acts as a proxy to the varnish (from port 443 to port 80). This solution is useful if accessing varnish directly.
* Similarly, Varnish cannot communicate with backends on port 443. When necessary, you need to use a nginx or apache reverse proxy to decrypt the request for varnish.
* carried by the LB when the flow passes through it (the recommended solution is to centralize the certificate, etc.). The flow then passes unencrypted through the data center.
* carried by an Apache, Nginx, or HAProxy service on the varnish server itself, which only acts as a proxy to the varnish (from port 443 to port 80). This solution is useful if accessing varnish directly.
* Similarly, Varnish cannot communicate with backends on port 443. When necessary, you need to use an Nginx or Apache reverse proxy to decrypt the request for varnish.

#### How it works

In a basic Web service, the client communicates directly with the service with TCP on port 80.

![How a standard website works](img/varnish_website.png)

To take advantage of the cache, the client must communicate with the web service on the default Varnish port 6081.
To use the cache, the client must communicate with the web service on the default Varnish port 6081.

![How Varnish works by default](img/varnish_website_with_varnish.png)

To make the service transparent to the client, you will need to change the default listening port for Varnish and the web service vhosts.
To make the service transparent to the client, you must change the default listening port for Varnish and the web service vhosts.

![Transparent implementation for the customer](img/varnish_website_with_varnish_port_80.png)

To provide an HTTPS service, you will need to add either a load balancer upstream of the varnish service or a proxy service on the varnish server, such as Apache, Nginx or HAProxy.
To provide an HTTPS service, add either a load balancer upstream of the varnish service or a proxy service on the varnish server, such as Apache, Nginx, or HAProxy.

### Configuration

Expand Down Expand Up @@ -151,13 +151,13 @@ $ sudo systemctl edit varnish.service
ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,512m
```

To specify a cache storage backend, you can specify the option several times. Possible storage types are `malloc` (cache in memory, then swap if needed), or `file` (create a file on disk, then map to memory). Sizes are expressed in K/M/G/T (kilobytes, megabytes, gigabytes or terabytes).
You can select the option several times to specify a cache storage backend. Possible storage types are `malloc` (cache in memory, then swap if needed), or `file` (create a file on disk, then map to memory). Sizes are expressed in K/M/G/T (kilobytes, megabytes, gigabytes, or terabytes).

#### Configuring the backends

Varnish uses a specific language called VCL for its configuration.

This involves compiling the VCL configuration file in C language. Restarting the service can occur if compilation is successful with no alarms.
This involves compiling the VCL configuration file in C. If compilation is successful with no alarms, the service can be restarted.

You can test the varnish configuration with the following command:

Expand All @@ -177,7 +177,7 @@ systemctl reload varnishd

!!! warning

A `systemctl restart varnishd` empties the varnish cache and causes a peak load on the backends. You should therefore avoid reloading `varnishd`.
A `systemctl restart varnishd` empties the varnish cache and causes a peak load on the backends. You should, therefore, avoid reloading `varnishd`.

!!! NOTE

Expand Down Expand Up @@ -223,9 +223,9 @@ sub vcl_deliver {
}
```

* **vcl_recv**: routine called before sending the request to the backend. In this routine, you can modify HTTP headers, cookies, choose the backend, and so on. See actions `set req`.
* **vcl_recv**: routine called before sending the request to the backend. In this routine, you can modify HTTP headers and cookies, choose the backend, etc. See actions `set req`.
* **vcl_backend_response**: routine called after reception of the backend response (`beresp` means BackEnd RESPonse). See `set bereq.` and `set beresp.` actions.
* **vcl_deliver**: This routine is useful for modifying Varnish output. If you need to modify the final object (add or remove a header, etc.), you can do so in `vcl_deliver`.
* **vcl_deliver**: This routine is useful for modifying Varnish output. If you need to modify the final object (e.g., add or remove a header), you can do so in `vcl_deliver`.

#### VCL operators

Expand All @@ -249,8 +249,8 @@ sub vcl_deliver {
The most frequent actions:

* **pass**: When returned, the request and subsequent response will come from the application server. No application of cache occurs. `pass` returns from the `vcl_recv` subroutine.
* **hash**: When returned from `vcl_recv`, Varnish will serve the content from the cache even if the configuration of request specifies passing without cache.
* **pipe**: Used to manage flows. In this case, Varnish will no longer inspect each request, but will let all bytes pass to the server. Websockets or video stream management, for example use `pipe`.
* **hash**: When returned from `vcl_recv`, Varnish will serve the content from the cache even if the request's configuration specifies passing without a cache.
* **pipe**: Used to manage flows. In this case, Varnish will no longer inspect each request but let all bytes pass to the server. Websockets or video stream management, for example, use `pipe`.
* **deliver**: Delivers the object to the client. Usually from the `vcl_backend_response` subroutine.
* **restart**: Restarts the request processing process. Retains modifications to the `req` object.
* **retry**: Transfers the request back to the application server. Used from `vcl_backend_response` or `vcl_backend_error` if the application response is unsatisfactory.
Expand Down Expand Up @@ -305,7 +305,7 @@ if (req.url ~ "/(login|admin)") {

Varnish never caches HTTP POST requests or requests containing cookies (whether from the client or the backend).

If the backend uses cookies, caching of content will not occur.
If the backend uses cookies, content caching will not occur.

To correct this behavior, you can unset the cookies in your requests:

Expand Down Expand Up @@ -394,9 +394,9 @@ sub vcl_recv {

#### Managing backends with CLI

Marking backends as **sick** or **healthy** is possible for administration or maintenance purposes. This action allows you to remove a node from the pool without having to modify the Varnish server configuration (and therefore without restarting it) or stop the backend service.
Marking backends as **sick** or **healthy** is possible for administration or maintenance purposes. This action allows you to remove a node from the pool without modifying the Varnish server configuration (without restarting it) or stopping the backend service.

View backend status : The `backend.list` command displays all backends, even those without a health check (probe).
View backend status: The `backend.list` command displays all backends, even those without a health check (probe).

```bash
$ varnishadm backend.list
Expand Down Expand Up @@ -426,7 +426,7 @@ site.front01 probe Healthy 5/5
site.front02 probe Healthy 5/5
```

To let varnish decide on the state of its backends, it is imperative to switch backends that have been manually switched to sick or healthy backends back to auto mode.
To let Varnish decide on the state of its backends, it is imperative to manually switch backends to sick or healthy backends and back to auto mode.

```bash
varnishadm backend.set_health site.front01 auto
Expand All @@ -436,7 +436,7 @@ Declaring the backends is done by following: <https://github.com/mattiasgeniar/v

### Apache logs

As the http service is reverse proxied, the web server will no longer have access to the client's IP address, but to that of the Varnish service.
As the HTTP service is reverse proxied, the web server will no longer have access to the client's IP address but to the Varnish service.

To take reverse proxy into account in Apache logs, change the format of the event log in the server configuration file:

Expand Down Expand Up @@ -464,7 +464,7 @@ if (req.restarts == 0) {

### Cache purge

A few requests to purge the cache :
A few requests to purge the cache:

on the command line:

Expand Down Expand Up @@ -521,9 +521,9 @@ sub vcl_recv {

### Log management

Varnish writes its logs in memory and in binary so as not to penalize its performance. When it runs out of memory space, it rewrites new records on top of old ones, starting from the beginning of its memory space.
Varnish writes its logs in memory and binary to not penalize its performance. When it runs out of memory space, it rewrites new records on top of old ones, starting from the beginning of its memory space.

It is possible to consult the logs with the `varnishstat` (statistics), `varnishtop` (top for Varnish), `varnishlog` (verbose logging) or `varnishnsca` (logs in NCSA format, like Apache) tools:
It is possible to consult the logs with the `varnishstat` (statistics), `varnishtop` (top for Varnish), `varnishlog` (verbose logging), or `varnishnsca` (logs in NCSA format, like Apache) tools:

```bash
varnishstat
Expand All @@ -539,13 +539,13 @@ varnishlog -q 'TxHeader eq MISS' -q "ReqHeader ~ '^Host: rockylinux\.org$'"
varnishncsa -q "ReqHeader eq 'X-Cache: MISS'"
```

`varnishlog` and `varnishnsca` daemons logs to disk independently of the `varnishd` daemon. The `varnishd` daemon continues to populate its logs in memory without penalizing performance towards clients, then the other daemons copy the logs to disk.
`varnishlog` and `varnishnsca` daemons logs to disk independently of the `varnishd` daemon. The `varnishd` daemon continues to populate its logs in memory without penalizing performance towards clients; then, the other daemons copy the logs to disk.

### Workshop

For this workshop, you will need one server with Apache services installed, configured, and secured, as described in the previous chapters.

You will configure a reverse-proxy cache in front of it.
You will configure a reverse proxy cache in front of it.

Your server has the following IP addresses:

Expand All @@ -561,7 +561,7 @@ $ cat /etc/hosts
192.168.1.10 server1 server1.rockylinux.lan
```

#### Task 1 : Installation and configuration of Apache
#### Task 1: Installation and configuration of Apache

```bash
sudo dnf install -y httpd mod_ssl
Expand Down Expand Up @@ -589,7 +589,7 @@ Content-Length: 54
Content-Type: text/html; charset=UTF-8
```

#### Task 2 : Install varnish
#### Task 2: Install varnish

```bash
sudo dnf install -y varnish
Expand All @@ -598,7 +598,7 @@ sudo firewall-cmd --permanent --add-port=6081/tcp --permanent
sudo firewall-cmd --reload
```

#### Task 3 : Configure Apache as a backend
#### Task 3: Configure Apache as a backend

Modify `/etc/varnish/default.vcl` to use apache (port 80) as backend:

Expand Down Expand Up @@ -632,9 +632,9 @@ $ curl http://server1.rockylinux.lan:6081

As you can see, Apache serves the index page.

Some headers have been added, giving us information that our request was handled by varnish (header `Via`), and the cached time of the page (header `Age`), giving us the information that our page was served directly from the varnish memory instead of from the disk with Apache.
Some headers have been added, giving us information that our request was handled by varnish (header `Via`) and the cached time of the page (header `Age`), which tells us that our page was served directly from the varnish memory instead of from the disk with Apache.

#### Task 4 : Remove some headers
#### Task 4: Remove some headers

We will remove some headers that can give unneeded information to hackers.

Expand Down Expand Up @@ -676,11 +676,11 @@ Accept-Ranges: bytes
Connection: keep-alive
```

As you can see, removal of the unwanted headers occurs, while adding the necessary one (to troubleshoot for example).
As you can see, removing the unwanted headers occurs while adding the necessary ones (to troubleshoot).

### Conclusion

You now have all the knowledge you need to set up a basic cache server and start adding functionality.
You now have all the knowledge you need to set up a primary cache server and add functionality.

Having a varnish server in your infrastructure can be very useful for many things besides caching: for backend server security, for handling headers, for facilitating updates (blue/green or canary mode, for example), etc.

Expand All @@ -691,7 +691,7 @@ Having a varnish server in your infrastructure can be very useful for many thing
* [ ] True
* [ ] False

:heavy_check_mark: Does the varnish cache have to be stored in memory??
:heavy_check_mark: Does the varnish cache have to be stored in memory?

* [ ] True
* [ ] False
Loading