Skip to content

Latest commit

 

History

History
313 lines (237 loc) · 14.1 KB

config-billing.md

File metadata and controls

313 lines (237 loc) · 14.1 KB

CHAPTER 14. THE BILLING SERVICE

dCache has built-in monitoring capabilities which provide an overview of the activity and performance of the installation’s doors and pools. There are two options for how this data can be represented and stored:

  • a set of log files written to a known location
  • a database (the billing database).

These options can be enabled simultaneously. If the database option is selected, the data in those tables will also be displayed as a set of histogram plots on the installation's web page.


[TOC bullet hierarchy]

The Billing log files

If you installed dCache following the instructions in the Chapter Installing dCache you enabled the BILLING in the domain where the HTTPD service is running (see the extract of the layout file).

[httpdDomain]
[httpdDomain/billing]
[httpdDomain/httpd]

Use the property billing.text.dir to set the location of the log files and the property billing.enable.text to control whether the plain-text log files are generated. To write the logs in JSON format instead of plain text, use the property billing.format.json=true.

By default the log files are located in the directory /var/lib/dcache/billing. Under this directory the log files are organized in a tree data structure based on date (YYYY/MM). A separate file is generated for errors. The log file and the error file are tagged with the date.

Example:

    log file: /var/lib/dcache/billing/##TODAY_YEAR##/09/billing-##TODAY_YEAR##.09.25
    error file: /var/lib/dcache/billing/##TODAY_YEAR##/09/billing-error-##TODAY_YEAR##.09.25

The log files may contain information about the time, the pool, the pnfsID and size of the transferred file, the storage class, the actual number of bytes transferred, the number of milliseconds the transfer took, the protocol, the subject (identity of the user given as a collection of principals), the data transfer listen port, the return status and a possible error message. The logged information depends on the protocol.

A log entry for a write operation has the default format:

<MM.dd> <HH:mm:ss> [pool:<pool-name>:transfer]
[<pnfsId>,<filesize>] [<path>]
<StoreName>:<StorageGroup>@<type-of-storage-system>
<transferred-bytes>  <connectionTime> <true/false> {<protocol>}
<initiator>  {<return-status>:"<error-message>"}

Example: A typical logging entry would look like this for writing. In the log file each entry is in one line. For readability we split it into separate lines in this documentation.:

12.10 14:19:42 [pool:pool2@poolDomain-1:transfer]
[0000062774D07847475BA78AC99C60F2C2FC,10475] [Unknown]
<Unknown>:<Unknown>@osm 10475 40 true {GFtp-1.0 131.169.72.103 37850}
[door:WebDAV-example.org@webdavDomain:1355145582248-1355145582485] {0:""}

The formatting of the log messages can be customized by redefining the <billing.format.someInfoMessage> properties in the layout configuration, where <billing.format.someInfoMessage> can be replaced by

  • billing.text.format.mover-info-message
  • billing.text.format.remove-file-info-message
  • billing.text.format.door-request-info-message
  • billing.text.format.storage-info-message

A full explanation of the formatting is given in the /usr/share/dcache/defaults/billing.properties file. For syntax questions please consult StringTemplate v3 documentation or the cheat sheet.

On the web page generated by the httpd service (default port 2288), there is a link to Action Log. The table which appears there gives a summary overview extracted from the data contained in the billing log files.

Billing High Availability

To replicate the billing service, the underlying store should be shared, otherwise one risks potentially dispersing text records over several nodes. Hence, a shared rdbms database instance should be enabled. Absent a database, enabling kafka may offer an alternative to centralized record-keeping without the bottleneck of a single dCache service.

The Billing database

In order to enable the database, the following steps must be taken.

  1. If the billing database does not already exist (see further below on migrating from an existing one), create it (we assume PSQL here):

    createdb -O dcache -U postgres billing
    

    If you are using a version of PostgreSQL prior to 8.4, you will also need to do:

    createlang -U dcache plpgsql billing
    

    No further manual preparation is needed, as the necessary tables, indices, functions and triggers will automatically be generated when you (re)start the domain with the billing database logging turned on (see below).

  2. The property billing.enable.db controls whether the billing cell sends billing messages to the database. By default the option is disabled. To activate, set the value to true and restart the domain.

Customizing the database

In most cases, the billing service will be run out-of-the-box; nevertheless, the administrator does have control, if this is desired, over the database configuration.

  • Database name, host, user, and password can be easily modified using the properties:

    • billing.db.name
    • billing.db.host
    • billing.db.user
    • billing.db.password

    The current database values can be checked with the dcache database ls command.

    dcache database ls
    |DOMAIN          CELL        DATABASE HOST      USER    MANAGEABLE AUTO
    |namespaceDomain PnfsManager chimera  localhost dcache  Yes        Yes
    |namespaceDomain cleaner     chimera  localhost dcache  No         No
    |billingDomain   billing     billing  localhost dcache  Yes        Yes
    
  • Database inserts are batched for performance. Since 2.8, improvements have been made to the way the billing service handles these inserts, which can now be tuned by adjusting the queue sizes (there are four of them, each mapped to the four main tables: billinginfo, storageinfo, doorinfo, hitinfo), and the maximum database batch size.

    • billing.db.inserts.max-queue-size (defaults to 100000 )
    • billing.db.inserts.max-batch-size (defaults to 1000 )

    There is further the option as to whether to drop messages (default is true) or block when the queue maximum is exceeded.

    • billing.db.inserts.drop-messages-at-limit (defaults to true )

    The default settings should usually be sufficient.

    You can now obtain statistics (printed to the billing log and pinboard) via the dcache admin command: display insert statistics <on/off> command. Activating this command logs the following once a minute:

                insert queue (last 0, current 0, change 0/minute)
                commits (last 0, current 0, change 0/minute)
                dropped (last 0, current 0, change 0/minute)
                total memory 505282560; free memory 482253512
    

    "insert queue" refers to how many messages actually were put on the queue; "commits" are the number of messages committed to the database; "dropped" are the number of lost messages. "last" refers to the figures at the last iteration. For insert queue, this is the actual size of the queue; for commits and dropped, these are cumulative totals.

    You can also generate a Java thread dump by issuing the "dump threads" command.

Database automatic truncation of fine-grained tables

It may be useful to limit the growth of the billing database, which can fill up quickly over the course of time if there is a lot of door activity.

A built-in cron can be set to run every 24 hours in order to remove older rows from the "fine-grained" tables (billinginfo, doorinfo, storageinfo, hitinfo).

The following properties are relevant:

billing.enable.db-truncate (default = false)

billing.db.fine-grained-truncate-before (default = 365)
billing.db.fine-grained-truncate-before.unit (default = DAYS)

Billing histogram data

If the database has been enabled, dCache's frontend will regularly collect histograms from it, and make these available via several RESTful API calls. Please refer to the frontend documentation for further infomartion.

Similarly, dCache-View (the web interface) includes a publicly available tab for plots generated from this data. These provide aggregate views of the data for 24-hour, 7-day, 30-day and 365-day periods.

The plot types are:

  • (Giga)bytes read and written for both dCache and HSM backend (if any)

  • Number of transactions/transfers for both dCache and HSM backend (if any)

  • Maximum, minimum and average connection time

  • Cache hits and misses

    NOTE

    The data for this last histogram is not automatically sent, since it contributes significantly to message traffic between the pool manager and the billing service. To store this data (and thus generate the relevant plots), the poolmanager.enable.cache-hit-message property must be set either in dcache.conf or in the layout file for the domain where the poolmanager runs:

    poolmanager.enable.cache-hit-message=true
    

Billing records

The frontend also provides an API for viewing the billing records for a given file. This data is also available through dCache-View. Once again, please consult those parts of the documentation for further information.

Upgrading a previous installation

Because it is possible that the newer version may be deployed over an existing installation which already uses the billing database, the Liquibase change-set has been written in such a way as to look for existing tables and to modify them only as necessary.

If you start the domain containing the billing service over a pre-existing installation of the billing database, depending on what was already there, you may observe some messages like the following in the domain log having to do with the logic governing table initialization.

Example:

   INFO 8/23/12 10:35 AM:liquibase: Successfully acquired change log lock
   INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
   INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
   INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
   INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
   INFO 8/23/12 10:35 AM:liquibase: Successfully acquired change log lock
   INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
   INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
   INFO 8/23/12 10:35 AM:liquibase: ChangeSet org/dcache/services/billing/
   db/sql/billing.changelog-1.9.13.xml::4.1.7::arossi ran successfully in 264ms
   INFO 8/23/12 10:35 AM:liquibase: Marking ChangeSet: org/dcache/services/
   billing/db/sql/billing.changelog-1.9.13.xml::4.1.8::arossi::(Checksum:
   3:faff07731c4ac867864824ca31e8ae81) ran despite precondition failure due
   to onFail='MARK_RAN': classpath:org/dcache/services/billing/db/sql/
   billing.changelog-master.xml : SQL Precondition failed. Expected '0' got '1'
   INFO 8/23/12 10:35 AM:liquibase: ChangeSet org/dcache/services/billing/db/sql/
   billing.changelog-1.9.13.xml::4.1.9::arossi ran successfully in 14ms
   INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
   INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock

Anything logged at a level lower than ERROR is usually entirely normal. Liquibase regularly reports when the preconditions determining whether it needs to do something are not met. All this means is that the update step was not necessary and it will be skipped in the future.

If, on the other hand, there is an ERROR logged by Liquibase, it is possible there may be some other conflict resulting from the upgrade (this should be rare). Such an error will block the domain from starting. One remedy which often works in this case is to do a clean re-initialization by dropping the Liquibase tables from the database:

psql -U dcache billing
|
|billing=> drop table databasechangelog
|billing=> drop table databasechangeloglock
|billing-> \q

and then restarting the domain.

NOTE

If the billing database already exists, but contains tables other than the following:

psql -U dcache billing
|    billing=> \dt
|                         List of relations
|     Schema	|         Name          | Type  |   Owner
|     -------+-----------------------+-------+-----------
|     public	| billinginfo           | table | dcache
|     public	| billinginfo_rd_daily  | table | dcache
|     public	| billinginfo_tm_daily  | table | dcache
|     public	| billinginfo_wr_daily  | table | dcache
|     public	| databasechangelog     | table | dcache
|     public	| databasechangeloglock | table | dcache
|     public	| doorinfo              | table | dcache
|     public	| hitinfo               | table | dcache
|     public	| hitinfo_daily         | table | dcache
|     public	| storageinfo           | table | dcache
|     public	| storageinfo_rd_daily  | table | dcache
|     public	| storageinfo_wr_daily  | table | dcache
|
|    billing-> \q

that is, if it has been previously modified by hand or out-of-band to include custom tables not used directly by dCache, the existence of such extraneous tables should not impede dCache from working correctly, provided those other tables are READ-accessible by the database user for billing, which by default is dcache. This is a requirement imposed by the use of Liquibase. You thus may need explicitly to grant READ privileges to the billing database user on any such tables if they are owned by another database user.