dCache has built-in monitoring capabilities which provide an overview of the activity and performance of the installation’s doors and pools. There are two options for how this data can be represented and stored:
- a set of log files written to a known location
- a database (the billing database).
These options can be enabled simultaneously. If the database option is selected, the data in those tables will also be displayed as a set of histogram plots on the installation's web page.
If you installed dCache following the instructions in the Chapter Installing dCache you enabled the BILLING in the domain where the HTTPD service is running (see the extract of the layout file).
[httpdDomain]
[httpdDomain/billing]
[httpdDomain/httpd]
Use the property billing.text.dir
to set the location of the log files and the property billing.enable.text
to control whether the plain-text log files are generated.
To write the logs in JSON format instead of plain text, use the property billing.format.json=true
.
By default the log files are located in the directory
/var/lib/dcache/billing
. Under this directory the log files are
organized in a tree data structure based on date (YYYY/MM). A separate
file is generated for errors. The log file and the error file are
tagged with the date.
Example:
log file: /var/lib/dcache/billing/##TODAY_YEAR##/09/billing-##TODAY_YEAR##.09.25
error file: /var/lib/dcache/billing/##TODAY_YEAR##/09/billing-error-##TODAY_YEAR##.09.25
The log files may contain information about the time, the pool, the pnfsID and size of the transferred file, the storage class, the actual number of bytes transferred, the number of milliseconds the transfer took, the protocol, the subject (identity of the user given as a collection of principals), the data transfer listen port, the return status and a possible error message. The logged information depends on the protocol.
A log entry for a write operation has the default format:
<MM.dd> <HH:mm:ss> [pool:<pool-name>:transfer]
[<pnfsId>,<filesize>] [<path>]
<StoreName>:<StorageGroup>@<type-of-storage-system>
<transferred-bytes> <connectionTime> <true/false> {<protocol>}
<initiator> {<return-status>:"<error-message>"}
Example: A typical logging entry would look like this for writing. In the log file each entry is in one line. For readability we split it into separate lines in this documentation.:
12.10 14:19:42 [pool:pool2@poolDomain-1:transfer]
[0000062774D07847475BA78AC99C60F2C2FC,10475] [Unknown]
<Unknown>:<Unknown>@osm 10475 40 true {GFtp-1.0 131.169.72.103 37850}
[door:WebDAV-example.org@webdavDomain:1355145582248-1355145582485] {0:""}
The formatting of the log messages can be customized by redefining the <billing.format.someInfoMessage> properties in the layout configuration, where <billing.format.someInfoMessage> can be replaced by
- billing.text.format.mover-info-message
- billing.text.format.remove-file-info-message
- billing.text.format.door-request-info-message
- billing.text.format.storage-info-message
A full explanation of the formatting is given in the
/usr/share/dcache/defaults/billing.properties
file. For syntax
questions please consult StringTemplate v3
documentation
or the cheat
sheet.
On the web page generated by the httpd
service (default port 2288), there is a link to Action Log
. The table which appears there gives a summary overview extracted from the data contained in the billing log files.
To replicate the billing service, the underlying store should be shared, otherwise one risks potentially dispersing text records over several nodes. Hence, a shared rdbms database instance should be enabled. Absent a database, enabling kafka may offer an alternative to centralized record-keeping without the bottleneck of a single dCache service.
In order to enable the database, the following steps must be taken.
-
If the billing database does not already exist (see further below on migrating from an existing one), create it (we assume PSQL here):
createdb -O dcache -U postgres billing
If you are using a version of PostgreSQL prior to 8.4, you will also need to do:
createlang -U dcache plpgsql billing
No further manual preparation is needed, as the necessary tables, indices, functions and triggers will automatically be generated when you (re)start the domain with the billing database logging turned on (see below).
-
The property
billing.enable.db
controls whether the billing cell sends billing messages to the database. By default the option is disabled. To activate, set the value totrue
and restart the domain.
In most cases, the billing service will be run out-of-the-box; nevertheless, the administrator does have control, if this is desired, over the database configuration.
-
Database name, host, user, and password can be easily modified using the properties:
- billing.db.name
- billing.db.host
- billing.db.user
- billing.db.password
The current database values can be checked with the
dcache database ls
command.dcache database ls |DOMAIN CELL DATABASE HOST USER MANAGEABLE AUTO |namespaceDomain PnfsManager chimera localhost dcache Yes Yes |namespaceDomain cleaner chimera localhost dcache No No |billingDomain billing billing localhost dcache Yes Yes
-
Database inserts are batched for performance. Since 2.8, improvements have been made to the way the billing service handles these inserts, which can now be tuned by adjusting the queue sizes (there are four of them, each mapped to the four main tables: billinginfo, storageinfo, doorinfo, hitinfo), and the maximum database batch size.
- billing.db.inserts.max-queue-size (defaults to 100000 )
- billing.db.inserts.max-batch-size (defaults to 1000 )
There is further the option as to whether to drop messages (default is true) or block when the queue maximum is exceeded.
- billing.db.inserts.drop-messages-at-limit (defaults to true )
The default settings should usually be sufficient.
You can now obtain statistics (printed to the billing log and pinboard) via the dcache admin command:
display insert statistics <on/off>
command. Activating this command logs the following once a minute:insert queue (last 0, current 0, change 0/minute) commits (last 0, current 0, change 0/minute) dropped (last 0, current 0, change 0/minute) total memory 505282560; free memory 482253512
"insert queue" refers to how many messages actually were put on the queue; "commits" are the number of messages committed to the database; "dropped" are the number of lost messages. "last" refers to the figures at the last iteration. For insert queue, this is the actual size of the queue; for commits and dropped, these are cumulative totals.
You can also generate a Java thread dump by issuing the
"dump threads"
command.
It may be useful to limit the growth of the billing database, which can fill up quickly over the course of time if there is a lot of door activity.
A built-in cron can be set to run every 24 hours in order to remove
older rows from the "fine-grained" tables (billinginfo
, doorinfo
,
storageinfo
, hitinfo
).
The following properties are relevant:
billing.enable.db-truncate (default = false)
billing.db.fine-grained-truncate-before (default = 365)
billing.db.fine-grained-truncate-before.unit (default = DAYS)
If the database has been enabled, dCache's frontend will regularly collect histograms from it, and make these available via several RESTful API calls. Please refer to the frontend documentation for further infomartion.
Similarly, dCache-View (the web interface) includes a publicly available tab for plots generated from this data. These provide aggregate views of the data for 24-hour, 7-day, 30-day and 365-day periods.
The plot types are:
-
(Giga)bytes read and written for both dCache and HSM backend (if any)
-
Number of transactions/transfers for both dCache and HSM backend (if any)
-
Maximum, minimum and average connection time
-
Cache hits and misses
NOTE
The data for this last histogram is not automatically sent, since it contributes significantly to message traffic between the pool manager and the billing service. To store this data (and thus generate the relevant plots), the
poolmanager.enable.cache-hit-message
property must be set either indcache.conf
or in the layout file for the domain where the poolmanager runs:poolmanager.enable.cache-hit-message=true
The frontend also provides an API for viewing the billing records for a given file. This data is also available through dCache-View. Once again, please consult those parts of the documentation for further information.
Because it is possible that the newer version may be deployed over an existing installation which already uses the billing database, the Liquibase change-set has been written in such a way as to look for existing tables and to modify them only as necessary.
If you start the domain containing the billing
service over a pre-existing installation of the billing database, depending on what was already there, you may observe some messages like the following in the domain log having to do with the logic governing table initialization.
Example:
INFO 8/23/12 10:35 AM:liquibase: Successfully acquired change log lock
INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
INFO 8/23/12 10:35 AM:liquibase: Successfully acquired change log lock
INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
INFO 8/23/12 10:35 AM:liquibase: Reading from databasechangelog
INFO 8/23/12 10:35 AM:liquibase: ChangeSet org/dcache/services/billing/
db/sql/billing.changelog-1.9.13.xml::4.1.7::arossi ran successfully in 264ms
INFO 8/23/12 10:35 AM:liquibase: Marking ChangeSet: org/dcache/services/
billing/db/sql/billing.changelog-1.9.13.xml::4.1.8::arossi::(Checksum:
3:faff07731c4ac867864824ca31e8ae81) ran despite precondition failure due
to onFail='MARK_RAN': classpath:org/dcache/services/billing/db/sql/
billing.changelog-master.xml : SQL Precondition failed. Expected '0' got '1'
INFO 8/23/12 10:35 AM:liquibase: ChangeSet org/dcache/services/billing/db/sql/
billing.changelog-1.9.13.xml::4.1.9::arossi ran successfully in 14ms
INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
INFO 8/23/12 10:35 AM:liquibase: Successfully released change log lock
Anything logged at a level lower than ERROR
is usually entirely normal. Liquibase regularly reports when the preconditions determining whether it needs to do something are not met. All this means is that the update step was not necessary and it will be skipped in the future.
If, on the other hand, there is an ERROR
logged by Liquibase, it is possible there may be some other conflict resulting from the upgrade (this should be rare). Such an error will block the domain from starting. One remedy which often works in this case is to do a clean re-initialization by dropping the Liquibase tables from the database:
psql -U dcache billing
|
|billing=> drop table databasechangelog
|billing=> drop table databasechangeloglock
|billing-> \q
and then restarting the domain.
NOTE
If the billing database already exists, but contains tables other than the following:
psql -U dcache billing | billing=> \dt | List of relations | Schema | Name | Type | Owner | -------+-----------------------+-------+----------- | public | billinginfo | table | dcache | public | billinginfo_rd_daily | table | dcache | public | billinginfo_tm_daily | table | dcache | public | billinginfo_wr_daily | table | dcache | public | databasechangelog | table | dcache | public | databasechangeloglock | table | dcache | public | doorinfo | table | dcache | public | hitinfo | table | dcache | public | hitinfo_daily | table | dcache | public | storageinfo | table | dcache | public | storageinfo_rd_daily | table | dcache | public | storageinfo_wr_daily | table | dcache | | billing-> \q
that is, if it has been previously modified by hand or out-of-band to include custom tables not used directly by dCache, the existence of such extraneous tables should not impede dCache from working correctly, provided those other tables are
READ
-accessible by the database user for billing, which by default isdcache
. This is a requirement imposed by the use of Liquibase. You thus may need explicitly to grantREAD
privileges to the billing database user on any such tables if they are owned by another database user.