etlog can be accessed on etlog.cesnet.cz. It gathers and analyzes national radius log files generated by eduroam service and presents them to the users. etlog is intended for both users and administrators.
Some of the main reasons to create etlog were:
- create generic interface for processing, analysis and searching of the radius log files
- create a system for generating statistics and reports
- create a system for trend analysis, which can signal service problems
- create a system for anomaly detection (authnetication errors, device or identity theft, .. )
etlog is a web application, which consists of Node.js, Express web application framework and MongoDB.
The application is setup on Debian jessie. It is running as user etlog and it's root is in /home/etlog/etlog/. It is listening for incoming http connections on port 8080. Apache webserver is in front of the apllication and is doing a proxy for it.
The main purpose of putting apache in front of the application itself is authentication. Apache uses shibd module for authentication in czech identitity feredation eduid.cz.
Add unprivileged user for application:
adduser etlog
Application is running by unprivileged user, so he can not use standard http and https ports. Instead port 8080 is used. Apache webserver is in front of application web server. Apache proxies all incoming request to the application web server. Automatic redirection from port 80 to port 443 is handled by apache.
Documentation used for sbibboleth setup is located at http://www.eduid.cz/cs/tech/sp/shibboleth.
etlog assumes that user's eduroam identity is the same as his eduPersonPrincipalName. If that is not true, user's home IdP can implement eduroamUID attribute. This attribute contains user's eduroam identity(or multiple identities). If the attribute is not implemented by user's home IdP, his eduPersonPrincipalName is used as his eduroam identity in etlog. User's home IdP must release the attribute at least for entityID https://etlog.cesnet.cz/shibboleth. Implementation at the Shibboleth Idp 3 may look like:
<AttributeDefinition id="eduroamUID" xsi:type="ScriptedAttribute">
<Dependency ref="uid" />
<AttributeEncoder xsi:type="SAML1String" name="http://eduroam.cz/attributes/eduroamUID" />
<AttributeEncoder xsi:type="SAML2String" name="http://eduroam.cz/attributes/eduroamUID" friendlyName="eduroamUID" />
<Script>
<![CDATA[
if (typeof uid != "undefined" && uid != null) {
eduroamUID.addValue (uid.getValues().get(0) + "@eduroam.%{idp.scope}");
}
]]>
</Script>
</AttributeDefinition>
At the SAML level, the messages can look like:
<saml2:Attribute FriendlyName="eduroamUID"
Name="http://eduroam.cz/attributes/eduroamUID"
NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri">
<saml2:AttributeValue xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">
user@org.eu</saml2:AttributeValue>
<saml2:AttributeValue xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="xsd:string">
user@org.cz</saml2:AttributeValue>
</saml2:Attribute>
Apache in conjuction with shibboleth is responsible for authentication of users into the application. After successful autentication apache proxies request to the application webserver.
Installation of apache webserver:
apt-get install apache2 libapache2-mod-proxy-html
Setup server certificate in /etc/ssl/certs/etlog.cesnet.cz.crt.pem
and private key in /etc/ssl/private/etlog.cesnet.cz.key.pem
.
Add intermediade certificate to /etc/ssl/certs/etlog.cesnet.cz.crt.pem
:
cd /tmp
wget https://pki.cesnet.cz/certs/TERENA_SSL_CA_3.pem
cat TERENA_SSL_CA_3.pem >> /etc/ssl/certs/etlog.cesnet.cz.crt.pem
rm TERENA_SSL_CA_3.pem
cd
SSL default vhost and module are enabled by:
a2enmod ssl
a2dissite 000-default
a2ensite default-ssl
service apache2 restart
Proxy is enabled by:
a2enmod proxy
a2enmod proxy_http
service apache2 restart
Headers and remote ip are enabled by:
a2enmod headers
a2enmod remoteip
service apache2 restart
Configuration for default ssl apache vhost is in /etc/apache2/sites-enabled/default-ssl.conf
.
Set the configuration as below:
<VirtualHost *:80>
ServerAdmin info@eduroam.cz
ServerName etlog.cesnet.cz
Redirect permanent "/" "https://etlog.cesnet.cz/"
</VirtualHost>
<IfModule mod_ssl.c>
# aplikacni virtualhost
<VirtualHost _default_:443>
ServerAdmin info@eduroam.cz
ServerName etlog.cesnet.cz
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/etlog_error.log
CustomLog ${APACHE_LOG_DIR}/etlog_access.log combined
SSLEngine on
SSLCertificateFile /etc/ssl/certs/...
SSLCertificateKeyFile /etc/ssl/private/...
BrowserMatch "MSIE [2-6]" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
# MSIE 7 and newer should be able to use keepalive
BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
# HSTS
Header always set Strict-Transport-Security "max-age=63072000; includeSubdomains;"
<Location />
# konfigurace shibbolethu pro /
AuthType shibboleth
Require shibboleth
ShibRequestSetting requireSession 1
# predani SSL prommene prostredi REMOTE_USER
RequestHeader set REMOTE_USER %{REMOTE_USER}s
# pro nastaveni dalsich hlavicek je treba dodat direktivu pro kokretni promenne prostredi
RequestHeader set entitlement %{entitlement}e
RequestHeader set eduroamUID %{eduroamUID}e
# proxy
ProxyPass http://127.0.0.1:8080/
ProxyPassReverse http://127.0.0.1:8080/
</Location>
# vyjimka z autentizace pro .well-known
<Location "/.well-known/security.txt">
AuthType shibboleth
Require shibboleth
ShibRequestSetting requireSession 0
</Location>
ProxyRequests Off
RemoteIPHeader X-Forwarded-For
RequestHeader set X-Forwarded-Proto "https"
</VirtualHost>
# virtualhost pro nrpe
<VirtualHost 127.0.0.1:443>
ServerAdmin info@eduroam.cz
ServerName etlog.cesnet.cz
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/etlog_error.log
CustomLog ${APACHE_LOG_DIR}/etlog_access.log combined
SSLEngine on
SSLCertificateFile /etc/ssl/certs/...
SSLCertificateKeyFile /etc/ssl/private/...
BrowserMatch "MSIE [2-6]" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
# MSIE 7 and newer should be able to use keepalive
BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
# HSTS
Header always set Strict-Transport-Security "max-age=63072000; includeSubdomains;"
<Location />
# proxy
ProxyPass http://127.0.0.1:8080/
ProxyPassReverse http://127.0.0.1:8080/
</Location>
ProxyRequests Off
RemoteIPHeader X-Forwarded-For
RequestHeader set X-Forwarded-Proto "https"
</VirtualHost>
# virtualhost pro dotazy na api a pro ermona
<VirtualHost etlog.cesnet.cz:8443>
ServerAdmin info@eduroam.cz
ServerName etlog.cesnet.cz
DocumentRoot /var/www/html
ErrorLog ${APACHE_LOG_DIR}/etlog_error.log
CustomLog ${APACHE_LOG_DIR}/etlog_access.log combined
SSLEngine on
SSLCertificateFile /etc/ssl/certs/...
SSLCertificateKeyFile /etc/ssl/private/...
BrowserMatch "MSIE [2-6]" \
nokeepalive ssl-unclean-shutdown \
downgrade-1.0 force-response-1.0
# MSIE 7 and newer should be able to use keepalive
BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown
# HSTS
Header always set Strict-Transport-Security "max-age=63072000; includeSubdomains;"
<Location />
# proxy
ProxyPass http://127.0.0.1:8080/
ProxyPassReverse http://127.0.0.1:8080/
</Location>
ProxyRequests Off
RemoteIPHeader X-Forwarded-For
RequestHeader set X-Forwarded-Proto "https"
</VirtualHost>
</IfModule>
Set listening ports in /etc/apache2/ports.conf
:
# If you just change the port or add more ports here, you will likely also
# have to change the VirtualHost statement in
# /etc/apache2/sites-enabled/000-default.conf
Listen 80
<IfModule ssl_module>
Listen 443
Listen 8443
</IfModule>
<IfModule mod_gnutls.c>
Listen 443
Listen 8443
</IfModule>
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
Configure apache log rotation in /etc/logrotate.d/apache2
:
/var/log/apache2/*.log {
monthly
missingok
rotate 12
compress
delaycompress
notifempty
create 640 root adm
sharedscripts
postrotate
if /etc/init.d/apache2 status > /dev/null ; then \
/etc/init.d/apache2 reload > /dev/null; \
fi;
endscript
prerotate
if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
run-parts /etc/logrotate.d/httpd-prerotate; \
fi; \
endscript
}
Additional settings are needed according to https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig. The page says:
Finally, on non-Windows systems you should make sure Apache is configured in so-called "worker" mode, using the "worker" MPM, either via a setting in an OS-supplied file like /etc/sysconfig/httpd or in the Apache configuration directly. Many servers come incorrectly configured in "prefork" mode, which emulates Apache 1.3's process model and causes vastly greater resource usage inside the shibd daemon.
Enable mpm-worker by:
a2dismod mpm_prefork
a2dismod mpm_event
a2enmod mpm_worker
service apache2 restart
apachectl -M | grep worker
Radius data are acquired through syslog. Installation and configuration:
cd /etc/ssl/certs/
wget https://crt.cesnet-ca.cz/CESNET_CA_Root.pem
wget https://crt.cesnet-ca.cz/CESNET_CA_3.pem
c_rehash
apt-get install syslog-ng
cat > /etc/syslog-ng/conf.d/etlog-fticks.conf
source net {
tcp(
port(1999)
tls( ca_dir("/etc/ssl/certs")
key-file("/home/etlog/etlog/cert/etlog.cesnet.cz.key.pem")
cert-file("/home/etlog/etlog/cert/etlog.cesnet.cz.crt.pem"))
);
};
destination fticks { file("/home/etlog/logs/fticks/fticks-$YEAR-$MONTH-$DAY" owner("etlog") group("etlog") perm(0600)); };
log { source(net); destination(fticks); };
^D
service syslog-ng restart
su - etlog
mkdir -p ~/logs/{fticks,transform,mongo,invalid_records,systemd,ldap}
The code above installs certificates required for syslog tls connection.
Next part installs syslog-ng and creates it's configuration.
last part creates directories /home/etlog/logs/fticks
, /home/etlog/logs/transform
,
/home/etlog/logs/mongo
and ./home/etlog/logs/invalid_records
Log files created by syslog are located in /home/etlog/logs/fticks
.
Ldap related files are in /home/etlog/logs/ldap
.
Systemd is used for integration of application within the system. Logging needs to be configured to acquire output to log files:
cat >> /etc/syslog-ng/conf.d/etlog-logs.conf
filter f_etlog { facility(local0); };
destination etlog_logs { file("/home/etlog/logs/systemd/log-$YEAR-$MONTH-$DAY" owner("etlog") group("etlog") perm(0600)); };
log { source(s_src); filter(f_etlog); destination(etlog_logs); };
^D
Application log files are located in /home/etlog/logs/systemd/
.
Cron is used to run tasks periodically. Setup is done in application for application logic and in user's crontab for incoming log importing.
User's crontab can be edited by using crontab -e
.
Crontab contains following jobs:
command | interval | description |
---|---|---|
/home/etlog/etlog/scripts/data_import.sh |
every 5 minutes | new data importing |
/home/etlog/etlog/scripts/ldap/admins.sh |
every 5 minutes | ldap synchronization |
/home/etlog/etlog/scripts/ldap/realms.sh |
every day at 0:30 | all known czech realms synchronization |
/home/etlog/etlog/scripts/invalid_records.sh |
every day at 1:00 | generating of files with invalid records |
/home/etlog/etlog/scripts/invalid_records_mail.sh |
every monday at 6:00 | sending report about invalid records |
/home/etlog/etlog/scripts/archive.sh |
every monday at 6:05 | archiving old log files |
/home/etlog/etlog/scripts/detection_data/create_detection_data.sh &>/dev/null |
every monday at 6:10 | generating login count graphs |
/home/etlog/etlog/scripts/concurrent_users/update_data.sh |
every saturday at 4:30 | generating old concurrent users data |
Crontab contents:
*/5 * * * * /home/etlog/etlog/scripts/data_import.sh
*/5 * * * * /home/etlog/etlog/scripts/ldap/admins.sh
30 0 * * * /home/etlog/etlog/scripts/ldap/realms.sh
0 1 * * * /home/etlog/etlog/scripts/invalid_records.sh
0 6 * * 1 /home/etlog/etlog/scripts/invalid_records_mail.sh
5 6 * * 1 /home/etlog/etlog/scripts/archive.sh
10 6 * * 1 /home/etlog/etlog/scripts/detection_data/create_detection_data.sh &>/dev/null
30 4 * * 6 /home/etlog/etlog/scripts/concurrent_users/update_data.sh
Setup is defined in cron.js. Table below defines how tasks are run.
Every task in the table below generates data for collection of the same name.
task name | interval |
---|---|
failed_logins | every day at 02:05:00 |
mac_count | every day at 02:15:00 |
roaming | every day at 02:20:00 |
shared_mac | every day at 02:25:00 |
realm_logins | every day at 02:35:00 |
visinst_logins | every day at 02:40:00 |
heat_map | every day at 02:45:00 |
unique_users | every day at 02:55:00 |
concurrent_users | every day at 03:10:00 |
users_mac | every 15 minutes |
Other tasks:
task name | interval |
---|---|
retention | every day at 03:00:00 |
Task retention deletes data from logs collections which are older than 365 days.
Monthly report about failed logins is sent at 5:59 every first day of month. For details see reports.
Mail is handled by postfix mail server.
Postfix configuration type is set up as Internet site.
Listeting only on localhost address (for both ipv4 and ipv6) is done with
inet_interfaces = localhost
in /etc/postfix/main.cf
Setup is done with nodemailer package in file mail.js.
These packages are necessary for etlog to run:
openssl git tmux htop iptables-persistent curl tmux make syslog-ng gawk logtail postfix mailutils bc duply ncftp lftp libkrb5-dev libapache2-mod-shib2 apache2 libapache2-mod-proxy-html apache2-bin apache2-data apache2-utils ldapscripts ldap-utils ldapscripts pwgen sharutils libdate-manip-perl libxml-libxml-perl libgps-point-perl libjson-perl
Other special packages along with installation are listed below.
Duply package is used to system backup. Configuration is in /etc/duply/system/conf
.
Files which should be backed up are defined in /etc/duply/system/exclude
.
Backup is executed by root's crontab file. Backup script is run every day. For details see root's crontab file.
mongodump is used for database backup.
mongodump is a utility for creating a binary export of the contents of a database.
Script /etc/duply/system/pre
is launched before every system backup and does the database backup using mongodump.
Binary export of database is located in /home/etlog/backup/dump
.
MongoDB is document oriented database.
At the time of writing this guide, no official documentation for installation on Debian jessie is available.
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/debian jessie/mongodb-org/3.2 main" | tee /etc/apt/sources.list.d/mongodb-org-3.2.list
apt-get update
apt-get install mongodb-org
systemctl enable mongod
service mongod start
Disable THP by following guide from official docs. THP is disabled using init script. No further configuration should be needed.
MongoDB stores all time data as Date data type, which stores time in UTC. This present issue when incoming data have localtime which has offset against UTC Official MongoDB docs say that:
MongoDB stores times in UTC by default, and will convert any local time representations into this form. Applications that must operate or report on some unmodified local time value may store the time zone alongside the UTC timestamp, and compute the original local time in their application logic.
Data will be reconstructed to original time when presenting to the user. Conversion in aggregation pipeline from UTC to localtime can be done using:
req.db.logs.aggregate([ { $sort : { timestamp : 1 } }, { $limit : 1 },
{ $project : { timestamp : 1, _id : 0 } } ],
function(err, doc) {
ret.logs.min = convert(doc[0].timestamp).toISOString();
});
// --------------------------------------------------------------------------------------
// convert UTC to localtime based on input
// --------------------------------------------------------------------------------------
function convert(date)
{
d = new Date(date);
d.setTime(d.getTime() + (-1 * d.getTimezoneOffset() * 60 * 1000));
// offset is variable [ -60 and - 120 minutes, depending on Daylight saving time ]
// => offset * 60 * 1000
// offset * 60 seconds * 1000 miliseconds
return d;
}
Database can be accessed by command mongo
.
Data are divided into databases, same as in the sql dabases. Each database consists of collections,
which is equivalent of sql tables. Collections consist of documents, which use the BSON notation, which is basen on JSON.
Basic commands:
show databases
lists all databases which are available.
use my_database
swich current database to my_database
show collections
lists collection for current database.
db.my_collection.find({})
display all documents in my_collection
db.my_collection.find({}).limit(5)
display 5 document from my_collection
db.my_collection.find({})limit(5).pretty()
display 5 nicely formatted documents from my_collection
Node.js is server-side JavaScript. Because the version of Node.js available in Debian jessie is very old (0.10.29~dfsg-2), installation of newer version is needed. At the time of writing this guide current version of Node.js is 6.5.
apt-get install curl
curl -sL https://deb.nodesource.com/setup_6.x | bash -
apt-get install nodejs
etlog consists of Node.js, Express web application framework and MongoDB. It uses many auxiliary javascript modules. All the necesarry modules including their specific version can be found in file package.json.
Application uses database etlog. Database is separated into several collections.
In the tables below the column note is just explanatory, it is not really present in the database. Every document has also a field _id, which is just for internal MongoDB purposes, it is not shown in tables below.
Collection represents raw radius log records transformed to json format. For details on data transformation see scripts/fticks_to_bson.sh
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp of authentication |
realm | String | domain part of username |
viscountry | String | visited country |
visinst | String | visited institution |
csi | String | mac address |
pn | String | username |
result | String | result of authentication |
Collection defines binding between user and all mac addresses, which he used for successful authentication to eduroam.
Collection has following structure:
field name | data type | note |
---|---|---|
username | String | username |
addrs | Array | array of user's mac addresses |
Collection contains mapping of users and mac addresses, which they used for successful authentication, for every day. Each user with more than 2 devices (assuming notebook and smartphone) is inserted. Address count and all used mac addreses are also available.
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
username | String | username |
count | Number | mac addresses count |
addrs | Array | Array of mac addresses |
timestamp | Date | timestamp |
Collection contains roaming related data. For every existing institution there is number of provided roamings and used roamings for every day.
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
inst_name | String | name of the institution |
used_count | Number | count of institution's users authenticated |
provided_count | Number | count of authentications provided |
timestamp | Date | timestamp |
Collection contains information about users, which have not successfully authenticated, for every day. Any user which has not successfully authenticated at least once is inserted. Both numbers for successful and unsuccessful authentication are available. There is also a field representing ratio (see below).
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
username | String | username |
timestamp | Date | timestamp |
fail_count | Number | count of failed login attempts |
ok_count | Number | count of successful login attempts |
ratio | Number | ratio of fail_count to (ok_count + fail_count) |
Collection contains count of logins for realms. Both successful and unsuccessful logins are counted. Values are saved in ok_count and fail_count. Unique values are also gathered, they are saved in grouped_ok_count and gropued_fail_count.
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp |
realm | String | realm |
ok_count | Number | count of successful logins |
grouped_ok_count | Number | unique count of successful logins |
fail_count | Number | count of unsuccessful logins |
grouped_fail_count | Number | unique count of unsuccessful logins |
Collection contains count of logins for visited institutions. Both successful and unsuccessful logins are counted. Values are saved in ok_count and fail_count. Unique values are also gathered, they are saved in grouped_ok_count and gropued_fail_count.
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp |
realm | String | visinst |
ok_count | Number | count of successful logins |
grouped_ok_count | Number | unique count of successful logins |
fail_count | Number | count of unsuccessful logins |
grouped_fail_count | Number | unique count of unsuccessful logins |
Collection contains data about users which logged in different locations concurrently. For the user to be in the collection the time difference of authentication in first visisted institution and the second visisted instituon must be lower than time_needed. The value of time_needed field is computed from geo information about instituons and possible travel speed.
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp |
timestamp_1 | Date | timestamp of first authentication |
timestamp_2 | Date | timestamp of second authentication |
visinst_1 | String | first visited institution |
visinst_2 | String | second visited institution |
username | String | username |
mac_address | String | MAC address related to incident |
time_needed | Number | time needed to travel from visinst_1 to visinst_2 in seconds |
dist | Number | distance between institutions in meters |
revision | Number | revision number |
Input data are stored in scripts/concurrent_users/inst.json
.
Data are converted from source XML document which contains geographical data
for all institutions. Conversion script used is in scripts/concurrent_users/inst.pl
.
Each run of cron job which computes new collection data works with input json data.
Data are automatically updated by script scripts/concurrent_users/update_data.sh
.
The script is run every saturday at 04:30. It gets new version of institution.xml, compares it to one locally saved.
If the files differ new version of input data are created by scripts/concurrent_users/inst.pl
.
These input data are used to compute new database data for concurrent_users collection. Newly computed data are 14 days old.
New revision of data is also saved.
Collection contains all available revisions of concurrent_users collection data. Data in this collection can be used to retrieve data from specific revision.
Collection has following structure:
field name | data type | note |
---|---|---|
revisions | Array | Array of all available revisions |
Collection contains unique mac addresses for realms for every day. Addresses of users from the realm are in array used_addrs. Addresses of users from other instituions which used realm as visinst are in array provided_addrs;
Timestamp field is populated with artificial data, just to distint in which interval the record belongs. Inserted timestamp is javascript Date for corresponding day at 00:00:00:000 (hours, minutes, seconds, milliseconds). Lowest distinction interval for timestamp is 24 hours.
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp |
realm | String | name of instituion |
realm_addrs | Array | array of instituion's users addresses |
visinst_addrs | Array | array of users, which used visited realm |
Collection contains array of administrators email's for institutions. Each institution hay have administrators specified.
If the insititution is defined and has administrators defined, the administrator(s) get a report once every month. For more see reports.
The only exception is realm "cz" which does not correspond with any institution. In this case, the administrator recieves reports with most significant problems found.
Realms are hierarchical - they are domain names, which use DNS. Every institution has it's domain and may have subdomains. All of these are different realms. Depending on the size of a subdomain/realm, it may be efficient for each one to have separate administration.
This collection is used to determine administrators for specific realm. If realm is defined, the the administrators can be notified about events in their realm.
Collection has following structure:
field name | data type | note |
---|---|---|
realm | String | realm |
admin | String | administrator's email address |
notify_enabled | Boolean | flag if administrator should be notified |
Data insertion may done easily by:
use etlog
db.realm_admins.insert({realm : "cvut.cz", admins : [ "administrator@cvut.cz" ]})
Data update may done easily by:
use etlog
db.realm_admins.update({realm : "cvut.cz"}, { $addToSet : { admins : "administrator2@cvut.cz" } } )
Data may be easily erased by:
use etlog
db.realm_admins.remove({realm : "cvut.cz"})
Collection contains unique mac addresses for realms for every day. Addresses of users from the realm are in array used_addrs. Addresses of users from other instituions which used realm as visinst are in array provided_addrs;
Collection has following structure:
field name | data type | note |
---|---|---|
admin_login_ids | Array | Array of possible login identities |
admin_notify_address | String | admin's email address |
administered_realms | Array | Array of realms which the admin manages |
Collection contains records about mac addresses, which have been used for successfull authnetication for mutiple different usernames for every day.
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp |
mac_address | String | MAC address |
users | Array | Array of users, which have used specific MAC address |
count | Number | number of users |
Collection contains all known realms from Czech republic.
Collection has following structure:
field name | data type | note |
---|---|---|
realm | String | realm |
Collection contains all IP addresses, which are allowed to do machine processing of data.
Collection has following structure:
field name | data type | note |
---|---|---|
ip | String | IP address |
hostname | String | hostname of IP address |
comment | String | comment |
Collection contains data for every known realm (see realms) for every day. Attribute realm represent institution from which the users roam. Attribute institutions is an array, which contains institution name (also named realm) and count. This array represents visited insitutions from users of specific realm for specific day.
Collection has following structure:
field name | data type | note |
---|---|---|
timestamp | Date | timestamp |
realm | String | institution name |
institutions | Array | array of other institutions |
One record may look like:
{
"_id" : ObjectId("5812681deb7bfee4dcde417d"),
"realm" : "ufa.cas.cz",
"timestamp" : ISODate("2016-10-25T22:00:00Z"),
"institutions" : [
{
"count" : 1,
"realm" : "utia.cas.cz"
},
{
"count" : 9,
"realm" : "asu.cas.cz"
},
{
"count" : 17,
"realm" : "ig.cas.cz"
}
]
}
Collection contains user sessions. Collection data are managed by connect-mongo. Data are updated dynamically based on user authentication and role changes. All relevant information for each authenticated user is stored.
Indexes are used to speed up queries. Following indexes are used:
collection name | indexed fields | note |
---|---|---|
failed_logins | _id, timestamp | |
logs | _id, timestamp, realm, visinst, pn, csi, result | |
realms | _id, realm | |
mac_count | _id, timestamp | |
shared_mac | _id, mac_address | |
privileged_ips | _id | |
realm_admins | _id | |
roaming | _id, timestamp | |
users_mac | _id, username | |
heat_map | _id, timestamp, realm | |
realm_logins | _id, timestamp, realm | |
visinst_logins | _id, timestamp, realm | |
unique_users | _id, timestamp, realm | |
concurrent_users | _id, timestamp, username | |
sessions | _id, expires | |
realm_admin_logins | _id, admin |
Application produces periodical reports. A report is a mail content, which is sent to eduroam administrators.
Weekly report is sent only to national radius administrator. It contains information about invalid records of past week.
Monthly report is sent to all administrators defined in realm_admins which have notify_enabled flag set to true. It contains 100 users with most failed logins from corresponding realm. Limit of 100 users is defined in config.js.
Report configuration is located in config directory.
Weekly reports configuration is located in config/invalid_records_mail
.
Link to the code generating the report content.
Monthly reports configuration is located in config/config.js
.
Link to the code generating the report content.
The application contains three privilege levels - user, realm admin and admin. The user is just a regular user with no special permissions. The user is least privileged one. The Realm admin is an admin of some specific realm(s). The admin is a global admin of all existing realms.
The autentication mechanism can provide addionational information about users.
Based on the provided information the user can be recognized as realm admin or admin.
Mapping of the groups provided by autentication process to privilege levels is defined in config/config.js
.
/home/etlog/etlog - application root
|-- app.js - main application file, constains appliation configuration
|-- auth.js - authentication configuration
|-- bin
`-- www - script to start the application
|-- cert - certificate related files
|-- config - configuration files for reports
|-- cron - cron tasks
`-- delete_logs.js - cron task for deleting old data from logs collection
`-- failed_logins.js - cron task for generating failed_logins collection data
`-- heat_map.js - cron task for generating heat_map collection data
`-- mac_count.js - cron task for generating mac_count collection data
`-- roaming.js - cron task for generating roaming collection data
`-- service_state.js - cron task for checking service state in all known realms
`-- shared_mac.js - cron task for generating shared mac address data
`-- succ_logins.js - cron task for generating succ_logins collection data
`-- users_mac.js - cron task for mapping users and mac addresses
|-- cron.js - cron tasks definiton
|-- db.js - database and schema configuration
|-- doc - documentation
|-- error_handling.js - middleware error handlers
|-- gulpfile.js - definition of gulp tasks
|-- javavscripts - directory with source frontend javascript files
|-- LICENSE - project LICENSE
|-- mail.js - mail api
|-- mongo_queries - directory with mongo shell queries for debugging purposes
|-- node_modules - application dependency files
|-- package.json - definition of application dependencies and properties
|-- public - directory for referring public files
`-- partials - directory for generated html files from pug templates
|-- README.md - link to doc/notes.md
|-- request.js - wrapper to backend api
|-- routes - application routes
|-- routes.js - mapping of routes to application
|-- scripts - various scripts
`-- archive.sh - script for old data archivation
`-- data_import.sh - cron script to import live data delivered by syslog
`-- detection_data - files for generating service state detection data
`-- fticks_to_bson.sh - transformation script from fticks to bson
`-- indexes.js - simple file with used indexes
`-- invalid_records_mail.sh - script for generating weekly invalid records report
`-- invalid_records.sh - script for generating invalid record files
`-- old_data.sh - script to import old data
`-- process_old_data.js - script to generate database data from old data
|-- stylesheets - source frontend css files
|-- views - templates of displayed pages
`-- templates - directory with pug templates for html pages
Gulp is a build system, which can be used for variuos tasks.
Gulp must be installed globally by root user by typing:
npm install -g gulp-cli
Everything that gulp does is defined in gulpfile.js.
After defining tasks, they can be run bu using gulp
.
When no particular task is defined as gulp parameter, all tasks are run.
Gulp is used to generate html files from pug templating language.
Pug files are in views/templates/
, html output is in public/partials
.
Task is run by gulp views
.
Gulp is used to generate single css file from all used css files.
Source css files are in stylesheets
, concatenated and minified output is in public/stylesheets/app.min.css
.
Task is run by gulp css
.
Gulp is used to generate single javascript file from all used javascript files.
Source javascript files are in javascripts/
, concatenated output is in public/javascripts/app.js
.
Task is run by gulp js
.
Everything related to log files is located in /home/etlog/logs.
/home/etlog/logs - log files root
|-- fticks - directory with log files and offset files
|-- last_date - file with date of last processed log file
|-- mongo - directory with log files generated by mongoimport
|-- transform - directory with files related to transformation from F-Ticks to BSON
`-- err-* - file containing line numbers of invalid records
`-- last_* - file containing number of last processed line of corresponding log file
|-- invalid_records - directory with files containing invalid records for every day
|-- access - webserver access log files for every day
Incoming syslog data are processed by scripts/data_import.sh
and subsequently by scripts/fticks_to_bson.sh
.
Data are converted from F-Ticks format (for more see this) to BSON.
Data are processed every 5 minutes by user's crontab.
Last date file (/home/etlog/logs/last_date
) contains date of last processed log file.
File is updated every day, when the last part of the data is imported.
Last file for every processed log file contains last processed line number. File on every cron job run. It is used to calculate absolute line numbers for error reporting.
Following filtering and replacements are done on incoming data:
-
All unprintable characters (ASCII codes 0 - 31) with exception of newline (\n, ASCII code 10) are replaced with string representing their code. For example data containing backspace (\b, ASCII code 8) will be replaced to string "<8>".
-
Backslash ('') and quote ('"') are escaped: '' will become '\' and '"' will become '"'
-
Correct number of fields in each record is checked: Each record must contain exactly 7 fields - REALM, VISCOUNTRY, VISINST, CSI, PN, RESULT + (inital log part).
-
Each of attributes REALM, VISCOUNTRY, VISINST, and RESULT must be separated from it's value by exactly one character '=' and it's value must not be empty.
-
VISINST value must begin with character '1'.
-
CSI value after normalization (all byte separators are deleted - eg. 123456789abc) must be 12 characters long.
Data which do not meet the filtering criteria are considered invalid and not imported to database. Information about invalid records are printed to error log files - see error log files.
Transform error log file has this structure:
filename:line number:error reason
Transform error log file may look like:
/home/etlog/logs/fticks/fticks-2016-10-20:681871: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:682504: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:683314: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:684293: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:685727: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:686547: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:688106: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:688122: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:688317: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:689784: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:690431: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:690872: skipped, general error in parsing current record
/home/etlog/logs/fticks/fticks-2016-10-20:692246: skipped, invalid mac address
Older data are archived due to space usage. Data are archived every monday at 6:05. F-tick files, transform log files and invalid records from past 14 days to past week are compressed. gzip is used for compression.
For easy use mapping from query string to MongoDB queries is used. Module api-query-params is used for this functionality. Official documentation provides full information how to use. Module is slightly modified to support various timestamps and for correct mapping of them to backend api.
Table below defines operators usage:
URI | example | explanation |
---|---|---|
key=val |
type=public |
equal |
key>val |
count>5 |
greater |
key>=val |
rating>=9.5 |
greater or equal |
key<val |
createdAt<2016-01-01 |
lower |
key<=val |
score<=-5 |
lower or equal |
key!=val |
status!=success |
not equal |
key=val1,val2 |
country=GB,US |
equal to all listed |
key!=val1,val2 |
lang!=fr,en |
not equal to all listed |
key |
phone |
exists |
!key |
!email |
not exists |
key=/value/<opts> |
email=/@gmail\.com$/i |
reqex equal |
key!=/value/<opts> |
phone!=/^06/ |
regex not equal |
Other operators usage:
operator type | example | explanation |
---|---|---|
skip | skip=10 |
skip 10 items before presenting to the user |
limit | limit=10 |
limit query to 10 items |
sort | sort=key |
sort ascending by key |
sort | sort=-key |
sort descending by key |
sort | sort=-key1,-key2 |
sort descending by both key1 and key2 |
Application api:
URL | params | query string variables | note |
---|---|---|---|
/api/search/ | timestamp, pn, [ csi, result, realm, visinst] | ||
/api/failed_logins/ | timestamp, [ username, fail_count, ok_count, ratio ] | ||
/api/mac_count/ | timestamp, [ username, count, addrs ] | ||
/api/roaming/most_provided/ | timestamp, [ inst_name, provided_count ] | ||
/api/roaming/most_used/ | timestamp, [ inst_name, used_count ] | ||
/api/shared_mac/ | timestamp, [ count, mac_address, users ] | ||
/api/heat_map/ | timestamp, [ realm, institutions.realm, institutions.count ] | ||
/api/succ_logins/ | timestamp, [ username, count ] | ||
/api/db_data/ | url with current data state | ||
/api/realms/ | url returning list of realms from realms collection | ||
/api/realm_logins | timestamp, [ realm ] | ||
/api/visinst_logins | tiemstamp, [ realm ] | ||
/api/unique_users/realm | timestamp, realm | ||
/api/unique_users/visinst | timestamp, realm | ||
/api/concurrent_users | timestamp, [ username, visinst_1, visinst_2, revision, diff_needed_timediff ] | ||
/api/concurrent_inst | timestamp | ||
/api/count/mac_count | timestamp, [ username, count, addrs ] | returns count of records for mac_count collection | |
/api/count/shared_mac | timestamp, [ count, mac_address, users ] | returns count of records for shared_mac collection | |
/api/count/concurrent_users | timestamp, [ username, visinst_1, visinst_2, revision, diff_needed_timediff ] | returns count of records for concurrent_users collection | |
/api/count/logs | timestamp, [ pn, csi, realm, visinst, result ] | returns count of records for logs collection |
Examples below are using the curl
command, but any other method (wget, browser, ... ) to retrieve http content can be used.
Some of the command below may take some time (units of seconds) to complete.
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/roaming/most_provided/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/roaming/most_used/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/failed_logins/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/shared_mac/?timestamp=2016-10-07'
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-10-07'
# get mac count records for 2016-10-07 with more than 5 mac addresses
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07&count>5'
# get mac count records for 2016-10-07 with more than 5 mac addresses, sort from most to least
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07&count>5&sort=-count'
# get mac count records for 2016-10-07 with more than mac addresses between 5 and 15, sort from most to least
curl 'https://etlog.cesnet.cz/api/mac_count/?timestamp=2016-10-07&count>5&count<15&sort=-count'
# get most provided roaming records for 2016-10-07 with more than 1000 provided roamings, sort from most to least
curl 'https://etlog.cesnet.cz/api/roaming/most_provided/?timestamp=2016-10-07&provided_count>1000&sort=-count'
# get most used roaming records for 2016-10-07 with more than 100 used roamings, sort from most to least
curl 'https://etlog.cesnet.cz/api/roaming/most_used/?timestamp=2016-10-07&used_count>100&sort=-count'
# get failed logins records for 2016-10-07 with ratio between 0.4 and 0.9, sort from most to least
curl 'https://etlog.cesnet.cz/api/failed_logins/?timestamp=2016-10-07&ratio>0.4&ratio<0.9&sort=-ratio'
# get failed logins records for 2016-10-07 only for users with realms ending '.cz' with fail count more than 500, sort from most to least
curl 'https://etlog.cesnet.cz/api/failed_logins/?username=/\.cz$/×tamp=2016-10-07&fail_count>500&sort=-fail_count'
# get failed logins records from 2016-09-20 to 2016-09-30 only for users with realms ending '.edu'
# with fail count more than 100, sort from most to least
curl 'https://etlog.cesnet.cz/api/failed_logins/?username=/\.edu$/×tamp>2016-09-20×tamp<2016-09-30&fail_count>100&sort=-fail_count'
# get failed logins records from 2016-09-20 to 2016-10-10 only for users from realm 'fit.cvut.cz'
# with no successful logins, sort from most failed logins to least
# get only 10 results
curl 'https://etlog.cesnet.cz/api/failed_logins/?username=/.*@fit\.cvut\.cz$/×tamp>2016-09-20×tamp<2016-10-10&ok_count=0&sort=-fail_count&limit=10'
# get heat map data for 2016-08-30 for institution 'cvut.cz'
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-08-30&realm=cvut.cz'
# get heat map data for 2016-08-30 where institution 'vfn.cz' was the visited institution
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-08-30&institutions.realm=vfn.cz'
# get heat map data for 2016-08-30 where the visited count was more than 1000
curl 'https://etlog.cesnet.cz/api/heat_map/?timestamp=2016-08-30&institutions.count>1000'
Timestamp value must be in one of the formats in table below. All timestamps used for querying must have set hours, minutes, seconds and milliseconds to 0, when specified. When not specified, values for hours, minutes, seconds and milliseconds are set automatically to 0.
format | example |
---|---|
ISO-8601 | 2016-10-06T22:00:00.000Z |
reduced ISO-8601 | 2016-10-06T22:00:00 |
%Y-%m-%d | 2016-10-06 |
Frontend is built in Pug (formely Jade) template engine. Only the index page is loaded through templating engine. All other pages are compiled to html by gulp. This is necessary becasuse all other pages are loaded dynamically via angular, which is not able to use templating engine.
AngularJS is a complete JavaScript-based open-source front-end web application framework. It enables dynamic content manipulation throught html element attributes.
Frontend has following structure:
state | url | title |
---|---|---|
search | /#/search?pn&csi | etlog: obecné vyhledávání |
mac_count | /#/mac_count | etlog: počet zařízení |
shared_mac | /#/shared_mac | etlog: sdílená zařízení |
failed_logins | /#/failed_logins | etlog: neúspěšná přihlášení |
heat_map | /#/heat_map | etlog: mapa roamingu |
orgs_roaming_most_used | /#/orgs_roaming_most_provided | etlog: organizace nejvíce poskytující konektivitu |
orgs_roaming_most_provided | /#/orgs_roaming_most_used | etlog: organizace nejvíce využívající roaming |
roaming_activity | /#/roaming_activity | etlog: aktivita eduroamu |
detection_data | /#/detection_data | etlog: absolutní počet přihlášení |
detection_data_grouped | /#/detection_data_grouped | etlog: normalizovaný počet přihlíšení |
notifications | /#/notifications | etlog: správa notifikací |
Application api is described in section API. This section describes classic html pages.
URL | explanation |
---|---|
/ | title page |
Application is intergated in system with the use of systemd. systemd is an init system used in Linux distributions.
Service configuration is in /etc/systemd/system/etlog.service
.
File contents:
[Service]
ExecStart=/usr/bin/npm --prefix /home/etlog/etlog/ start
WorkingDirectory=/home/etlog/etlog/
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogFacility=local0
SyslogLevel=info
SyslogIdentifier=etlog
User=etlog
Group=etlog
[Install]
WantedBy=multi-user.target
Service is enabled by systemctl enable etlog
.
Service is launched by systemctl start etlog
.
In case the application crashes for some reason, systemd automatically restarts it.