Releases: MI-DPLA/combine
v0.11
- localsettings template for docker changed to reflect use of nginx instead of internal static IP addresses
- mysql port for docker changed to 3307 to facilitate local integration tests run from outside Docker
- added localsettings template for testing
- added new settings LIVY_UI_HOME, SPARK_HOST, ES_UI_HOME, ENABLE_PYTHON, CELERY_RPC_SERVER; removed setting APP_HOST
- set the admin site site_url to '/combine'
- added line to log traceback with errors under DEBUG
- altered Validation Scenario, Transformation Scenario, Field Mapper, and Record Identifier Transformation Scenario to prohibit adding python code by default (can be enabled with server setting ENABLE_PYTHON; existing scenarios will still work but can't be modified without the setting)
- added createsuperuser django management command to facilitate docker build script
- allowed record groups to be sorted by when their most recently run job was run
- fixed local transformation includes on import
- protected actually all the endpoints that should require a login
v0.10
This is a really big release! In addition to all the changes called out here, the codebase has been refactored and cleaned up quite a bit. Some of the dependencies have also been updated, including ElasticSearch. You may need to re-index your Jobs if upgrading in place.
Added
- Add Configuration page to allow editing Validation Scenarios, Transformations, OAI Endpoints, etc. inside the Combine user interface #87
- Allow changing the Publish Set ID on a Job without unpublishing/republishing #407
- "Re-run all jobs" button on Organizations and Record Groups #410
- Global error recording in admin panel #430
- Add logout link #194
- Add 'include upstream Jobs' toggle to Job re-run options #358
- Include OAI harvest details in Job details #374
Changed
- FIXED: trying to view the Test Validation Scenario and related pages when a Record exists with an invalid Job ID #426
- FIXED: Malformed validation scenarios fail silently when running in a Job #431
- Give background tasks the same status display as Jobs and Exports #438
- Improve stateio status indicators #382
- Clarify wording on configuration 'payloads' #441
- FIXED: timestamp sorts #199
- FIXED: Job on rerun with invalid records still marked Valid #379
v0.9
Release Notes - v0.9
For changes see CHANGELOG
Upgrading to v0.9
(Ansible/Vagrant Server)
This version v0.9
introduces some changes at the server level that the normal update utility cannot address. Steps to manually make these changes are outlined below for Ansible/Vagrant Server build or Docker deployment.
Switch from standalone Spark cluster to running in local mode
To more closely align with the Docker deployment, and reduce some complexity without much/any noticeable change to performance, this switches the Spark application that is created by Livy from running in a standalone Spark cluster to running in what is called "local" mode, using N-threads.
This is optional, but recommended, as future updates and releases will likely assume running in local mode.
- First step is to stop Spark cluster, if running. The following can be run from anywhere:
# note the trailing colon, which is required
sudo supervisorctl stop spark:
- Second, is to prevent the Spark cluster from autostarting on reboot. Modify the file
/etc/supervisor/supervisord.conf
, and then under the sections[program:spark_driver]
and[program:spark_worker]
, changeautostart
andautorestart
tofalse
. They should then look something like the following:
[program:spark_driver]
environment =
SPARK_MASTER_HOST=0.0.0.0
command=/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master
directory=/opt/spark/
autostart = false
autorestart = false
stdout_logfile = /var/log/spark/spark_driver.stdout
stderr_logfile = /var/log/spark/spark_driver.stderr
user = combine
[program:spark_worker]
command=/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077
directory=/opt/spark
autostart = false
autorestart = false
stdout_logfile = /var/log/spark/spark_worker.stdout
stderr_logfile = /var/log/spark/spark_worker.stderr
user = combine
To apply these changes, run the following:
sudo supervisorctl reread
sudo supervisorctl update
For reference sake, the configurations and binaries to run this standalone Spark cluster remain in the build, in the event it is deemed helpful, or might assist in configuring with another cluster.
- Finally, update the parameter
livy.spark.master
in/opt/livy/conf/livy.conf
to the following:
livy.spark.master = local[*]
To apply, restart Livy:
sudo supervisorctl restart livy
Update DPLA's Ingestion3 build
As outlined in this issue, moving from forked version of Ingestion3 to pinned commits in DPLA repository to build from.
The Docker deployment ships with a .jar
file already compiled, which can be used for our purposes here. Ansible/Vagrant builds as of v0.9
will build this newer, updated version of Ingestion3.
To upgrade in place:
# jump to directory where Ingestion3 jar is located
cd /opt/ingestion3/target/scala-2.11
# backup previous Ingestion3 jar file
mv ingestion3_2.11-0.0.1.jar ingestion3_2.11-0.0.1.jar.BAK
# download pre-built .jar file
wget https://github.com/WSULib/combine-docker/raw/15938f053ccdfad08e41d60e6385588a064dc062/combinelib/ingestion3_2.11-0.0.1.jar
Then, restart Livy:
sudo supervisorctl restart livy
Finally, run update script as per normal
cd /opt/combine
source activate combine
git checkout master
git pull
pip install -r requirements.txt
./manage.py update --release v0.9
Upgrading to v0.9
(Docker)
From the Combine-Docker git repository directory on your host machine, pull changes:
git pull
Checkout tagged release:
git checkout v0.9
Run update script:
./update_build.sh
v0.8
Release Notes - v0.8
Added
- Global search of Record's mapped fields
- Ability to add Organizations, Record Groups, and/or Jobs to Published Subsets #395
- Remove temporary payloads of static harvests on Job delete #394
- Added
CHANGELOG.md
Changed
- Fixed precounts for Published Subsets when included Jobs mutate #396
Upgrading to v0.8
Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:
cd /opt/combine
source activate combine
git checkout master
git pull
./manage.py update --release v0.8
v0.7.1
Release Notes - v0.7.1
- bug fix and improvement of redis and celery python version pinning. Thanks @bibliotechy for finding this.
- bug fix for exporting published subset to S3
Upgrading to v0.7.1
Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:
cd /opt/combine
source activate combine
git checkout master
git pull
./manage.py update --release v0.7.1
v0.7
Release Notes - v0.7
- Introduction of Published Subsets (documentation)
- ability to create subsets of all published records based on Published Set Identifiers
- creates a unique OAI endpoint for this Published Subset
- when viewing a Published Subset, all metrics, exports, and analysis jobs are filtered for this subset
- Published Subsets are included in State Import/Export exports when any Jobs (upstream or downstream) are associated with a Published Subset
- introduce small delay in firing background tasks, avoiding some potential race conditions for Job statuses
- bug fixes for State Import/Export
- pinning python redis client to
2.10.6
(issue)
Upgrading to v0.7
Depending on what version of Combine you're upgrading from, it may be necessary to add the configuration MONGO_HOST
to your localsettings.py
configuration file. You can see an example in the localsettings.py.template
file.
Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:
cd /opt/combine
source activate combine
git checkout master
git pull
./manage.py update --release v0.7
v0.6.3
Release Notes - v0.6.3
- Hot fix for bug in v0.6.2 where records harvested via OAI-PMH that were outside of any OAI sets were missing the required
oai_set
column. - fix for
pyspark_shell.sh
that runs Spark environment onlocal[*]
, accommodating firing in Docker environment, and not requiring stopping of Livy session
Upgrading to v0.6.3
Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:
./manage.py update --release v0.6.3
v0.6.2
Release Notes - v0.6.2
Includes a couple fixes / improvements:
- Closes issue #383, draggable Transformation Scenarios should be cross-browser
- Allows for OAI-PMH harvesting of records not part of an OAI set with new "Harvest All Records" option
Upgrading to v0.6.2
Run the built-in update command to run any migrations, restart services, and pull in new front-end static files:
./manage.py update --release v0.6.2
Note: While not mandatory, it's been observed that adding the following Spark configuration may help when data is highly "skewed" in OAI harvests, meaning some sets are very large, or all records exist outside of OAI sets.
Add the following configuration to the file /opt/spark/conf/spark-defaults.conf
, allowing RPC messages to be 1024mb:
spark.rpc.message.maxSize 1024
v0.6.1
v0.6
Release Notes - v0.6
v0.6
includes the following two major additions:
- publishing records, mapped fields, or tabular data to S3 buckets
- supports Docker deployment
- read more about that process here: https://github.com/WSULib/combine-docker
The route of building a server dedicated to Combine via Ansible will continue to be supported for the foreseeable future, but increased attention will likely go the Docker deployment that begins with this version v0.6
.
Upgrading to v0.6
The addition of S3 publishing, and some additional configurations needed to support Dockerization, requires a couple of specific changes to files.
- Update
/opt/spark/conf/spark-defaults.conf
. Add the following package to the settingspark.jars.packages
which allows Spark to communicate with S3:
org.apache.hadoop:hadoop-aws:2.7.3
- Add the following variables to the
/opt/combine/localsettings.py
file if your installation is Ansible server based (if you are deploying via Docker, these settings should be included automatically via thelocalsettings.py.docker
file):
# Deployment type (suggested as first variable, for clarity's sake)
COMBINE_DEPLOYMENT = 'server'
# (suggested as part of "Spark Tuning" section)
TARGET_RECORDS_PER_PARTITION = 5000
# Mongo server
MONGO_HOST = '127.0.0.1'
As always, you can see examples of these settings in /opt/combine/localsettings.py.template
.
Once these changes are made, it is recommended to run the update
management command to install any required dependencies, pull in GUI changes, and restart everything:
# from /opt/combine
./manage.py update