-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPII-3138: Update snapsets in data base #626
Conversation
Modified vagrantCloudBasedContainers.sh script to make use of the changes in the GPII-3138 branch of gpii/gpii-dataloader
Moved deleteSnapset.js from gpii-dataloader to universal's script folder.
Fixed some (grievous) typos.
Fixed erroneous call to fluid.error() -- replaced with fluid.log().
Removed hard-coded host ("localhost") and port for the CouchDB URL and used the actual value passed in on the command line.
- Replaced all occurrences of forEach() with fluid.each(). - Properly set up shell environment variable NODE_PATH.
Added more log messages.
CI job failed: https://ci.gpii.net/job/universal-tests/945/ |
scripts/deleteSnapsets.js
Outdated
var dbLoader = gpii.dataLoader; | ||
dbLoader.couchDbUrl = process.argv[2]; | ||
if (!fluid.isValue(dbLoader.couchDbUrl)) { | ||
fluid.log ("COUCHDB_URL environment variable must be defined"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to also output the command usage: node deleteSnapsets.js $COUCHDBURL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
scripts/deleteSnapsets.js
Outdated
} | ||
fluid.log("COUCHDB_URL: '" + dbLoader.couchDbUrl + "'"); | ||
dbLoader.prefsSafesViewUrl = dbLoader.couchDbUrl + "/_design/views/_view/findSnapsetPrefsSafes"; | ||
dbLoader.gpiiKeyViewUrl = dbLoader.couchDbUrl + "/_design/views/_view/findGpiiKeysByPrefsSafeId"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think to define the full url here to provide a general view: dbLoader.gpiiKeyViewUrl = dbLoader.couchDbUrl + "/_design/views/_view/findGpiiKeysByPrefsSafeId%22%gpiiKey%22"
. When this var is used later on, replace %gpiiKey
with actual key values using fluid.stringTemplate()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now use the get-all-gpiikeys view, this is no longer needed -- "overtaken by events".
scripts/deleteSnapsets.js
Outdated
dbLoader.addGpiiKeysAndBulkDelete = function (snapSets, docsToRemove) { | ||
fluid.each(snapSets, function (aSnapset) { | ||
var gpiiKeyId = aSnapset.value._id; | ||
fluid.log("Snapset: " + gpiiKeyId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output a more meaningful message such as "Finding GPII keys associated with the snapset prefs safe id: ".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated all the log messages to be very simple, such as you have suggested. Actual snapset or gpiikeys internal information such as the "_id" field are no longer logged.
scripts/deleteSnapsets.js
Outdated
}; | ||
|
||
/** | ||
* Delete the snapset Prefs Safes and their associated GPII Keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function only deletes prefs safes, not gpii keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the use of promises, this now does delete all prefs safes and associated gpii keys.
scripts/deleteSnapsets.js
Outdated
fluid.log("STATUS: " + res.statusCode); | ||
fluid.log("HEADERS: " + JSON.stringify(res.headers, null, 2)); | ||
res.on('end', function () { | ||
fluid.log('Batch deletion of snapsets'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapsets -> snapset prefs safes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the refactoring, the log message includes "Prefs Safes", and also "GPII Keys".
scripts/deleteSnapsets.js
Outdated
batchDeleteReq.end(); | ||
}; | ||
|
||
dbLoader.snapSetsRequest = http.request(dbLoader.prefsSafesViewUrl, dbLoader.processSnapsets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The couchdb url on the GPII cloud might be using https instead http. Will find out by testing with the developer cloud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any progress on this, @cindyli ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the confusion. I meant YOU will find out when testing with the cloud, not me. It's also worth to check with @mrtyler on this information since production/staging might act differently than dev clusters.
scripts/deleteSnapsets.js
Outdated
}); | ||
getGpiiKeysRequest.end(); | ||
}); | ||
dbLoader.doBatchDelete(docsToRemove); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting all prefs safe records here at the end of the script doesn't guarantee it to be run as the last task due to the aync nature of http requests above for getting and deleting gpii keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but it's another instance of "overtaken by events": The latest code handles the asynchrony.
scripts/deleteSnapsets.js
Outdated
var gpiiKeyId = aSnapset.value._id; | ||
fluid.log("Snapset: " + gpiiKeyId); | ||
var gpiiKeyViewUrl = dbLoader.gpiiKeyViewUrl + "?key=%22" + gpiiKeyId + "%22"; | ||
var getGpiiKeysRequest = http.request(gpiiKeyViewUrl, function (resp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worrying lack of abstraction in all of this work. At the least there should be a wrapper for the process of invoking an HTTP request and waiting for responses from it as some kind of promise-producer. I don't want to slow down this work one atom so I'm not suggesting that it be re-expressed using https://github.com/fluid-project/kettle/blob/master/docs/DataSources.md#simple-example-of-using-an-http-datasource but something needs to happen to unwrap the tangle in this method which by line 88 we are 4 closures deep. It's hard to follow this expression of the algorithm, it is brittle, and seems prone to races which are already causing confusion. Also consider utilities like https://docs.fluidproject.org/infusion/development/PromisesAPI.html#fluidpromisesequencesources-options
scripts/deleteSnapsets.js
Outdated
var snapSetsString = ""; | ||
response.setEncoding("utf8"); | ||
response.on("data", function (chunk) { | ||
snapSetsString += chunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IANA Javascript developer but to me this looks like it implements the dataloading strategy Cindy and I have discussed.
Two areas I want to make sure we're thinking about:
-
Idempotence - the goal is that we should be able to run the dataloader over and over again, and the end result will always be that the correct data is in the database (and any old data is removed). I think this implementation satisfies that goal, but please confirm for me Joseph/Cindy.
-
Failure modes - the goal is that a failure along the way (e.g. an HTTP timeout while communicating with couchdb) should NOT leave the database in a bad or inconsistent state.
This is a nuanced topic with many possible solutions that range from "cheap but disruptive" to "resilient but expensive". A few examples:
-
Current solution: drop entire database and re-populate from scratch
- Cheap!
- Very disruptive. Ignoring the showstopping problem -- this approach destroys any data that cannot be reloaded from canned snapsets -- a user will be unable to use the system from the moment the database is dropped until the Snapset the user wants is re-uploaded (on the order of 10s of seconds).
-
Proposed solution in this PR: delete all snapsets in a first pass, re-populate snapsets in a second pass
- More expensive (the cost of writing the code in this PR, basically)
- Less disruptive. This solves the showstopping problem above, and narrows the window where a user is affected (on the order of seconds).
-
Possible future solution: delete each snapset individually, then re-upload that snapset immediately
- A bit more expensive to implement
- A bit less disruptive. Narrows the window where a user is affected (on the order of deciseconds).
Note that these are all "happy paths". If the dataloading process halts due to an error after it has deleted a snapset but before it has re-uploaded that snapset, the window where users are affected grows to the time it takes the dataloader process to be run again (on the order of minutes or tens of minutes).
First, are my concerns clear? If not, I'm happy to elaborate and clarify, perhaps in a real-time chat.
Second, are my concerns worth considering now? The solution in this PR is better than what we have today, and deadlines are looming, so perhaps this is sufficient. My main purpose in raising these questions is so that everyone is thinking about failure modes and how to handle them since failure on the internet is inevitable :p.
scripts/deleteSnapsets.js
Outdated
fluid.log ("COUCHDB_URL environment variable must be defined"); | ||
process.exit(1); | ||
} | ||
fluid.log("COUCHDB_URL: '" + dbLoader.couchDbUrl + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
COUCHDB_URL can (and usually will) contain credentials. Please do not write it to the log without sanitization.
It looks like url.parse
may be helpful in reporting useful data like hostname without reporting sensitive data like password.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, @mrtyler. I've gone with logging the
- protocol
- host
- port
- pathname
I don't think any of those contain sensitive information, but I could be wrong. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
scripts/deleteSnapsets.js
Outdated
}); | ||
}); | ||
getGpiiKeysRequest.on("error", function (e) { | ||
fluid.log("Error finding snapsets' associated GPII Keys: " + e.message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be "snapset's", unless the error message is going to contain errors from all the snapsets that failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closest log message to this old one now has no apostrophe: "Error finding snapset Prefs Safes associated GPII Keys".
scripts/deleteSnapsets.js
Outdated
fluid.log("STATUS: " + res.statusCode); | ||
fluid.log("HEADERS: " + JSON.stringify(res.headers, null, 2)); | ||
res.on('end', function () { | ||
fluid.log('Batch deletion of snapsets'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is clearer as "Finished batch deletion of..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one is now "Bulk deletion completed."
GPII-3138: Add a couchDB view to return all GPII keys
CI job failed: https://ci.gpii.net/job/universal-tests/968/ |
@cindyli, @mrtyler, @amb26 I'm pushing the latest code that uses promises. But, this is not complete as I have yet to migrate the code that loads the snapsets and their keys using bash, as you suggested @cindyli. I wanted to test what I have so far using the docker image and see if there are any problems. |
- Refactored deleteSnapsets.js to use promises. - Modified Dockerfile to remove dependency on NODE_ENV, as it is not needed by the gpii-dataloader script built into the docker image.
CI job failed: https://ci.gpii.net/job/universal-tests/977/ |
scripts/deleteSnapsets.js
Outdated
dbLoader.snapsetPrefsSafes.push(aSnapset.value); | ||
}); | ||
fluid.log("\tSnapset Prefs Safes marked for deletion."); | ||
return dbLoader.snapsetPrefsSafes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems unnecessary because dbLoader.snapsetPrefsSafes
is a global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and nothing actually uses the return value. Removed.
scripts/deleteSnapsets.js
Outdated
gpiiKeyRecords, dbLoader.snapsetPrefsSafes | ||
); | ||
fluid.log("\tGPII Keys associated with snapset Prefs Safes marked for deletion."); | ||
return dbLoader.gpiiKeys; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
I believe these snapset data are already in the production and staging databases and marked as "user" type. We can do a one time cleanup to wipe them out before the new dataloader goes alive. |
Okay, I won't add that one time cleanup to the new dataloader. |
scripts/deleteAndLoadSnapsets.js
Outdated
dbOptions.couchDbUrl = processArgv[2]; | ||
dbOptions.staticDataDir = processArgv[3]; | ||
dbOptions.buildDataDir = processArgv[4]; | ||
if (processArgv.length > 5 && processArgv[5] === "--justDelete") { // for debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be dbOptions.justDelete = processArgv.length > 5 && processArgv[5] === "--justDelete";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
scripts/deleteAndLoadSnapsets.js
Outdated
return dbOptions; | ||
}; | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turn this into a proper doc comment that gets linted. Add a special warning that the input argument will be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of a warning, I've documented the properties added to the options
input parameter in the function description. It would be nice to be able to use some form of @return block tag here, but everything I tried failed to lint.
scripts/deleteAndLoadSnapsets.js
Outdated
fluid.log("\tViews data " + ( views ? "retrieved." : "missing." )); | ||
}; | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** To make this a proper doc comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the following 3 - fix globally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching them -- done, globally.
scripts/deleteAndLoadSnapsets.js
Outdated
|
||
/* | ||
* Create the step that retrieves the current views from the database. | ||
* @param {Object} options - Object containing the views URL into the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be a little clearer about what fields are expected/allowed in this options block, possibly using a JSDocs @typedef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
scripts/deleteAndLoadSnapsets.js
Outdated
/** | ||
* Generate a response handler, setting up the given promise to resolve/reject | ||
* at the correct time. | ||
* @param {Function} handleEnd - Function to call that deals with the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document signature of this function, possibly using an @callback JSDocs directive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
scripts/deleteAndLoadSnapsets.js
Outdated
* return. It is up to the caller to trigger the request by calling its end() | ||
* function. | ||
* @param {String} databaseURL - URL to query the database with. | ||
* @param {Function} handleResponse - callback that processes the response from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, use @callback/@typedef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
scripts/deleteAndLoadSnapsets.js
Outdated
* Utility to configure a step: creates a response callback, binds it to an | ||
* http database request, and configures a promise to resolve/reject when the | ||
* response callback finishes or fails. | ||
* @param {Object} details - Specific information for the request and response, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document mandatory fields in this object either inline here as with http://usejsdoc.org/tags-param.html#parameters-with-properties or else using a JSDocs @typedef http://usejsdoc.org/tags-typedef.html if this options structure appears in the signature of other functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've defined a few @typedef and @callback block tags and used them accordingly throughout. And, where options
is a parameter, I've followed the Parameters with properties JSDoc section, listing the properties of options
specifically used within the function.
That said, this function, gpii.dataLoader.configureStep()
, was a struggle because it is abstract and generic. The actual option properties and their meaning is defined in the specific "set up" functions that ultimately call this one. It's tricky being informative and vague at the same time...
Modified based on Antranig's comments: - modified the JSDoc documentation to better explain the rationale behind each function, its input parameters and outputs.
CI job passed: https://ci.gpii.net/job/universal-tests/1196/ |
README.md
Outdated
`%gpii-universal/build/dbData/snapset/` folder. These are used to update the snapsets in CouchDB when GPII is | ||
run in a production or staging configuration. | ||
* They are also converted into `user` preferences safes and GPII keys and placed into the | ||
`%gpii-universal/build/dbData/user/` folder. These are used with PouchDB when GPII runs in a development |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When GPII runs in a development configuration, it also uses snapset-type data from %gpii-universal/build/dbData/snapset/
. See pouchManager code.
%gpii-universal/build/dbData/user/
is only used for running integration tests. See PouchTestCaseHolder.
The doc in testData/dbData/README.txt for these data sets are correct. Please sync up. Thanks.
README.md
Outdated
* The preferences files for running GPII and for integration tests are located at | ||
`%gpii-universal/testData/preferences`. These files are converted into two types of preferences safes and GPII keys: | ||
* They are converted into `snapset` preferences safes and GPII keys and placed into the | ||
`%gpii-universal/build/dbData/snapset/` folder. These are used to update the snapsets in CouchDB when GPII is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading "when GPII is run in a production or staging configuration" reminds me the use of config files in gpii/configs
directory. All those configs use pouchDB at the backend.
Probably adjust this sentence to express datasets in this folder are:
- loaded into the production and staging CouchDB in the real clouds.
- loaded into the pouchDB when GPII runs locally (regardless which config is used.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cindyli I've reworked the whole section to:
- better sync up with the
testData\dbData\README.txt
, and - adjust the wording to reflect your suggestions.
- modified the main README.md to better explain the various ways that the source preferences files are converted into PrefsSafes and GPII Keys, and for which purposes they are used. - minor grammaticial changes to README.txt.
CI job failed: https://ci.gpii.net/job/universal-tests/1199/ |
Fixed markdown errors.
CI job passed: https://ci.gpii.net/job/universal-tests/1203/ |
I already confirmed this with @cindyli (thanks for help), but can you @klown please also confirm that removing following keys and corresponding prefsSafes is what we want to do for cleanup? (matches the files in https://github.com/GPII/universal/tree/master/testData/preferences)
|
@stepanstipl @cindyli There is also an And, sorry in advance for being pedantic: the prefsSafes for this one-time removal must have Using
Hope that helps |
thanks @klown . You're right that And yes, the I agree, and that's why I'm double-checking, since we're gonna do the changes on live production database, it's better to be safe than sorry! thanks for comments and if you see anything else that might be an issue pls. let me know. |
nyx.json is fixed with #630. |
Of course. Makes sense.
Oh, for sure. |
@cindyli Here is the pull request that goes with the one in gpii-dataloader, gpii-ops/gpii-dataloader#6