-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 12040 #12155
base: master
Are you sure you want to change the base?
Fix issue 12040 #12155
Conversation
@amaltaro , this is initial logic based on provided requirements. I would appreciate if you will reviewed and let me know if it has expected behavior. In particular, I need to know decision about persistent storage and overview of acknowledged responses to upstream caller. Once we settle on this the rest would be implementation of site update/rules only. |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
test this please |
Jenkins results:
|
test this please |
Jenkins results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valentin, despite not covering 100% of your changes, I left some comments along the code.
In addition, for dealing with persisted information in the filesystem. If we decide to keep writing a file per workflow, we then need to implement:
- deleting that file once data replacement has been successful
- listing all files pending for data replacement
In my opinion, filesystem will provide only the workflow name that needs replacement. We then fetch the workflow from ReqMgr2 (similar to what is done by getRequestRecords()
) and let it go through the service.
@@ -72,6 +74,11 @@ def __init__(self, msConfig, logger=None): | |||
""" | |||
super(MSTransferor, self).__init__(msConfig, logger=logger) | |||
|
|||
# persistent area for site list processing | |||
wdir = '{}/storage'.format(os.getcwd()) | |||
self.storage = self.msConfig.get('persistentArea', wdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure that this area is persistent across POD restarts, so we do not lose data accidentally.
I remember we used to use something like /data/srv/state/
for database related data.
@@ -195,6 +202,13 @@ def execute(self, reqStatus): | |||
self.logger.info("%d requests information completely processed.", len(reqResults)) | |||
|
|||
for wflow in reqResults: | |||
# perform site list updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code has to be placed outside of this for loop (L197). Otherwise it will only get executed when there is other workflows in the queue for data placement (workflows sitting in assigned
).
@@ -195,6 +202,13 @@ def execute(self, reqStatus): | |||
self.logger.info("%d requests information completely processed.", len(reqResults)) | |||
|
|||
for wflow in reqResults: | |||
# perform site list updates | |||
errors = self._updateSites(wflow) | |||
if len(errors) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, you are overwriting this metric with the very last workflow outcome.
Instead, the way it has been used so far is to provide a summary of the microservice execution cycle.
Said that, my suggestion would be to define it to an integer number saying how many workflows (count) have been re-placed.
""" | ||
Update sites API provides asynchronous update of Site info. | ||
|
||
:param doc: JSON payload with the following data structures: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the source code, it looks like we only save the workflow name. I think that is correct, but we then need to update this docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We receive this record {'workflow': <wflow name>, 'SiteWhiteList' ['T1', ...], 'SiteBlackList': ['T2',...]}
from upstream and this is what is saved into a file with workflow name as a file name. This allows to keep site lists when we need to run business logic and avoid extra calls to upstream service.
:return: acknowledge dict to upstream caller (ReqMgr2) | ||
""" | ||
# preserve provided payload to local file system | ||
errors = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this API is supposed to receive a single workflow per HTTP call (and I would say this is what we should implement), then we should convert errors from list to string type.
# send acknowledged message back to upstream caller | ||
resp = {'status': 'ok'} | ||
if len(errors) != 0: | ||
resp = {'status': 'fail', 'errors': errors} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to use the same string as we use in CouchDB, just so we keep error strings as consistent as possible. Please check out the CMSCouch.py module, which I believe sets the non ok
answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is used by HTTP end-point to return to upstream caller. Please clearly define how HTTP end-point should behave both in success and failure mode? In other words, if this API succeed, what it should return, a code , nothing? And if it fails what it should return to upstream code, a string? How error can be defined in upstream code from a return value of this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, on a separate note, why MSTransferor or in this sense any MS service should be complaint how CMSCouch return errors? I'm not criticizing but rather trying to understand. Bottom line, I'm asking how any MS service should return the success and failures? Is it standardized across all MS services?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was probably mixing things up and ended up thinking that this data structure was written to couchdb, hence reporting any potential errors from the backend database back to the user.
Seeing that I was wrong, I would suggest you to look into MSPileup (or perhaps pick a different MS service) to see how the server responds back to the client, which data format and content is returned. Just so we try to keep services as consistent as possible.
data = json.load(istream.read()) | ||
return data | ||
|
||
def _updateSites(self, wflow): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove all this code and rely on what has already been implemented in MSTransferor, hence, just let the workflow go through the standard algorithm.
When removing this module, please do not squash commits though. Just in case I am missing any detail that would make that not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why I need to remove it since it is a business logic of requested feature. How standard algorithm will execute a logic which is not there? So far, the default algorithm does not deal with sites in white/black lists? I don't understand what you require to do here. Please elaborate more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is why I am suggesting to have only a list of workflows that need dedicated data placement (instead of having the site lists a well).
You will, of course, have to modify the standard algorithm such that it can also considers a list of workflow(s) that is retrieved from somewhere else. Other than that, the rest of the logic is already implemented and there is no need to have all this code duplication.
@amaltaro , I asked few questions about your comments and I'm not sure you saw them, but in order to proceed with this PR I need your response. Please have a look along the PR threads where I posted my questions. |
@amaltaro , this is kind ping that in order to move forward I'm awaiting response on my questions in this PRs. |
@vkuznet you need to refresh the review request through the "Reviewers" option on the top right side, otherwise I cannot see it in the GitHub filters. In addition, please update the title of this PR and if needed amend the commit message as well. |
Alan, this is not review request since I didn't made any changes, and rather it is request to answer my questions in order for me to proceed. Since I didn't update any code I though I should not request a review. Please see my questions in open threads and reply to each of them directly within a thread. |
Fixes #12040
Status
In development
Description
Introduce new logic to update sites and associated rules:
/ms-transferor/data/transferor
updateSites
API to handler POST request and return status to upstream caller (ReqMgr2)saveData
andreadData
to perform IO operations for provided JSON payload and handle its persistent storage. So far these APIs rely on usage of local file system where it store JSON as file whose name is workflow name. If we will decide to use other storage, e.g. database only these two APIs will need a change to perform IO operations_updateSites
API which will be executed byexecute
API of MSTransferor daemon.Is it backward compatible (if not, which system it affects?)
YES
Related PRs
<If it's a follow up work; or porting a fix from a different branch, please mention them here.>
External dependencies / deployment changes
<Does it require deployment changes? Does it rely on third-party libraries?>