-
Notifications
You must be signed in to change notification settings - Fork 62
prune non-existent application details #381
Conversation
Hey paltanmoy! Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA. |
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/160247722 The labels on this github issue will be updated when the story is started. |
} | ||
// For each app check if they really exist or not via CC api call | ||
for appID := range appIds { | ||
_, err = as.cfClient.GetApp(appID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple possible errors in GetApp() calls
For example, if the status code = 200 OK, an error returned:
https://github.com/cloudfoundry-incubator/app-autoscaler/blob/9f93c7c2b4ae3091b7b409b855755555c74a6fd6/src/autoscaler/cf/app.go#L41-L45
With the code above, what will happen when the CC api endpoint is temporarily unavailable ? All GetAPP() may fail with error, then all policy might be deleted. We met such a situation in daily operations ..
Technically, we can only delete App with the following response:
RESPONSE: [2018-09-05T10:55:18+08:00]
HTTP/1.1 404 Not Found
{
"description": "The app could not be found: xxxxx",
"error_code": "CF-AppNotFound",
"code": 100004
}
Otherwise, we may false delete a valid policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdlliuy PR updated after incorporating the review comments
src/autoscaler/cf/app.go
Outdated
var bodydata map[string]interface{} | ||
err = json.Unmarshal([]byte(respBody), &bodydata) | ||
if err != nil { | ||
c.logger.Error("failed-to-unmarshal-response-body-while-getting-app-summary", err, lager.Data{"appid": appId}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Besides resp.StatusCode != http.StatusOK , could we check whether resp.StatusCode == 404 first before parsing the response body ?
-
When CC API is available, I think the response error should be written in JSON format. But if CC API endpoint is not accessible here , the response body maybe empty or plain text . In this case, most likely the unmarshal error message talks about the json parsing error , instead of the original http response. How about adding respBody into the c.logger.Error ?
src/autoscaler/cf/app_test.go
Outdated
Description: "Server error", | ||
ErrorCode: "ServerError", | ||
Code: 100001, | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above, we can't determine there will be a JSON response returned in each failure case.
src/autoscaler/cf/app_test.go
Outdated
@@ -17,6 +18,12 @@ import ( | |||
"net/url" | |||
) | |||
|
|||
type CFSummaryResponse struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like the error response struct, then the name CFSummaryResponse is confusing
09fef38
to
c7d3763
Compare
travis build is failing for go-yaml/yaml#401 which is caused by niemeyer/gopkg#63 We have to wait till this got resolved. |
src/autoscaler/cf/app_test.go
Outdated
@@ -17,6 +18,12 @@ import ( | |||
"net/url" | |||
) | |||
|
|||
type ErrorResponse struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may want to move this to https://github.com/cloudfoundry-incubator/app-autoscaler/blob/develop/src/autoscaler/models/errors.go together with CFAppNotFound definition
@@ -39,6 +41,32 @@ func (c *cfClient) GetApp(appId string) (*models.AppEntity, error) { | |||
defer resp.Body.Close() | |||
|
|||
if resp.StatusCode != http.StatusOK { | |||
if resp.StatusCode == 404 { | |||
respBody, err := ioutil.ReadAll(resp.Body) | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this piece of code can be simplified to unmarshal to CF error response struct
src/autoscaler/helpers/errors.go
Outdated
@@ -0,0 +1,14 @@ | |||
package helpers | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to models package?
@@ -109,6 +120,9 @@ var defaultConfig = Config{ | |||
LockRetryInterval: DefaultRetryInterval, | |||
LockTTL: DefaultLockTTL, | |||
}, | |||
AppSyncer: AppSyncerConfig{ | |||
SyncInterval: DefaultSyncInterval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the DefaultSyncInterval (10 minutes) is too short. Imagine we may have 10K apps, it will take a pretty long time. And I think remove the orphan app policies is not that urgent so we probably can give it one day as default sync interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
* Consolidate all db maintenance of pruner to operator (#368) * Consolidate all db maintenance of pruner to operator * update per comments * wrong sample request in docs (#374) * Bug fix in liquibase change set when rename pruner to operator * Added DB schema for custom-metrics credentials management (#373) * Added DB schema for custom-metrics credentials management * Deleted binding_id column * Update api.db.changelog.yml * adapt public rest api to cf style (#371) * adapt public rest api to cf style [#159391488] * both order-direction and order are valid to compatible with old version * page * refactor router helper and update prevUrl&nextUrl method * add test cases for function getPageUrl * test case update * Instance metrics can be queried by instance index (#377) * Instance metrics can be queried by instance index [#159756219] * when instance-index is not provided, returns instance metrics of all instances * update for comments * update for comments * Added custom metrics binding credential generation and management in APIServer (#376) * Custom Metrics broker api changes * Modified Integration Test * Minor update * Incorporated Review comments * Updated Integration tests * credentials generated during binding without policy * Rechecking database for invalid credential cache entry * Incorporated Review comments * refactor config of go components (#378) * Add maven dependencies to scheduler to be compatible with JAVA9 or higher versions. (#380) [#160224877] * Check user authorization in apiserver with cc token_point instead of authorization_endpoint (#382) [#160438532] * prune non-existent application details (#381) * synchronize application details * Pruning apps only for CF-AppNotFound type of error * pruning of app is restricted to only 404 statusCode * Updated to incorporate review comments * cache metric data in metrics collector (#379) * cache metric data in metrics collector * update based on review comments * fix the race condition in test (#386) * Add http connection timeout as a configuration to all autoscaler components (#387) * Add http connection timeout as a configuration to all autoscaler components [#160241806] * rename http_request_timeout to http_client_timeout * typo * override tcp dial timeout of cfhttp httpclient (#389) * Added validation to restrict threshold value to be integer only (#392) * [auth] allow admin to access autoscaler (#393) * [auth] allow admin to access autoscaler * [auth] check token via uaa * [auth] add test cases * [auth] supplement test cases * [auth] update test * [auth] [7]string -> []string * Add db connection limitation configurations to scheduler (#395) [#161710050] * Add health endpoint for go component (#394) * Add health endpoint for go component * update for comments * Add health endpoint for scheduler (#397) [#161745454] * unify all the golang fakes (#398) [#161774071] * cut off duration of operator should be any duration rather than days (#399) * cut off duration of operator should be any duration rather than days [#161834647] * the default cut off duration is 30 days * enable cpu metric collection * Configuration enhancement for scheduler to prevent logjam vulnerability (#400) [#161873300] * Updated metric_type schema to support custom metrics (#404) * Updated metric_type schema to support custom metrics - metric_type can be alphanumeric characters and it can also have underscores * Updated apiserver policy validation * enhance integration test for bug "TLS handshake error between operator and scheduler" (#405) [#162230075]
[#159282737]