Skip to content
This repository has been archived by the owner on Dec 1, 2021. It is now read-only.

prune non-existent application details #381

Merged
merged 4 commits into from
Sep 14, 2018

Conversation

paltanmoy
Copy link
Contributor

[#159282737]

@cfdreddbot
Copy link

Hey paltanmoy!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/160247722

The labels on this github issue will be updated when the story is started.

}
// For each app check if they really exist or not via CC api call
for appID := range appIds {
_, err = as.cfClient.GetApp(appID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple possible errors in GetApp() calls
For example, if the status code = 200 OK, an error returned:
https://github.com/cloudfoundry-incubator/app-autoscaler/blob/9f93c7c2b4ae3091b7b409b855755555c74a6fd6/src/autoscaler/cf/app.go#L41-L45
With the code above, what will happen when the CC api endpoint is temporarily unavailable ? All GetAPP() may fail with error, then all policy might be deleted. We met such a situation in daily operations ..

Technically, we can only delete App with the following response:
RESPONSE: [2018-09-05T10:55:18+08:00]
HTTP/1.1 404 Not Found
{
"description": "The app could not be found: xxxxx",
"error_code": "CF-AppNotFound",
"code": 100004
}
Otherwise, we may false delete a valid policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdlliuy PR updated after incorporating the review comments

var bodydata map[string]interface{}
err = json.Unmarshal([]byte(respBody), &bodydata)
if err != nil {
c.logger.Error("failed-to-unmarshal-response-body-while-getting-app-summary", err, lager.Data{"appid": appId})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Besides resp.StatusCode != http.StatusOK , could we check whether resp.StatusCode == 404 first before parsing the response body ?

  2. When CC API is available, I think the response error should be written in JSON format. But if CC API endpoint is not accessible here , the response body maybe empty or plain text . In this case, most likely the unmarshal error message talks about the json parsing error , instead of the original http response. How about adding respBody into the c.logger.Error ?

Description: "Server error",
ErrorCode: "ServerError",
Code: 100001,
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, we can't determine there will be a JSON response returned in each failure case.

@@ -17,6 +18,12 @@ import (
"net/url"
)

type CFSummaryResponse struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the error response struct, then the name CFSummaryResponse is confusing

@paltanmoy
Copy link
Contributor Author

travis build is failing for go-yaml/yaml#401 which is caused by niemeyer/gopkg#63

We have to wait till this got resolved.

@@ -17,6 +18,12 @@ import (
"net/url"
)

type ErrorResponse struct {
Copy link
Contributor

@boyang9527 boyang9527 Sep 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to move this to https://github.com/cloudfoundry-incubator/app-autoscaler/blob/develop/src/autoscaler/models/errors.go together with CFAppNotFound definition

@@ -39,6 +41,32 @@ func (c *cfClient) GetApp(appId string) (*models.AppEntity, error) {
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
if resp.StatusCode == 404 {
respBody, err := ioutil.ReadAll(resp.Body)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this piece of code can be simplified to unmarshal to CF error response struct

@@ -0,0 +1,14 @@
package helpers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to models package?

@@ -109,6 +120,9 @@ var defaultConfig = Config{
LockRetryInterval: DefaultRetryInterval,
LockTTL: DefaultLockTTL,
},
AppSyncer: AppSyncerConfig{
SyncInterval: DefaultSyncInterval,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DefaultSyncInterval (10 minutes) is too short. Imagine we may have 10K apps, it will take a pretty long time. And I think remove the orphan app policies is not that urgent so we probably can give it one day as default sync interval

Copy link
Contributor

@boyang9527 boyang9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cdlliuy
Copy link
Contributor

cdlliuy commented Sep 14, 2018

LGTM

@cdlliuy cdlliuy merged commit 65f3d81 into cloudfoundry:develop Sep 14, 2018
boyang9527 added a commit that referenced this pull request Dec 1, 2018
* Consolidate all db maintenance of pruner to operator (#368)

* Consolidate all db maintenance of pruner to operator

* update per comments

* wrong sample request in docs (#374)

* Bug fix in liquibase change set when rename pruner to operator

* Added DB schema for custom-metrics credentials management (#373)

* Added DB schema for custom-metrics credentials management

* Deleted binding_id column

* Update api.db.changelog.yml

* adapt public rest api to cf style (#371)

* adapt public rest api to cf style

[#159391488]

* both order-direction and order are valid to compatible with old version

* page

* refactor router helper and update prevUrl&nextUrl method

* add test cases for function getPageUrl

* test case update

* Instance metrics can be queried by instance index (#377)

* Instance metrics can be queried by instance index

[#159756219]

* when instance-index is not provided, returns instance metrics of all instances

* update for comments

* update for comments

*  Added custom metrics binding credential generation and management in APIServer (#376)

* Custom Metrics broker api changes

* Modified Integration Test

* Minor update

* Incorporated Review comments

* Updated Integration tests

* credentials generated during binding without policy

* Rechecking database for invalid credential cache entry

* Incorporated Review comments

* refactor config of go components (#378)

* Add maven dependencies to scheduler to be compatible with JAVA9 or higher versions. (#380)

[#160224877]

* Check user authorization in apiserver with cc token_point instead of authorization_endpoint (#382)

[#160438532]

* prune non-existent application details (#381)

* synchronize application details

* Pruning apps only for CF-AppNotFound type of error

* pruning of app is restricted to only 404 statusCode

* Updated to incorporate review comments

* cache metric data in metrics collector (#379)

* cache metric data in metrics collector

* update based on review comments

* fix the race condition in test (#386)

* Add http connection timeout as a configuration to all autoscaler components (#387)

* Add http connection timeout as a configuration to all autoscaler components

[#160241806]

* rename http_request_timeout to http_client_timeout

* typo

* override tcp dial timeout of cfhttp httpclient (#389)

* Added validation to restrict threshold value to be integer only (#392)

* [auth] allow admin to access autoscaler (#393)

* [auth] allow admin to access autoscaler

* [auth] check token via uaa

* [auth] add test cases

* [auth] supplement test cases

* [auth] update test

* [auth] [7]string -> []string

* Add db connection limitation configurations to  scheduler (#395)

[#161710050]

* Add health endpoint for go component (#394)

* Add health endpoint for go component

* update for comments

* Add health endpoint for scheduler (#397)

[#161745454]

* unify all the golang fakes (#398)

[#161774071]

* cut off duration of operator should be any duration rather than days (#399)

* cut off duration of operator should be any duration rather than days

[#161834647]

* the default cut off duration is 30 days

* enable cpu metric collection

* Configuration enhancement for scheduler to prevent logjam vulnerability (#400)

[#161873300]

* Updated metric_type schema to support custom metrics (#404)

* Updated metric_type schema to support custom metrics

 - metric_type can be alphanumeric characters and it can also have
underscores

* Updated apiserver policy validation

* enhance integration test for bug "TLS handshake error between operator and scheduler" (#405)

[#162230075]
@paltanmoy paltanmoy deleted the sync-app-details branch January 4, 2019 06:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants