Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate novel pub/sub system test failures #1085

Closed
dhermes opened this issue Aug 25, 2015 · 14 comments
Closed

Investigate novel pub/sub system test failures #1085

dhermes opened this issue Aug 25, 2015 · 14 comments
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API. testing

Comments

@dhermes
Copy link
Contributor

dhermes commented Aug 25, 2015

  1. Subscription.pull(return_immedately=False, max_messages=2) returns immediately (See 'Subscription.pull(return_immedately=False, max_messages=2)' returns immediately :( #893)
  2. Missing ACK ID
======================================================================
ERROR: test_message_pull_mode_e2e (pubsub.TestPubsub)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/system_tests/pubsub.py", line 123, in test_message_pull_mode_e2e
    subscription.acknowledge(ack_ids)
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/.tox/system-tests/lib/python2.7/site-packages/gcloud/pubsub/subscription.py", line 230, in acknowledge
    method='POST', path='%s:acknowledge' % (self.path,), data=data)
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/.tox/system-tests/lib/python2.7/site-packages/gcloud/connection.py", line 419, in api_request
    error_info=method + ' ' + url)
gcloud.exceptions.BadRequest: 400 You have not specified an ack ID in the request. (POST https://pubsub.googleapis.com/v1/projects/precise-truck-742/subscriptions/subscribing-now:acknowledge)
----------------------------------------------------------------------

From:

  1. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/75334450
  2. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/77188348
  3. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/77564743
  4. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/94920686
  5. Only one message returned
======================================================================
ERROR: test_message_pull_mode_e2e (pubsub.TestPubsub)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/system_tests/pubsub.py", line 129, in test_message_pull_mode_e2e
    message1, message2 = sorted(messages, key=_by_timestamp)
ValueError: need more than 1 value to unpack
----------------------------------------------------------------------

From:

  1. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/77052107
  2. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/98407178
  3. Eventual Consistency Error
======================================================================
FAIL: test_list_subscriptions (pubsub.TestPubsub)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/system_tests/pubsub.py", line 100, in test_list_subscriptions
    self.assertEqual(len(created), len(subscriptions_to_create))
AssertionError: 2 != 3
======================================================================
FAIL: test_list_topics (pubsub.TestPubsub)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/GoogleCloudPlatform/gcloud-python/system_tests/pubsub.py", line 62, in test_list_topics
    self.assertEqual(len(created), len(topics_to_create))
AssertionError: 2 != 3
----------------------------------------------------------------------
Ran 5 tests in 66.301s
FAILED (failures=2)

From:

  1. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/77600321
  2. https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/78109143
@dhermes dhermes added testing api: pubsub Issues related to the Pub/Sub API. labels Aug 25, 2015
@dhermes dhermes changed the title Investigate novel test failure (missing ACK ID) Investigate novel pub/sub system test failure (missing ACK ID) Aug 27, 2015
@dhermes
Copy link
Contributor Author

dhermes commented Aug 27, 2015

Same issue in https://travis-ci.org/GoogleCloudPlatform/gcloud-python/builds/75334450 (build was from #1060, previously reported in #1062)

@dhermes dhermes changed the title Investigate novel pub/sub system test failure (missing ACK ID) Investigate novel pub/sub system test failures Aug 27, 2015
@dhermes
Copy link
Contributor Author

dhermes commented Dec 4, 2015

@tseaver @tmatsuo The 1st error occurred again today.

@tmatsuo Not sure if there is a place to report this internally, but we are hitting this with a tiny amount of load and we've see this server-side flakiness 4 times now. 99% of the time it works (i.e. it knows there is an Ack ID in the request) but for some reason it doesn't always pick it up.

@tmatsuo
Copy link
Contributor

tmatsuo commented Dec 4, 2015

@dhermes
99%, I think it matches the end-to-end message delivery latency expectations at this point.

The tail latency is mainly caused by server crashes, disk crashes, or network cut etc (but we never lost any messages so far, that's great), as you can imagine, and it's almost inevitable.

Only thing I can think of is longer wait time (and more pull calls) for the tests. Is it possible to have a time limit, and pulls again and again until it reaches the time limit?

@dhermes
Copy link
Contributor Author

dhermes commented Dec 4, 2015

Ah I see. So in the source

ack_ids = [recv[0] for recv in received]
subscription.acknowledge(ack_ids)

we should check that received / ack_ids is not empty and then never send the acknowledge request?

@tmatsuo
Copy link
Contributor

tmatsuo commented Dec 4, 2015

The response can be contain 0 message, 1 message, or 2 messages. You can repeatedly call pull until you get the right amount of messages. sg?

@dhermes
Copy link
Contributor Author

dhermes commented Dec 4, 2015

can be contain message

You mean 0 messages?

@dhermes
Copy link
Contributor Author

dhermes commented Dec 4, 2015

Also

repeatedly call pull until you get the right amount of messages

does sound good

@tmatsuo
Copy link
Contributor

tmatsuo commented Dec 4, 2015

@dhermes

You mean 0 messages?
Yes, I corrected my comment :)

@dhermes
Copy link
Contributor Author

dhermes commented Dec 4, 2015

👍

@daspecster
Copy link
Contributor

@dhermes, for point 4, I tried playing with @Retry in those methods and retrying didn't seem to work.

test_create_subscription_defaults (pubsub.TestPubsub) ... ok
test_create_subscription_w_ack_deadline (pubsub.TestPubsub) ... ok
test_create_topic (pubsub.TestPubsub) ... ok
test_fetch_delete_subscription_w_deleted_topic (pubsub.TestPubsub) ... 0 != 1, Trying again in 30 seconds...
0 != 1, Trying again in 60 seconds...

0 != 1, Trying again in 120 seconds...
FAIL
test_list_subscriptions (pubsub.TestPubsub) ... 2 != 3, Trying again in 30 seconds...
2 != 3, Trying again in 60 seconds...
2 != 3, Trying again in 120 seconds...
2 != 3, Trying again in 30 seconds...
True is not false, Trying again in 60 seconds...
True is not false, Trying again in 120 seconds...
FAIL
test_list_topics (pubsub.TestPubsub) ... 2 != 3, Trying again in 30 seconds...
2 != 3, Trying again in 60 seconds...
2 != 3, Trying again in 120 seconds...
FAIL
test_message_pull_mode_e2e (pubsub.TestPubsub) ... ok
test_subscription_iam_policy (pubsub.TestPubsub) ... ok
test_topic_iam_policy (pubsub.TestPubsub) ... ok

@dhermes
Copy link
Contributor Author

dhermes commented Aug 3, 2016

@daspecster When you say you tried, did you decorate the entire test case or just the sensitive part?

@tseaver
Copy link
Contributor

tseaver commented Aug 3, 2016

#2050 fixes the "#4. Eventual consistency errors" case for me.

@daspecster
Copy link
Contributor

@dhermes just the asserts actually.

@tseaver fantastic! I'm excited for master to turn green again.

@tseaver
Copy link
Contributor

tseaver commented Aug 9, 2016

Point 3) now tracked in #2077. Point 1) is really the same issue, with the failure in parsing the empty response already fixed in #899. Point 2) is also a symptom of #2077.

@tseaver tseaver closed this as completed Aug 9, 2016
@dhermes dhermes added the flaky label Aug 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API. testing
Projects
None yet
Development

No branches or pull requests

4 participants