Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagfetcherratelimit #9

Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 25 additions & 4 deletions src/musicbrainz/web/musicbrainzrecordingstask.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
#include "network/httpstatuscode.h"
#include "util/assert.h"
#include "util/compatibility.h"
#include "util/duration.h"
#include "util/logger.h"
#include "util/thread_affinity.h"
#include "util/versionstore.h"
Expand All @@ -26,8 +27,12 @@ const QString kRequestPath = QStringLiteral("/ws/2/recording/");

const QByteArray kUserAgentRawHeaderKey = "User-Agent";

// MusicBrainz allows only a single request per second
constexpr int kRateLimitMillis = 1000;
// According to the MusicBrainz API Doc: https://musicbrainz.org/doc/MusicBrainz_API/Rate_Limiting
// The rate limit should be one query in a second.
// Related Bug: https://github.com/mixxxdj/mixxx/issues/10795
// In order to not hit the rate limits and respect their rate limiting rule.
// Every request was delayed by one second.
constexpr int kMinTimeBetweenMbRequests = 1000;

QString userAgentRawHeaderValue() {
return VersionStore::applicationName() +
Expand Down Expand Up @@ -76,6 +81,7 @@ MusicBrainzRecordingsTask::MusicBrainzRecordingsTask(
networkAccessManager,
parent),
m_queuedRecordingIds(recordingIds),
m_measurementTimer(0),
m_parentTimeoutMillis(0) {
musicbrainz::registerMetaTypesOnce();
}
Expand Down Expand Up @@ -104,6 +110,9 @@ QNetworkReply* MusicBrainzRecordingsTask::doStartNetworkRequest(
<< "GET"
<< networkRequest.url();
}

m_measurementTimer.start();

return networkAccessManager->get(networkRequest);
}

Expand Down Expand Up @@ -156,7 +165,9 @@ void MusicBrainzRecordingsTask::doNetworkReplyFinished(
statusCode),
-1,
QStringLiteral("Failed to parse XML response"));
return;
if (m_queuedRecordingIds.isEmpty()) {
return;
}
}

if (m_queuedRecordingIds.isEmpty()) {
Expand All @@ -170,7 +181,17 @@ void MusicBrainzRecordingsTask::doNetworkReplyFinished(

// Continue with next recording id
DEBUG_ASSERT(!m_queuedRecordingIds.isEmpty());
slotStart(m_parentTimeoutMillis, kRateLimitMillis);
auto timerSinceLastRequest = m_measurementTimer.elapsed(true);
int timerSinceLastRequestMillis = timerSinceLastRequest.toIntegerMillis();
qDebug() << "Task took:" << timerSinceLastRequestMillis;
int delayMillis = kMinTimeBetweenMbRequests - timerSinceLastRequestMillis;
if (delayMillis <= 0) {
qDebug() << "Task took more than a second, slot is calling now.";
slotStart(m_parentTimeoutMillis, 0);
} else {
qDebug() << "Task took less than a second, slot is going to be called in:" << delayMillis;
slotStart(m_parentTimeoutMillis, delayMillis);
}
Comment on lines -173 to +194
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this extra complexity really required? What does advantage does it have?
from what I can tell, delayMillis will always be positive because each task will take longer than a second. Also calling slotStart with 0 delay will cause the request to be fired shortly after the first one, resulting in more than one request per second.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine the a request itself takes 500 ms. Without it, we will have one request each 1500 ms and not as desired one on 1000 ms. The timer is started right after the wait time is over, just before the actual request is done in doStartNetworkRequest();

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well how strict the the musicbrainz rate-limiting? If request A arrived at MB after 550ms (bad network routing, flaky connection, whatever) but then comes back after 50ms and request B only took 50ms per direction, the time between the two requests arriving at MB would only be 500ms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two consecutive request do not triggered the rate penalty. My guess is actually for this reason.
@fatihemreyildiz can you confirm that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can confirm that @daschuer 👍 .

I have tried it many times with the different tracks. After this fix, I've only hit the rate limit once (I think that is because of the global rate limit (300 requests per second) stated on the MB API doc ) besides that I've never hit the rate limit. If this happens, it can be easily fixed by a retry button replaced on the new GUI.

Since I played with the tag fetcher a lot. I also want to add few things that I noticed. If a track has about 8-9 recording IDs they might hit the rate limit time to time in my case (If someone has a way better connection than I have, this amount of recording IDs might be lower). After 10 recording IDs, I always hit the rate limit, as I said this can be different in your case.

One of my rough test result was like this:

  • Abba - Super Trouper -> Has 5 recording IDs normally took 4.78 seconds - after fix took 6.40 seconds
  • Axwell Ingrosso - MTYK -> Has 6 recording IDs normally took 4.30 seconds - after fix took 6.81 seconds
  • DVBBS - Tsunami -> Has 12 recording IDs normally failed in 8:36 seconds - after fix took 18.15 seconds.

In this case with the delayMillis the difference is not too much for the tracks which has less recording IDs. If we delay the every request by 1000ms that would be easily noticeable for popular tracks.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine. If this avoids a failure we can keep it.

}

void MusicBrainzRecordingsTask::emitSucceeded(
Expand Down
5 changes: 3 additions & 2 deletions src/musicbrainz/web/musicbrainzrecordingstask.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include "musicbrainz/musicbrainz.h"
#include "network/webtask.h"
#include "util/timer.h"

namespace mixxx {

Expand Down Expand Up @@ -44,15 +45,15 @@ class MusicBrainzRecordingsTask : public network::WebTask {
int errorCode,
const QString& errorMessage);

void continueWithNextRequest();

const QUrlQuery m_urlQuery;

QList<QUuid> m_queuedRecordingIds;
QSet<QUuid> m_finishedRecordingIds;

QMap<QUuid, musicbrainz::TrackRelease> m_trackReleases;

Timer m_measurementTimer;

int m_parentTimeoutMillis;
};

Expand Down
40 changes: 12 additions & 28 deletions src/network/webtask.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ WebTask::WebTask(
QNetworkAccessManager* networkAccessManager,
QObject* parent)
: NetworkTask(networkAccessManager, parent),
m_state(State::Idle),
m_state(State::Initial),
m_timeoutTimerId(kInvalidTimerId),
m_timeoutMillis(kNoTimeout) {
std::call_once(registerMetaTypesOnceFlag, registerMetaTypesOnce);
Expand All @@ -123,32 +123,24 @@ void WebTask::onNetworkError(
QNetworkReply::NetworkError errorCode,
const QString& errorString,
const WebResponseWithContent& responseWithContent) {
DEBUG_ASSERT(m_state == State::Pending);
DEBUG_ASSERT(m_state == State::Failed || m_state == State::Pending);
DEBUG_ASSERT(m_timeoutTimerId == kInvalidTimerId);

DEBUG_ASSERT(errorCode != QNetworkReply::NoError);
switch (errorCode) {
case QNetworkReply::OperationCanceledError:
// Client-side abort or timeout
m_state = State::Aborted;
break;
case QNetworkReply::TimeoutError:
// Network or server-side timeout
case QNetworkReply::OperationCanceledError: // Client-side timeout
case QNetworkReply::TimeoutError: // Network or server-side timeout
m_state = State::TimedOut;
break;
default:
m_state = State::Failed;
}
DEBUG_ASSERT(hasTerminated());

if (m_state == State::Aborted) {
emitAborted(responseWithContent.requestUrl());
Comment on lines -144 to -145
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state is still aborted, why remove the emitAborted signal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition can no longer be true because we have removed m_state = State::Aborted; above.

The original issue was that emitAborted() is designed to be the responds to manual user Abort(). There is no change in the GUI. Now we emitNetworkError() in the client timeout case as well.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does the Abort state have any purpose then? Can we remove it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it to change the state from any other state when calling slotAbort()

However we have some other places where State::Abort is used in responds of an error. I will do some checks if this even works as desired.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

} else {
emitNetworkError(
errorCode,
errorString,
responseWithContent);
}
emitNetworkError(
errorCode,
errorString,
responseWithContent);
}

void WebTask::emitNetworkError(
Expand Down Expand Up @@ -183,7 +175,6 @@ void WebTask::slotStart(int timeoutMillis, int delayMillis) {
// Reset state
DEBUG_ASSERT(!m_pendingNetworkReplyWeakPtr);
DEBUG_ASSERT(m_timeoutTimerId == kInvalidTimerId);
m_state = State::Idle;
m_timeoutMillis = kNoTimeout;

if (delayMillis > 0) {
Expand All @@ -205,7 +196,7 @@ void WebTask::slotStart(int timeoutMillis, int delayMillis) {

auto* const pNetworkAccessManager = m_networkAccessManagerWeakPtr.data();
VERIFY_OR_DEBUG_ASSERT(pNetworkAccessManager) {
m_state = State::Pending;
m_state = State::Failed;
onNetworkError(
QNetworkReply::NetworkSessionFailedError,
tr("No network access"),
Expand All @@ -222,12 +213,6 @@ void WebTask::slotStart(int timeoutMillis, int delayMillis) {
m_pendingNetworkReplyWeakPtr = doStartNetworkRequest(
pNetworkAccessManager,
timeoutMillis);
// Still idle, because we are in the same thread.
// The derived class is not allowed to abort a request
// during the callback before it has beeen started
// successfully. Instead it should return nullptr
// to abort the task immediately.
DEBUG_ASSERT(m_state == State::Idle);
if (!m_pendingNetworkReplyWeakPtr) {
kLogger.debug()
<< this
Expand Down Expand Up @@ -260,10 +245,10 @@ void WebTask::slotAbort() {
DEBUG_ASSERT_QOBJECT_THREAD_AFFINITY(this);
if (!isBusy()) {
DEBUG_ASSERT(m_timeoutTimerId == kInvalidTimerId);
if (m_state == State::Idle) {
if (m_state == State::Initial) {
kLogger.debug()
<< this
<< "Cannot abort idle task";
<< "Cannot abort task in Initial state";
} else {
DEBUG_ASSERT(hasTerminated());
kLogger.debug()
Expand All @@ -286,7 +271,6 @@ void WebTask::slotAbort() {
QUrl requestUrl;
auto* const pPendingNetworkReply = m_pendingNetworkReplyWeakPtr.data();
if (pPendingNetworkReply) {
DEBUG_ASSERT(m_state == State::Pending);
if (pPendingNetworkReply->isRunning()) {
kLogger.debug()
<< this
Expand Down Expand Up @@ -332,7 +316,7 @@ void WebTask::timerEvent(QTimerEvent* event) {

if (m_state == State::Starting) {
DEBUG_ASSERT(!m_pendingNetworkReplyWeakPtr);
m_state = State::Idle;
m_state = State::Initial;
slotStart(m_timeoutMillis);
return;
}
Expand Down
2 changes: 1 addition & 1 deletion src/network/webtask.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ class WebTask : public NetworkTask {

enum class State {
// Initial state
Idle,
Initial,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the renaming only makes the state machine more difficult to understand, but I'm not gonna start bikeshedding over a name.

// Timed start
Starting,
// Pending state
Expand Down