-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Add initial implementation for rate limiting failed logins #3404
fix: Add initial implementation for rate limiting failed logins #3404
Conversation
Kudos, SonarCloud Quality Gate passed! 0 Bugs |
Codecov Report
@@ Coverage Diff @@
## master #3404 +/- ##
==========================================
+ Coverage 42.90% 43.15% +0.24%
==========================================
Files 179 179
Lines 19953 20142 +189
Branches 272 237 -35
==========================================
+ Hits 8561 8692 +131
- Misses 10396 10454 +58
Partials 996 996
Continue to review full report at Codecov.
|
…s/rate-limit-failed-logins
I think codecov might be drunk. |
…asier to understand meaning of each variable
Bugs/rate limit failed logins
PR Summary
This PR implements rate limiting of failed login attempts in ArgoCD and thereby
addresses CVE-2020-8827.
Implementation details
The rate limiter uses the Redis cache of ArgoCD to store information about
failed login attempts for non-SSO users, i.e. targets
SessionManager
'sVerifyUsernamePassword()
method.For this purpose, we introduce the application cache to
SessionManager
andmake it a mandatory component.
The cache key used is
session|login.attempts
, and it stores a data structurecontaining information about failed login attempts. When the password for any
given account has been consecutively entered wrong for a configurable number of
times (
failures
) within a configurable timeframe (failure window
), thevalidation of the following logon attempts is delayed by a configurable number
of seconds (
delay
). Thedelay
is removed after a successful login successfulor when the last login attempt has been made outside the
failure window
.As to not reintroduce account name guessing, the limiter makes no difference
between existing and non-existing accounts. To prevent possible DoS attacks
on memory consumption of the cache by unauthenticated users, a maximum number
of entries in the cache is enforced. When this limit is reached, a random
entry in the cache will be deleted before a new entry is added.
Design decision 1: Focus on accounts, not origins
The algorithm doesn't take the origin of the logon failure (i.e. remote IP
address) into account. The logon failures are counted per account, regardless
of where the logon attempts originates from. For one, we just cannot
reliably determine the origin of the logon request, because we might be
behind a complex reverse-proxy structure (i.e. load balancers, Ingress, etc.).
Second, an attacker itself might hide behind a huge number of different source
addresses i.e. by utilizing anonymous HTTP proxy networks, a botnet etc. On the
other hand, legitimate users might all be connecting via i.e. a corporate HTTP
proxy server, thus sharing the same source IP. Extracting the real user's source
address reliably and in a correct way in all possible user setups might be next
to impossible.
Design decision 2: Delay attempts, but do not lock accounts
As a measure to prevent locking out legitimate users without reliably knowing
their origin, it was chosen to perform a (configurable) delay before actually
validating a username/password combination and letting the requester know the
result. As described above, this delay kicks in after a number of
failures
within given
failure window
. This means, that if adelay
is active, alsolegitimate users will be affected by it - if the legitimate user enters their
valid credentials, they have to wait for
delay
time before the system willactually log them in.
Design decision 3: Do not exponentially increase delay
To prevent an attacker from deliberately lock out legitimate users from using
their accounts, we chose not to exponentially increase the delay to a possibly
indefinite time. The goal is to make the logon process slow enough for the
attacker to prevent him from quickly trying different combinations of passwords
while still giving legitimate users a reasonable possibility to use their
account while the account is being (or was) under a brute-force attempt. The
algorithm can be configured to increase the delay successively by a fixed
amount of time, up to a configurable maximum limit.
Configuration and defaults
The feature can be configured by setting the following environment variables
on the
argocd-server
pod:ARGOCD_SESSION_MAX_FAIL_COUNT
: Maximum number of failed logins before thedelay kicks in. Default:
5
.ARGOCD_SESSION_FAILURE_DELAY_START
: Time in seconds the authenticationshould be delayed for if the limiter becomes first active. Default:
3
ARGOCD_SESSION_FAILURE_DELAY_INCREASE
: Time in seconds the authenticationdelay should be increased on consecutive login failures after max fail count
has been reached. Default:
2
ARGOCD_SESSION_FAILURE_DELAY_MAX
: Max time in seconds the authenticationdelay can be increased to. Default:
30
ARGOCD_SESSION_FAILURE_WINDOW
: Number of seconds for the failure window.Default:
300
(5 minutes). If this is set to0
, the failure window isdisabled and the
delay
kicks in after 10 consecutive logon failures,regardless of the time frame they happened.
ARGOCD_SESSION_MAX_CACHE_SIZE
: Maximum number of entries allowed in thecache. Default:
1000
Notes
The UI has not been adapted to indicate the delay. This is left for a later
point in time. It should display a spinner or something along these lines
while waiting for session creation by the server. For this, a new API method
to get the currently applied delay has to be implemented.
This change also introduces a maximum length of 32 for usernames, mainly to
prevent cache pollution and DoS by using insanely long usernames in invalid
requests. This has be documented in the user's documentation.