Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add initial implementation for rate limiting failed logins #3404

Merged
merged 13 commits into from
Apr 21, 2020

Conversation

jannfis
Copy link
Member

@jannfis jannfis commented Apr 13, 2020

PR Summary

This PR implements rate limiting of failed login attempts in ArgoCD and thereby
addresses CVE-2020-8827.

Implementation details

The rate limiter uses the Redis cache of ArgoCD to store information about
failed login attempts for non-SSO users, i.e. targets SessionManager's
VerifyUsernamePassword() method.

For this purpose, we introduce the application cache to SessionManager and
make it a mandatory component.

The cache key used is session|login.attempts, and it stores a data structure
containing information about failed login attempts. When the password for any
given account has been consecutively entered wrong for a configurable number of
times (failures) within a configurable timeframe (failure window), the
validation of the following logon attempts is delayed by a configurable number
of seconds (delay). The delay is removed after a successful login successful
or when the last login attempt has been made outside the failure window.

As to not reintroduce account name guessing, the limiter makes no difference
between existing and non-existing accounts. To prevent possible DoS attacks
on memory consumption of the cache by unauthenticated users, a maximum number
of entries in the cache is enforced. When this limit is reached, a random
entry in the cache will be deleted before a new entry is added.

Design decision 1: Focus on accounts, not origins

The algorithm doesn't take the origin of the logon failure (i.e. remote IP
address) into account. The logon failures are counted per account, regardless
of where the logon attempts originates from. For one, we just cannot
reliably determine the origin of the logon request, because we might be
behind a complex reverse-proxy structure (i.e. load balancers, Ingress, etc.).
Second, an attacker itself might hide behind a huge number of different source
addresses i.e. by utilizing anonymous HTTP proxy networks, a botnet etc. On the
other hand, legitimate users might all be connecting via i.e. a corporate HTTP
proxy server, thus sharing the same source IP. Extracting the real user's source
address reliably and in a correct way in all possible user setups might be next
to impossible.

Design decision 2: Delay attempts, but do not lock accounts

As a measure to prevent locking out legitimate users without reliably knowing
their origin, it was chosen to perform a (configurable) delay before actually
validating a username/password combination and letting the requester know the
result. As described above, this delay kicks in after a number of failures
within given failure window. This means, that if a delay is active, also
legitimate users will be affected by it - if the legitimate user enters their
valid credentials, they have to wait for delay time before the system will
actually log them in.

Design decision 3: Do not exponentially increase delay

To prevent an attacker from deliberately lock out legitimate users from using
their accounts, we chose not to exponentially increase the delay to a possibly
indefinite time. The goal is to make the logon process slow enough for the
attacker to prevent him from quickly trying different combinations of passwords
while still giving legitimate users a reasonable possibility to use their
account while the account is being (or was) under a brute-force attempt. The
algorithm can be configured to increase the delay successively by a fixed
amount of time, up to a configurable maximum limit.

Configuration and defaults

The feature can be configured by setting the following environment variables
on the argocd-server pod:

  • ARGOCD_SESSION_MAX_FAIL_COUNT: Maximum number of failed logins before the
    delay kicks in. Default: 5.

  • ARGOCD_SESSION_FAILURE_DELAY_START: Time in seconds the authentication
    should be delayed for if the limiter becomes first active. Default: 3

  • ARGOCD_SESSION_FAILURE_DELAY_INCREASE: Time in seconds the authentication
    delay should be increased on consecutive login failures after max fail count
    has been reached. Default: 2

  • ARGOCD_SESSION_FAILURE_DELAY_MAX: Max time in seconds the authentication
    delay can be increased to. Default: 30

  • ARGOCD_SESSION_FAILURE_WINDOW: Number of seconds for the failure window.
    Default: 300 (5 minutes). If this is set to 0, the failure window is
    disabled and the delay kicks in after 10 consecutive logon failures,
    regardless of the time frame they happened.

  • ARGOCD_SESSION_MAX_CACHE_SIZE: Maximum number of entries allowed in the
    cache. Default: 1000

Notes

  • The UI has not been adapted to indicate the delay. This is left for a later
    point in time. It should display a spinner or something along these lines
    while waiting for session creation by the server. For this, a new API method
    to get the currently applied delay has to be implemented.

  • This change also introduces a maximum length of 32 for usernames, mainly to
    prevent cache pollution and DoS by using insanely long usernames in invalid
    requests. This has be documented in the user's documentation.

@sonarcloud
Copy link

sonarcloud bot commented Apr 13, 2020

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities (and Security Hotspot 0 Security Hotspots to review)
Code Smell A 2 Code Smells

93.6% 93.6% Coverage
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Apr 13, 2020

Codecov Report

Merging #3404 into master will increase coverage by 0.24%.
The diff coverage is 86.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3404      +/-   ##
==========================================
+ Coverage   42.90%   43.15%   +0.24%     
==========================================
  Files         179      179              
  Lines       19953    20142     +189     
  Branches      272      237      -35     
==========================================
+ Hits         8561     8692     +131     
- Misses      10396    10454      +58     
  Partials      996      996              
Impacted Files Coverage Δ
util/oidc/oidc.go 7.57% <0.00%> (ø)
server/cache/cache.go 75.51% <53.84%> (-9.86%) ⬇️
util/session/sessionmanager.go 69.88% <92.30%> (+18.60%) ⬆️
server/server.go 57.59% <100.00%> (ø)
util/cache/appstate/cache.go 93.10% <100.00%> (ø)
ui/src/app/applications/components/utils.tsx 49.83% <0.00%> (ø)
ui/src/app/shared/services/projects-service.ts 19.71% <0.00%> (ø)
server/application/application.go 28.25% <0.00%> (+0.21%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75d9f23...2a8c32a. Read the comment docs.

@jannfis jannfis marked this pull request as ready for review April 13, 2020 16:52
@jannfis jannfis requested review from alexmt and jessesuen April 13, 2020 16:56
@jannfis
Copy link
Member Author

jannfis commented Apr 17, 2020

I think codecov might be drunk.

@alexmt alexmt merged commit 76bacfd into argoproj:master Apr 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants