Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage resource group #2791

Open
achimnol opened this issue Aug 30, 2024 · 0 comments
Open

Storage resource group #2791

achimnol opened this issue Aug 30, 2024 · 0 comments
Labels
comp:agent Related to Agent component comp:manager Related to Manager component comp:storage-proxy Related to Storage proxy component
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Aug 30, 2024

This issue defines the extension of storage-proxy to host "direct-access" sessions for fast-path of large file transfers:

  • They bypass App Proxy to minimize the traffic forwarding overheads.
  • They directly access the storage volume mount to minimize the burden of the data-plane.

The best place to achieve the above conditions is Storage Proxy.

This issue overrides the concept of previously proposed "DIRECT_ACCESS" sessions.

Current design

flowchart LR
    classDef ServerNode fill:#99c2,stroke:#99c;
	classDef Container fill:#9992,stroke:#999;

	User("User")
    AP[App Proxy]

	subgraph ComputeRg["Regular resource group"]
		subgraph ComputeNode1["Compute Node"]
			Ag1[Agent]
			C1["Session container"]
			FB1["Filebrowser container"]
		end
	end

	subgraph StorageNode["Storage Proxy Node"]
	   	SP[Storage Proxy]
	    subgraph SFTPRg["SFTP resource group"]
			SAg["Agent"]
			SC1["SFTP container"]
		end
    end

	class StorageNode ServerNode
	class ComputeNode1 ServerNode
	class C1 Container
	class SC1 Container
	class FB1 Container

	User == "vfolder API" ==> SP
	User -. "SFTP (slow)" .-> AP
	User == "SFTP (fast)" ==> SC1
	User -. "HTTPS" .-> AP
    S[Storage volume]
    
    SP == "NFS" ==> S
    SC1 == "NFS" ==> S
    AP -.-> C1 == "NFS" ==> S
    AP -. "HTTP" .-> FB1 == "NFS" ==> S
Loading

So far, we have implemented a specialized form of resource group and session for "SFTP", as the above diagram shows.

Proposed design

flowchart LR
    classDef ServerNode fill:#99c2,stroke:#99c;
	classDef Container fill:#9992,stroke:#999;

	User("User")
    AP[App Proxy]

	subgraph ComputeRg["Regular resource group"]
		subgraph ComputeNode1["Compute Node"]
			Ag1[Agent]
			C1["Session container"]
		end
	end

    subgraph StorageRg["Storage resource group"]
		subgraph StorageNode["Storage Proxy Node"]
	    	SP["Storage Proxy + Storage Agent"]
			SC1["SFTP container"]
			FB1["Filebrowser container"]
		end
    end

	class StorageNode ServerNode
	class ComputeNode1 ServerNode
	class C1 Container
	class SC1 Container
	class FB1 Container

	User == "vfolder API" ==> SP
	User -. "SFTP (slow)" .-> AP
	User == "SFTP (fast)" ==> SC1
	User == "HTTPS" ==> FB1
    S[Storage volume]
    
    SP == "NFS" ==> S
	SC1 == "NFS" ==> S
	FB1 == "NFS" ==> S
    AP -.-> C1 == "NFS" ==> S
Loading

Goals

  • Simplify the deployment and configuration of storage agent(s). Once users/developers install Backend.AI and its Storage Proxy, there should be nothing special to do for starting use of "fast" SFTP sessions.
    • Currently we have to do extra docker pull and agent/resource-group configuration to say which image is for SFTP and filebrowser, etc.
  • Let's avoid incorporating bug-prone specialized implementations and configurations just for "SFTP" sessions.

Design

  • Generalize "SFTP resource group" to Storage resource group.
    • Manager won't allocate (i.e., exclusively reserve) the occupied resource slots for the containers in storage resource groups. Instead, it will limit the total number of containers per storage agent (configured in storage-agent.toml and reported via heartbeat) to allow oversubscription of storage-access containers.
    • Still, storage agents set per-container cgroup resource limits.
  • Generalize "sftp concurrency limit" to "max session count per resource group per user".
  • Combine Storage Proxy with Storage Agent.
    • Storage Agent reuses the existing DockerAgent backend but adds security features to allow exposing container ports publicly.
    • In a single Python process, both the Storage Proxy's aiohttp application and the Storage Agent's RPC handler runs side-by-side.
    • It uses storage-agent.toml (which is almost compatible with agent.toml) so that a regular agent and the storage agent together in an all-in-one setup with ease.
  • Let Storage Agent also manage the Filebrowser sessions and make them faster in the same way.

Technical issues

  • Filebrowser containers should have HTTPS (self-signed certificates or customer-provided ones) if exposed to the public network. This could be implemented as a storage-agent configuration to auto-generate or inject the certificates and mount it into the filebrowser containers.

References

@achimnol achimnol added this to the 25.03 milestone Aug 30, 2024
@achimnol achimnol added type:feature Add new features comp:manager Related to Manager component comp:agent Related to Agent component comp:storage-proxy Related to Storage proxy component labels Aug 30, 2024
@achimnol achimnol changed the title Storage resource group (Writing in progress) Storage resource group Aug 30, 2024
@achimnol achimnol modified the milestones: 24.12, 25Q1 Oct 15, 2024
@achimnol achimnol removed the type:feature Add new features label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:agent Related to Agent component comp:manager Related to Manager component comp:storage-proxy Related to Storage proxy component
Projects
None yet
Development

No branches or pull requests

1 participant