DCS 2: Decentralized Cloud Exchange Source

AuthorAdam Bozanich
StatusFinal
TypeStandards Track
CategoryCore
Created2018-03-18

Summary

Decentralized cloud computing exchange connects those who need computing resources with those that have computing capacity to lease providers.

Specification

Workflow

Actors

Tenants

A tenant hosting an application on the Akash network

Datacenters

Each datacenter will host an agent which is a mediator between the with the Akash Network and datecenter-local infrastructure.

The datacenter agent is responsible for

  • Bidding on orders fulfillable by the datacenter.
  • Managing managing active leases it is a provider for.

Validators

A Akash Node that is elected to be a validator in the DPoS consensus scheme.

Marketplace Facilitators

Marketplace facilitators maintain the distributed exchange (marketplace). Validators will initially perform this function.

Distributed Exchange

Global Parameters

Name Description
reconfirmation-period Number of blocks between required lease confirmations
collateral-interest-rate Interest rate awarded to datacenters for collateral posted with fulfillment orders

Models

ComputeUnit

Field Definition
cpu Number of vCPUs
memory Amount of memory in GB
disk Amount of block storage in GB

ResourceGroup

Field Definition
compute compute unit definition
price Price of compute unit per time unit
collateral Collateral per compute unit
count Number of defined compute units

Deployment

A Deployment represents the state of a tenant’s application. It includes desired infrastructure and pricing parameters, as well as workload definitions and connectivity.

Field Definition
infrastructure List of deployment infrastructure definitions
wait-duration Amount of time to wait before matching generated orders with fulfillment orders

DeploymentInfrastructure

DeploymentInfrastructure represents a set of resources (including pricing) that a tenant would like to be provisioned in a single datacenter. orders are created from deployment infrastructure as necessary.

Field Definition
region Geographic region of datacenter
persist Whether or not to maintain active lease if current lease is broken
resources List of resource groups for this datacenter

Within the resources list, resource group fields are interpreted as follows:

Field Definition
price Maximum price tenant is willing to pay.
collateral Amount of collateral that the datacenter must post when creating a fulfillment order

Order

A Order is generated for each deployment infrastructure present in the deployment.

Field Definition
region Geographic region of datacenter
resources List of resource groups for this datacenter
wait-duration Number of blocks to wait before matching the order with fulfillment orders

Fulfillment

A Fulfillment represents a datacenter’s interest in providing the resources requested in a order.

Field Definition
order ID of order which is being bid on.
resources List of resource groups for this datacenter.

The resources list must match the order’s resources list for each resource group with the following rules:

  • the compute, count,collateral fields must be the same.
  • the price field represents the datacenter’s offering price and must be less than or equal to the order’s price.

The total collateral required to post a fulfillment order is the sum of collateral fields present in the order’s resources list.

Lease

A Lease represents a matching order and fulfillment order.

Field Definition
deployment-order ID of order
fulfillment-order ID of fulfillment order

LeaseConfirmation

A LeaseConfirmation represents a confirmation that the resources are being provided by the datacenter. Its creation may initiate a transfer of tokens from the tenant to the datacenter

Field Definition
lease ID of lease being confirmed

Transactions

SubmitDeployment

Sent by a tenant to deploy their application on Akash. A order will be created for each datacenter configuration described in the deployment

UpdateDeployment

Sent by a tenant to update their application on Akash.

CancelDeployment

Sent by a tenant to cancel their application on Akash.

SubmitFulfillment

Sent by a datacenter to bid on a order.

CancelFulfillment

Sent by a datacenter to cancel an existing fulfillment order.

SubmitLeaseConfirmation

Sent by a datacenter to confirm a lease that it is engaged in. This should be called once every reconfirmation period rounds.

SubmitLease

Sent by a validator to match a order with a fulfillment order.

SubmitStaleLease

Sent by a validator after finding a lease that has not been confirmed in reconfirmation period rounds.

Workflows

Tenants

Tenants submit their deployment to the network via SubmitDeployment.

Marketplace Facilitators

Every time a new block is created, each facilitator runs MatchOpenOrders and InvalidateStaleLeases

MatchOpenOrders

For each order that is ready to be fulfilled (state=open,wait-duration has transpired):

  1. Find the matching fulfillment order with the lowest price.
  2. Emit a SubmitLease transaction to initiate a lease for the matching orders.
InvalidateStaleLeases

For each active lease that has not been confirmed in reconfirmation-period:

  1. Emit a SubmitStaleLease transaction

Datacenters

Every time a new block is created, each datacenter runs ConfirmCurrentLeases and BidOnOpenOrders

ConfirmCurrentLeases

For each lease currently provided by the datacenter:

  1. Emit a SubmitLeaseConfirmation event for the lease.
BidOnOpenOrders

For each open order:

  1. If the datacenter is out of collateral, exit.
  2. If datacenter is not able to fulfill the order, skip to next order.
  3. Emit a SubmitFulfillment transaction for the order

Deployments

Once resources have been procured, clients must distribute their workloads to providers so that they can execute on the leased resources. We refer to the current state of the client’s workloads on the Akash Network as a “deployment”.

A tenant describes their desired deployment in a “manifest”. The manifest contains workload definitions, configuration, and connection rules. Providers use workload definitions and configuration to execute the workloads on the resources they’re providing, and use the connection rules to build an overlay network and firewall configurations.

A hash of the manifest is known as the deployment “version” and is stored on the blockchain-based distributed database.

Workflow

  1. Stack infrastructure is submitted to the ledger.
  2. Ask orders are generated for resources defined in the stack infrastructure.
  3. Providers (data centers) bid on orders.
  4. Leases are reached by matching bid and ask orders.
  5. Stack manifest is distributed to deployment data centers (lease providers).
  6. Datacenters deploy workloads and distribute connection parameters to all other deployment datacenters.
  7. Overlay network is established to allow for connectivity between workloads.

Manifest Distribution

Each on-chain deployment contains a hash of the manifest. This hash represents the deployment version.

The manifest contains sensitive information which should only be shared with participants of the deployment. This poses a problem for self-managed deployments - Akash must distribute the workload definition autonomously, without revealing its contents to unnecessary participants.

To address these issues, we devised a peer-to-peer file sharing scheme in which lease participants distribute the manifest to one another as needed. The protocol runs off-chain over a TLS connection; each participant can verify the manifest they received by computing its hash and comparing this with the deployment version that is stored on the blockchain-backed distributed database.

In addition to providing private, secure, autonomous manifest distribution, the peer-to-peer protocol also enables fast distribution of large manifests to a large number of datacenters.

Overlay Network

By default, a workload’s network is isolated - nothing can connect to it. While this is secure, it is not practical for real-world applications. For example, consider a simple web application: end-tenant browsers should have access to the web tier workload, and the web tier needs to communicate to the database workload. Furthermore, the web tier may not be hosted in the same datacenter as the database.

On the Akash Network, clients can selectively allow communications to and between workloads by defining a connection topology within the manifest. Datacenters use this topology to configure firewall rules and to create a secure network between individual workloads as needed.

To support secure cross-datacenter communications, providers expose workloads to each other through a mTLS tunnel. Each workload-to-workload connection uses a distinct tunnel.

Before establishing these tunnels, providers generate a TLS certificate for each required tunnel and exchange these certificates with the necessary peer providers. Each provider’s root certificate is stored on the blockchain-based distributed database, enabling peers to verify the authenticity of the certificates it receives.

Once certificates are exchanged, providers establish an authenticated tunnel and connect the workload’s network to it. All of this is transparent to the workloads themselves - they can connect to one another through stable addresses and standard protocols.

Models

Stack

A stack is a description of all components necessary to deploy an application on the Akash Network.

A stack includes:

  • Infrastucture requirements.
  • Manifest of workloads to deploy on procured infrastructure.

Manifest

A manifest describes workloads and how they should be deployed.

A manifest includes:

  • Workloads to be executed.
  • Data center placement for each workload.
  • Connectivity rules describing which entities are allowed to connect to each workload.

Deployment

A deployment represents the current state of a stack as fulfilled by the Akash Network.

  • Infrastructure procured via the cloud exchange (leases).
  • Manifest distribution state.
  • Overlay network state.

Workload

Field Description
name Workload name
container Docker container
compute resources needed for each instance
count number of instances to run
connections List of allowed incomming connections

Connection

Field Description
port TCP port
workload Workload name to allow incomming connection from
datacenter Datacenter to allow incomming connection from
global If true, allow all connections, regardless of source

LeasedWorkload

Field Description
lease Lease ID
workload Workload name
certificate SSL certificate for workload
addresses List of (address,port) for connecting to remote workload

Automation

The dynamic nature of cloud infrastructure is both a blessing and a curse for operations management. That new resources can be provisioned at will is a blessing; the exploding management overhead and complexity of said resources is a curse. The goal of DevOps – the practice of managing deployments programmatically – is to alleviate the pain points of cloud infrastructure by leveraging its strengths.

The Akash Network was built from the ground up to provide DevOps engineers with a simple but powerful toolset for creating highly-automated deployments. The toolset is comprised of the primitives that enable non-management applications – generic workloads and overlay networks – and can be leveraged to create autonomous, self-managed systems.

Self-managed deployments on Akash are a simple matter of creating workloads that manage their own deployment themselves. A DevOps engineer may employ a workload that updates DNS entries as providers join or leave the deployment; tests response times of web tier applications; and scales up and down infrastructure (in accordance with permissions and constraints defined by the client) as needed based on any number of input metrics. The “management tier” may be spread across all datacenters for a deployment, with global state maintained by a distributed database running over the secure overlay network.

Examples

Latency-Optimized Deployment

Many web-based applications are “latency-sensitive” - lower response times from application servers translates into a dramatically improved end-tenant experience. Modern deployments of such applications employ content delivery networks (CDNs) to deliver static content such as images to end tenants quickly.

CDNs provide reduced latency by distributing content so that it is geographically close to the tenants that are accessing it. Deployments on the Akash Network can not only replicate this approach, but beat it - Akash gives clients the ability to place dynamic content close to an application’s tenants.

To implement a self-managed “dynamic delivery network” on Akash, a DevOps engineer would include a management tier in their deployment which monitors the geographical location of clients. This management tier would add and remove datacenters across the globe, provisioning more resources in regions where tenant activity is high, and less resources in regions where tenant participation is low.

Machine Learning Deployment

Machine learning applications employ a large number of nodes to parallelize computations involving large datasets. They do their work in “batches” - there is no “steady state” of capacity that is required.

A machine learning application on Akash may use a management tier to proactively procure resources within a single datacenter. As a machine learning task begins, the management tier can “scale up” the number of nodes for it; when a task completes, the resources provisioned for it can be relinquished.

History

March 8, 2018: Initial Design based on Akash Whitepaper

All content herein is licensed under Apache 2.0.