logo
SIGN UP

Developer Documentation

# Let's Data : Focus on the data - we'll manage the infrastructure!

Cloud infrastructure that simplifies how you process, analyze and transform data.

Vpcs

Connector Destinations such as AWS Kafka require setting up a Virtual Private Cloud (VPC). Let's Data automatically creates and secures the Vpc at write connector initialization and deletes the VPC when write connector is deleted. #LetsData provides self-service infrastructure to enable connectivity to these Vpcs.

IP Address Management

The VPC IP Address management is done automatically by #LetsData, here are some details on how LetsData manages the IP addresses for the datasets.

  • Each Vpc is assigned an IP address range (cidrBlock) from the Amazon's recommended private IP space. (We currently assign IPs in the 10.0.0.0 Ip address range).
  • We allocate a defined IP Range for a tenant (10.X.X.X/21 cidr ~ 2000 IPs) (TenantIPRange). Each dataset allocates a dataset IP range (DatasetIPRange) from the tenant's TenantIPRange. Dataset resources are then created using IPs from this dataset IP range (DatasetIPRange).
  • One important point to note is that if the customer wants to establish a VpcPeeringConnection to this dataset's Vpc, the customer Vpc's cidrBlock should not overlap with the Dataset's cidrBlock. Choosing a specific IPRange is not allowed at this time, we may allow users to select a non overlapping IP Address Range in future.

In terms of how many IPs to allocate to each dataset, we allow users to specify a Vpc size (small, medium, large) in their dataset configuration. We currently support the following vpc sizes:

  • small: This allocates a IP range with subnet mask set to 25 - this allocates around 128 IPs.
  • medium: This allocates a IP range with subnet mask set to 23 - this allocates around 512 IPs.
  • large: This allocates a IP range with subnet mask set to 22 - this allocates around 1024 IPs.

Network Architecture

Our standard vpc setup, as of now, is as follows:

  • Private Subnets: Three private subnets, each in a different availability zone. The cluster resources are created in these private subnets.
  • Public Subnets: Three public subnets, one in each availability zone. These are connected to 3 NAT Gateways (one for each availability zone) for connectivity to AWS resources.
  • Lambda Connectivity: Elastic network interfaces are created for the Data Task Lambda function to connect to the resources in the VPC.
  • Security Groups / ACLs: The security groups and network ACLs are currently configured to allow inbound / outbound TPC traffic on all ports.
  • External Connectivity: No Routes are currently configured to route external traffic to the resources in the private / public subnets. When a VpcPeeringConnection is established, we allow inbound traffic from the peer Vpc for TCP on all ports. (In case you are testing connectivity, ICMP (ping) / UDP etc might not work. Test with TCP.)
  • Customer can access the vpc by creating a VpcPeeringConnection to this Vpc.
 

VPC Peering Connections

Customers can establish VPC Peering Connections to #Let's Data Vpcs to access the resources in the VPC. You can learn about VPC Peering Connections at:  AWS Docs - VPC peering basics

To establish a VPC Peering connection, we'll need a client VPC - this is a VPC that the customer has created. We also refer to it as the requester VPC. The client / requester VPC should be in the dataset's customerAccountForAccess AWS account. We'll use this VPC to connect to the #Let's Data Write Connector VPC that has resources that we need to access e.g. the Write Connector Kafka cluster.

Setup

Creating a VPC peering connection is essentially a two step proccess, the requester VPC (client VPC) sends a VPC Peering Connection request and the accepter VPC (lets data VPC) accepts the request. Here are the commands to setup VPC Peering Connection:

  • Gather the #LetsData VPC Details: Get the dataset's #Let's Data VPC details. We'll use the LetsData CLI's vpcs list command and will need the vpcId, ownerId and cidrBlock from the output.
  • Create a VPC Peering Connection Request: Create a VPC peering connection request using the following AWS CLI command. Save the vpcPeeringConnectionId from the output, we'll need it in the next step.
  • Accept the VPC Peering Connection: Accept the VPC peering connection on behalf of #Let's Data by using the vpcs vpcPeeringConnections accept LetsData CLI command
  • List VPC Peering Connections: List the VPC peering connections for a #LetsData VPC by using the vpcs vpcPeeringConnections list LetsData CLI command
  • Delete a VPC Peering Connection: To delete a VPC peering connection for a #LetsData VPC, use the vpcs vpcPeeringConnections delete #LetsData CLI command

Vpcs & VpcPeeringConnections: CLI Commands

Here are a few different actions related to vpcs:

Command Syntax:

Command Help:

Show Help
 

Command Examples:

Show Examples
On This Page