Developer Documentation
# Let's Data : Focus on the data - we'll manage the infrastructure!
Cloud infrastructure that simplifies how you process, analyze and transform data.
Customer Access
The #Let's Data datasets write output records to a write destination and error records to an error destination. These write and error destinations can be:
- either located in the #Let's Data AWS account and be managed by #Let's Data
- or located in the customer's account.
When these error and write destinations are in the customer's account, accessing the output and error records is simple - the customer can use their credentials with the AWS API and access the records.
However, when these error and write destinations are located in the #Let's Data AWS account, the #Let's Data initialization workflow will grant the customer account access to these error and write detinations via an IAM role. This IAM role is listed in the dataset json as the customerAccessRoleArn attribute. We'll also need the dataset's createDatetime which is set as the externalId (contextId) for the sts:assumeRole call:
Here is how the customer can access the data using the customerAccessRoleArn:
- Use AWS SecurityTokenService (AWS STS)'s assumeRole API to get access credentials to the resources. In this case, the customer code is running as the customer's aws account which would then assume the customerAccessRoleArn IAM role. There is one caveat, assumeRole API can assume roles only when running as an IAM user (not as a root account). If the customer code is running as the root account, the assumeRole API will return an error. The simple fix is to create an IAM User and grant it assumeRole access. (We've granted these IAM users AdministratorAccess and that seems to work fine). To follow the AWS security best practices, we've also added an additional externalId (contextId) in the sts:assumeRole call to disallow access in from unknown contexts. Currently, the dataset's createDatetime is set as the externalId.
- Call the write / error destination APIs to get the data using these access credentials. The stream details such as streamName and the error bucketName are in the dataset json.
- A sample implementation of the STS assume role is in the STSUtil.java - this can be used for the Kinesis and S3 destinations. For the Kafka destination, we use the AWS's aws-msk-iam-auth library which uses the same methodology to connect securely to the Kafka cluster. We did make a private fix to this library - you'll need to download our custom version of the jar to access Kafka Cluster. For those interested, GitHub Issue and the fix that we made
Here are details for each of these steps - STS assumeRole API, Kinesis Reader, S3 Reader, Kafka Reader, IAM User with AdministratorAccess and the cli driver Main class. You can view these code examples in entirety at the letsdata-writeconnector-reader GitHub repo.
- Simple implementation creates an STS client using the IAM User's credentials
- Calls the assumeRole API with the roleArn and policy texts