AWSCloudTechnology

CloudFormation Custom Resources

cloudy skies

CloudFormation is the Amazon Web Services (AWS) method of creating repeatable infrastructure as code. Technically templates that describe resources to be built in AWS. There are things that either were never implemented in CloudFormation, or need some logic that exceeds what CloudFormation does. For those cases, AWS provided the capability of adding macros and custom resources by connecting Lambda functions to CloudFormation.

Over the past little while, I have built a few of these Custom Resources to do various things:

  • Build an AMI (Amazon Machine Image) from an instance
  • Create a set of security groups that are pulled from a VPN provider
  • Compute an SMTP password for an IAM user in multiple regions to provide access to SES via SMTP, and put them into Secrets Manager)
  • Connect to other external services like Vault

First I want to talk about the simplest of these which is the AMI builder. For some reason, the creation of an AMI is missing from CloudFormation, so to automate creation of an AMI you would normally resort to  building an instance then running some other script or CLI (Command Line Interface) command to generate the new AMI

We use CloudFormation, and the Custom Resource concept extends that. I figured that I could create a CloudFormation Custom Resource to build an AMI. Which would let build a stack that could create the instance and the AMI all in one atomic operation.

First step in building a custom resource is to decide the language that you will use for your Lambda. I have become a bit of a fan-boy of Go for it’s elegance, simplicity, and speed. Not to mention it’s ability to run lots of concurrent tasks, (although that part is not really important for this). So I started down the road of using #golang.

Go (Golang) application

To start your Go application, you need a main package function that uses the cfn.LambdaWrap method to handle turning the request into things you can process in your code, so the empty skeleton looks like this:

package main

import (
   "context"
   "github.com/aws/aws-lambda-go/cfn"
   "github.com/aws/aws-lambda-go/lambda"
)

func LambdaAddAMIs(ctx context.Context, event cfn.Event) (physicalResourceID string, data map[string]interface{}, err error) {
   return
}

func main() {
   lambda.Start(cfn.LambdaWrap(LambdaAddAMIs))
}

The LambdaAddAMIs is the part of the code that handles the actual request, so the first thing we need to do is understand what comes in from CloudFormation, and decide what sorts of parameters we might need when calling the template to build our AMI. In this case we’ll need at least an instance ID. 

Usually the first thing I do is to learn about the inputs and outputs that we’ll have to work with. For a Custom Resource, we will get the Context and Event as inputs. The function returns the physical resource ID, any attributes (in the data structure), and the errors to CloudFormation.

To get started I usually just add some debug code that will show up CloudWatch log trail for the Lambda. I add a few log statements like:

log.Printf("Lambdahandler(%#v, %#v)", ctx, event)

All this will do is print out the name of the method and the arguments to it. Alternatively you could split those out to make the log easier to read like:

log.Printf("Context: %#v", ctx)
log.Printf("Event: %#v", event)

That will log the data from the context object and the event, which helps in debugging. Most of what we need will come from the Event. We can examine this by looking at the Event structure in the cfn package:

type Event struct {
   RequestType           RequestType            `json:"RequestType"`
   RequestID             string                 `json:"RequestId"`
   ResponseURL           string                 `json:"ResponseURL"`
   ResourceType          string                 `json:"ResourceType"`
   PhysicalResourceID    string                 `json:"PhysicalResourceId,omitempty"`
   LogicalResourceID     string                 `json:"LogicalResourceId"`
   StackID               string                 `json:"StackId"`
   ResourceProperties    map[string]interface{} `json:"ResourceProperties"`
   OldResourceProperties map[string]interface{} `json:"OldResourceProperties,omitempty"`
}

The RequestType will tell us what CloudFormation is trying to do (Create/Update/Delete) and the ResourceType will give us what the resource is that called our Lambda (not all that interesting since it is basically just the name of the type from CloudFormation):

2021/09/09 16:57:33 RequestType: Create
2021/09/09 16:57:33 ResourceType: "Custom::AMIBuilder"

More important is the ResourceProperties which is where CloudFormation dumps the properties of our Custom Resource. That is a slice of interfaces accessed via keys. We grab the instance ID from there along with anything else we might want to use to build our AMI). I decide I want to have a Name, Description and Tags, so I add this logic:

instanceId, _ := event.ResourceProperties["InstanceId"].(string)
amiName, _ := event.ResourceProperties["Name"].(string)
region, _ := event.ResourceProperties["Region"].(string)

amiDescription, _ := event.ResourceProperties["Description"].(string)
myTags, _ := event.ResourceProperties["Tags"].([]interface{})

for i, x := range myTags{
   myTag, ok := x.(map[string]interface{})
   if !ok {
      log.Printf("%#v", x)
      log.Printf("Error converting to map[string]string: %v", ok)
   }
   key, _ := myTag["Key"].(string)
   value, _ := myTag["Value"].(string)
   log.Printf("%d - %#v - %#v", i, key, value)
}

Test CloudFormation Custom Resources template

That gives me the structures from the template. Now to actually test this. I need a quick template, so I create one that has my Custom Resource, and those values:

AMITest:
  Type: Custom::AMIBuilder
  Properties:
    ServiceToken: !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:AMIBuilder
    InstanceId: !Ref Instance
    Name: !Sub "${AWS::StackName}-AMI"
    Description: AMITest description
    Region: !Ref AWS::Region
    Tags:
      - Key: "Name"
        Value: !Sub "${AWS::StackName}-AMI"

Note in my example template I only added one tag because I just want to test the logic. Later in my final version I add any additional tags that are required for an AMI. I also add logic that will “merge” the tags on the resource with the CloudFormation stack tags. The Custom Resource only has to add tags that are in addition to what the stack has.

The other important parts from the above that you will notice, is that we call the Type of the Custom Resource Custom::AMIBuilder which is our Lambda name prefixed with Custom:: and the ServiceToken is the actual ARN of our Lambda (so it’s important to have the Lambda in each account and region you will be running the stack in).

Deploy the Cloudformation Custom Resources

So a couple more things so we can test the stack as it is so far. One is to grant the permissions the builder will need to the Lambda. The other is to make sure we return a physicalResourceId to the stack. We’ll need to have a role that grants us the rights to query the CloudFormation stack tags. The role will also need to be able to build the AMI. This is probably easier to do with serverless.com. For this example though, I decided to just use a quick bash script to create the Lambda. I authenticate, set my AWS_PROFILE and AWS_REGION and run the following (depending on how you access your AWS account, you may not need to do that part):

#!/usr/bin/env bash
# To get the session, export the AWS_PROFILE and AWS_REGION and then run ../../../ci/awsSetup.sh
account_id=$( aws sts get-caller-identity | jq -r '.Account')
echo "Deploying to account ${account_id}"

aws iam create-role --role-name AMIBuilderRole --assume-role-policy-document file://./trust-policy.json

aws iam attach-role-policy --role-name AMIBuilderRole --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
aws iam attach-role-policy --role-name AMIBuilderRole --policy-arn arn:aws:iam::aws:policy/AmazonEC2FullAccess
aws iam attach-role-policy --role-name AMIBuilderRole --policy-arn arn:aws:iam::aws:policy/AWSCloudFormationReadOnlyAccess

env GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o /tmp/main AMIBuilder.go
zip -j /tmp/main.zip /tmp/main

aws lambda create-function --function-name AMIBuilder 
    --runtime go1.x 
    --role arn:aws:iam::${account_id}:role/AMIBuilderRole 
    --handler main --zip-file fileb:///tmp/main.zip 
    --timeout 300

Lambda Policy

The above creates a role with a basic policy to trust the Lambda service to assume it. Then it attaches some basic policies from AWS. Note this should have permissions scoped to be more least-privileged. Once the role is created, it compiles our code, and uploads it to our account. For completeness, the trust-policy.json is this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Now that the script to deploy is ready I just add a dummy value for physicalResourceId like:

func LambdaAddAMIs(ctx context.Context, event cfn.Event) (physicalResourceID string, data map[string]interface{}, err error) {
    physicalResourceID = "test-value"
    return
}

Run the script

Then run the script to create the role and Lambda:

Robs-Mac-Pro:blog robweaver$ ./create.sh 
Deploying to account xxxxxxxxxxxxxxxxx
{
    "Role": {
        "Path": "/",
        "RoleName": "AMIBuilderRole",
        "RoleId": "ARXXXXXXXXXXXX",
        "Arn": "arn:aws:iam::xxxxxxxxxxxx:role/AMIBuilderRole",
        "CreateDate": "2021-09-11T15:57:13+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "lambda.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}
  adding: main (deflated 60%)
{
    "FunctionName": "AMIBuilder",
    "FunctionArn": "arn:aws:lambda:us-west-2:xxxxxxxxxxxx:function:AMIBuilder",
    "Runtime": "go1.x",
    "Role": "arn:aws:iam::xxxxxxxxxxxx:role/AMIBuilderRole",
    "Handler": "main",
    "CodeSize": 2385483,
    "Description": "",
    "Timeout": 300,
    "MemorySize": 128,
    "LastModified": "2021-09-11T15:57:27.423+0000",
    "CodeSha256": "Oa9grltN/Euuo12F4bGeWPU9VqmDGWiVMCWgAaiXOYA=",
    "Version": "$LATEST",
    "TracingConfig": {
        "Mode": "PassThrough"
    },
    "RevisionId": "7cead9e9-8ff2-4c16-8f57-94ade9cac2c9",
    "State": "Active",
    "LastUpdateStatus": "Successful",
    "PackageType": "Zip"

}

Now we can try our CloudFormation Custom Resources template, so going to the console and uploading works for testing. Now I’m going to use this to show one of the coolest new things the CloudFormation team invented recently. Upload and retry of a build. In the console I do “Create with new resources” . So upload the template:

Upload CloudFormation Template for CloudFormation Custom Resource

Then add tags (I always add at least the owner so somebody can figure out who built this):

Configure CloudFormation Stack Options on CloudFormation Custom Resource

Then set the stack not to roll back on failure (this is useful during initial development):

Specify stack details for CloudFormation Custom Resource

Stack Failure and debug

At this point wait for the CloudFormation Custom Resources stack to build. If you have a default VPC, everything will go fine, and it will build. In my case I forgot that we kill the default VPC and use a shared subnet. So it failed and gave me some options to continue.

This prompt lets you retry failed things, upload a fixed template or roll back everything. This can save you from building things for hours and needing to delete and recreate from scratch. Which is what you used to need to do to get to the point that the failure occurred. That makes debugging and testing stacks so much easier, and makes it my favorite new feature:

CloudFormation Custom Resources Stack with repair options shown

Fix the template

To fix this I add some more bits to my template. Tell it to build my test instance in the right VPC and subnet (in this case using a parameter):

Parameters:
  # The AMI ID we start from (pulled from SSM)
  LatestAmiId:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-arm64-gp2
    Description: AMI ID from SSM

  SubnetId:
    Type: AWS::EC2::Subnet::Id
    Description: Subnet ID to put VPC in

  VPCID:
    Type: AWS::EC2::VPC::Id
    Description: VPC ID for temporary instance

And to the instance code itself:

Resources:

  Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: !Ref LatestAmiId
      InstanceType: t4g.nano
      SubnetId: !Ref SubnetId
      SecurityGroupIds:
        - !GetAtt InstanceSecurityGroup.GroupId

  # Need a security group so we can build the instance
  InstanceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      VpcId: !Ref VPCID
      GroupDescription: >-
        Fake security group
      Tags:
        - Key: "Name"
          Value: !Sub "${AWS::StackName}-AMI"

Verify fixed template

I update the CloudFormation Custom Resources template, hit “Retry”. Then I upload the new template which will now let me choose a valid subnet to finish the build. I haven’t really built anything yet, but I can see the physicalResourceId under the Resources tab in my stack:

Retry CloudFormation Stack for CloudFormation Custom Resources

I delete the stack because we are going to update that value to be the ID of the AMI.

Flesh out the Lambda code

Then add some code to handle what type of action. Also some more code to get the values we’ll need.

	log.Printf("Context: %#v", ctx)
	log.Printf("Event: %#v", event)
	log.Printf("StackId: %#v", event.StackID)
	log.Printf("RequestType: %v", event.RequestType)
	log.Printf("ResourceType: %#v", event.ResourceType)
	log.Printf("ResourceProperties: %#v", event.ResourceProperties)
	lc, _ := lambdacontext.FromContext(ctx)

	log.Printf("AWSRequestID: %#v", lc.AwsRequestID)
	log.Printf("ClientContext: %#v", lc.ClientContext)
	log.Printf("Identity: %#v", lc.Identity)
	log.Printf("InvokedFunctionArn: %#v", lc.InvokedFunctionArn)

	//InstanceId: 	Instance ID
	//Name: 		Name of the AMI
	//Description: 	Description of the AMI

	instanceId, _ := event.ResourceProperties["InstanceId"].(string)
	amiName, _ := event.ResourceProperties["Name"].(string)
	region, _ := event.ResourceProperties["Region"].(string)

	amiDescription, _ := event.ResourceProperties["Description"].(string)
	myTags, _ := event.ResourceProperties["Tags"].([]interface{})

	for i, x := range myTags{
		myTag, ok := x.(map[string]interface{})
		if !ok {
			log.Printf("%#v", x)
			log.Printf("Error converting to map[string]string: %v", ok)
		}
		key, _ := myTag["Key"].(string)
		value, _ := myTag["Value"].(string)
		log.Printf("%d - %#v - %#v", i, key, value)
	}

After we parse the ResourceProperties, a switch statement to handle the RequestType sent by the CloudFormation Custom Resources is added:

	switch event.RequestType {

	// Create the AMI
	case "Create":
		{
			physicalResourceID, err = CreateAMI(event, instanceId, amiName, amiDescription)
		}

	// Do the delete (for cleanup)
	case "Delete":
		{
			_, err = DeleteAMI(event, physicalResourceID)
		}

	default:
		_, err = DeleteAMI(event, physicalResourceID)
		physicalResourceID, err = CreateAMI(event, instanceId, amiName, amiDescription)

	}

We’ll create some attributes to our CloudFormation Custom Resources for convenience later:

	data = map[string]interface{}{
		"Location":    fmt.Sprintf(`https://console.aws.amazon.com/ec2/v2/home?region=%v#Images:visibility=owned-by-me;search=%v`, region, physicalResourceID),
		"Name":        amiName,
		"Description": amiDescription,
		"ARN":         physicalResourceID,
	}


Since we’re setting physicalResourceId to the AMI ID on our CloudFormation Custom Resources. The delete and create logic is pretty simple:

/**
	Delete the AMI by ID ...
 */
func DeleteAMI(event cfn.Event, physicalRsourceId string) (result *ec2.DeregisterImageOutput, err error) {
	log.Printf("PhysicalResourceId %v", event.PhysicalResourceID)
	physicalRsourceId = event.PhysicalResourceID
	// To get the AMI Id, we need the physical_resource_id
	result, err = awshandler.DeleteAMI(&physicalRsourceId, ec2api)
	if err != nil {
		log.Printf("Error deleting %#v: %v", physicalRsourceId, err)
	}

	return
}

func CreateAMI(event cfn.Event, instanceId string, amiName string, description string) (physicalRsourceId string, err error) {
	tags, err := awshandler.MakeEC2TagsFromStack(event, cfnsvc)
	if err != nil {
		log.Printf("Error tagging AMI %#v: %v", amiName, err)
		return
	}
	createImageOutput, err := awshandler.CreateAMI(&instanceId, &amiName, &description, ec2api, tags)
	if err != nil {
		log.Printf("Error creating AMI %v: %v", amiName, err)
		return
	}
	physicalRsourceId = *createImageOutput.ImageId
	log.Printf("PhysicalResourceID: %v", physicalRsourceId)
	return
}

Update script

We just run another simple script that just recompiles and update the Lambda (since the role already exists):

#!/usr/bin/env bash

env GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o /tmp/main AMIBuilder.go
zip -j /tmp/main.zip /tmp/main

aws lambda update-function-code \
    --function-name AMIBuilder \
    --zip-file fileb:///tmp/main.zip

And to prove our attributes work, we add an Output section to the CloudFormation Template:

Outputs:
  AMI:
    Value: !Ref AMITest
    Export:
      Name: !Sub "${AWS::StackName}-AMI"

  AMIName:
    Value: !GetAtt AMITest.Name
    Export:
      Name: !Sub "${AWS::StackName}-AMIName"

  AMILocation:
    Value: !GetAtt AMITest.Location
    Export:
      Name: !Sub "${AWS::StackName}-AMILocation"

  AMIDescription:
    Value: !GetAtt AMITest.Description
    Export:
      Name: !Sub "${AWS::StackName}-AMIDescription"

Test the CloudFormation Custom Resource

Then we build the stack again. It will take a bit longer now because it is really creating the AMI. You will see a dot being added every few seconds in the CloudWatch Log trail for the Lambda

CloudWatch output

Once built, you will see the AMI ID in the outputs (and the resources tab) on the stack:

CloudFormation Stack output

Conclusion

To fully test the Custom Resource, you can delete the stack, and it should fully delete the AMI and the stack. I have run into some corner cases where the delete fails (mostly because I haven’t fleshed out the error handling fully, so it can fail for things like using the same name on an AMI, or timing out).

Hope this helps you understand the concept a bit more, and to show you that CloudFormation Custom Resources can be useful for some interesting use cases.

Hi, I’m Rob Weaver