Deleting AWS Glacier Vaults via AWS CLI using a Lightsail Instance

December 30, 2018

Amazon Web Services (AWS) offers some very affordable archive storage via it's S3 Glacier service. I've used this on a backup account in the past to store archives, and have decided it's time to clear down this account (oh, and save $0.32 a month in doing so).

The main challenge with doing this, is that unlike S3, S3 Glacier (objects stored directly there rather than using the Glacier storage tier within S3) objects can only be deleted via the AWS CLI. And to delete a Glacier Vault, you've got to delete all of the objects.

0 aws costs

This account has some wild spending. $4.90 a month!

In this post I'll spin up a Lightsail box and wipe out the pesky Glacier objects through the AWS CLI. This doesn't require any changes on your local PC, but will require some patience.

0 glacier costs

And a whole $0.32 of that is Glacier.

So, for my $0.32 a month, I store ~80GB of archives. Conveniently, these are ancient - so we're outside of the early delete fee period.

02 vault size

~80GB and >20k archives - let's just delete the archive!

You might assume you can just delete the vault via the AWS GUI and avoid the CLI altogether, but you'd be wrong.

01 glacier vault cant be deleted

"Vault not empty or recently written to..." - so we need to empty it first

So, my weapon of choice to get the AWS CLI working is going to be a Lightsail machine. Why? Low cost, and low complexity!

1 Spin up a Lightsail Instance

We're going to create a Lightsail instance based on Amazon Linux (you could also use an EC2 box).

1 select machine

For type, Amazon Linux under Linux/Unix is ideal

As for the size, it doesn't really matter. All we're doing is running a few CLI commands, nothing crazy.

2 select plan size

$3.50 looks good - I'll only be running it for a few hours anyway

3 wait for the server to come up

Once the server is created (it may take a few minutes), you can hit the console icon to log into the machine via the browser

4 login via web

Logging in gives you direct access without requiring anything on your local machine. Great!

2 Check what's installed

We can run a few commands to see what's installed. This is following the Linux install guide. Amazon Linux includes AWS CLI by default, but we may need to update it.

To check what version of the AWS CLI is installed:

$ aws --version
aws-cli/1.14.9 Python/2.7.14 Linux/4.14.62-65.117.amzn1.x86_64 botocore/1.8.13

This shows version 1.14.9, which is a little old. To update:

$ sudo pip install --upgrade awscli
Successfully installed PyYAML-3.13 awscli-1.16.81 botocore-1.12.71 colorama-0.3.9 docutils
-0.14 futures-3.2.0 jmespath-0.9.3 pyasn1-0.4.4 python-dateutil-2.7.5 rsa-3.4.2 s3transfer
-0.1.13 six-1.12.0 urllib3-1.24.1

Let's verify the version installed:

$ aws --version
aws-cli/1.16.81 Python/2.7.14 Linux/4.14.62-65.117.amzn1.x86_64 botocore/1.12.71

Ok, we're good to go.

3 Create an IAM account (if you don't have one)

I created a new IAM account in the IAM Console for use by only this activity. Notably, I gave it full access to Glacier, and nothing else - the "AmazonGlacierFullAccess" policy is a default policy.

5 new user

A new user, "awscli" with "Programmatic access - with an access key" and one policy "AmazonGlacierFullAccess"

4 Configure AWS CLI

Now that we have the vehicle (the Lightsail instance) and the driver (the IAM profile), let's configure AWS CLI!

This is as simple as typing "aws configure" and following the guided prompts.

$ aws configure
AWS Access Key ID [None]: [account access id]
AWS Secret Access Key [None]: [account secret access key]
Default region name [None]: eu-west-2  
Default output format [None]: json

5 Querying the AWS CLI to get the file list

We can query the CLI to determine whether access is working, and that we're in the right region. Note that the [accountid] is this AWS account ID.

$ aws glacier list-vaults --account-id [accountid]
{
    "VaultList": [
        {
            "SizeInBytes": 85593505088, 
            "VaultARN": "arn:aws:glacier:eu-west-2:[accountid]:vaults/Backup_Local", 
            "LastInventoryDate": "2017-01-27T11:59:55.235Z", 
            "VaultName": "Backup_Local", 
            "NumberOfArchives": 20111, 
            "CreationDate": "2017-01-17T09:38:35.075Z"
        }
    ]
}

That looks good - we need to generate a list of all archives in the vault with a new job using the vault name:

$ aws glacier initiate-job --account-id [accountid] --vault-name Backup_Local --job-parameters '{"Type": "inventory-retrieval"}'
{
    "location": "/[accountid]/vaults/Backup_Local/jobs/FDBG0Gbry7uNuE20BHiEWXz8uYCwKgDlFP
DqNqxDILY88tA9_iuSVkI7pX80Iw3XzaZ-oPL-GpznrI_k-D6keHOUqmf3", 
    "jobId": "FDBG0Gbry7uNuE20BHiEWXz8uYCwKgDlFPDqNqxDILY88tA9_iuSVkI7pX80Iw3XzaZ-oPL-Gpzn
rI_k-D6keHOUqmf3"
}

We can list all the current jobs on that vault using list-jobs, to see whether it's finished:

$ aws glacier list-jobs --account-id [accountid] --vault-name Backup_Local
{
    "JobList": [
        {
            "InventoryRetrievalParameters": {
                "Format": "JSON"
            }, 
            "VaultARN": "arn:aws:glacier:eu-west-2:[accountid]:vaults/Backup_Local", 
            "Completed": false, 
            "JobId": "FDBG0Gbry7uNuE20BHiEWXz8uYCwKgDlFPDqNqxDILY88tA9_iuSVkI7pX80Iw3XzaZ-
oPL-GpznrI_k-D6keHOUqmf3", 
            "Action": "InventoryRetrieval", 
            "CreationDate": "2018-12-29T15:24:04.808Z", 
            "StatusCode": "InProgress"
        }
    ]
}

This will take several hours to progress to "StatusCode": "Succeeded".

6 Iterating over files to delete them

Once we have the archive inventory, this is a bit of bash script that will run on our server to remove the archives. This approach is essentially a copy from this gist, but I wanted to run it in the command line directly to delete the 20k archives in the vault.

The only two pre-requisites are a) building an inventory as shown above, and b) installing jq, which is a command line json parser, which will help us read the inventory.

$ sudo yum install jq
Installed:
  jq.x86_64 0:1.5-1.2.amzn1                                                               

Dependency Installed:
  jq-libs.x86_64 0:1.5-1.2.amzn1           oniguruma.x86_64 0:5.9.1-3.1.2.amzn1          

Complete!

With that done, we can save down the inventory (assuming it's succeeded).

$ aws glacier get-job-output --account-id [accountid] --vault-name Backup_Local --job-id "FDBG0Gbry7uNuE20BHiEWXz8uYCwKgDlFPDqNqxDILY88tA9_iuSVkI7pX80Iw3XzaZ-oPL-GpznrI_k-D6keHOUqmf3" output.json

We can now copy paste this script into the AWS Terminal window. The brackets on either side are to ensure it's entered as a single command rather than a set of separate lines.

(
## Set config values, then paste this script into the console, ensure the ( and ) are copied at the start and end of the script.
vAWSAccountId=[accountid]
vAWSVaultName=Backup_Local
vAWSInventoryFile='./output.json'


## Parse inventory file
vArchiveIds=$(jq .ArchiveList[].ArchiveId < ${vAWSInventoryFile})
vFileCount=1

## Echo out to start
echo Starting remove task

for vArchiveId in ${vArchiveIds}; do
    echo "Deleting Archive #${vFileCount}: ${vArchiveId}"
    aws glacier delete-archive --archive-id=${vArchiveId} --vault-name ${vAWSVaultName} --account-id ${vAWSAccountId}
    let "vFileCount++"
done

## Echo out to finish
echo Finished remove task on ${vFileCount} archives
)

This should iterate through all files to leave you with an empty Glacier Vault. This is a not an asynchronous process, so it does one file at a time. You may need to re-inventory or wait a day for it to be able to delete the vault.

6 deleting archives

On my system, it ran at about 2 archives a second - so it took around 3.5 hours end to end

If you wished to run another inventory retrieval (as per 5 above), then the resulting json file would look like the below - note there's no archives in the ArchiveList array.

$ cat output_after_deleting.json 
{"VaultARN":"arn:aws:glacier:eu-west-2:[accountid]:vaults/Backup_Local","InventoryDate":"
2018-12-30T02:07:22Z","ArchiveList":[]}

7 null size

The AWS GUI also reports a null size and # of archives

Success! We can now delete the vault via the CLI or via the AWS GUI.

Other options and references

There were a bunch of other options dotted around which had code written in different languages. I used a few of these as references while trying to keep it all to something I can paste into the command line:


Profile picture

From Dave, who writes to learn things. Thoughts and views are his own.

© 2024, withdave.