HOW TO RUN AWS CLI COMMANDS WITHIN A RUBY APP RUNNING ON HEROKU

By: Saurav

2020-07-12 06:07:00 UTC

Let's say you have to run AWS CLI commands from inside a ruby app. For the sake of simplicity, I will take this sync command:

aws s3 sync s3://source-bucket/source_folder s3://destination-bucket/destination_folder

Here source_bucket and destination bucket could be buckets lying in one s3 account or across different s3 accounts.

Note: If you wish to use different s3 accounts, I will suggest looking into setting a user on a destination account, a user policy that allows access to source account folder plus setting folder access permission for the destination account user on source_bucket.

This link might help: https://aws.amazon.com/premiumsupport/knowledge-center/copy-s3-objects-account/

Note: If you have been using a ruby app deployed on heroku and you want aws cli commands to work from inside the ruby code, you should firstly add a heroku buildpack

AWS CLI is a command-line tool that can be run through your bash on mac os. For that to happen, as you can expect, we need to run a bash script through a ruby method. Rubyguides gives good basics on various options available to accomplish this here https://www.rubyguides.com/2018/12/ruby-system/

Note: I did try AWS Lambda invocations for S3 file addition for a folder. For that to work, you can set up a similar CLI inside the Lambda but I felt it was a more tedious process plus the lambda would be called for any new file added in the specified folder. This caused way too many invocations and hence increased cost. I wanted this sync to run once a night once all day's load of file changes have finished.

Let's get back to making it work through ruby and heroku.

I firstly tried the system call to execute the script

def run_aws_sync
 system({environment_variable_hash}, 'aws s3',  'sync s3://source-bucket/source_folder s3://destination-bucket/destination_folder')
end

This method comes from the Kernel module.
Note: If you are doing sync on the same AWS account, you don't need to set up AWS configuration variable shown above.
Otherwise you need to provide AWS_ACCESS_ID, AWS_SECRET_KEY, and Region in the hash

Sadly, this command didn't work for me on heroku

My second way was to try backticks.

def run_aws_sync
 `aws s3 sync s3://source-bucket/source_folder s3://destination-bucket/destination_folder`
end

This didn't work for the case where I had to do sync across AWS accounts.
For this to work, I tried using AWS set command and issued 2 more commands before the one shown above to set up the environment variables right. But due to heroku's nature, that didn't work either.

Lastly, what worked for me was the spawn method

def run_aws_sync
 spawn({ENVIRONMENT_VARIABLES_SETUP}, "aws s3 sync s3://source-bucket/source_folder s3://destination-bucket/destination_folder")
end

Spawn executes the specified command inside a child process and returns its PID. This method comes from the Process module and can run your bash script pretty well.
I was able to set up the necessary config variables for the scope of the method very well by supplying the hash as first argument.

The only caveat is that you should always use Process.wait PID at the end otherwise, the process will become a zombie process.

This method is similar to Kernel#system but it doesn't wait for the command to finish.
The parent process should use Process.wait to collect the termination status of its child or use Process.detach to register disinterest in their status; otherwise, the operating system may accumulate zombie processes.

More details here

def run_aws_sync
 pid = spawn({ENVIRONMENT_VARIABLES_SETUP}, "aws s3 sync s3://source-bucket/source_folder s3://destination-bucket/destination_folder")
 Process.wait pid
end

Worked like a charm. Let me know if it works for you or share if you found a better way. Thanks

Owned & Maintained by Saurav Prakash

If you like what you see, you can help me cover server costs or buy me a cup of coffee though donation :)