Site icon R-bloggers

Building a data pipeline- uploading external data in AWS S3

[This article was first published on Stories Data Speak, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Recently, I stepped into the AWS ecosystem to learn and explore its capabilities. I’m documenting my experiences in these series of posts. Hopefully, they will serve as a reference point to me in future or for anyone else following this path. The objective of this post is, to understand how to create a data pipeline. Read on to see how I did it. Certainly, there can be much more efficient ways, and I hope to find them too. If you know such better method’s, please suggest them in the comments section.

How to upload external data in Amazon AWS S3

Step 1: In the AWS S3 user management console, click on your bucket name.

Step 2: Use the upload tab to upload external data into your bucket.

Step 3: Once the data is uploaded, click on it. In the Overview tab, at the bottom of the page you’ll see, Object Url. Copy this url and paste it in notepad.

Step 4:

Now click on the Permissions tab.

Under the section, Public access, click on the radio button Everyone. It will open up a window.

Put a checkmark on Read object permissions in Access to this objects ACL. This will give access to reading the data from the given object url.

Note: Do not give write object permission access. Also, if read access is not given then the data cannot be read by Sagemaker

AWS Sagemaker for consuming S3 data

Step 5

Step 6

Step 7

Accessing data in S3 bucket with python

There are two methods to access the data file;

  1. The Client method
  2. The Object URL method

See this IPython notebook for details.

AWS Data pipeline

To build an AWS Data pipeline, following steps need to be followed;

Note: To be continued

To leave a comment for the author, please follow the link and comment on their blog: Stories Data Speak.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.