Building a DatawareHouse with Amazon Web Services

Big Data Çağrı ŞİŞMAN 21.6.2017

If you are interested in datawarehouse projects, Amazon Web Services could be one of the best options for you. 

Amazon Web Services(AWS) provide services that you can use for different processes such as data storage, data migration, machine learning, databases etc. I'd like to give you some information about services which are useful for datawarehouse design and development.

REDSHIFT

Redshift is a petabyte scalable datawarehouse that you can manage it easily. You can run complex queries and do analysis against big data using columnar data storage and query optimization. 

Amazon Redshift

EMR(Elastic Map Reduce)

Emr provides you to run big data processes with distbuted frameworks such as Hadoop,Spark etc. If you want to impelement Machine Learning Algorithms or Etl jobs on your big data you can use Spark in Emr cost effectively. You can see items that provided by EMR below screenshot.

 

Provided Technologies By Emr

 

S3(Simple Storage Service)

 

S3 is a storage to store and retrieve big data from anywhere. You can use it as a repository or recovery. You can access S3 repositories with Emr and Redshift easily.

 

 

In our scenairo we have multiple data sources that are stored in different places. We will load data into datawarehouse with ETL(Extract-Transform-Load) steps.

If you want to do Etl processes against big data, you can use Emr service with Spark. Spark Sql allows you to get and transform data from different databases such as MS Sql Server,MySql  etc.  The processed data locates in S3 and then loads into Redshift with copy command executes by Emr job.

This will be good option for big data but if you load small data with transformation by Emr, this process could be expensive. Four your small data, there is another option to do etl processes by Open Talend Studio(TOS). It's open source and you can use it easily to do Etl processes with so many data sources. TOS supports Redshift connections and commands with simple user interface.