The purpose of this assignment was to use cloud ETL skills on big data from two of Amazon’s available public datasets on product reviews. The goal is to perfrom the ETL process completely in the cloud and upload a DataFrame to an RDS instance.
This project required the use of Amazon Web Service (AWS), Relational Database Service (RDS), pgAdmin, google Colab, PySpark, and Google Colab
Started by creating an AWS-RDS to connect to pgAdmin
Then registered our AWS-RDS server in pgAdmin - displaying our databases for both datasets
One step before the loading - was to create the schema for our loading tables in pgAdmin
After the schema was created - we were able to move to the loading process
Checking that load was successful in pgAdmin (one example)
One step before the loading - was to create the schema for our loading tables in pgAdmin
After the schema was created - we were able to move to the loading process
Checking that load was successful in pgAdmin (one example)