Data Engineering
Data Engineering
Data Engineering program will be delivered in approximately 100 hours over a sixteen-week period of time.
In this course, Upgradata will not only focuses the Technical aspects of it, also the Behavioural aspect to give the holistic approach towards upskilling for Self- Development.
course Content
Data Manipulation |
Data Cleaning |
Data Maping |
Exploratory Data Analysis using Excel |
Advance Excel - Add-ins and analysis tabs of excel, Array functions, Conditional formatting, Charts & Graphs |
Introduction to Python |
Data, Datatypes and Variables |
Object Oriented using Python |
Classes and Interfaces |
Class method vs Static method in Python |
Class or Static Variables in Python |
Changing Class Members in Python |
Constructors in Python |
Destructors in Python |
Inheritance in Python |
Types of inheritance Python |
Encapsulation in Python |
Polymorphism in Python |
Abstract Classes in Python |
Pandas - Dataframe, Series, Lists, Dictionaries |
Numpy - Multidimensional arrays |
Scipy |
Datetime objects and time series data |
Matplotlib for data visualization |
Class, functions & exceptions handling |
PEP-8 standards in python programming |
Modular Coding - parameterized code, configuration parameter |
standardize code for migration |
Data, Datatypes and Variables |
Data Visualization -Plots, histograms, heatmap |
Data Pre-processing- Missing values, Outliers, Encoding |
Scaling and Normalization |
Multivariate Analysis |
Mentoring Session |
Graded Assessment |
Introduction to database |
Types of data models |
DBMS operations |
Advantages of DBMS |
Introduction to RDMS |
RDMS vs Traditional approach |
Normalization |
Types of Normalization |
SQL Basics |
Data types and constraints |
DDL |
DML |
DCL and TCL |
SQL Operators |
SQL Functions |
Set Operators |
SQL Joins part 1 |
SQL Joins part 2 |
Connect to MySQL |
Perform basic CRUD operations |
Graded Assessment |
Mentoring Session |
Overview of Power BI / Tableau |
Overview of Power BI Desktop / Tableau Public |
Importing Data |
Data Structuring |
Introduction to Data Modelling |
Relationship between Two Variables (legend) |
Summarization Techniques |
Creating Calculations/ Parameters |
Use of Logical Calculation |
Basic Steps to Data Visualization |
Charts |
Cross Filter |
What-If Analysis |
Table Creation and Conditional Formatting |
Tool Tips |
Basics of Dashboard Creation |
Dashboard Design and Storytelling |
Storytelling |
Graded Assessment |
Mentoring Session |
What is NoSQL databases and challenges with RDBMS |
Different RDBMS cluster approaches and methods to scale it. |
Features and Types of NoSQL databases. |
Advantages and disadvantages of NoSQL databases. |
CAP Theorem |
Key values databases, document databases, columnar database and graph databases |
Introduction to Mongo DB |
Graded Assessment |
Mentoring Session |
Big Data Touch |
Hadoop framework: Stepping into Hadoop |
Working on HDFS |
Mapreduce: A programming paradigm |
Hadoop 3.0 - What’s new? |
Apache Hive: Teasing the honeybee |
External tables in Hive |
Loading different file formats in Hive |
Query operations on hive tables |
Querying complex structures |
Views |
Introduction to Apache Spark |
Apache Spark Ecosystem |
Installing PySpark |
Reading data with PySpark |
EDA with PySpark |
Grouping and sorting data |
Aggregating datasets |
Joining Functions |
Saving data with PySpark |
Visualization using PySpark |
What is Kafka and its need. |
Kafka Architecture |
Kafka Workflow |
Configuring Kafka Cluster |
Performing Basic Topic Operations |
Performance tuning in Kafka |
Applications of Kafka |
Integration of Apache Spark with Kafka |
Graded Assessment |
Mentoring Session |
Overview of data management |
Importance of data management at enterprises |
Multidimensional Data Representation and Manipulation |
Design practices and Methodologies |
Understand Datawarehouse, Data lake, Lake house |
Data Management Architecture |
Datawarehouse design approaches |
Database Architecture |
Data Processing Techniques |
ETL & Data Pipelines: Tools and Techniques |
ETL vs ELT concept and techniques |
Building Data Pipelines |
Managing & orchestrating workflows |
Quality Check - How to build a quality check pipeline |
Governance - How to create governance layer in pipeline |
Migration - What are the checklists for migration |
Data Security - How to track security and lineage |
Data quality testing |
ETL Testing & Data Quality Management |
Metadata testing |
Functional Database Testing |
Source to target count testing |
Source to target data testing |
Performance testing |
Data transformation testing |
Data integration testing |
Graded Assessment |
Developing API interfaces to share data infrastructure |
Building RESTAPI |
Flask |
WebsocketAPI |
Deployment using FASTAPI |
Microservices Architecture - How to design |
Microservices Deployment |
CI/CD |
Containers |
Containers management & Kubernetes |
Testing Microservices - Unit test, Integration test, End-to-End tests |
Service Mesh |
Logging & Monitoring |
Docker images & Containers - Build, push, pull, delete, sharing images etc. |
Building Multi-container Applications |
Docker compose |
Working in utility containers and executing commands in containers |
Deploying Docker containers |
Managing data & volumes with Kubernetes |
Kubernetes Deployement |
Kubernetes Networking |
Graded Assessment |
Mentoring Session |
Azure Networking Components |
Azure Virtual Machines |
Availability Sets |
Virtual Machine Scale Sets |
Blob Storage |
File Storage |
Storage Replication Options |
Azure SQL |
Cosmos DB |
Azure Messaging Services |
Azure Service Bus |
Azure Event Hub |
Azure Event Grid |
Azure Functions |
Azure Data Factory |
Azure Data Bricks |
Azure Logic Apps |
Azure Synapse Analytics |
Azure HDInsight |
Azure Analysis Services |
Azure ML service |
Security |
Introduction |
Data processing in Azure Databricks |
Work with Data Frames |
Platform architecture |
Security and data protection |
Streaming data |
Production workloads |
Graded Assessment |
Mentoring Session |
Business team to give inputs for the capstone project