DATAMARCOS

Data Engineering

Data Engineering

Data Engineering program will be delivered in approximately 100 hours over a sixteen-week period of time.
In this course, Upgradata will not only focuses the Technical aspects of it, also the Behavioural aspect to give the holistic approach towards upskilling for Self- Development.

course Content

Data Manipulation

Data Cleaning

Data Maping

Exploratory Data Analysis using Excel

Advance Excel - Add-ins and analysis tabs of excel, Array functions, Conditional formatting, Charts & Graphs

Introduction to Python

Data, Datatypes and Variables

Object Oriented using Python

Classes and Interfaces

Class method vs Static method in Python

Class or Static Variables in Python

Changing Class Members in Python

Constructors in Python

Destructors in Python

Inheritance in Python

Types of inheritance Python

Encapsulation in Python

Polymorphism in Python

Abstract Classes in Python

Pandas - Dataframe, Series, Lists, Dictionaries

Numpy - Multidimensional arrays

Scipy

Datetime objects and time series data

Matplotlib for data visualization

Class, functions & exceptions handling

PEP-8 standards in python programming

Modular Coding - parameterized code, configuration parameter

standardize code for migration

Data, Datatypes and Variables

Data Visualization -Plots, histograms, heatmap

Data Pre-processing- Missing values, Outliers, Encoding

Scaling and Normalization

Multivariate Analysis

Mentoring Session

Graded Assessment

Introduction to database

Types of data models

DBMS operations

Advantages of DBMS

Introduction to RDMS

RDMS vs Traditional approach

Normalization

Types of Normalization

SQL Basics

Data types and constraints

DDL

DML

DCL and TCL

SQL Operators

SQL Functions

Set Operators

SQL Joins part 1

SQL Joins part 2

Connect to MySQL

Perform basic CRUD operations

Graded Assessment

Mentoring Session

Overview of Power BI / Tableau

Overview of Power BI Desktop / Tableau Public

Importing Data

Data Structuring

Introduction to Data Modelling

Relationship between Two Variables (legend)

Summarization Techniques

Creating Calculations/ Parameters

Use of Logical Calculation

Basic Steps to Data Visualization

Charts

Cross Filter

What-If Analysis

Table Creation and Conditional Formatting

Tool Tips

Basics of Dashboard Creation

Dashboard Design and Storytelling

Storytelling

Graded Assessment

Mentoring Session

What is NoSQL databases and challenges with RDBMS

Different RDBMS cluster approaches and methods to scale it.

Features and Types of NoSQL databases.

Advantages and disadvantages of NoSQL databases.

CAP Theorem

Key values databases, document databases, columnar database and graph databases

Introduction to Mongo DB

Graded Assessment

Mentoring Session

Big Data Touch

Hadoop framework: Stepping into Hadoop

Working on HDFS

Mapreduce: A programming paradigm

Hadoop 3.0 - What’s new?

Apache Hive: Teasing the honeybee

External tables in Hive

Loading different file formats in Hive

Query operations on hive tables

Querying complex structures

Views

Introduction to Apache Spark

Apache Spark Ecosystem

Installing PySpark

Reading data with PySpark

EDA with PySpark

Grouping and sorting data

Aggregating datasets

Joining Functions

Saving data with PySpark

Visualization using PySpark

What is Kafka and its need.

Kafka Architecture

Kafka Workflow

Configuring Kafka Cluster

Performing Basic Topic Operations

Performance tuning in Kafka

Applications of Kafka

Integration of Apache Spark with Kafka

Graded Assessment

Mentoring Session

Overview of data management

Importance of data management at enterprises

Multidimensional Data Representation and Manipulation

Design practices and Methodologies

Understand Datawarehouse, Data lake, Lake house

Data Management Architecture

Datawarehouse design approaches

Database Architecture

Data Processing Techniques

ETL & Data Pipelines: Tools and Techniques

ETL vs ELT concept and techniques

Building Data Pipelines

Managing & orchestrating workflows

Quality Check - How to build a quality check pipeline

Governance - How to create governance layer in pipeline

Migration - What are the checklists for migration

Data Security - How to track security and lineage

Data quality testing

ETL Testing & Data Quality Management

Metadata testing

Functional Database Testing

Source to target count testing

Source to target data testing

Performance testing

Data transformation testing

Data integration testing

Graded Assessment

Developing API interfaces to share data infrastructure

Building RESTAPI

Flask

WebsocketAPI

Deployment using FASTAPI

Microservices Architecture - How to design

Microservices Deployment

 CI/CD

Containers

Containers management & Kubernetes

Testing Microservices - Unit test, Integration test, End-to-End tests

Service Mesh

Logging & Monitoring

Docker images & Containers - Build, push, pull, delete, sharing images etc.

Building Multi-container Applications

Docker compose

Working in utility containers and executing commands in containers

Deploying Docker containers

Managing data & volumes with Kubernetes

Kubernetes Deployement

Kubernetes Networking

Graded Assessment

Mentoring Session

Azure Networking Components

Azure Virtual Machines

Availability Sets

Virtual Machine Scale Sets

Blob Storage

File Storage

Storage Replication Options

Azure SQL

Cosmos DB

Azure Messaging Services

Azure Service Bus

Azure Event Hub

Azure Event Grid

Azure Functions

Azure Data Factory

Azure Data Bricks

Azure Logic Apps

Azure Synapse Analytics

Azure HDInsight

Azure Analysis Services

Azure ML service

Security

Introduction

Data processing in Azure Databricks

Work with Data Frames

Platform architecture

Security and data protection

Streaming data

Production workloads

Graded Assessment

Mentoring Session

Business team to give inputs for the capstone project

Scroll to Top