Azure Data Factory | Overview
Hiya there!
This is my first article here at AzureTar Blog and I am really excited to contribute with Data stuffs on Microsoft Azure! But let’s stop stalling and go straight to the point?
If you are familiar with SSIS (SQL Server Integration Services) or do not even know what that is but know at least do a Select query on SQL Server Management Studio (in short, SSMS), this article hopefully will not be hard to understand.
For completing its relational data service engine, Microsoft offers not only the well-known SQL Server, but also several services that help to manage different aspects of the way we manage, integrate, report and analyse data.
SSMS (SQL Server Management Server):
This service is dedicated to managing and configuring any SQL infrastructure by running SQL queries, stored procedures and views on the databases.
SSIS (SQL Server Integration Services):
This service is used to execute a wide range of migration tasks like data extraction, transformation (cleaning, aggregating, joining…) and loading. It basically moves relational data in different formats like Excel files, Oracle, DB2, and .csv into a SQL Server Database.
SSRS (SQL Server Reporting Services):
This service allows us to create various interactive reports using graphs, maps, tabular reports, drill-down reports, click-through reports, and many more.
SSAS (SQL Server Analysis Services):
It provides us with a great view of what is happening on our database using online analytical processing and data mining. In simple terms, we can use SSAS to create cubes using data from a data warehouse for deeper and faster data analysis.
To be fair, I have only used one of the four services and there was SSMS. I would load .txt files into a SQL Server table using BULK INSERT or an Excel file using OPENROWSET so then, after having the database clean and ready to be used, I would use either Excel or Tableau to create my reports and dashboards.
As we can see, Microsoft offers us softwares and solutions for each step in our data process. Not too different from on-premise, MS also helps us with a bunch of cloud-based alternatives and this platform is known as Azure.
The Azure cloud platform is more than 200 products and cloud services designed to help you bring new solutions to life – to solve today’s challenges and create the future. Build, run and manage applications across multiple clouds, on-premises and at the edge, with the tools and frameworks of your choice.
Microsoft Azure
Basically, for those who are coming from SSIS or have played with SQL Queries before, Azure Data Factory is all about extraction, manipulation and saving the data. It is the first step to be taken when we initiate our journey in data workload. Unless we already have everything set into our Database, which is unlikely to happen in real life. The terms that define better what ADF performs are more known as Extraction-Transformation-Load (the famous ETL).
Putting it in an example, let’s imagine we work for a video streaming company and they want us to inspect some marketing analysis: customer behaviors, customer demographics and their preferences in terms of movies. We have petabytes of video logs data in the cloud, customer reviews on Hadoop and their latest watched movies on MongoDB.
We can just export the files from their sources and upload them into SQL Server Database and do the job there. However, it would take us forever to export, load and transform the data. Big data needs a scalable service that can ORCHESTRATE the data movement and AUTOMATE the data transformation. In our case, we have raw and unorganized data relational and non-relational coming from different storage systems. So that is where the ADF comes in to show us its power!
At a high level of collecting from sources, processing to transform, publishing into a warehouse for analysts, and monitoring and managing daily schedules, Azure is a set of interconnected systems and gives us an end-to-end platform for our data engineering.