Data Factory
  • 16 Feb 2019
  • 3 Minutes to read
  • Contributors
  • Comment
  • Dark
    Light
  • PDF

Data Factory

  • Comment
  • Dark
    Light
  • PDF

Article Summary

In this section we will look at Data Factory.

What is it?

Data Factory is Microsoft Azure's data integration service. It is intended to allow you to extract data from both on-premise or cloud resources and then to transform them at cloud scale and then load the data to a destination.

Azure Data Factory is a cloud scale product aimed at the ETL type use cases.

Features

  • Control Flow which allows you to use workflow like shapes within your pipeline to control how execution of the data move and transform occurs.
  • Code-free developer experience for some use cases
  • Git and Azure Devops support
  • Life and shift SSIS from on-premise to the cloud
  • Mapping of data

Strengths

The key strength for Data Factory is the ability for core use cases to get up and running quickly with data transformation pipelines with minimal infrastructure requirements

From those simple use cases Data Factory can expand upto very large data movements to support the use cases you require.

Weaknesses

There are a few weaknesses and limitations of Data Factory which you need to be aware of:

  1. The main weakness at present is the limitation of 3rd party tooling support for the SSIS runtime. There are many successful 3rd party SSIS tools used by customers and until these are fully supported on the SSIS runtime. We expect on due course these will be fully supported and this will no longer be an issue

  2. Some customers struggle a bit with the ALM side of Data Factory solutions. I think this is mainly due to the fact that Data Factory is pretty new for V2 and the focus has been on the core features rather than the maturity of usability. We expect like other products this will come in due course too.

Dependancies

Azure Data Factory is fairly self contained and the majority of any dependancies you will have related to any connectors you may choose to be using.

Hosting

Data Factory has a couple of hosting options.

  • Azure Integration Runtime is setup and managed by Azure and you would configure resources to run in it. The Azure Integration Runtime only has access to public resources
  • Self-Hosted Integration Runtime means hosting the runtime yourself on a machine which is likely connected to a network allowing you to access resources that are perhaps on premise
  • Azure SSIS Runtime is an azure managed runtime which allows you to run SSIS packages in the cloud

Costs

The cost for Data Factory is covered by the combination of two areas. The first is a typical cloud consumption based cost where you are paying per activity executed in your pipelines.

The second is based around the runtime for your pipelines/SSIS components which can be per run or per hour depending on the options you choose.

Note that there are some differences between Data Factory V2 and V1 pricing you may need to be aware.

SSIS

Data Factory and SSIS perform a very similar job. SSIS is a mature part of the SQL Server estate for doing server based ETL. The maturity and adoption of SSIS was so high and established that Microsoft decided to offer the ability to run SSIS capability within the Data Factory as a feature which will help customers lift and shift SSIS to the cloud rather than having to re-write.

Learn More

TBC

Product Recommendation

Recommendation
Data Factory is a green light technology where Microsoft are investing a lot. There are lots of good use cases which it can help you with today and althought there are a few things which could make the technology easier to use we believe these are temporary issues while the product matures

Was this article helpful?

What's Next