powertools-data

Microsoft Fabric Analytics Engineer (DP-600)

Notes from the Microsoft ESI session for the Fabric DP-600 training. Also, additional reference materials, links captured for the preparation of DP-600 exam.

1. Administer Microsoft Fabric

2. Get started with lakehouse

3. Using Apache Spark in Microsoft Fabric

4. Work with Delta Lake tables

5. Secure a Fabric lakehouse

6. Ingest Data with Dataflow Gen 2 in Fabric

7. Use Data Factory Pipelines in Microsoft Fabric

8. Ingest data with Spark and Microsoft Fabric notebooks

9. Organize a lakehouse using medalliion architechture

10. Get Started with data warehouse in Microsoft Fabric

11. Load data into a warehouse

12. Query a warehouse

13. Monitor a warehouse

14. Secure a warehouse

15. Understanding Scalability in PBI

16. Create Model relationsship

17. Use tools to optimize PBI performance

18. Enforce PBI model security

Exam Preparation - Quick Reference:

Method When to Consider When Not to Consider
Dataflow ETL/ELT 1. To use 150+ external connectors, 2. No/ low-code solution, 3. Accessing on-premise data, Can do Extract, 4. Transform AND Load, 5. When you need to get more than one dataset at a time and combine them (although you might want to space this out to allow data validation). 1. Difficult to implement data validation 2. Currently struggles with large datasets (although Fast Copy has recently been introduced which should speed up your ETL.
Data Pipeline Primarily an orchestration tool (do this, then do that). Can also be used to get data into Fabric, using the Copy Data activity (and others!). 1. Large datasets (although now Dataflow has Fast Copy, so performance should be comparable between the two), 2. Importing ‘cloud’ data (e.g. data in Azure), 3. When you need control flow logic, 4. Triggering wide variety of actions in Fabric (and outside of Fabric), like Dataflows, Notebooks, Stored Procs, KQL Scripts, Webhooks, Azure Functions, Azure ML, Azure Databricks. 1. Can’t do the Transform natively (but can embed notebooks or dataflow). 2. No ability to ‘upload’ local files 3. Does not work cross-workspace
Notebook General purpose coding notebook which can be used to bring data into Fabric, via connecting to APIs or by using client Python libraries 1. Extraction from APIs (using Python requests library, or similar!), 2. To use client libraries (e.g. Azure libraries, or the Hubspot client library in Python to access Hubspot data). 3. Good for code re-use (and can be parameterized),3. For data validation and data quality testing of incoming data 4. The fastest in terms of data ingestion (and most efficient for CU spend - see here) 1. When you don’t have a Python capability in your organisation