A few weeks ago I decided to pass 70-475 exam. Even though I had been working on Data Factory, Machine Learning and HDInsight for a year,I wanted to make sure I don’t miss any details.
Make sure you understand general concepts behind Big Data in Azure
To get familiar with this, go through initial videos about Big Data, HDInsight, Machine Learning and Data Factory on Channel 9 and Microsoft Virtual Academy. Then dive into hands on labas with below mentioned services.
Data Factory
- JSON structure of Data Factory components
- Additional properties in datasets, activities
- Activity types
- Data Gateway
- Custom activities
- UI in Portal
- Alerts, threshold, notifications
- Development and deployment tools
HDInsight
- Understand Hadoop, Spark, HBase and Storm components (region, zookeeper, etc)
- Which is for batch, real time?
- Lambda architecture (technology in which layer)
- File formats
- Blob storage, Azure SQL, Document DB, Azure Table, Azure Data Lake - general concepts, when to use which
- Metastore and custom scripts
- Supported storage, security keys
Azure Machine Learning
- Custom R and Python
- Custom graphs
- Machine Learning flow
- Dealing with missing data
- Web service deployment steps
Real time analytics
- Event hub partitions count, Storm node count
- Windowing
- Inputs and Outputs
- Spark supported languages and tools