Here is a monthly list of my latest findings, things I enjoy or ponder. I look into new tools and releases made by Microsoft, its competitors and others. I collect and share with you informative materials about data, AI, cloud. Last but not least, I add futuristic content that I’ve dug up in different places.
New releases (updates from Microsoft, Databricks, and others)
- New MLflow model registry to simplify model management. During Spark & AI Summit Databricks announced new features to MLflow: a central place to share ML models, collaborate on moving them from experimentation to testing and production, and implement approval and governance workflow.
Credit: Databricks
- Power BI announced automated machine learning in general availability. Feature recommendations, model explainability, controlling training time, improved training reports - solid step towards democratizing machine learning for all users.
Credit: Microsoft
-
Azure Data Factory supports Azure Machine Learning service pipelines as a step. Finally, one can execute Azure ML service pipelines without workarounds.
-
Azure Data Share supports both structured data (from Azure SQL and SQL DW) and unstructured data (from Azure Data Lake Store and Blob storage) with centralized management and governance. You can share tables and views, and data consumers can receive data in any of the following Azure data stores of their choice.
-
Netflix open sourced Polynote notebooks - an alternative for Jupyter. In earlier posts, Netflix shed some light on how they work with ML and productionalize notebooks: Beyond Interactive: Notebook Innovation at Netflix & Part 2: Scheduling Notebooks at Netflix
-
Microsoft open sources SandDance, a visual data exploration tool. It is available as an extension to both Visual Studio Code and Azure Data Studio and has also been re-released as a Power BI Custom Visual.
Interesting materials on Big Data, Machine Learning
- Continuous Delivery for Machine Learning by Martin Fowler. A solid read about for all Data Science & Data Engineering professionals!
Continuous Delivery for Machine Learning (CD4ML) is a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles. - Martin Fowler
-
Spark + AI in Amsterdam: European Summit Recap, Keynote Videos, & Announcements. One of the hottest Big Data & Data Science conferences in Europe took place in October in Amsterdam.
-
Productionizing Machine Learning: From Deployment to Drift Detection. Good explanation of concept and data drifts - ways to detect and protect against it.
-
Microsoft tames the “wild west” of big data with modern data management. A super interesting read on how Microsoft used its own tools (Azure Data Platform) to enable predictive and prescriptive analytical capabilities.
Credit: Microsoft
Futurism
- Google claims quantum supremacy. Google has announced that its 53-qubit “Sycamore” processor has achieved quantum supremacy, performing a specific task in 200 seconds that would take the world’s best supercomputers 10,000 years to complete.
I hope the most popular comment under that video is just a joke :)
-
AI allows paralyzed person to “handwrite” with his mind The brain activity helped train a computer model known as a neural network to interpret the commands, tracing the intended trajectory of his imagined pen tip to create letters.
-
20 new moons of Saturn. Astronomers report the discovery of 20 new natural satellites of Saturn – taking the planet’s known number to 82, surpassing Jupiter, and pushing the total count for the Solar System above 200.
Credit: NASA/JPL-Caltech/Space Science Institute. Starry background courtesy of Paolo Sartorio