A typical big data or analytics team consists of various personas. Naming may vary from project to project, from company to company, but the core is always the same: statistics, domain knowledge, and technology skills. Unfortunately, finding a single person with all necessary skills is like finding a unicorn. Consequently, a complementary team is needed and team members get assigned to one of two groups: Data Science and Data Engineering.What is the difference between them then? Do you really have to put a label on them?
I won’t elaborate on the skills, since you can find many articles here, here, or here. However, the skillset and technologies are just the tip of the iceberg. Those two groups might go into conflicts quite often not because of different opinions about Hive or Spark. But because of completely different mindset that distinguishes them from each other.
Today I picked up an amazing quote while watching a fictional documentary movie called Voyage To The Planets. After a very long voyage in our solar system and on a way back to Earth, the spaceship’s crew discussed either they should make a landing attempt on a comet or not. Engineers were anxious and were listing countless risks while scientists were enthusiastic about it.
[Comets] are unpredictable. It’s the whole point. That’s the difference between engineers and scientists. Engineers hate surprises, we love them.
This quote makes a lot of sense to me now. Data Scientists are eager to find new hot correlations, prove their crazy hypotheses - they love surprises. While Data Engineers want to play with a technology, master it, automate and forget it - there is no place for surprises.
P.S. I know, comparing a simple analytics project to a space travel is “slightly” ridiculous :)