Millions of records added every day, a bag full of missing values, table-join hell, non-written but brain-stored documentation, inconsistent reports,.. and all this while a brand-new AI-company comes in through higher up management to pitch an all-in-one, end-to-end, world-peace solving AI solution.
Contrary to popular belief, AI and Data Science don’t come close to what’s being pitched in sales meetings or popular over-arching clouds-only LinkedIn posts, about how AI can solve every business’s KPI-nightmares.
Working as a data scientist for over 3 years, I took it from the classroom to the dirty fields of big data, to tackle nice data projects with fellow data scientists, higher management staff and business people.
Below, I’d like to share with you my 3 most important lessons I’ve learned.
1. Being a Data Scientist, you’re an educator
“No, AI can’t do that” is not a good way of making it clear that Mr. Boss’s vision on AI and machine learning closely resemble the storyboards of science fiction films or CGI. Make no mistake! There is nothing wrong with getting all excited about AI through multi-million corporate PR and marketing… and getting it all wrong. Especially if what you’re doing is not crunching data and modeling in Python all day yourself.
As a data scientist, know your craft and know your process. Know what’s possible and what is something to be researched further, explain this to the business in an ELI5-fashion. Explain to them what’s needed to get somewhere. Explain how 100% model accuracy is akin to immortality and set the expectations at a baseline level. Explain how a 5% increase in performance could result in millions in savings over a year, but don’t bullshit these metrics either. Don’t bother explaining your model evaluation metrics, instead focus on tangible and intuitive measures of performance like for example the lift or savings per capita in comparison to the previous state of affairs.
Before doing all the above though, please… First get on the same page with the business and put yourself second in the conversation. We’re here to help them. At the end of the day, they’re the ones who free up the budget, to give us an opportunity to advance our knowledge in this exciting field. Don’t just sit there and listen… Understand their problem, and have a meaningful discussion. After a meeting, try to find other means to an end by tackling the problem from your understanding, not his.
Business has a kite-high vision of their problem, us data scientists, we’re the mole that digs deep, find little meaningful dirty stones, polish them up & present them to the business.
2. Data Scientists are essentially Mr. Clean & Dora The Explorer
Who didn’t take Andrew Ng’s Machine Learning course on Coursera and felt the immense power of AI crushing their skull from the inside? Nothing can stop you and the untamable power of AI, right? Elon Musk might be right, after all, the singularity is impending at light speed!
Too bad the real world isn’t a controlled lab. How often did Google Maps tell you it’s a 48-minute drive home, which resulted in a 2-hour traffic jam party because of some accident. In essence, a data scientist dives in head-first into what could be defined as unpredictable human-behavior generated data, tries to make sense out of the mess, filters out all that supposedly isn’t needed and as heroic as he is… creates a mathematical model in an attempt to understand and predict the human world.
As a data scientist make no mistake: Exploring the vast landscapes of the unknown and making sense out of those is pretty much why we’re still needed in the first place. Otherwise, an algorithm might have done that for us. It takes creativity no less to solve data science problems. But creativity is murdered brutally by being analytical, it’s the ultimate challenge for a data scientist to be both, preferably at the same time! It’s the reason why data scientists, just as artists & writers, exist in masses, but really good ones are a rarity to find.
From my understanding of Human Psychology, being both creative and analytical (or critical) at the same time is impossible. Instead have a system in place, where you go in between the 2 states every 1-2 hours. Find what works for you and your team. Work in iterations. Make “Any results are better than none” your mantra. Make a bad model first before revisiting your data and creating a better one. Understand that knowing 5% of the required data and get modeling right off the bat is sometimes better than wrapping your head around everything at once. Document your results and understanding
Make your code streamlined, reusable and understandable to an outsider. Why? Because after a weekend of partying you become one. I seriously cannot emphasize this enough: STRUCTURIZE YOUR CODE AND NOTEBOOKS. Nothing kills the creative and analytical performance of a problem-solver more than ugly, undocumented and unstructured notebooks.
Finally, please laugh a lot, and take productive breaks regularly to make your subconscious solve your problems for you.
3. Define the dots first, then connect them
So we have SAS, SPSS, Python or even IBM’s #cognitive to use for predictions, but that GraphDB that RT recently uses could be a way better way of using this or that. Or maybe you need to ask IT to install a total of 4 new GPU’s in SLI to faster train neural networks using Tensorflow? Or better yet, what about creating a fully-fledged real-time predictive analytics infrastructure running on liquid nitrogen to speed-up computing power and enable the company with real-time on-demand advanced NLP Chatbot-served forecasts and insights?
Stop the discussion right there. What the fuck are you even trying to do? Does it have any merit to discuss how to do things before knowing what to do in the first place? As a data scientist, you’ll encounter it a lot, especially in business meetings. Everybody is getting all hyped up about technology’s endless possibilities and potential, yet somehow nothing gets done, projects fail, get delayed and eventually, the results are not delivered.
If you can’t explain in a tweet what you’re trying to do exactly, get right back to the drawing board, or the meeting board for that matter. In Data Science and AI, the whole company is involved in solving the predictive problem, and it should be discussed as such when talking to coworkers.
- What data is important, or could potentially be used for the case?
- What, realistically, is possible in the short-term?
- What is already in place that can be leveraged today?
- What is the potential outcome of the exercise?
- What, in essence, is being modeled? Common Human B
ehavior, or random effects?
Knowing the what really well will automatically point you to the potential hows (or the people that will perhaps know the how by heart). Make this your principle in everything that you do, not only in data science, and your life will change.
First, agree on the what, then discuss the how.
So there you have it, my 3 most important things I learned from being a data scientist in the past 2 years.
Summarizing we discussed how:
- Being a Data Scientist, you’re an educator. How obvious things to you, are not so obvious to others.
- Data Scientists are data cleaners, data explorers and data analyzers. How the perfect data scientist/programmer navigates the path between creativity and analytics really well.
- Knowing what you’re after, defining your target and discussing the what before the how is essential, not only in the field of data science.
Also published on Medium.