The Data Scientist’s Guide to Career Excellence: How to Thrive and Become a Top Contributor by Overcoming Industry Challenges in the Evolving Field of AI

Claire Longo
7 min readDec 24, 2023

--

Image created by author using Dall-E 3

Recently, Data Scientists are getting hit with mass layoffs across tech companies. This is a startling development, because just 10 years ago, the Data Scientist was the sexiest job of the 21st century, and just recently we confirmed it still kinda was!

So why are we all getting laid off?

Well, the answer is quite complicated. it has a lot to do with stuff out of our hands, like the economy, covid, and a companies AI strategy.

But there is stuff we can do as Data Scientists to protect our projects from failure, and to make are skillsets stand out in the industry.

First, lets take a peek at what it looks like when Data Science projects fail, or rather when Data Scientists fails their projects. Here are just a few motivating examples.

Amazon had gender bias in their hiring algorithm. They trained a model to predict if a candidate would be a successful hire, and the model was created using a dataset at hand that largely contained examples of resumes from men. The model learned biased patterns from a biased dataset, and then perpetuated these patterns at scale. It started making predictions against hiring women based on words correlated to women that it found in the resumes such as “women’s chess club” or the name of all women’s college. This was a long time ago, and it was shut down.

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G/

The Zillow pricing model didn’t work. Back in 2021, The online real estate giant tried to break into the iBuying space. This guy sold his home to Zillow based on a price recommended by the algorithm, then quickly bought it back for a profit. Their algorithm just couldn’t adjust to the changing market conditions that is was over and under valuing homes left and right. Zillow eventually folded this part of their business.

https://insidebigdata.com/2021/12/13/the-500mm-debacle-at-zillow-offers-what-went-wrong-with-the-ai-models/

It looks like United Healthcare might be making a lil misstep with an algorithm used to help decide when to deny care to elderly patients.

https://arstechnica.com/health/2023/11/ai-with-90-error-rate-forces-elderly-out-of-rehab-nursing-homes-suit-claims/

And General Motors is coming in hot using LLMs for a helpful customer service chatbot that was willing to sell a customer a 2024 Chevy Tahoe for just $1. Thanks AI!

https://gmauthority.com/blog/2023/12/gm-dealer-chat-bot-agrees-to-sell-2024-chevy-tahoe-for-1/

And its worth nothing AI projects can also fail silently and less dramatically by simply just not adding value to the business, or perhaps you forgot to measure it 😉. Many experimental projects go in this direction.

Across these examples and more, I see some reoccurring trends. There are 3 reasons a Data Science projects fail:

  • No tangible business value. No matter how cool the tech or research is, if its not monetizable by the business, the project will not succeed. We get here when we’re spending time developing complex or expensive tech that is perhaps ill-fitted or overkill for the business problem we’re solving, or by simply not measuring the ROI of the project properly.
  • Quality issues. The model and/or software is plagued with quality issues. This is a software problem. We get here when it is not easy to trace back why the algorithm gave the output it did, and the algorithmic error is not easily quantifiable. This typically gets into bad MLOps practices. As a good software engineer, you wouldn't deploy software without automated preemptive quality monitoring in place, yet many AI systems today are out in the wild with little or no proper monitoring, and they’re going off the rails (here is a solution).
  • Bias and Fairness issues. When the AI pick up harmful or unfair patterns that exist in the data, the model ends up perpetuating these patters, and we end up automating bias at scale. There are some things that should not be automated today.

(I’ll just say as a caveat right now — there are some application areas where I will not apply AI to today. This is just because of the current limitations of AI and the state Responsible AI and Governance today. The three verticals are Healthcare, Finance, and Hiring. These areas need more research, and I believe Responsible AI is the new frontier in AI. It will take a combination of policy and tech that we do not have today to get to a working solution to the bias and fairness issues in AI.)

Okay moving on. There is a lot you can do as the Data Scientist to ensure your projects succeed and don’t end up on my sh!t list above 💩. And these same skills will also make you stand out as top talent in the industry.

Spoiler alert, most of these tips are NOT technical skills, but instead they get down to your mindset, your personal working style, and your project and time management skills. I’m not going to tell you that you need to know more complex modeling techniques or you need to improve your python skills. You can still tank a company and your own career with those great technical skillsets.

So lets dive in. In this article, I’m giving up some tips and tricks on how not to be a sh!tty Data Scientist, or actually, the recipe book on how to be a fantastic one. So consider this therapy for Data Scientists. I’m the Data Science therapist! (full disclosure I’m not a licensed therapist — but I can totally be your career coach).

Ownership

Outstanding Data Scientists have an outstanding sense of ownership for what they are building. They feel personally responsible for he quality of wha they produce, and how it is used. As a result, they are invested in following the Take ownership of the project end-to-end. they are not going to simply train a good model with a sufficient performance metric, and throw it over the fence to the engineers to deploy and maintain, and expect the product manager to figure out how to derive business value from their work for them.

These Data Scientists work with their teams to properly scope the project. They understand the business problem and can translate that into a Data Science solution. They want to understand the use case and will spend time in the shoes of their customers, and with Subject Matter Experts to truly understand the vertical they are building for. They are about a lot more than just he modeling peice of a project. Responsible AI is part of ownership, and this is the next frontier in AI imho.

Know the Math

The best Data Scientists understand how the models work under the hood, and that means understanding the math. They c an read and understand research papers so they can stay iup to date with the latest approches in AI. To them, these models should NOT be a black box. They know exactly how the algorithm functions, and this helps them choose the right modeling approach for the problem at hand.

Sure its possible to brute force finding a good model with a hyperparameter search and such, and we still should, but we also need to understand the actually mathematical theory to even have a hypothesis of what models types are appropriate for the problem, and what gaps or shortcomings we’ll need to watch for in production, and that actually takes some expertise

Code Quality

These fantastic Data Scientists are capable of writing code that can survive outside of a jupyter notebook. Even if they are not the ML Engineer on the project, They understand coding best practices that will ensure the quality of what they produce, and help them collaborate with the Engineers to operationalize their models. These Data Scientists deliver models like they are delivering software.

Measure the AI ROI

You’ll find a great Data Scientist spending a significant amount of their time with their business stakeholders. They measure their models’s performance beyond the traditional model performance metrics like AUC, F1 score, etc. And its often the case that its not possible to use the business metrics used to measure the projects success as the loss function. So these excellent Data Scientists spend time develop a deep understanding of how their projects are measured for success, and I can assure you these success metrics are not going to be the same metrics Data Scientist use to measure the model performance directly.

If possible, the best thing to do is to directly measure the busies metric, and this can be done through A/B testing. If the metric is not so easy to measure and directly correlate to the model, then creating a baseline model may be a good approach. We can use a simple linear regression or heuristic model as a baseline to answer the question of “how much better can I solve this problem using AI”.

When a Data Scientist can deliver a project that actually move a business metric, they can start to justify their salary.

Don’t Get Analysis Paralysis

For many Data Scientists and Researchers coming from academia, the switch from academic mindset to the mindset needed to deliver value quickly in an industry setting.

The Data Scientists who successfully make the switch have actually learned how to think like an engineer. They have learned to maniacally scope things. They ship lean prototypes using simple methods, and then iterate. They don’t overcomplicate things.

They also know when to come up for air. They don’t work alone in a silo. They have learned that gave conversations about imperfect work will only facilitate collaboration and help them move faster.

--

--

Claire Longo
Claire Longo

Written by Claire Longo

Full Stack Data Scientist/Machine Learning Engineer, Recommender Systems Specialist, ML Platform Builder, Central ML Team Advocate, bad Poker Player (try me)

Responses (3)