free webpage hit counter

4 Common Machine Learning Pitfalls and How to Avoid Them

Machine learning is one of the hottest topics in technology today—and for good reason.

It has tremendous potential to automate or semi-automate some of the tedious tasks faced by knowledge workers – and major tech companies are already beginning to realize the greater potential.

For example, machine learning can help reduce manual labor on the following tasks by 50% or more:

We are on the precipice of unlocking this value as machine learning applications become more widespread. An Algorithmia study revealed that 76% of enterprises will prioritize artificial intelligence (AI) and machine learning (ML) over other IT initiatives in 2021.

Yet, most machine learning initiatives fail. (Also Read: The Promises and Perils of Machine Learning.)

While there are myriad reasons why ML pilots never take off, the most pressing problems can be traced back to four main pitfalls:

  1. Lack of trade compatibility.
  2. Poor machine learning training practices.
  3. Data quality issues.
  4. Deployment complexities.

Let’s explore each of these and suggest some solutions for data teams and organizations to avoid them.

1. Lack of trade compatibility

The original sin of machine learning is how most of these projects are born.

Often, a group of data scientists think of machine learning projects, “This data is interesting; Wouldn’t it be cool if…”

And it’s a way of thinking that turns ML projects into science experiments.

A model in this type of project may still be able to produce something of value – but if the project doesn’t address an urgent and painful need, it won’t get the time or attention it needs from business stakeholders. Or worse, it could become something closer to blockchain: cool technology in search of a problem. (Also Read: An Introduction to Blockchain Technology.)

Machine learning projects should start by looking at the most urgent business priorities and assessing the resources needed to address them—instead of starting with the clean data at hand and then trying to figure out the problem they can solve.

Good questions to ask before starting a machine learning project include:

  • Is this issue urgent? According to WHO?
  • Why is machine learning the right solution for this problem?
  • How do we define success?

2. Poor machine learning training

Let’s say your project has a really hard and valuable business problem in its sights. The next step is collecting enough clean data to train the model.

Therein lies the paradox of data scientists: to eliminate labor for others, they must immerse themselves in it.

According to Anaconda, data scientists spend about 45% of their time on data preparation tasks, including loading and cleaning data.

After all this work, there is a significant chance that there may not be enough relevant or representative training data. And, like any other manual task, the risk of human error is introduced. (Also Read: Automation: The Future of Data Science and Machine Learning?)

Optimizing your ML model can also be challenging. It can be overadapted, where it learns more, and underadapted, where it learns less.

How a machine learning model can learn too much Well, you ask?

There is a famous example of a model trained to distinguish between huskies and wolves. It was very accurate during training, but began to fail in production. The problem? All the pictures of the wolves had snow in the background and the huskies did not. It was a snow detection model—not a wolf detection model.

Unfortunately, machine learning training is probably one test where you don’t want to score 100%.

3. Data quality issues

In training or deployment, it is almost impossible to have an effective machine learning model with bad data. As they say garbage in, garbage out.

The challenge is that machine learning models are data-hungry. They always want more data—as long as it’s reliable.

However, bad data can be introduced into good data pipelines in an almost infinite number of ways. Sometimes it can be a noisy inconsistency, where the error is quickly caught; Other times it may be a gradual case of data drift that reduces the accuracy of your model over time. Either way, it’s bad.

Because you’ve built this model to automate or address a painful business problem—so when accuracy drops, trust drops and the consequences are severe. For example, a colleague of mine talked to a financial company that was using a machine learning model to buy bonds that met certain criteria. Bad data took it offline and it was weeks before it was believed to be back in production. (Also Read: The Future of FinTech: AI and Digital Assets in Financial Institutions.)

The machine learning models that support the data infrastructure must be continuously tested and observed – ideally in a scaled, automated way.

4. Placement complexes

It turns out that deploying and maintaining machine learning in production takes a lot of resources. Who knew?

Well, Gartner did. By 2025, AI is projected to be the top class driving infrastructure decisions, driving a tenfold growth in compute requirements due to the maturation of the AI ​​market.

This requires a lot of support from business stakeholders, which is why business alignment is so important. For example, former Uber data product manager Atul Gupte led a project to improve the organization’s data science workbench, which data scientists used to facilitate collaboration.

Data scientists are currently automating the process of validating and verifying worker documents required when applying to join the Uber platform. This is a great plan for machine learning and deep learning, but the problem is that data scientists routinely hit the limits of available compute.

Gupte researched multiple solutions and identified virtual GPUs (then an emerging technology) as a possible solution. While the price was high, Gupte justified the cost with leadership. The plan wasn’t going to save the company millions, but supported a key competitive differentiator.

Another example is that Netflix never moved its award-winning recommendation algorithm into production, opting instead for a simpler solution that was easier to integrate. (Also Read: How AI is personalizing entertainment.)

How to avoid these pitfalls

Don’t let these challenges stop you from starting your machine learning initiative.

Mitigate these risk factors by:

  • Getting stakeholder buy-in early and aligning often.
  • Iterate in a DevOps style.
  • You have proper training data and monitor quality before and after production.
  • Keep in mind production resource constraints.

As Tom Hanks says in “A League of Their Own,” “If it wasn’t hard, everyone else would be doing it. Difficulty is what makes it great.”

Leave a Reply

Your email address will not be published.

Previous post How to power through boring tasks
Next post How to get into the Halo Infinite Campaign Co-op Beta