The Amazing Ways Verizon Uses AI And Machine Learning To Improve Performance

Verizon’s FIOS fiber optic broadband keeps millions of US homes online. However, monitoring stability and reacting to faults and outages which affect customer experience takes huge amounts of resources.

Until recently, Verizon primarily relied on customer feedback to understand when the speed and quality of its service was falling short of expectations.

In recent years, however, following a large investment in analytics and AI-driven technology such as machine learning – in part subsumed through the company’s 2017 acquisition of Yahoo! and it’s research units – a different approach is bringing impressive results.

Now it’s predictive analytics algorithms monitor 3GB of data every second streaming from millions of network interfaces – from customers’ routers to an array of sensors gathering temperature and weather data, and software which “listens in” on operational data, such as billing records.

Verizon’s director of network performance and analytics, Matt Tegerdine, told me that in 2017 this analytics infrastructure allowed them to predict 200 “customer impacting” events before they happened and take steps to prevent them occurring.

He tells me “Essentially what we’re trying to do is listen to all of our network elements … there’s a tremendous wealth of data that we have coming from the different elements and we want to listen to them, translate them, run them through [predictive] models and ensure that there’re no interruptions to our customers.”

The strategy has been designed to be customer-focused from the ground up – with reducing customer dissatisfaction through poor service as the problem to be overcome.

It works by using machine learning algorithms to firstly establish the “normal” behaviors that are expected on the network. Then it identifies “outlier” data which sits outside this threshold of normal behavior, and attempts to recognize events that have led to the emergence of these outliers.

“The beauty of this is that we don’t just look at one singular data source like interface statistics – we’re also going out and collecting things like environmental statistics, CPU usage on routers … we use machine learning to learn what ‘normal’ is.

“It’s a very complex ecosystem of different data sources, and it’s that combination that drives a lot of insights and is where the value of analytics increases.”

The strategy will become increasingly important as it moves towards the goal of deploying the first residential and mobile 5G networks at the end of this year.

On its home networks, Verizon runs automated testing on a sample of 60,000 in-home routers every two hours, to ensure that customers are receiving the speed of service they are paying for.

As often happens with Big Data projects though, the insights are proving useful in ways other than those which they were originally intended for. Verizon has found that it is now able to use them to drive business decisions. Testing showed that the home routers were consistently able to operate at higher speeds than was previously thought. This meant the business was able to market its service as a 1-gigabit connection, where previously it was advertised as 750 megabits. This led to a huge upsurge in sales.

Improving service for existing customers remains the focus though – “Performance is our team’s main charter,” Tegerdine says, “We’re here as a silent advocate for the customer, behind the scenes, and our job is to work in that area.

“If a leak occurs or a router goes down hard, those are very easy to detect – what we want to know is could we have detected this? Could we have gotten ahead of this before the failure? Was it at all possible?”

They have also been able to detect manufacturing or production defects in the hardware and software provided by third parties which the network is built on. Microfractures in chips or operating system bugs often lead to faults or errors which, while non-fatal, nevertheless reduce service or cause annoyance to customers. These are traditionally far harder to detect that terminal errors which will have customers reporting in their thousands that they can’t get online.

“It may not be a total outage but there are circumstances where out of a group of 1,000 customers, perhaps 100 are experiencing buffering and their applications aren’t working smoothly … it’s not the premium network we want to provide.

“That’s kind of where my team lives”, Tegerdine tells me.

Verizon’s AI and Big Data infrastructure is built largely from open source components. The team heavily relies on Spark and Kafka due to their ability to handle very fast streaming network data in real-time.

“If you think about it, it makes sense,” says Tegerdine, “the data never stops flowing so we need real time processing to respond to it.”

The platform sits on Hadoop, and development work is carried out in Python and Java.

Another core strategy has been the deployment of “incubation teams”, comprised of specialists in different areas of data technology. These are comprised of data scientists, data engineers, data architects and, crucially, a data translator.

Data translators in particular play an increasingly critical role – and demand is forecast to grow across all industries for workers equipped with this particular toolset.

“The data translators are a very powerful and unique layer. They can speak the data science language but they also know the business – typically these are people we have pulled out of business functions.

“They become very important because, how do you get the insights from the data? Data scientists speak a particular language but data translators make it real. They’re the glue that ties it all together.”

As far as the future goes, Tegerdine is confident that AI (and machine learning in particular) will play an increasingly vital role in protecting and ensuring performance, and networks become bigger, faster and more complex.

“But it’s something we will get to through iteration,” he tells me.

“You don’t just wake up one morning and say, “We’ve developed artificial intelligence”, but that’s our north star … that’s the path we’re on because we want to get bigger, faster and more automated.

“Another name for it would be our Big Hairy Audacious Goal. We’ll fix what we can today and focus on the customer at every stage, but everything we do should be aligned towards that ultimate goal … how do we combine all these insights and automate them, and get them down to real-time, millisecond response times – and build self-healing networks – that’s our ultimate goal.”

Leave a Reply