If you’ve been having trouble getting onto Facebook and Instagram lately, you’re not alone. 

Since the start of the year, Meta‘s services have already experienced 33 outages – including two of the biggest ones since 2022. 

While it is easy to imagine that Facebook could be under siege by malicious hackers, the truth might actually be worse for the company. 

Speaking to MailOnline, tech experts have revealed that Meta may have created a system that is now too complex to keep running – particularly as the company continues to cut staff.

Worryingly, one expert has warned that the problems are only going to get worse, describing the outages as ‘existential’ for Meta. 

Cybersecurity experts told MailOnline that Meta’s service outages are due to the company creating a system so complex that it can no longer be properly maintained

Meta’s biggest service outages

December 2020

  • Facebook, Messenger, and Instagram went down for two hours due to an unexplained technical issue.  

October 2021

  • Meta’s biggest service outage in recent years in which Facebook, Instagram, and WhatsApp went down for five to seven hours. 

July 2022

  • Some Meta services were down for around two hours. 

October 2022

  • A ‘configuration change’ caused Facebook and Instagram to become inaccessible for about two hours. 

March 2024

  • Global outage left users unable to log onto Facebook, Instagram, WhatsApp, Threads, and even Meta Quest VR services for two hours.  

April 2024 

  • Instagram, WhatsApp and Facebook went down for users across the globe for two hours. 
<!—->

Advertisement

Meta’s issue, as cybersecurity expert Dr Junade Ali told MailOnline, is something called ‘technical debt’.

This, essentially, refers to the fact that big tech companies like Meta have built very complex pieces of the internet on the back of old-fashioned systems that don’t quite work.

Dr Ali says: ‘What happens is that there are these ‘legacy systems’  which people don’t have the time to fix.’

As Meta has grown and swallowed up services like Instagram and Whatsapp it has had to make more things work on the back of this technical debt. 

Each of these thousands of different systems and services speak to each other using something called an API, or Application Programming Interface.

These let the complex system work as a whole, but if something goes wrong in one API, the consequences can quickly spread to lots of different services.

This means that issues with routine updates and new features can trigger cascading effects that lead to outages big enough for users to notice.  

Dr Ali says: ‘When you work on a computer system like Meta you’re always releasing new features and doing maintenance…the key thing is to be able to recover quickly.

‘But when you aren’t able to keep on top of that housekeeping, then things start to become a lot more noticeable.’

On March 5 and April 3 this year, Meta services including Facebook, WhatsApp and Instagram all went down for about two hours. 

The issue appeared to be spread widely across Meta’s services with Threads and even Meta Quest VR-headset users being affected.  

At the time, Meta acknowledged that services were down and attributed the outages to a ‘technical issue’.

However, closer analysis can narrow down exactly what this ‘technical error’ might have been.

Although these disruptions were branded as ‘server outages’, Meta’s servers never actually went down and the site remained live the entire time. 

On April 5 users were unable to log into Meta services including Facebook, Instagram, WhatsApp, and Threads due to an authentication error

On April 5 users were unable to log into Meta services including Facebook, Instagram, WhatsApp, and Threads due to an authentication error

On April 5 users were unable to log into Meta services including Facebook, Instagram, WhatsApp, and Threads due to an authentication error

On April 3, another service outage led to 714 people reporting they were unable to access Facebook on Down Detector

On April 3, another service outage led to 714 people reporting they were unable to access Facebook on Down Detector

On April 3, another service outage led to 714 people reporting they were unable to access Facebook on Down Detector 

Neither is it very likely that Meta’s servers had been targeted by cybercriminals, although this can’t be ruled out entirely.

In the immediate aftermath of the service outage on March 5, the hacker group Anonymous appeared to claim responsibility for a cyberattack against the company. 

However, Angelique Medina, head of internet intelligence at Cisco Thousand Eyes, told MailOnline that human error was a more likely cause. 

A cyberattack such as a Distributed Denial of Service (DDoS) attack in which a company’s systems are overwhelmed by vast numbers of requests, would leave a clear trace.

Ms Medina explains: ‘If it’s something like a DDoS attack where you’re seeing lots of traffic that is flooding a particular service, you’re going to see the ripple effects across lots of different ISPs [internet service providers].’

In her analysis of network traffic around Meta services Ms Medina didn’t find any evidence of these ripple effects. 

Hacktivist group Anonymous seemed to claim responsibility for the outage, but it is common for hackers to falsely claim attacks in order to sow disinformation and bolster their credibility

Hacktivist group Anonymous seemed to claim responsibility for the outage, but it is common for hackers to falsely claim attacks in order to sow disinformation and bolster their credibility

Hacktivist group Anonymous seemed to claim responsibility for the outage, but it is common for hackers to falsely claim attacks in order to sow disinformation and bolster their credibility

These diagrams show connections to Meta's servers during the April 3 service outage. As the green colours indicate, all of the servers remained active, indicating the issue was on Meta's backend

These diagrams show connections to Meta's servers during the April 3 service outage. As the green colours indicate, all of the servers remained active, indicating the issue was on Meta's backend

These diagrams show connections to Meta’s servers during the April 3 service outage. As the green colours indicate, all of the servers remained active, indicating the issue was on Meta’s backend 

What is a DDoS attack?

A DDoS attack stands for Distributed Denial of Service.

In this attack, individuals bombard a website with so much traffic that its intended users can’t access it.

Some hackers will use hijacked computers called bot-nets to generate even more traffic.

The biggest attacks can be so strong they take whole sections offline.

They are a common tool of ‘hacktivists’ and less sophisticated cybercriminals. 

<!—->

Advertisement

This makes it far more likely that Meta’s developers released an update which interacted poorly with the rest of the infrastructure.

Ms Medina explains: ‘Typically, what we see with these kinds of outages is there might have been some kind of update that was being done to the application or underlying infrastructure.

‘The types of outages are, for lack of a better word, self-inflicted,’ she said.

While these two outages were the most noticeable, they are far from the only times Meta has seen service disruption this year.

In fact, Meta’s service disruptions appear to be getting worse over time 

There were 33 instances of ‘performance degradation’ between January 1 and April 5 – a 154 per cent increase on the same time period the year before. 

Because total outages are so expensive for the company, they are usually fixed quickly.

When they are not, the results can be nearly catastrophic as seen in October 2021 when Meta’s services disappeared for between five and seven hours.

The resulting loss in ad revenue was estimated to cost the company $100 million (£80m) and wiped five per cent of the company’s share price.

This makes the fact that Meta has just seen two global outages, each lasting over two hours, all the more troubling.

Dr Ali says: ‘Normally you would try to get mean recovery time down into the minutes.

‘Going into a few hours is quite concerning because it means something has gone wrong with the detection process or there’s been something going wrong in the recovery.’ 

If service outages for Meta's products like Facebook and Instagram continue to get worse, this could cost the company millions in lost advertising revenue and harm the share price

If service outages for Meta's products like Facebook and Instagram continue to get worse, this could cost the company millions in lost advertising revenue and harm the share price

If service outages for Meta’s products like Facebook and Instagram continue to get worse, this could cost the company millions in lost advertising revenue and harm the share price 

More worryingly, some experts are not optimistic that these systems will improve in the future.  

Cybersecurity expert James Bore told MailOnline he expects these problems to become ‘existential’ for Meta.  

He says: ‘They’re not going to improve, it’s never going to be better than it is now.

‘It’s going to keep on growing out of control and keep on decaying and becoming more fragile with more failures…people will lose faith and, eventually, I suspect it will just vanish.’ 

Speaking to Mr Bore, a Meta insider allegedly said: ‘They’ve got no control internally. 

‘They’re keeping it running mostly, and that’s about all they can do.’

The biggest problem, Mr Bore claims, is that the system has grown too large and does too many things for Meta to keep everything working.   

‘We get these systems which become more and more complex, with more and more shoddy code so they’re harder to work with and, as time goes on, you have to throw more and more resources at it,’ he says.

This also appears to be an issue Meta has been aware of for some time. 

In a 2019 internal Facebook meeting that was later leaked to The Verge, Meta CEO Mark Zuckerberg said that Facebook’s outages were becoming more severe.

In a leaked audio recording Meta CEO Mark Zuckerberg (pictured) told employees as far back as 2019 that the complexity of the company’s system meant that small issues were causing ‘systems to fall over’

Mr Zuckerberg told employees: ‘It’s not that there’s one technical being, except that just the complexity of the systems is growing. 

‘So things that previously would have just been a blip are now things that are causing systems to fall over.’

Things are now getting worse, Mr Bore and Dr Ali both told MailOnline, because companies like Meta are trying to cut back on costs like staffing. 

In May last year, Meta fired more than 10,000 staff on top of a prior 11,000 in November 2022, losing about 10 per cent of total staff each time. 

Mr Bore says: ‘We know from experience that if you take people away from a system, it rarely gets more stable. 

‘Even if they weren’t particularly good you’ve just lost 10 per cent of your hands on keyboards trying to keep this system running.’

This post first appeared on Dailymail.co.uk

You May Also Like

The Best Climbing Gear for Beginners

Climbing has never been more popular. Whether you’re discovering nearby outdoor climbs…

Elon Musk launches Starlink ‘space WiFi’ service in the UK – but it costs £89 a month

SPACEX appears to be launching its controversial “space WiFi” in the UK…

Is your boss spying on you? – podcast

With home working now well established, many companies are finding new ways…

How to Leave Your Photos to Someone When You Die

Google takes a slightly less morbid approach. You can configure Inactive Account…