Internet is losing its memory, remedies are needed to stop the wound

Be careful what you put online because once something is on the Internet, it stays there forever. We were used to thinking before we put content on the net to avoid negative consequences in the future. In reality, however, not everything is preserved on the Internet; on the contrary, the Internet is slowly losing its memory.

The alarm has been sounded by a study by the Pew Research Center, which, through a lengthy process of analysis, found that 25% of web pages that existed between 2013 and 2023 are no longer traceable today. By monitoring a sample of just under one million web addresses from the Common Crawl archive service, which collects screenshots of millions of web pages catalogued by date, as many as one in four are no longer available.

38% of web pages in 2013 no longer exist

As you might expect, the percentage of missing pages increases the further back in time you go, so much so that looking at 2013 alone, 38% of pages that existed ten years ago point to an error. By 2021, one out of every five pages will be unavailable, and by 2023, the percentage of inaccessible content will be reduced to 8%.

To understand this phenomenon, it is necessary to understand the reasons for the disappearance of URLs. Some websites are inaccessible but belong to an active domain, or addresseses are inaccessible because the entire domain no longer exists. As a result, attempts to connect to these sites return different types of error: from the more familiar 404 (Not Found) and 502 (Bad Gateway) to the less familiar 410 (Gone), 500 (Internet Server Error), 501 (Not Implemented), 503 (Service Unavailable) and 523 (Origin is Unreachable). 

Digging deeper into the analysis reveals some worrying details. For example, 23% of news sites contain at least one broken link, with a slightly lower percentage (21%) for government sites. A separate issue is Wikipedia, the primary source of information for many people worldwide, which, according to the study, contains one or more broken links in more than half of the 50,000 entries examined.

Internet losing its memory

The phenomenon does not even spare social media. Let’s take X as the leading platform for sharing news. One in five posts is no longer visible even a few months after publication (the negative record belongs to Arabic and Turkish language content, which in four out of ten cases is no longer available after just 90 days). To get an idea of how things work, of the 4.8 million tweets analysed by the Washington-based study centre between 8 March and 27 April 2023, 18% were invisible after 15 June of that year, in most cases because the account that posted the message no longer existed.

An eternal present that leaves no trace

There were several reasons for the disappearance of web pages: deleted addresses, moved content, pages hosted on domains no longer renewed by their owners and therefore inactive, and direct links to deleted documents. The scenario has also been aggravated by the way we use social media, which has accelerated the abandonment of the era of the static web in favour of content created to crystallise a fleeting moment, only to be forgotten as it is replaced by the next moment, immortalised in the next post. A trend that explains the success of Stories on Instagram and, more generally, short-lived clips.

The Pew Research Centre study says that we are completely immersed in a virtual world that lives in an eternal present without leaving any trace of the future. This is something to think about in order to find effective solutions without wasting too much time.

Alessio Caprodossi is a technology, sports, and lifestyle journalist. He navigates between three areas of expertise, telling stories, experiences, and innovations to understand how the world is shifting. You can follow him on Twitter (@alecap23) and Instagram (Alessio Caprodossi) to report projects and initiatives on startups, sustainability, digital nomads, and web3.