Do you think most of the people around you know at least a few HTTP status codes? Unfortunately, if any – they’re likely to familiar with at least 404 Not Found and 500 Server Error. If you spend enough time on the web, you may encounter these errors quite often – if not daily. The Internet Archive’s new Wayback Machine extension for Firefox and Chrome can help you get to the page you were looking for even after it has been removed from the web.
The Internet Archive have been archiving copies of webpages for almost 15 years. They don’t have every webpage ever created in their collection, but if you’re looking for a page that was available publicly for some time and was of any notoriety to anyone: there’s a fair chance that the page will be part of their collection.
The whole collection of webpages is available to anyone through the Wayback Machine where you can lookup any page by its URL. I do this at least a dozen times per month when encountering pages or whole websites that have disappeared off the public web.
When links stop working and pages go off the web, we call it “link rot”. A large portion of the blame for link rot falls on webmasters who don’t care properly for their old links when moving, restructuring, or changing publishing platform. Server decay, bankruptcies, link rot, an aging online population, and marketers who recommend their clients to “delete all old pages!” or attempt to rewrite their company’s history all contribute to link rot.
One thing is for sure; link rot is a real problem that affects us all and it’s only going to get worse over time.
Dead link revival built in to the web browser
The Internet Archived and Mozilla started working together on integrating the Wayback Machine in Firefox around . Mozilla was evaluating whether they can deliver a better web experience without link rot and with access to long-dead pages by partnering with the Internet Archive and baking the feature directly in to Firefox. In dreadful institutionalized language:
The result of this cooperation was the No More 404s experiment, part of Firefox’s Test Pilot program. Users who’ve opted in to the experiment would see a dialog offering to serve an archived version of the current page if the page was unavailable on the web but a copy existed in the Internet Archive’s collection.
The original plugin version only detected and offer archived versions for 404 Not Found error pages. However, I contributed a small patch that expanded its coverage to include common server-side, cache-proxy, and temporary error situations as well. This greatly increased the usefulness of the plugin as it covers more reasons that can make pages become unavailable. The experiment’s original name lost its meaning in the process, however.
The Test Pilot experiment is still being evaluated in Firefox, but a new version of the plugin (actually runs practically the exact same WebExtension code as the Firefox experiment) was renamed Wayback Machine and released as an extension for Google Chrome . The plugin is only offered through Test Pilot for Firefox, and isn’t available as a stand-alone extension for Firefox through the Add-ons Catalog.
The Chrome extension get 22 % of the total number of users who use the Firefox extension after 5,5 months in just one month without any marketing push from the browser vendor. If Mozilla decides to bake Wayback Machine lookups into Firefox, then we might see a lot more interest in the feature from Firefox users and privacy researchers.
Update (): The Wayback Machine extension is no longer available in the Google Chrome Store. There are other extensions available with similar functionality.
You can get the Wayback Machine extension in the Google Chrome Store or from the Firefox Add-Ons Directory.
Using the plugin has some implications for your browsing privacy. The address of any broken page, as identified by its HTTP status code, will be sent to the Internet Archive so that they may determine whether they’ve an archived copy of the page available or not. The Internet Archive have a long-winded and archaic privacy policy not updated since that covers access to their collections. The plugin uses HTTPS to communicate with the archive and all archived pages are retrieved over HTTPS.