WebWord.com > Moving WebWord > Web Sites That Heal (10-March-2002)


If you want to know when new articles go online,
subscribe to the WebWord.com Usability Newsletter!

Web Sites That Heal

Article by John S. Rhodes


Abstract

The first purpose of this article is to explain the true causes of linkrot. The second purpose is to outline a new way to solve the linkrot problem.


Understanding Linkrot

Users are frustrated with linkrot. Web sites are updated, links are changed, and pages are eliminated. People cannot find what they want because pages are missing, or seem to be missing. This is a serious problem. It has also reached a point where it is downright silly. While doing research for this article, I found a page that defined linkrot. Near the bottom of that page, there were links to some articles on linkrot. I decided to follow one of the links. When I reached that page (screenshot), I was given the following message:

"Can't find what you were looking for? Some older articles published on ZDNet have moved, or are no longer available. For up-to-date reviews on hardware and software for your business, browse this site or use the search form above. For all current and past PC Magazine articles, you can subscribe to our Tech InfoBase library."

Even pages about linkrot suffer from it.

Jakob Nielsen wrote an article about 4 years ago that described how to fight linkrot. I think a problem with that article is that Nielsen doesn't really talk about the causes of linkrot. He makes the leap from defining linkrot to how to eliminate it. This might seem like a good idea but it leaves out the conversation about the true causes of linkrot. It doesn't make sense to jump to a linkrot solution without understand the root causes. 

Now that I have thrown down the gauntlet, what causes linkrot? 

  • Content management systems (CMS) , which almost always rely on databases and server-side scripting, generate nasty URLs. Those nasty URLs are terrible to look at and they are often long. This is a problem if you are trying to send your friends the URL. But more importantly, those URLs are apt to change. If you change your CMS, your URLs will probably change.
     
  • Poor information architecture is another problem. When we sites are small and when they are maintained by just a few people, the architecture is relatively stable. However, as soon as you introduce workflow and complex content management systems, the architecture is probably going to change. As a web site grows, it gets more complex. Most architectures simply can't use those architectures and URLs often change to reflect the growth.
     
  • Web sites aren't tested for usability. They are designed and deployed without consideration for how people will actually use it. When sites are not properly tested, missing URLs are not caught. Worse, some web sites are tested for both usability problems and broken links yet the sites are not updated. Just because people are aware of usability problems, it doesn't mean that those problems get fixed. Usability problems are usually the last to get fixed.
     
  • Some web site owners simply don't care about linkrot. There are many sites that suffer from this apathy
     
  • It is hard to inform other web site owners that your site has changed. This is important. The idea is that if I change my one web page, I would need to contact every other web site that links to that page. It can be hard enough to change your own pages let alone every other site that links to them.
     
  • Telling people to update their bookmarks is poor usability. It is not instant and it is not driven by the site. Users are forced to do work because the web site has made a change. This is what I'll call bookmark linkrot. It is the result of general linkrot.
     
  • Setting up page redirections is not 100% simple and easy. While it makes sense to reduce linkrot by offering page redirections, it is not something that happens automatically when a web site changes. Webmasters and web site developers usually have other issues to deal with, such as deadlines, editing, security, and debugging code. Once again, usability plays second fiddle.
     
  • Content is posted openly but is subsequently hidden behind registration systems and security
     
  • Content is seen as being temporary. For example, contests usually are open for a limited time. Things like tradeshows and conferences are also dependent on a particular date. When the time of these activities passes, web site owners often yank the material which causes linkrot.
     
  • Government intervention will causes some web sites to remove pages. And, in places like China and Saudi Arabia, content is filtered all of the time. Furthermore, the U.S. government has found ways to shut out people from some web sites if they contain information that they want. This kind of activity leads to what I'll call censorship linkrot.
     
  • URLs wrap in email messages. Many users send other users URLs. When the URLs are too long, they wrap in email clients. When the users try to cut and paste the URL they don't get it in its entirety and this yields a broken page. This is not typical case of linkrot, but it is quite similar.


How Bad is Linkrot?

Nielsen's ideas about fighting linkrot make sense: Never let URLs die and set up page redirects. However, I think these ideas are dated. In 1998 Jakob Nielsen stated the following:

"...linkrot contributes to dissolving the very fabric of the Web: there is a looming danger that the Web will stop being an interconnected universal hypertext and turn into a set of isolated info-islands. Anything that reduces the prevalence and usefulness of cross-site linking is a direct attack on the founding principle of the Web."

Linkrot is still a problem, but it is hardly dissolving the fabric of the web. 

First, users are rarely stopped by linkrot. If they want to find information on a topic they can almost always find it. People are incredible. They can route around the hardest problems. We're good at solving them. In many ways, linkrot is merely an inconvenience.

Second, it is rare that only one web page has just the right information a person needs. If the idea is important, many other web sites will contain that information. Some technical journal articles might be excluded from this generalization, but most people don't want to read highly technical information. Again, linkrot is an inconvenience. It is a usability problem.

Linkrot is bad and it is a problem. However, it is not destroying the web. On the other hand, can URLs really live forever? Can we always easily set up redirections? The answer to both questions is no.

The core problem is that people expect stability but the web is not stable


Google?

I think Google has solved some problems associated with linkrot. 

  • First, they offer cached versions of web pages. The cache doesn't always last long but it does give people an idea if they are on the right track. 
     
  • Second, Google does such a great job indexing web sites. If you can't find something in your bookmarks, or if a page moved, Google has probably found that page. Google finds what is lost.
     
  • Third, Google offers a similar pages function. If you find the page you want on Google but it is broken (yes, they do index broken pages sometimes) then you can find similar pages and route around the linkrot. This is not always elegant, but it does work for many people. 

In short, Google is fighting linkrot and helping a lot of people. But is Google enough? Further, do we really want to trust Google to become the gatekeeper of the web? Will Google last? Can we depend on one company to help people get around the problems of linkrot? I really like Google, but I don't think they are the ultimate solution. 


Web Services and the Semantic Web to the Rescue

Web sites need to heal each other. 

My proposal revolves around web services and the semantic web. Simply stated, web sites need to talk to each other as they are changed. If my web site changes, it should attempt to locate all other web sites that link to mine. At the primitive end of things, my site could at least send the webmasters of those sites the information without my help. A more sophisticated solution would be to have my web server talk to other web servers and ask them to (automatically) change their links. 

Why do humans have to be involved in the communication about changing URLs? Webmasters are human and want to do what they do best. They are not good at maintaining links which is a manual process in most cases. It is boring and tedious. Web servers don't mind doing this work, and they are good at it. The process could take place entirely behind the scenes. It could potentially work with sites created with HTML or XML. The idea works either way. In effect, It is a sophisticated search and replace exercise that can take place between two or more servers.

I proposed building a simple open source web server tool to solve the linkrot problem. Perhaps the idea would be to build it right into web servers (e.g., Apache). The software could do reverse lookups on links to your pages using your referrer logs or even by piggybacking on search engines. Once the incoming links are found to your broken page(s), the software would attempt to reach the web site that is linking to those pages. A handshake could occur between the servers and then your server could send the repair information to the other server. It can either be queued up to be authorized by a human or it could happen automatically if the handshake is secure. 

I challenge someone to solve the linkrot problem by writing this software. I would be willing to help develop a more detailed project specification if someone would write the code and make it open source. Just think of the implications.

While I am waiting, I promise not to change the URL of this page.

 

Comments?  

Please send them to me:  john@webword.com  I want to know what you think about this article.

 


What next?


Home | Services | Moving WebWord | Cool Books | Hot Web Sites | Reports
Newsletter Archive | Weblog Archive | Interviews | About WebWord

Subscribe to the Webword.com Newsletter
Receive the best free usability newsletter on the Internet.

 

Contact John S. Rhodes, the WebWord.com Editor and Webmaster

URL: http://webword.com/moving/healing.html

© 2002 by WebWord.com. All rights reserved.
Do not reproduce or redistribute any material from this document,
in whole or in part, without explicit written permission from WebWord.com.