If you want to know when new articles go
online,
subscribe to the WebWord.com
Usability Newsletter!
Web
Sites That Heal
Article by John
S. Rhodes
Abstract
The first purpose of this article
is to explain the true causes of linkrot. The second purpose is to outline a
new way to solve the linkrot problem.
Understanding Linkrot
Users are frustrated with linkrot.
Web sites are updated, links are changed, and pages are eliminated. People
cannot find what they want because pages are missing, or seem to be missing.
This is a serious problem. It has also reached a point where it is downright
silly. While doing research for this article, I found a page
that defined linkrot. Near the bottom of that page, there were links to
some articles on linkrot. I decided to follow one of the links. When I
reached that page (screenshot), I was
given the following message:
"Can't find what you were looking for? Some older articles published on ZDNet have moved, or are no longer available. For up-to-date reviews on hardware and software for your business, browse this site or use the search form above. For all current and past PC Magazine articles, you can subscribe to our Tech InfoBase library."
Even pages about linkrot suffer
from it.
Jakob Nielsen wrote an article
about 4 years ago that described how
to fight linkrot. I think a problem with that article is that Nielsen
doesn't really talk about the causes of linkrot. He makes the leap from
defining linkrot to how to eliminate it. This might seem like a good idea
but it leaves out the conversation about the true causes of linkrot.
It doesn't make sense to jump to a linkrot solution without understand the
root causes.
Now that I have thrown down the
gauntlet, what causes linkrot?
- Content management
systems (CMS) , which almost always rely on databases and
server-side scripting, generate nasty URLs. Those nasty URLs are
terrible to look at and they are often long. This is a problem if you
are trying to send your friends the URL. But more importantly, those
URLs are apt to change. If you change your CMS, your URLs will probably
change.
- Poor information
architecture is another problem. When we sites are small and when
they are maintained by just a few people, the architecture is relatively
stable. However, as soon as you introduce workflow and complex content
management systems, the architecture is probably going to change. As a
web site grows, it gets more complex. Most architectures simply can't
use those architectures and URLs often change to reflect the growth.
- Web sites aren't tested
for usability. They are designed and deployed without consideration
for how people will actually use it. When sites are not properly tested,
missing URLs are not caught. Worse, some web sites are tested for both
usability problems and broken links yet the sites are not updated. Just
because people are aware of usability problems, it doesn't mean that
those problems get fixed. Usability problems are usually the last to get
fixed.
- Some web site owners simply
don't care about linkrot. There are many sites that suffer from this apathy.
- It is hard to inform
other web site owners that your site has changed. This is important.
The idea is that if I change my one web page, I would need to contact
every other web site that links to that page. It can be hard enough to
change your own pages let alone every other site that links to them.
- Telling people to update
their bookmarks is poor usability. It is not instant and it is
not driven by the site. Users are forced to do work because the web site
has made a change. This is what I'll call bookmark linkrot. It is
the result of general linkrot.
- Setting up page redirections
is not 100% simple and easy. While it makes sense to reduce linkrot by
offering page redirections, it is not something that happens
automatically when a web site changes. Webmasters and web site
developers usually have other issues to deal with, such as deadlines,
editing, security, and debugging code. Once again, usability plays second
fiddle.
- Content is posted openly but
is subsequently hidden behind registration systems and security.
- Content is seen as being
temporary. For example, contests usually are open for a limited
time. Things like tradeshows and conferences are also dependent on a
particular date. When the time of these activities passes, web site
owners often yank the material which causes linkrot.
- Government intervention
will causes some web sites to remove pages. And, in places like China
and Saudi Arabia, content is filtered all of the time. Furthermore, the
U.S. government has found ways to shut
out people from some web sites if they contain information that they
want. This kind of activity leads to what I'll call censorship
linkrot.
- URLs wrap in email
messages. Many users send other users URLs. When the URLs are too
long, they wrap in email clients. When the users try to cut and paste
the URL they don't get it in its entirety and this yields a broken page.
This is not typical case of linkrot, but it is quite similar.
How Bad is Linkrot?
Nielsen's ideas about fighting
linkrot make sense: Never let URLs die and set up page redirects.
However, I think these ideas are dated. In 1998 Jakob Nielsen stated the
following:
"...linkrot contributes
to dissolving the very fabric of the Web: there is a looming danger that
the Web will stop being an interconnected universal hypertext and turn
into a set of isolated info-islands. Anything that reduces the prevalence
and usefulness of cross-site linking is a direct attack on the founding
principle of the Web."
Linkrot is still a problem, but
it is hardly dissolving the fabric of the web.
First, users are rarely
stopped by linkrot. If they want to find information on a topic they can
almost always find it. People are incredible. They can route around the
hardest problems. We're good at solving them. In many ways, linkrot is
merely an inconvenience.
Second, it is rare that only
one web page has just the right information a person needs. If the idea is
important, many other web sites will contain that information. Some
technical journal articles might be excluded from this generalization, but
most people don't want to read highly technical information. Again, linkrot
is an inconvenience. It is a usability problem.
Linkrot is bad and it is a
problem. However, it is not destroying the web. On the other hand, can URLs
really live forever? Can we always easily set up redirections? The answer to
both questions is no.
The core problem is that people
expect stability but the web is not stable.
Google?
I think Google has solved some
problems associated with linkrot.
- First, they offer cached
versions of web pages. The cache doesn't always last long but
it does give people an idea if they are on the right track.
- Second, Google does such a
great job indexing web sites. If you can't find something in your
bookmarks, or if a page moved, Google has probably found that page. Google
finds what is lost.
- Third, Google offers a similar
pages function. If you find the page you want on Google but it is
broken (yes, they do index broken pages sometimes) then you can find similar
pages and route around the linkrot. This is not always elegant, but
it does work for many people.
In short, Google is fighting
linkrot and helping a lot of people. But
is Google enough? Further, do we really want to trust Google to become the
gatekeeper of the web? Will Google last? Can we depend on one company to
help people get around the problems of linkrot? I really like Google, but I
don't think they are the ultimate solution.
Web Services and the Semantic Web to the Rescue
Web sites need to heal each
other.
My proposal revolves around web
services and the semantic
web. Simply stated, web sites need to talk to each other as they are
changed. If my web site changes, it should attempt to locate all other
web sites that link to mine. At the primitive end of things, my site could
at least send the webmasters of those sites the information without my help.
A more sophisticated solution would be to have my web server talk to other
web servers and ask them to (automatically) change their links.
Why do humans have to be
involved in the communication about changing URLs? Webmasters are human and
want to do what they do best. They are not good at maintaining links which
is a manual process in most cases. It is boring and tedious. Web servers
don't mind doing this work, and they are good at it. The process could take
place entirely behind the scenes. It could potentially work with sites
created with HTML or XML. The idea works either way. In effect, It is a
sophisticated search and replace exercise that can take place between two or
more servers.
I proposed building a simple
open source web server tool to solve the linkrot problem. Perhaps the
idea would be to build it right into web
servers (e.g., Apache). The software could do reverse lookups on links
to your pages using your referrer logs or even by piggybacking on search
engines. Once the incoming links are found to your broken page(s), the
software would attempt to reach the web site that is linking to those pages.
A handshake could occur between the servers and then your server could send
the repair information to the other server. It can either be queued up to be
authorized by a human or it could happen automatically if the handshake is
secure.
I challenge someone to solve
the linkrot problem by writing this software. I would be willing to help
develop a more detailed project specification if someone would write the
code and make it open source. Just think of the implications.
While I am waiting, I promise
not to change the URL of this page.
Comments?
Please send them to me: john@webword.com
I want to know what you think about this article.
What next?
|