What does it do?
Check for links to web addresses that don’t exist, or which return an error. These are known as broken links.
Why is it important?
Broken links are very common and happen to almost any website (usually because one party changes or removes a page, which another party has linked to, but doesn’t know). Testing these regularly with an automated tool is the quickest and easiest way to avoid them.
How is it measured?
Conventional HTML links and Meta Refresh links are checked by this test. Each link within the website is tested to see whether it returns a valid response. There are four potentially ‘bad’ responses:
- Page not found (ungraceful) – when asked for the page, the webserver simply replied saying ‘that page was not found’ (a HTTP 404 response). No HTML was sent by the server, so the error will be displayed by the user’s web browser, usually a relatively poor user-experience.
- Page not found (graceful) – when asked for the page, the webserver replied with a full HTML page, but marked it as ‘not found’ (a HTTP 404 response). This is the best way to handle broken links, as the user will have seen something – ideally a professionally made and useful error message, explaining the problem.
- Host not found – the website itself was not found to exist (the hostname was not found). For example, a link to www.this-domain-does-not-even-exist.com wound fail in this manner. Such errors are always handled by the user’s web browser, which cannot be avoided.
- Timed out – the page did not respond quickly enough, and Sitebeam assumed the page would never load.
- Broken 404 header – when asked for the page, the webserver simply replied saying ‘that page was not found’ (a HTTP 404 response), however when the page was downloaded it did exist. Technically it replied with a 404 to a HEAD request, but with a valid response to a GET request. This usually means the code behind the website doesn’t handle HEAD requests properly: poor practice but not disastrous.
The score is based on a weighted balance of possible and definite broken links, plus links with header issues. For technical details, see below.
Links are found during the initial spidering process, and are added to a list to be checked in parallel. Accordingly there is some delay between a page being analysed and the links upon it being checked, although this is almost always inconsequential.
- Sitebeam attempts a quick HTTP HEAD request for each page, and if this returns a success code ignores the page.
- If the hostname is not found, that error is flagged.
- If the HEAD request returns any non-valid HTTP request (e.g. 404, 500 etc), a second check is made with a GET request. Depending on the outcome and HTML returned, an appropriate error is flagged.
A HTTP response of 403 is not considered a broken link. Some sites respond with this (e.g. Wikipedia) if large numbers of requests for broken links are made. Note that therefore a very low percentage of broken links – say 1% – can result in a terrible score for this test. This is by design, as broken links are damaging in small doses and typically a very small percentage of overall link volume. Limiters on maximum score exist because some very large sites exhibit a small percentage of broken links which are nevertheless significant. Awarding 9.99 (rounded up to 10) for a site with some minor flaws is less appropriate than awarding 9.9.
Sitebeam says I have a broken link, but it works fine for me
- The link was broken when Sitebeam tested it (e.g. if the website was down).
- The webpage may be returning an error code known as a “404”, which effectively says “Page not found”, even if it looks like a valid webpage. This is a technical problem with that website which would negatively impact SEO and should be fixed.
- The link is to a page which you can access, but Sitebeam cannot. E.g. the page may only be available on your corporate network.
Sitebeam did not find a broken link
- Sitebeam may not have tested the page containing the broken link. Click on the xxx pages were tested link at the top left of your report, then Advanced options. Try searching for the URL containing the broken link to confirm. If it wasn’t tested, your spidering settings may be wrong. See What to do if your website won’t test.
- The link is to an external website, and testing of external broken links is disabled. To check, view the website settings, click Configuration, and check the Use default broken link settings? Make sure that Don’t test external links is not checked.
- The website itself may be saying the page is OK, even if it is clearly broken. A missing webpage should return a HTTP 404 response, but some badly behaved webservers will return something else, which prevents Sitebeam from detecting the broken l ink. You can see what Sitebeam sees when it views a page by using the Test URL feature, under Account > Test URL.
How to improve this score
Review the list of broken links and fix them, either by removing the link, or pointing it to the correct address.
How to use this test effectively
This test should be run regularly and used as a key quality control mechanism.