5.
Checking a
complete site

By now you will have an idea of what Xenu can do for you, and what steps are involved in creating a linkreport, either in web form or in Excel.

Checking a complete site is even easier!

First change the depth you want to scan from 1 to 999 under:

Just copy the URL of the top level index page of the site you want to scan in the field under:

Note: if your site uses a frame based domain name redirect service, don't start with this index page.== use the “true” start page of your site.

Xenu will now scan the complete site for you. This may take a while. Larger websites typically have over 10.000 links!

Two caveats are in order here:

The index page may have very few links, so Xenu cannot reach the deeper levels of the site and/or most links to deeper levels of the site are embedded in javascript in the navbar.

Since Xenu cannot “read” javascript, these links are missed, and the site is not fully spidered. A solution for this issue will be offered in Chapter 7. Checking Javascript links.

Three things should be noted when spidering a complete site:

What does that mean?

Checking of “external” links is optional

When a start URL is entered into Xenu, say www.nrc.nl, only pages within this website are spidered. Links to other sites are considered to be “external”. These are checked, but there the spidering process stops.

Otherwise, Xenu would not know when to stop and would spider all of the Internet !

If you are not interested in checking external links, you can unselect this option under Check external links:

You may have noticed that some settings are connected to the URL in the top field. Xenu “remembers” these settings in a text file that is generated on your desktop. (xenu.ini). Always be sure the correct URL is visible in the field, before you change the settings.

As noted before, excluding external links is not possible when you use a text file as start page. However, Xenu will not spider beyond the root of the URLs listed in the text file.

Some external links can be considered as “internal”

If you want to scan two site at once, you can tell Xenu to see the links of the second site as “internal”.

Some internal links can be excluded from the linkscan

Alternatively, you can exclude pages from your linkscan that are part of the site you want to scan.

These URLs will be marked as “User skip” in the linkreport.

For example:

Always use absolute URLs, starting with http:// !

Monitoring progress

When Xenu is running, you can monitor its progress by looking at the bottom right of the status bar.

You will see for example:

2097 of 2481 URLs (84%) done

this means Xenu has detected 2481 URLs, 84% of which have been scanned. The remaining URLs are marked as

Both figures keep increasing until the scan is complete (100% done).

Broken links are marked as “not found” in the report. To see them all at a glance:

You will see all broken links displayed in red. (By pressing CTRL-B again, the complete list of URLs is displayed again.

Watching the broken links drop in during a scan (with CTRL-B enabled) is fun, the longer you see a white screen, the better your site is !

For whatever reason, some links get a “Operation cancelled” or “Connection aborted” status message. This can be due to a temporary hiccup of the Internet. In those situations use:

All “broken” links are rescanned, this usually decreases the number broken links. You can even do this several times, until all “operation cancelled” or “timout” messages have been resolved.

If that does not help, lower the number of threads to about 10, under:

.


0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10