Analyzing the data presented on the traffic report. The analysis is broken up by the report headings.
An AWStats traffic report has many great pieces of information contained in it.
Something to consider when analyzing web stats is to take into account any regional, national or international holidays, festivals, religious events, major sporting events and seasonal trends.
This is the first place I head. It is amazing how many broken links exist on websites. Be sure to visit AWStats: Page not found to see a list of problems and causes.
Note that if you passworded your awstats directory (recommended) you may see a lot of references to icons and images in the awstats directory not found.
This is the place you can discover spambots and bandwidth hogs.
Hosts has 5 columns: host, pages, hits, bandwidth, and last visit.
If pages and hits are nearly the same or are the same number this is an indication that the site is grabbing one or two pages from your site and using it for hotlinking or a splogger stealing your content via an RSS feed. Typical users request not only the web page but supporting items such as images and css pages.
Check out the IP address to see if it is one you want to ban. We use whois domain tool for IP or domain analysis. The one listed in the example below was from an unknown source in China. Over the course of one day it consumed 10 MB of bandwidth.
We ban any unknown IP or domain which uses a bandwidth of more than 2 MB. Make sure you do not ban your own IP address!
The example below is how to block IP addresses or domains in an htaccess file (used on Apache servers).
#START Blocking order allow,deny deny from 60.12.136.66 deny from .mydomain.com allow from all #END Blocking
The host analysis spreadsheet resource is a helpful tool in providing IP analysis.
Contains the keywords and phrases that people typed in to reach your website. This provides valuable insight on what you think are keywords vs. the public.
Shows the top robots and spiders. Make sure the important ones are visiting, such as Googlebot and Yahoo Slurp. If not, they may be blocked or your site is banned.
The referring URL is missing from the web page request. This can occur if someone types in the URL in the address bar or sends a link in an email. Since there is no referrer URL you cannot easily find the origin.
You can tell which search engines are sending the most traffic by the page number. If you don't see the major search engines, for example Yahoo, your site may be banned or have other problems.
The links displayed are links that someone clicked on or invoked (such as an ad) to your site. There are two columns of numbers: pages and hits. Pages are the number of pages clicked on your site while hits are the number of "hits" on your site. (see Hits vs. Pages below). You can:
If you see odd URLs they are probably referer (sic) spam. DO NOT click on them...that is their intention. Also, since some site publish their server logs or the logs are publically (i.e. search engine) accessible, the inbound link to their site gives them slight link juice from your site - at no cost to them and without their permission.
To rid your logs and site of referrer spam you can modify the htaccess file on an Apache server:
Options +FollowSymlinks RewriteEngine On RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite1.com.*$ [OR] RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite2.com.*$ [NC,OR] RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite3.com.*$ [NC,OR] RewriteCond %{HTTP_REFERER} ^http://(www\.)?spammersite4.com.*$ [NC] RewriteRule .* - [F,L]
Change spamsite.com to the bad referrer sites. The bad sites will be given a Forbidden error and not appear in your server logs.
For a detailed explanation, the web page stop referrer spam shows how to block bad referrer URLs. This is for Apache servers only.
In the full view mode, do you have entries with no page values (i.e. only hit values)? If so, then something (usually images) is being served other than the web page itself. One possibility is the site is hot-linking (stealing) your images. In other words, they display your image on their site. It can also be a search engine indexing your images.
Shows the clicks on internal links on your website.
The top 10 web pages, ordered by viewed, are shown. Four columns exist: viewed, entry, exit, average size.
The number of times the page was viewed. The most popular are at the top.
Clicking on the entry link will order the pages by entry; that is the first page that someone clicked to. This is very valuable information since it shows what people see when they first visit your site. They get there through search engine clicks, email link clicks, social bookmarks and others.
This shows the *last* page someone viewed before leaving your site.
Other codes that are bad should be tracked down, too.
Someone trying to access a page or graphic in an area that requires login.
Definitely check out the 404 (document not found) codes. They are an indication of broken links, missing pages, missing graphics, or an attack attempt.
This is an estimated value of how many times someone has added one of your website pages to their bookmarks. AWStats looks at the number of requests for the favicon.ico file so this indicator is nebulous.
A web site's home page is actually a group of files - for example: one text file (index.html), one style sheet to indicate formatting (CSS), six image files (GIF, ICO, and PNG), and some dynamic client-side logic (JavaScript) stored in two separate files on the web server. Calling up the home page will result in ten file requests to the web server, and thus ten hits:
Along with bandwidth consumption, hits can be useful as an input for server sizing and capacity planning. While people make much of hits to tout the success of a site, hits have no intrinsic business value. Representations to the contrary probably indicate a lack of understanding of how futile hits are as a useful business measure.
As the internet has matured, more sophisticated attention turned from hits to pages. Unfortunately, there is no standard definition of a page. A web server log file simply contains information on objects requested from the web server. It is up to the web server log file analysis software to give semantic meaning to those objects.
AWStats works by exclusion in defining a page. By default, any object accessed by a user on your web server is a page unless it is listed in the NotPageList parameter. You must explicitly add any other objects you do not want to count as pages in AWStats reports. For example, add ZIP archives and Flash animation files to this list by adding their suffixes to the NotPageList directive in the AWStats configuration file:
NotPageList="css js class gif jpg jpeg png bmp ico swf zip tgz gz tar rss xml rdf"
Then AWStats will count everything but the following as pages:
Suffix | Description |
---|---|
css |
Cascading Style Sheet formatting instruction files |
js |
JavaScript files |
class |
Java program files |
gif , jpg , jpeg ,
png , and bmp |
Various image/photo formats |
ico |
An image icon file; many sites have a company logo saved as favicon.ico; many browsers use this in bookmarks (favorites) and tabs |
swf |
Shockwave Flash animation |
zip , tgz , gz , and
tar |
Archive formats created by PKZip, WinZip, tar, gzip, and others |
rss | rss news feed |
xml | xml file |
rdf | resource description framework |
One advantage to this approach is that if you are using a CGI to generate dynamic pages, you do not have to worry about each CGI query counting as a page--this will be automatic.
Bandwidth consumption is of interest to technical staff, as there is usually an economic cost associated with its use. On a more granular level, large individual file sizes will indicate performance issues, especially for dial-up users.
Next: Tasks