Tweaking GoAccess for Analytics

GoAccess is an open source server-side web log analyzer. Server side means that it will process web server logs and compile the results into a real-time graphical view. It works with many web log Nearly all log formats such as Apache, CloudFront, and Nginx according to GoAccess’ website.
By default GoAccess will provide a
A complete graphical overview can allow you to spot patterns or quirks in your stack.
picture of the traffic to your server. This includes crawlers, bots, and various HTTP
requests. We can make some changes to GoAccess’ configuration to make it behave a bit more like a
Check out Matomo and Fathom.
web analytic
You can view this server’s analytics here.
“I believe tracking visitors at the client level deflates the actual number of visitors. On the other hand, server-side tracking gives you a more accurate number at the cost of not knowing for sure if the client is a human behind a browser.”
GoAccess allows command line flags and shell piping, but we’ll do most of the work from a central goaccess.conf
You can see the default config file here.
Let’s start off by enabling real-time HTML
for live updates through the socket connection.
# Enable real-time HTML output.
real-time-html true
# Set output HTML path.
output /srv/http/goaccess/index.html
The backend web server is nginx
so enable the combined log format and set the access log path.
# Set log format.
log-format COMBINED
# Specify the path to the input log file.
log-file /var/log/nginx/access.log
Exclude localhost
so that GoAccess ignores counting internal requests as unique visitors. We can exclude multiple public IPv4
and IPv6
addresses here as well.
# Exclude an IPv4 or IPv6 address from being counted.
exclude-ip 127.0.0.1
exclude-ip xx.xx.xx.xx
Ignore counting crawlers. This should make the unique visitors count more accurate.
# Ignore crawlers from being counted.
ignore-crawlers true
You can further refine the output by adding more crawlers to ignore. This can be done by setting a browsers-file
An example is provided in the repository. This file must be tab delimited.
# Include an additional delimited list of browsers/crawlers/feeds etc.
browsers-file /opt/goaccess/config/browsers.list
GoAccess does not anonymize IP addresses by default so let’s do that.
# IP address anonymization
anonymize-ip true
By default GoAccess does not add client errors to the unique visitors count.
# Do not add 4xx client errors to the unique visitors count.
4xx-to-unique-count false
We can also remove specific HTTP
response codes from the visitor’s count too.
# Ignore parsing and displaying one or multiple status code(s)
ignore-status 301
ignore-status 302
Referrer spam can screw with the logs real good.
There is a hard limit of 64
ignored entries. To accommodate larger lists adjust settings.h
accordingly.
visitors from a list of
We can use a systemd
or cron
timer to refresh the list periodically.
by using ignore-referer
. My personal preference is to use
Matomo’s list.
# Ignore referrer from being counted.
ignore-referer www.example.com
Sort the most important panels by visitor count in descending order.
# Sort panel on initial load by visitors.
sort-panel REQUESTS,BY_VISITORS,DESC
sort-panel REQUESTS_STATIC,BY_VISITORS,DESC
sort-panel HOSTS,BY_VISITORS,DESC
sort-panel OS,BY_VISITORS,DESC
sort-panel BROWSERS,BY_VISITORS,DESC
sort-panel REFERRERS,BY_VISITORS,DESC
sort-panel REFERRING_SITES,BY_VISITORS,DESC
sort-panel GEO_LOCATION,BY_VISITORS,DESC
Change the theme and table specifications on the page by using a string of json
The theme is set to dark blue, with 20 results per graph. The visitors and visit time graphs are set to use bar charts instead of line charts.
# Set default HTML preferences.
html-prefs {"theme":"darkBlue","perPage":20,"visitors":{"plot":{"chartType":"bar"}},"visit_time":{"plot":{"chartType":"bar"}}}
Make sure that all static files — including files with a query string are categorized under the static files table.
# Include static files that contain a query string in the static files
all-static-files true
Show statistics based on country by loading in a GeoIP database.
# Set GeoIP database path.
geoip-database /usr/share/GeoIP/GeoLiteCity.dat
Everything runs in memory. Enable Tokyo
Check the configure options to compile with Tokyo Cabinet Support. For example — ./configure --enable-utf8 --enable-geoip=legacy --enable-tcb=btree --disable-zlib --disable-bzip
and store the results as a database on the file system.
### GoAccess version 1.3
# Persist parsed data into disk.
keep-db-files true
# Load previously stored data from disk.
load-from-disk true
# Path where the on-disk database files are stored.
db-path /tmp/
The newer version uses a different syntax and will not require setting up specific configure options.
### GoAccess version 1.4
# Persist parsed data into disk.
persist true
# Load previously stored data from disk.
restore true
# Path where the on-disk database files are stored.
db-path /tmp
Now stream the logs into GoAccess using our souped-up config. GoAccess will process the rotated logs of nginx
in addition to the current access log stipulated in goaccess.conf
.
zcat -f /var/log/nginx/access.log-* | goaccess --config-file=/opt/goaccess/config/goaccess.conf -
Updated 1 May 2020