Tweaking GoAccess for Analytics

GoAccess is an open source server–side web log analyzer. Server side means that it will process web server logs and compile the results into a real–time graphical view. It works with many web log Nearly all log formats such as Apache, CloudFront, and Nginx according to GoAccess’ website.
By default goaccess
will provide a
A
complete graphical overview can allow you to spot patterns or quirks in your
stack.
picture of the traffic to your server. This includes
crawlers, bots, and various HTTP
requests. We can make some changes to
goaccess
’ configuration to make it behave a bit more like a
Check out
Matomo
and
Fathom.
web analytic
View this server’s
goaccess
analytics page.
“I believe tracking visitors at the client level deflates the actual number of visitors. On the other hand, server–side tracking gives you a more accurate number at the cost of not knowing for sure if the client is a human behind a browser.”
GoAccess allows command line flags and shell piping, but we’ll do most of the
work from a central goaccess.conf
The
default configuration file can be viewed
at the goaccess
code repository.
Let’s start off by enabling
real–time HTML
for live updates through the socket connection.
# Enable real-time HTML output.
real-time-html true
# Set output HTML path.
output /srv/http/goaccess/index.html
The backend web server is nginx
so enable the combined log format and set the
access log path.
# Set log format.
log-format COMBINED
# Specify the path to the input log file.
log-file /var/log/nginx/access.log
Exclude localhost
so that goaccess
ignores counting internal requests as
unique visitors. We can exclude multiple public IPv4
and IPv6
addresses here
as well.
# Exclude an IPv4 or IPv6 address from being counted.
exclude-ip 127.0.0.1
exclude-ip xx.xx.xx.xx
Ignore counting crawlers. This should make the unique visitors count more accurate.
# Ignore crawlers from being counted.
ignore-crawlers true
You can further refine the output by adding more crawlers to ignore. This can be
done by setting a browsers-file
An
example
is provided in the repository. This file must be tab
delimited.
# Include an additional delimited list of browsers/crawlers/feeds etc.
browsers-file /opt/goaccess/config/browsers.list
Let’s enable IP
address anonymization. In future versions of goaccess
you’ll
be able set the
level of IP
address anonymization
with the command line flag --anonymize-level
and the configuration option anonymize-level
.
# IP address anonymization
anonymize-ip true
# Pedantic IP address anonymization
anonymize-level 3
By default goaccess
does not add client errors to the unique visitors count.
# Do not add 4xx client errors to the unique visitors count.
4xx-to-unique-count false
We can also remove specific HTTP
response codes from the visitor’s count too.
# Ignore parsing and displaying one or multiple status code(s)
ignore-status 429
Referrer spam inflates and skews
the log data.
There is a
hard limit
of 64
ignored entries. To accommodate larger lists adjust settings.h
accordingly.
visitors from a list of
We can use a systemd
or cron
timer to refresh the list periodically.
by using
ignore-referer
. My personal preference is to use
Matomo’s list.
# Ignore referrer from being counted.
ignore-referer www.example.com
Sort the most important panels by visitor count, data, and bandwidth in descending order.
# Sort panels on initial load by visitors, data, and bandwidth.
sort-panel BROWSERS,BY_VISITORS,DESC
sort-panel CACHE_STATUS,BY_VISITORS,DESC
sort-panel GEO_LOCATION,BY_VISITORS,DESC
sort-panel HOSTS,BY_VISITORS,DESC
sort-panel KEYPHRASES,BY_VISITORS,DESC
sort-panel MIME_TYPE,BY_VISITORS,DESC
sort-panel NOT_FOUND,BY_BW,DESC
sort-panel OS,BY_VISITORS,DESC
sort-panel REFERRERS,BY_VISITORS,DESC
sort-panel REFERRING_SITES,BY_VISITORS,DESC
sort-panel REMOTE_USER,BY_VISITORS,DESC
sort-panel REQUESTS,BY_VISITORS,DESC
sort-panel REQUESTS_STATIC,BY_BW,DESC
sort-panel STATUS_CODES,BY_VISITORS,DESC
sort-panel TLS_TYPE,BY_VISITORS,DESC
sort-panel VIRTUAL_HOSTS,BY_VISITORS,DESC
sort-panel VISITORS,BY_DATA,DESC
sort-panel VISIT_TIMES,BY_DATA,DESC
Change the theme and table specifications on the page by using a string of
json
The theme is set to dark
blue, with 20
results per graph. The visitors and visit time graphs are set to
use bar charts instead of line charts.
# Set default HTML preferences.
html-prefs {"theme":"darkBlue","perPage":20,"visitors":{"plot":{"chartType":"bar"}},"visit_time":{"plot":{"chartType":"bar"}}}
Make sure that all static files — including files with a query string are categorized under the static files table.
# Include static files that contain a query string in the static files
all-static-files true
Show statistics based on country by loading in a
GeoIP
database. You can
install a database
from your Linux distribution of choice.
# Set GeoIP database path.
geoip-database /usr/share/GeoIP/GeoLiteCity.dat
Everything runs in memory.
Check the
configure options to compile with Tokyo Cabinet
Support. For example —
./configure --enable-utf8 --enable-geoip=legacy --enable-tcb=btree --disable-zlib --disable-bzip
Tokyo Cabinet and store the results as a
database on the file system.
### GoAccess version <= 1.3
# Persist parsed data into disk.
keep-db-files true
# Load previously stored data from disk.
load-from-disk true
# Path where the on-disk database files are stored.
db-path /tmp/
Newer versions use a different syntax and will not require setting up specific configure options.
### GoAccess version >= 1.4
# Persist parsed data into disk.
persist true
# Load previously stored data from disk.
restore true
# Path where the on-disk database files are stored.
db-path /tmp
Now stream the logs into goaccess
using our souped–up config. GoAccess will
process the rotated logs of nginx
in addition to the current access log
stipulated in goaccess.conf
.
zcat --force /var/log/nginx/access.log-* | goaccess --config-file=/opt/goaccess/config/goaccess.conf -