I often have a need to get some anayltics out of the apache or nginx logs, like where the most 403’s whateveer other access code came from, to which IPs, which user agents or so. So here’s a cheatsheet on how to do it for several scenarios.
I’ll be focusing on Apache log, but it’s also applicable to Nginx or any other web server log, having in mind that you know in which column, or after which separator does the parameter of interest appear.
I’ll be refering to the access_log in the whole text, but you should replace it with the real name and path (if needed) of the access log you want to perform this on.
So, here are a few tricks:

Count the number of 403’s in the Apache log:
grep -c ' 403 ' access_log

List the 403’s in the apache log, sorted by IP address:
grep ' 403 ' access_log |awk '{ print $1 }' |uniq -c |sort -nr

List the 403’s in the apache log, sorted by the AgentID string (you an also pipe to |head -n 10 to list the top 10):
grep ' 403 ' ssl_access_log-202009*|awk -F\" '{print $6}' |sort|uniq -c|sort -nr

List the 403’s in the apache log, sorted by the AgentID string filtered for GET methods only (you an also pipe to |head -n 10 to list the top 10):
grep ' 403 ' access_log |awk -F\" '($2 ~ "^GET /"){print $6}' |sort|uniq -c|sort -nr

So, the upper will check the access_log’s and the string before the 2nd filed separator (which I chose to be ” – look at your access log) and search for a string that beings with “GET /” (this can be tricky, you could be searching for POST or other HTTP methods too), and the print the text before the 6th separator, that is in my case the Agent ID string.

Get a top list of requests per hour from a log file:
cat access_og | cut -d[ -f2 | cut -d] -f1 | awk -F: '{print $2":00"}' | sort -n | uniq -c

One thought on “Apache analytics from the command line

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.