Dipesh Majumdar

Blog and Paintings

Some useful Linux Commands - 1 (helpful in getting useful information form access and error logs)

April 1, 2015

We will cover mostly Find commands in this section

  1. Many a times there will be an entry by say for example pqrs inside a file with extension say for example .xyz and you have to find all files with .xyz extension which has the entry with line pqrs - so this is what you do:
    find -type f |grep ".xyz" |xargs grep -i "pqrs"
    find -iname "*.xyz" |xargs grep -i "pqrs"
    Remember -i in above command ensures that the result is case INSENSITIVE. And it's my personal preference to always use -i while using grep because it takes out the case - upper/lower totally out of the equation
    but if we want to get rid of these kind of lines from the above result of grep ....
    grep: blah-blah: No such file or directory
    we can use below command -
    find . -type f |xargs grep -s "pattern_in_file"

    Similar results can be achieved by searching recursively -
    grep -r -i matching_pattern *
    However if you want also to exclude some unwanted matching pattern of file-names from grep output, this is the way ->
    grep -r -i matching_pattern * |grep -v -i 'dontwant_this' |grep -v -i 'dontwantthis_also'

  2. Find 2 files -
    find /usr/java/jdk1.8.0_74/ -type f -iname "US_export_policy.jar" -o -iname "local_policy.jar"
  3. Sum of log files -
    pqrs@hostname:~/scripts> ls -al --block-size=M /tmp/*2016-09-01
    -rw-r--r-- 1 pqrs pqrs_group 103M Sep 20 13:17 /tmp/access.log.2016-09-01
    -rw-r--r-- 1 pqrs pqrs_group 2M Sep 20 13:17 /tmp/audit.log.2016-09-01
    -rw-r--r-- 1 pqrs pqrs_group 70M Sep 20 13:17 /tmp/error.log.2016-09-01
    -rw-r--r-- 1 pqrs pqrs_group 2M Sep 20 13:17 /tmp/replication.log.2016-09-01
    -rw-r--r-- 1 pqrs pqrs_group 75M Sep 20 13:17 /tmp/request.log.2016-09-01
    pqrs@hostname:~/scripts> ls -al --block-size=M /tmp/*2016-09-01 |awk -F ' ' '{print $5}'
    pqrs@hostname:~/scripts> ls -al --block-size=M /tmp/*2016-09-01 |awk -F ' ' '{print $5}' |awk -F'M' '{print $1}'
    pqrs@hostname:~/scripts> ls -al --block-size=M /tmp/*2016-09-01 |awk -F ' ' '{print $5}' |awk -F'M' '{print $1}' |awk '{sum+=$1} END {print sum}'
  4. For example from the current directory we need to search for some mapthcing patter in files but don't want the search results to contain some path... how that is to be done? for example - there is very big list of find command output like the one shown below and requirement is to filter out this part - *this_path_dont_want* then shown below is the way to do it -
    find . -type f -not -path "*this_path_dont_want*" |xargs grep -i "matching_pattern" 
    Remember -i in above command ensures that the result is case INSENSITIVE as already mentioned above.

  5. Consider this case: Applicaton is supposed to be run by applicaiton-user, but  someone by mistake has run it from root , so there are files created in repository of cq5 with owner root... that's wrong. how do you find if there is any file in repository with owner root? below command gives all files with owner root  
    find . -type f -user root 
    but if you want everything - files as well as directories - you should use only find
    find . -user root
    below command gives all files whose owner is not root
    find . -type f ! -user root
    in the path /abc/xyz/pqr/ if some ownership of files need to be changed from some other user and group to applicaton user and application group then following commands can be used. here we are using xargs to directly get the chown command into effect. (saves a lot of time and effort ): 
    find /abc/xyz/pqr/ -type f ! -user appuser|xargs chown appuser:appgroup
    use this if you consider directories as well (didn't check below command - so user should execute at his own risk) :
    find /abc/xyz/pqr/ ! -user appuser |xargs chown -R appuser:appgroup
  6. How to delete files older than a fixed number of days?
    Answer: Example provided below for 10 days
    find /correct/path/of/files -type f -mtime +10 |xargs ls -ltra
    make sure the above output gives all required files in path - /correct/path/of/files and then issue below command -
    find /correct/path/of/files -type f -mtime +10 |xargs rm
  7. How to have only unique values in output of grep????? 1st you want to see what are generating 301 responses in access log from a particular path say /abc/pqr/stu/
    grep "/abc/pqr/stu/" access.log |grep 301
    the output may be such that you will find
    in 6th column - and these also are repeated and many many in number.
    requirement is to get only this string and also only unique values - so below command can be used ->
    grep "/abc/pqr/stu/" access.log |grep 301 |cut -d ' ' -f 6 |sort -u
    okay now you want to take this output in a file, so do this:
    grep "/abc/pqr/stu/" access.log |grep 301 |cut -d ' ' -f 6 |sort -u >> file_for_unique_301_response
  8. How to count number of occurenrences of a particular word or string in a file in linux?
    1st example - we want to count number of files inside path /xyz/ whose name starts with start. here search is case insensitive - for which we used iname and not name.
    find /xyz/ -iname "start*" |wc -l
    grep -c "word-or-string" file_name

  9. not exactly a find command - but just found that if you want to see the required configuration of ssh in linux: you can check it here - /etc/ssh/sshd_config
    similarly another important file is sssd.conf - which you can do a find -
    find -type f |grep sssd.conf
    or you can do a locate -
    locate sssd.conf
  10. Suppose you want to know all modified files with name *xyz* in the last 1 day - you can use formula shown below
    find / -iname "*xyz*" -mtime -1
  11. now this is going to be very useful... this will really help you in analysing and getting more meaning out of your access, error and request logs. In a certain access log file: /xyz/pqr/some_access_log_file.log, for certain urls, the http response mite be 404, now you want to get the top 20 such urls for which maximum time the 404 gets generated... you want to get the no. of times each urls generate 404 as well... you can use the below command...
    please note that: awk '{print $7}' prints the required url which occurs on the seventh column separated by space. |sort |uniq -c  is actually counting how many times each url occurs in the file - it's very much like a group by column_name in a sql query.  sort -n -k 1  sorts in descending order on column number 1. tail -20 gives the last 20 lines. now i wanted the result in reverse order so the last |sort -n -r ....may not be an efficient way of multiple grepping but nonetheless works for me.
    top twenty 404 requests ->
    cat /xyz/pqr/some_access_log_file.log |grep 404 |awk '{print $7}' |sort |uniq -c |sort -n -k 1 |tail -20 |sort -n -r
    can also be written in this way:
    grep " 404" /xyz/pqr/some_access_log_file.log |cut -d ' ' -f 7 |sort |uniq -c |sort -n -k 1 |tail -20
    you can also use a keyword - for example - errorMessage, as the separator and print the 2nd column after this separator to log useful error messages and the no. of times each of these occur in this way:
    grep -i "error" error.log|awk -F 'errorMessage' '{print $2}' |sort |uniq -c|sort -n -k 1 |tail -20
  12. Do you know how to use wildcard entries in grep command? If no, then this might help you... so read on....Want to count - at what time these useless 503, 504 (gateway timeout) and 500 (internal server error) occurs from request.log
    you can use below wildcard as shown to capture a request for example with string <- 500 which is the 500 response message... the whole out put can be awkED with PLUS and take the first occurence - which is nothing but the timestamp during which these 5** errors occur
    remember -Bn is to get n lines before the grepped pattern and -Cn is to get n lines after the grepped pattern....
    grep -B1 "<- 5[0-9][0-9]" request.log |grep POST |awk -F '+' '{print $1}' |sort
  13. and regarding how may aem authors are present -
    cut -d " " -f 3 access.log |sort -u
  14. grep search for 2 matching patterns...ok let me rephrase it - how to grep for two OR conditions
    ls -ltra |grep -Ei '(act|asis)' -rwxr-xr-x 1 root root 10416 Apr 29 2013
    -rwxr-xr-x 1 root root 10416 Apr 29 2013
    better way: grep -c "GET\|POST" request.log will give you count of GET and POST CALLS
  15. how to create symlink in linux?
    first go to directory where you need to create symlink by name
    ln -f -s /etc/httpd/modules/
  16. so i was executing this script->
    -bash-4.1$ ./
    -bash: ./ /bin/bash^M: bad interpreter: No such file or directory
    didn't have any idea why the above output was coming - google gave me some answer - it was due to linux not being able to DIGEST windows line ending. So with below command i removed the unwanted characters
    -bash-4.1$ sed -i -e 's/\r$//'
    -bash-4.1$ ./
    command ran without any issue now. so i solved the nagging error: /bin/bash^M: bad interpreter: No such file or directory

Go Back