Paul Sutton

WordCount

Bash scripting 12 – Files and Grep

Rather than make a video for this, I decided to just make a blog post so that I could include downloadable or at least copy / pasteable components.

Grep stands for GNU Regular Expression Parser, In essence and among other things, it can read (or parse) a file and report on contents, or in the case of this, find a specific string of text.

lorem Ipsum, is standard in the printing industry as it is dummy text used to fill on a page. I have pasted below an example and it just happens to explain further.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.~

If you copy and paste the above and save in a text file called lorem.txt we can do some neat stuff with grep and a few other commands. I am not an expert at this, so this is some of what I picked up while researching this post.

Firstly we can find a specific word (or string) in the text with

cat lorem.txt | grep the

we can also do

grep the lorem.txt

Both will search the file lorem.txt, display the file contents and highlight the word 'the' from the text.

This is great, so what else can we do

In a short file, the number of times a word may appear may be less than 5 or 10. So we could just count manually. As discussed in a previous video, the command wc or word count, does what it says and counts the number of words.

So I found the following

cat lorem.txt | grep -o the | wc -l

Which gives the output as 6 which is how many times the word 'the' appears in the text.

As with other commands, there is a man page so

man grep

and

man wc

Should provide useful information, you can also search for information with duckduckgo and there are numberous tutorials on line.

Hope this is useful

Chat

I am on the Devon and Cornwall Linux user group mailing list and also their matrix channel as zleap, it is better to ask there, that way others can answer too.

Tags

#Bash,#Bashscripting,#Files,#TextSearch,#StringSearch,#Grep,#wc,#WordCount


Mastodon ShellLabs Join Mastodon
AI statement : Consent is NOT granted to use the content of this blog for the purposes of AI training or similar activity. Consent CANNOT be assumed, it has to be granted.

Donate using Liberapay