emacs lisp tutorial: Count Words and Chars [Lisp]

Prev: Comprehending error messages
Next: (SETF FOO) syntax. [Was Re: Comprehending error messages]

From: Xah Lee on 24 Mar 2010 19:25

Emacs Lisp: Count Words, Count Chars, Count Region

Xah Lee, 2010-03-23

A little elisp tip. Here's a short elisp i have been using since about
2006. It reports the number of words and chars in a text selection.

(defun count-region (beginning end)
"Print number of words and chars in region."
(interactive "r")
(message "Counting ...")
(save-excursion
(let (wCnt charCnt)
(setq wCnt 0)
(setq charCnt (- end beginning))
(goto-char beginning)
(while (and (< (point) end)
(re-search-forward "\\w+\\W*" end t))
(setq wCnt (1+ wCnt)))

(message "Words: %d. Chars: %d." wCnt charCnt)
)))

This code is largely from Introduction to Programming in Emacs Lisp by
Robert J Chassell, when i was reading it sometimes in 2005. That
tutorial is for people who never programed. It was quite frustrating
to read, because for every sentence you are learning about emacs lisp,
you have to scan some 20 pages of things you already know about
programing, such as what's variables, assignment, syntax, etc. In the
end, i didn't really read that book. This function is about the only
thing i got out of it.

--------------------------
How It Works

Now let's explain how this function works.

The function has this skeleton:

(defun count-region (pos1 pos2)
"..."
(interactive "r")
; ...
)

This means, when you call the function with M-x, the region beginning
as a integer will be fed to your variable âpos1â, and region's end
will be fed to the argument âpos2â, automatically. This is caused by
the line â(interactive "r")â.

The next part of the function is this:

(save-excursion
(let (var1 var2 ...))
(setq var1 ...)
(setq var2 ...)
...
)

The âletâ is lisp's way to have a block of local variables. We are
going to be doing some cursor moving and searching. However, when the
function count-region ended, the cursor should return to whatever its
original position when user called our function. This is what the
âsave-excursionâ does. Quote from its inline doc:

(save-excursion &rest body)

Save point, mark, and current buffer; execute body; restore those
things.
...

Now, to count the char, it is just the length of the beginning and
ending position of the region. So, it is simple, like this:

(setq charCnt (- end beginning))

Now, we move the char to beginning of region, like this: â(goto-char
beginning)â. The next part count the words, like this:

(while (and (< (point) end)
(re-search-forward "\\w+\\W*" end t))
(setq wCnt (1+ wCnt)))

The â(< (point) end)â is for checking that the cursor havn't reached
the end of region yet.

The â(re-search-forward "\\w+\\W*" end t)â means, keep moving the
cursor forward by regex search a word pattern. The âendâ argument
there means don't search beyond the end of region. And the âtâ there
means don't report error if not found.

search-forward and re-search-forward are very important functions in
elisp. I use them almost in all of my text processing script. If you
are not familiar with them, lookup their inline doc. (use describe-
function)

So, the above âwhileâ blog, basically means keep moving the cursor and
count words, until the cursor is at the end of region.

Finally, the program just print out the result, by:

(message "Words: %d. Chars: %d." wCnt charCnt)

Exercise

Try to write a version so that, when there is a text selection, count
word and char in text selection, but if there's no text selection,
just count the current line. You might want to read Emacs Lisp Idioms
to refresh your memory about emacs's tech meaning of âregionâ, âactive
regionâ, transient-mark-mode.

--------------------------
original url:

â¢ Emacs Lisp: Count Words, Count Chars, Count Region
http://xahlee.org/emacs/elisp_count-region.html

Xah
â http://xahlee.org/

â

From: Awhan Patnaik on 8 Apr 2010 06:39

many thanks! different ppl learn differently and i happen to learn
most effectively when somebody writes a code and explain what the
individual bits and parts do. so thanks for the tut.

From: Mirko on 8 Apr 2010 08:38

On Mar 24, 7:25Â pm, Xah Lee <xah...(a)gmail.com> wrote:
> Emacs Lisp: Count Words, Count Chars, Count Region
>
> Xah Lee, 2010-03-23
>
> A little elisp tip. Here's a short elisp i have been using since about
> 2006. It reports the number of words and chars in a text selection.
>
> (defun count-region (beginning end)
> Â "Print number of words and chars in region."
> Â (interactive "r")
> Â (message "Counting ...")
> Â (save-excursion
> Â Â (let (wCnt charCnt)
> Â Â Â (setq wCnt 0)
> Â Â Â (setq charCnt (- end beginning))
> Â Â Â (goto-char beginning)
> Â Â Â (while (and (< (point) end)
> Â Â Â Â Â Â Â Â Â (re-search-forward "\\w+\\W*" end t))
> Â Â Â Â (setq wCnt (1+ wCnt)))
>
> Â Â Â (message "Words: %d. Chars: %d." wCnt charCnt)
> Â Â Â )))
>
> This code is largely from Introduction to Programming in Emacs Lisp by
> Robert J Chassell, when i was reading it sometimes in 2005. That
> tutorial is for people who never programed. It was quite frustrating
> to read, because for every sentence you are learning about emacs lisp,
> you have to scan some 20 pages of things you already know about
> programing, such as what's variables, assignment, syntax, etc. In the
> end, i didn't really read that book. This function is about the only
> thing i got out of it.
>
> --------------------------
> How It Works
>
> Now let's explain how this function works.
>
> The function has this skeleton:
>
> (defun count-region (pos1 pos2)
> Â "..."
> Â (interactive "r")
> Â ; ...
> Â )
>
> This means, when you call the function with M-x, the region beginning
> as a integer will be fed to your variable âpos1â, and region's end
> will be fed to the argument âpos2â, automatically. This is caused by
> the line â(interactive "r")â.
>
> The next part of the function is this:
>
> (save-excursion
> Â (let (var1 var2 ...))
> Â (setq var1 ...)
> Â (setq var2 ...)
> Â ...
> )
>
> The âletâ is lisp's way to have a block of local variables. We are
> going to be doing some cursor moving and searching. However, when the
> function count-region ended, the cursor should return to whatever its
> original position when user called our function. This is what the
> âsave-excursionâ does. Quote from its inline doc:
>
> Â Â (save-excursion &rest body)
>
> Â Â Save point, mark, and current buffer; execute body; restore those
> Â Â things.
> Â Â ...
>
> Now, to count the char, it is just the length of the beginning and
> ending position of the region. So, it is simple, like this:
>
> (setq charCnt (- end beginning))
>
> Now, we move the char to beginning of region, like this: â(goto-char
> beginning)â. The next part count the words, like this:
>
> (while (and (< (point) end)
> Â Â Â Â Â Â Â Â Â (re-search-forward "\\w+\\W*" end t))
> Â Â Â Â (setq wCnt (1+ wCnt)))
>
> The â(< (point) end)â is for checking that the cursor havn't reached
> the end of region yet.
>
> The â(re-search-forward "\\w+\\W*" end t)â means, keep moving the
> cursor forward by regex search a word pattern. The âendâ argument
> there means don't search beyond the end of region. And the âtâ there
> means don't report error if not found.
>
> search-forward and re-search-forward are very important functions in
> elisp. I use them almost in all of my text processing script. If you
> are not familiar with them, lookup their inline doc. (use describe-
> function)
>
> So, the above âwhileâ blog, basically means keep moving the cursor and
> count words, until the cursor is at the end of region.
>
> Finally, the program just print out the result, by:
>
> (message "Words: %d. Chars: %d." wCnt charCnt)
>
> Exercise
>
> Try to write a version so that, when there is a text selection, count
> word and char in text selection, but if there's no text selection,
> just count the current line. You might want to read Emacs Lisp Idioms
> to refresh your memory about emacs's tech meaning of âregionâ, âactive
> regionâ, transient-mark-mode.
>
> --------------------------
> original url:
>
> â¢ Emacs Lisp: Count Words, Count Chars, Count Region
> Â http://xahlee.org/emacs/elisp_count-region.html
>
> Â Xah
> âhttp://xahlee.org/
>
> â

Thanks for the post, Xah.

Since I don't do much elisp programming, I am curious why you did not
initialize wCnt and charCnt with the `let' statement

(let ((wCnt 0)
(charCnt (- end beginning)))

instead of

(let (wCnt charCnt)
(setq wCnt 0)
(setq charCnt (- end beginning))

and why not use incf

(incf wCnt) instead of (setq wCnt (1+ wCnt)) ?

By way of statistics, the these two modifications reduce the word
count from 54 to 48 and character count from 442 to 392 :-)

Mirko

From: Xah Lee on 8 Apr 2010 10:02

On Apr 8, 5:38Â am, Mirko <mirko.vuko...(a)gmail.com> wrote:
> Since I don't do much elisp programming, I am curious why you did not
> initialize wCnt and charCnt with the `let' statement
>
> (let ((wCnt 0)
> Â Â Â (charCnt (- end beginning)))
>
> instead of
>
> (let (wCnt charCnt)
> Â Â (setq wCnt 0)
> Â Â (setq charCnt (- end beginning))

no serious reason. The latter form is just easier to understand for
beginners.
I tend to always use the latter myself, unless all my vars are
constants.

> and why not use incf
>
> (incf wCnt) instead of (setq wCnt (1+ wCnt)) ?

incf is in Common Lisp package. The CL package use has some
controversy among GNU emacs developers, and is a bit complex to
understand.
I tend to stick with pure emacs lisp myself whenever possible for now.

> By way of statistics, the these two modifications reduce the word
> count from 54 to 48 and character count from 442 to 392 :-)

:) thanks for comment and also Awhan Patnaik.

â Xah Lee â

|
Pages: 1
Prev: Comprehending error messages
Next: (SETF FOO) syntax. [Was Re: Comprehending error messages]