From: Wes James on
On Wed, Feb 24, 2010 at 5:03 AM, Jonathan Fine <J.Fine(a)open.ac.uk> wrote:
> Hi
>
> Does anyone know of a collection of regular expressions that will break a
> TeX/LaTeX document into tokens?  Assume that there is no verbatim or other
> category code changes.

I'm not sure how this does it, but it might help:

http://plastex.sourceforge.net/plastex/sect0025.html

-wes