View Single Post

   
  #9 (permalink)  
Old 02-19-2008, 09:20 PM
Robert Komar
 
Posts: n/a
Default Re: OT How to count occurances in a text file

buck <buck@private.mil> wrote:
>
>>grep -e '^[^\"]*\"[^\"]*$' filename
>>grep -e '^[^\"]*\"[^\"]*\"[^\"]*\"' filename

>
> Thank you. These say I have no unmatched quotes. The first returns
> nothing and the second returns lines where there are 4 " characters.
>
> Please "teach me to fish". Say in words what one or the other of the
> above lines does.
>
> What does that first ^[
>
> mean?
>
> Doesn't the ^\"
>
> right after that mean to find a line that _begins_ with a quote?
>
> What is the purpose of the first ]
>
> Perhaps a better question is "where do I find documentation for
> (extended?) regular expressions so I can understand these two lines?"
>
> Color me perplexed but greatful. Thx again
> buck


A pair of brackets denotes a range of characters to be used in the search.
Adding a circumflex "^" at the start of the range means to invert the logic,
so everything except the following range. So, "[^\"] means any character
other than \" (where the " is quoted to avoid confusing the command-line
shell). Adding the "*" immediately after means any number of the preceding
characters, including zero of them.

Outside of the brackets, "^" means the start of the line. "$" means the
end of the line.


^ [^\"]* \" [^\"]* $

start some number 1 " some number end
of of not-" of not-" of
line characters characters line


So only a single " somewhere on the line will return a match. The 3 or more
" regexp is only a little more complicated than that.

I don't know a good on-line reference for regular expressions, but I'm sure
google will turn something up. I learned about them in Sobell's book "A
Practical Guide to the Unix System", and in an earlier edition of O'Reilly's
camel book on Perl. There's a nice overview in Kernighan and Pike's book
"The Unix Programming Environment", and I seem to recall that the small
book on awk and sed programming had a section on regular expressions, as
well. They are common and confusing enough that they have been covered
in a lot of books on Unix.

Cheers,
Rob Komar
Reply With Quote