Processing stdin in blocks [Shell]

Prev: Escaping regexp meta characters
Next: wget, forms, password, cookies

From: Ed Morton on 2 Mar 2010 08:45

On 3/2/2010 7:34 AM, Ed Morton wrote:
> On 3/2/2010 3:09 AM, pk wrote:
>> Janis Papanagnou wrote:
>>
>>>> works, but I have the impression that I'm overcomplicating it.
>>>> However, I
>>>> cannot find a simpler way. Any suggestion?
>>>
>>> awk '{ print | "command" }
>>> /^END$/ { close("command") }'
>>
>> Yes, thanks (and to Bill). I was thinking of something more shell-ish
>> rather
>> than calling external commands in awk, but that'll do.
>>
>> Thank you!
>>
>
> How about something like (untested, but I know you know awk...):
>
> awk -v RS="END" -v ORS="\n" -F FS="\n" -v OFS="^L" '{$1=$1}1' file |
> while IFS= read -r block; do
> echo "$block" | tr '^L' '\n' | command
> done
>
> where the ^L is control-L or some other control character that's not in
> your input.
>
> Regards,
>
> Ed.

Actually, you could use "END" instead of control-L as the OFS since you know
there won't be any "END"s in the current records since the RS="END" is taking
care of that.

Ed.

Ed.

From: pk on 2 Mar 2010 09:26

Ed Morton wrote:

>> How about something like (untested, but I know you know awk...):
>>
>> awk -v RS="END" -v ORS="\n" -F FS="\n" -v OFS="^L" '{$1=$1}1' file |
>> while IFS= read -r block; do
>> echo "$block" | tr '^L' '\n' | command
>> done
>>
>> where the ^L is control-L or some other control character that's not in
>> your input.
>>
>> Regards,
>>
>> Ed.
>
> Actually, you could use "END" instead of control-L as the OFS since you
> know there won't be any "END"s in the current records since the RS="END"
> is taking care of that.

Yes, that's a clever solution. I prefer the ^L as separator however (or any
other single character), as it's easier to turn into a "\n" with tr.

Thanks!

From: Janis on 2 Mar 2010 09:57

On 2 Mrz., 15:26, pk <p...(a)pk.invalid> wrote:
> Ed Morton wrote:
> >> How about something like (untested, but I know you know awk...):
>
> >> awk -v RS="END" -v ORS="\n" -F FS="\n" -v OFS="^L" '{$1=$1}1' file |
> >> while IFS= read -r block; do
> >> echo "$block" | tr '^L' '\n' | command
> >> done
>
> >> where the ^L is control-L or some other control character that's not in
> >> your input.
>
> >> Regards,
>
> >> Ed.
>
> > Actually, you could use "END" instead of control-L as the OFS since you
> > know there won't be any "END"s in the current records since the RS="END"
> > is taking care of that.
>
> Yes, that's a clever solution. I prefer the ^L as separator however (or any
> other single character), as it's easier to turn into a "\n" with tr.

In such situations I sometimes just take SUBSEP for convenience; being
predefined and a control character.
(If you do shell post-processing you would of course have to know what
SUBSEP actually is.)

>
> Thanks

BTW, I wonder why you said upthread

>>> "I was thinking of something more shell-ish [...]"

and prefer shell loops and in this case quite bulky shell code.

Janis

From: pk on 2 Mar 2010 10:10

Janis wrote:

> BTW, I wonder why you said upthread
>
>>>> "I was thinking of something more shell-ish [...]"
>
> and prefer shell loops and in this case quite bulky shell code.

Don't get me wrong: awk is perfectly fine (no, I don't want to start the
debate "shell loops vs. dedicated tools" again).

But at the point I described in my first post I was just feeling like I was
overlooking some more "natural" shell way (ie, involving pipelines, file
descriptors, IFS or other trickery) to complete the task.

Efficiency is not a concern here as the input is just a few hundred lines in
the worst case, and it's semi-throwaway code anyway - ie will be used for a
limited time only as part of a bigger data migration task.

But it turns out it was just a wrong feeling.

Thanks again.

From: Ed Morton on 2 Mar 2010 10:10

On Mar 2, 8:57 am, Janis <janis_papanag...(a)hotmail.com> wrote:
> On 2 Mrz., 15:26, pk <p...(a)pk.invalid> wrote:
>
>
>
>
>
> > Ed Morton wrote:
> > >> How about something like (untested, but I know you know awk...):
>
> > >> awk -v RS="END" -v ORS="\n" -F FS="\n" -v OFS="^L" '{$1=$1}1' file |
> > >> while IFS= read -r block; do
> > >> echo "$block" | tr '^L' '\n' | command
> > >> done
>
> > >> where the ^L is control-L or some other control character that's not in
> > >> your input.
>
> > >> Regards,
>
> > >> Ed.
>
> > > Actually, you could use "END" instead of control-L as the OFS since you
> > > know there won't be any "END"s in the current records since the RS="END"
> > > is taking care of that.
>
> > Yes, that's a clever solution. I prefer the ^L as separator however (or any
> > other single character), as it's easier to turn into a "\n" with tr.
>
> In such situations I sometimes just take SUBSEP for convenience; being
> predefined and a control character.
> (If you do shell post-processing you would of course have to know what
> SUBSEP actually is.)

Yeah, I thought about that, but I don't like to use SUBSEP outside of
awk for the reason you stated so I'd only use that if the solution was
going to be:

awk -v RS="END" -v ORS="\n" -F FS="\n" 'BEGIN{OFS=SUBSEP}{$1=$1}1'
file |
while IFS= read -r block; do
awk -v block="$block" 'BEGIN{gsub(SUBSEP,"\n",block); print
block; exit}' | command
done

which just seemed unnecessarily complicated compared to the tr
solution and you're as likely to be able to use any other absent
control character as you are to be able to use the SUBSEP one, and you
might have to know what it is anyway to be sure it can't appear in
your input.

Ed.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Escaping regexp meta characters
Next: wget, forms, password, cookies