The inverse of .join [Python]

Prev: Efficiency/style issues of import <module> vs. from <module> import <name>, ...
Next: Running a program from another program.

From: Steven D'Aprano on 17 Jun 2010 23:01

On Thu, 17 Jun 2010 17:45:41 +0000, Neil Cerutti wrote:

> What's the best way to do the inverse operation of the .join function?

str.join is a many-to-one function, and so it doesn't have an inverse.
You can't always get the input back unchanged:

>>> L = ["a", "b", "c|d", "e"]
>>> s = '|'.join(L)
>>> s
'a|b|c|d|e'
>>> s.split('|')
['a', 'b', 'c', 'd', 'e']

There's no general way of getting around this -- if split() takes input
"a|b|c", there is no way even in principle for it to know which of these
operations it should reverse:

"|".join(["a", "b", "c"])
"|".join(["a|b", "c"])
"|".join(["a", "b|c"])
"|".join(["a|b|c"])
"b".join(["a|", "|c"])

The behaviour with the empty string is just a special case of this.

--
Steven

From: Steven D'Aprano on 18 Jun 2010 01:00

On Thu, 17 Jun 2010 20:44:41 +0100, MRAB wrote:

> Should .split grow an addition keyword argument to specify the desired
> behaviour?

Please no.

> (Although it's simple enough to define your own function.)

Exactly.

--
Steven

From: Steven D'Aprano on 18 Jun 2010 01:02

On Thu, 17 Jun 2010 20:03:42 +0000, Neil Cerutti wrote:

> I'm currently using the following without problems, while reading a data
> file. One of the fields is a comma separated list, and may be empty.
>
> f = rec['codes']
> if f == "":
> f = []
> else:
> f = f.split(",")
>
> I just wondered if something smoother was available.

Seems pretty smooth to me. What's wrong with it? I assume you've put it
into a function for ease of use and reduction of code duplication.

You could also use the ternary operator, in which case it's a mere two-
liner and short enough to inline wherever you need it:

f = rec['codes']
f = f.split(",") if f else []

--
Steven

From: Neil Cerutti on 18 Jun 2010 08:54

On 2010-06-18, Steven D'Aprano <steve(a)REMOVE-THIS-cybersource.com.au> wrote:
> On Thu, 17 Jun 2010 20:03:42 +0000, Neil Cerutti wrote:
>> I'm currently using the following without problems, while
>> reading a data file. One of the fields is a comma separated
>> list, and may be empty.
>>
>> f = rec['codes']
>> if f == "":
>> f = []
>> else:
>> f = f.split(",")
>>
>> I just wondered if something smoother was available.
>
> Seems pretty smooth to me. What's wrong with it? I assume
> you've put it into a function for ease of use and reduction of
> code duplication.

The part that's wrong with it, and it's probably my fault, is
that I can never think of it. I had to go dig it out of my code
to remember what the special case was.

> You could also use the ternary operator, in which case it's a
> mere two- liner and short enough to inline wherever you need
> it:
>
> f = rec['codes']
> f = f.split(",") if f else []

That's pretty cool.

Thanks to everybody for their thoughts.

--
Neil Cerutti

From: Jon Clements on 18 Jun 2010 11:35

On 17 June, 21:03, Neil Cerutti <ne...(a)norwich.edu> wrote:
> On 2010-06-17, Robert Kern <robert.k...(a)gmail.com> wrote:
>
> > On 6/17/10 2:08 PM, Neil Cerutti wrote:
> >> On 2010-06-17, Ian Kelly<ian.g.ke...(a)gmail.com> wrote:
> >>> On Thu, Jun 17, 2010 at 11:45 AM, Neil Cerutti
> >>> <ne...(a)norwich.edu> wrote:
> >>>> What's the best way to do the inverse operation of the .join
> >>>> function?
>
> >>> Use the str.split method?
>
> >> split is perfect except for what happens with an empty string.
>
> > Why don't you try it and find out?
>
> I'm currently using the following without problems, while reading
> a data file. One of the fields is a comma separated list, and may
> be empty.
>
> f = rec['codes']
> if f == "":
> f = []
> else:
> f = f.split(",")
>
> I just wondered if something smoother was available.
>
> --
> Neil Cerutti

In terms of behaviour and 'safety', I'd go for:

>>> rec = { 'code1': '1,2,3', 'code2': '' }
>>> next(csv.reader([rec['code1']]))
['1', '2', '3']
>>> next(csv.reader([rec['code2']]))
[]

hth
Jon.

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: Efficiency/style issues of import <module> vs. from <module> import <name>, ...
Next: Running a program from another program.