building a hash path index? [Perl]

Prev: FAQ 4.67 How can I make my hash remember the order I put elements into it?
Next: FAQ 6.17 How do I efficiently match many regular expressions at once?

From: bugbear on 7 Jul 2010 04:48

I have a large (array) of hashes, and each hash
has several fields.

I would like to be able to group
the hashes by some of the fields,
so I thought of creating a hash so
that I find an array of selected hashes via:

$index->{field1}->{field2}->{field3}

I would like to create a method which could
be called like this:

my $hash_index = make_index($list_of_hashes, [ 'date', 'name' ])

resulting in $hash_index being a hash such that

$index->{"jul-10"}->{paul} was an array of all hashes
with the corresponding date and name.

This is easy to do with a fixed field list,
but I can't see a clear road to parameterising
it as per my example call.

It's the variable length of the index-name-array
that causes me difficulty.

BugBear

From: bugbear on 7 Jul 2010 05:05

bugbear wrote:
> I have a large (array) of hashes, and each hash
> has several fields.
>
> I would like to be able to group
> the hashes by some of the fields,
> so I thought of creating a hash so
> that I find an array of selected hashes via:
>
> $index->{field1}->{field2}->{field3}
>
> I would like to create a method which could
> be called like this:
>
> my $hash_index = make_index($list_of_hashes, [ 'date', 'name' ])
>
> resulting in $hash_index being a hash such that
>
> $index->{"jul-10"}->{paul} was an array of all hashes
> with the corresponding date and name.
>
> This is easy to do with a fixed field list,
> but I can't see a clear road to parameterising
> it as per my example call.
>
> It's the variable length of the index-name-array
> that causes me difficulty.

Here's my inelegant code; I suspect there's a MUCH
more elegant solution to be had:

sub _mk_index {
my ($dst, $hash, $fields) = @_;
if(scalar(@$fields) == 0) {
if(!defined($dst)) {
$dst = [];
}
push @$dst, $hash;
} else {
if(!defined($dst)) {
$dst = {};
}
my $key = $hash->{$fields->[0]};
my @tail = @$fields;
shift @tail;
$dst->{$key} = _mk_index($dst->{$key}, $hash, \@tail);
}
return $dst;
}

sub mk_index {
my ($list, $fields) = @_;
my $index;
foreach my $h (@$list) {
$index = _mk_index($index, $h, $fields);
}
return $index;
}

BugBear

From: sln on 8 Jul 2010 18:50

On Wed, 07 Jul 2010 10:05:54 +0100, bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:

>bugbear wrote:
>> I have a large (array) of hashes, and each hash
>> has several fields.
>>
>> I would like to be able to group
>> the hashes by some of the fields,
>> so I thought of creating a hash so
>> that I find an array of selected hashes via:
>>
>> $index->{field1}->{field2}->{field3}

Is this what you mean?
my $index = {
'field1' => {
'field2' => {
'field3' => {},
},
},
};

>>
>> I would like to create a method which could
>> be called like this:
>>
>> my $hash_index = make_index($list_of_hashes, [ 'date', 'name' ])
>>
>> resulting in $hash_index being a hash such that
>>
>> $index->{"jul-10"}->{paul} was an array of all hashes
>> with the corresponding date and name.
>>
>> This is easy to do with a fixed field list,
>> but I can't see a clear road to parameterising
>> it as per my example call.
>>
>> It's the variable length of the index-name-array
>> that causes me difficulty.
>
>Here's my inelegant code; I suspect there's a MUCH
>more elegant solution to be had:
>
>
>sub _mk_index {
> my ($dst, $hash, $fields) = @_;
> if(scalar(@$fields) == 0) {
> if(!defined($dst)) {
> $dst = [];
> }
> push @$dst, $hash;
> } else {
> if(!defined($dst)) {
> $dst = {};
> }
> my $key = $hash->{$fields->[0]};
> my @tail = @$fields;
> shift @tail;
> $dst->{$key} = _mk_index($dst->{$key}, $hash, \@tail);
> }
> return $dst;
>}
>
>sub mk_index {
> my ($list, $fields) = @_;
> my $index;
> foreach my $h (@$list) {
> $index = _mk_index($index, $h, $fields);
> }
> return $index;
>}
>
> BugBear

It would be better if you post a working example.
I imagine you call mk_index() from the main code?

-sln

From: bugbear on 9 Jul 2010 04:10

sln(a)netherlands.com wrote:
>
> It would be better if you post a working example.
> I imagine you call mk_index() from the main code?

Of course - it's a library-style utility method.

Here's my "test"

my $data = [
{
a => 10,
b => 20,
},
{
a => 10,
b => 25,
c => "thing",
},
{
a => 10,
b => 25,
c => "other thing",
},
{
a => 12,
b => 25,
},
];

my $index = mk_index($data, [ 'a', 'b'] );
print Dumper($index);
And the desired result:

$VAR1 = {
'10' => {
'25' => [
{
'c' => 'thing',
'a' => 10,
'b' => 25
},
{
'c' => 'other thing',
'a' => 10,
'b' => 25
}
],
'20' => [
{
'a' => 10,
'b' => 20
}
]
},
'12' => {
'25' => [
{
'a' => 12,
'b' => 25
}
]
}
};

What's annoying is how trivial this is for fixed-length field lists
e.g. 2:

sub mk_2_index {
my ($list, $fields) = @_;
my $index;
foreach my $h (@$list) {
push @{$index->{$h->{$fields->[0]}}->{$h->{$fields->[1]}}}, $h;
}
return $index;
}

BugBear

From: Ted Zlatanov on 9 Jul 2010 09:47

On Fri, 09 Jul 2010 09:10:20 +0100 bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:

b> What's annoying is how trivial this is for fixed-length field lists
b> e.g. 2:

b> sub mk_2_index {
b> my ($list, $fields) = @_;
b> my $index;
b> foreach my $h (@$list) {
b> push @{$index->{$h->{$fields->[0]}}->{$h->{$fields->[1]}}}, $h;
b> }
b> return $index;
b> }

That's the right direction, but by condensing and using so many
shortcuts you've robbed yourself of the chance to see the general
solution.

An alternative would have been to use Hash::Merge; construct each
entry's tree (e.g. { 10 => { 20 => { a => 10, b => 20 } } } )
individually and merge them all into one hash. But since that will be
less efficient (I think) I went with the recursive standalone version
below. It produces the results you want and will work as long as all
the entries have the keys required.

#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my $data = [
{
a => 10,
b => 20,
},
{
a => 10,
b => 25,
c => "thing",
},
{
a => 10,
b => 25,
c => "other thing",
},
{
a => 12,
b => 25,
},
];

my $index = mk_index($data, [ 'a', 'b'] );
print Dumper($index);

sub mk_index
{
my $data = shift @_;
my $fields = shift @_;

return $data unless scalar @$fields;

my @fields = @$fields;
my $field = shift @fields;

my %uniques;
foreach my $entry (@$data)
{
push @{$uniques{$entry->{$field}}}, $entry;
}

my %h;

foreach my $unique (keys %uniques)
{
$h{$unique} = mk_index($uniques{$unique}, \@fields);
}

return \%h;
}

| Next | Last
Pages: 1 2
Prev: FAQ 4.67 How can I make my hash remember the order I put elements into it?
Next: FAQ 6.17 How do I efficiently match many regular expressions at once?