Frequent Patter Mining [Matlab]

Prev: Matlab/C#
Next: Loading a 3D image using DirectX file (.x file)

From: singlepoint on 27 Feb 2010 08:34

Hey Guys,
I have a to extract the frequent item sets using matlab. Let me give a
flavor of the problem. I have data in the following format

transaction_id, item_id
example,
1,1
2,1
3,1
4,1
it means that item_id 1 was purchased in transactions 1,2,3 and 4. I
have to maintain a list of each item_id and transaction_id.
like
for item_id = 1
transaction_id = 1,2,3,4 (all those in which this item was purchased)
similarly for other item_ids as well. I believe there are some set
operators in matlab which can help with this kind of task. This is
basically an implementation of vertical data format (frequent pattern
mining algorithm) in matlab. Any ideas guys?

From: us on 27 Feb 2010 08:50

singlepoint <singlepoint(a)gmail.com> wrote in message <2bdc726d-8951-478a-8bf8-44ee8748f58d(a)u15g2000prd.googlegroups.com>...
> Hey Guys,
> I have a to extract the frequent item sets using matlab. Let me give a
> flavor of the problem. I have data in the following format
>
> transaction_id, item_id
> example,
> 1,1
> 2,1
> 3,1
> 4,1
> it means that item_id 1 was purchased in transactions 1,2,3 and 4. I
> have to maintain a list of each item_id and transaction_id.
> like
> for item_id = 1
> transaction_id = 1,2,3,4 (all those in which this item was purchased)
> similarly for other item_ids as well. I believe there are some set
> operators in matlab which can help with this kind of task. This is
> basically an implementation of vertical data format (frequent pattern
> mining algorithm) in matlab. Any ideas guys?

it is not clear...
you have to learn how to post in this NG (CSSM)...
rather than using a lot of descriptive words, come up with a parsimonious(!) set of
input data
and
output data
you'd like to see...
look through CSSM and see how others post...

us

From: ImageAnalyst on 27 Feb 2010 09:19

singlepoint:
Why not just go through in a for loop? This demo I wrote will even
handle missing itemIDs and missing transaction numbers. It's
intuitive and easy to understand (although long because I expanded it
to be explicit, added abundant comments, and had it print to the
command window.) If you wait long enough, I'm sure someone can
provide a one or two liner (though it may be more cryptic and harder
to understand - or maybe not). Anyway, this should be fast unless you
have millions of records.

Be sure to join any lines split apart by the newsreader!!!

% Startup code.
clc; % Clear command window.
clear all; % Get rid of variables from prior run of this m-file.
workspace; % Show panel with all the variables.

% Generate some sample data
% with missing itemIDs and missing transaction numbers.
transaction_Record = [...
1,1;...
2,1;...
3,1;...
4,1;...
22,1;...
33,1;...
24, 3;...
12, 1;...
16, 4;...
11, 15;...
8, 2;...
7, 15;...
17, 23;...
20, 21]
transactions = transaction_Record(:,1);
itemIDs = transaction_Record(:, 2);
% Loop through finding all transaction #'s
% for each item ID.
maxItemID = max(itemIDs)
for item = 1 : maxItemID
indexesForItem = find(itemIDs == item);
transactionsForItem = transactions(indexesForItem);
% Check to see if there is a transaction for this item.
if ~isempty(transactionsForItem)
% There is a transaction for this item.
% Log it.
caTransactionsForItem{item} = transactionsForItem;
% Print transactions to the command window.
message1 = sprintf('\nItem %d occurred in transaction(s):', item);
message2 = sprintf('%d, ', transactionsForItem);
fprintf(1, [message1, message2]);
else
% There is NO transaction for this item.
% State that in the command window.
fprintf(1, '\nNo transactions for item %d', item);
end
end

The results:
Item 1 occurred in transaction(s):1, 2, 3, 4, 22, 33, 12,
Item 2 occurred in transaction(s):8,
Item 3 occurred in transaction(s):24,
Item 4 occurred in transaction(s):16,
No transactions for item 5
No transactions for item 6
No transactions for item 7
No transactions for item 8
No transactions for item 9
No transactions for item 10
No transactions for item 11
No transactions for item 12
No transactions for item 13
No transactions for item 14
Item 15 occurred in transaction(s):11, 7,
No transactions for item 16
No transactions for item 17
No transactions for item 18
No transactions for item 19
No transactions for item 20
Item 21 occurred in transaction(s):20,
No transactions for item 22
Item 23 occurred in transaction(s):17,

From: us on 27 Feb 2010 10:01

ImageAnalyst <imageanalyst(a)mailinator.com> wrote in message <3edbc0b2-e843-4358-8f20-1bf8cdd1c0b6(a)m37g2000yqf.googlegroups.com>...
> singlepoint:
> Why not just go through in a for loop? This demo I wrote will even
> handle missing itemIDs and missing transaction numbers. It's
> intuitive and easy to understand (although long because I expanded it
> to be explicit, added abundant comments, and had it print to the
> command window.) If you wait long enough, I'm sure someone can
> provide a one or two liner (though it may be more cryptic and harder
> to understand - or maybe not). Anyway, this should be fast unless you
> have millions of records.
>
> Be sure to join any lines split apart by the newsreader!!!
>
> % Startup code.
> clc; % Clear command window.
> clear all; % Get rid of variables from prior run of this m-file.
> workspace; % Show panel with all the variables.
>
> % Generate some sample data
> % with missing itemIDs and missing transaction numbers.
> transaction_Record = [...
> 1,1;...
> 2,1;...
> 3,1;...
> 4,1;...
> 22,1;...
> 33,1;...
> 24, 3;...
> 12, 1;...
> 16, 4;...
> 11, 15;...
> 8, 2;...
> 7, 15;...
> 17, 23;...
> 20, 21]
> transactions = transaction_Record(:,1);
> itemIDs = transaction_Record(:, 2);
> % Loop through finding all transaction #'s
> % for each item ID.
> maxItemID = max(itemIDs)
> for item = 1 : maxItemID
> indexesForItem = find(itemIDs == item);
> transactionsForItem = transactions(indexesForItem);
> % Check to see if there is a transaction for this item.
> if ~isempty(transactionsForItem)
> % There is a transaction for this item.
> % Log it.
> caTransactionsForItem{item} = transactionsForItem;
> % Print transactions to the command window.
> message1 = sprintf('\nItem %d occurred in transaction(s):', item);
> message2 = sprintf('%d, ', transactionsForItem);
> fprintf(1, [message1, message2]);
> else
> % There is NO transaction for this item.
> % State that in the command window.
> fprintf(1, '\nNo transactions for item %d', item);
> end
> end
>
> The results:
> Item 1 occurred in transaction(s):1, 2, 3, 4, 22, 33, 12,
> Item 2 occurred in transaction(s):8,
> Item 3 occurred in transaction(s):24,
> Item 4 occurred in transaction(s):16,
> No transactions for item 5
> No transactions for item 6
> No transactions for item 7
> No transactions for item 8
> No transactions for item 9
> No transactions for item 10
> No transactions for item 11
> No transactions for item 12
> No transactions for item 13
> No transactions for item 14
> Item 15 occurred in transaction(s):11, 7,
> No transactions for item 16
> No transactions for item 17
> No transactions for item 18
> No transactions for item 19
> No transactions for item 20
> Item 21 occurred in transaction(s):20,
> No transactions for item 22
> Item 23 occurred in transaction(s):17,

one of the other solutions

t=[
1 2 3 4 22 33 24 12 16 11 8 7 17 20 % <- transaction
1 1 1 1 1 1 3 1 4 15 2 15 23 21 % <- item
].';
tu=unique(t(:,2));
ar=arrayfun(@(x) t(ismember(t(:,2),x),1).',tu,'uni',false);
% look at transactions for item #1
ar{1}
% ans = 1 2 3 4 22 33 12

% to beautify output, only
am=max(cellfun(@numel,ar));
ar=cellfun(@(x,y) [x,y,zeros(1,am-numel(y+1))],num2cell(tu),ar,'uni',false);
ar=cat(1,ar{:});
disp(ar);
%{
1 1 2 3 4 22 33 12
2 8 0 0 0 0 0 0
3 24 0 0 0 0 0 0
4 16 0 0 0 0 0 0
15 11 7 0 0 0 0 0
21 20 0 0 0 0 0 0
23 17 0 0 0 0 0 0
%}

us

From: singlepoint on 27 Feb 2010 11:11

On Feb 27, 8:01 pm, "us " <u...(a)neurol.unizh.ch> wrote:
> ImageAnalyst <imageanal...(a)mailinator.com> wrote in message <3edbc0b2-e843-4358-8f20-1bf8cdd1c...(a)m37g2000yqf.googlegroups.com>...
> > singlepoint:
> > Why not just go through in a for loop? This demo I wrote will even
> > handle missing itemIDs and missing transaction numbers. It's
> > intuitive and easy to understand (although long because I expanded it
> > to be explicit, added abundant comments, and had it print to the
> > command window.) If you wait long enough, I'm sure someone can
> > provide a one or two liner (though it may be more cryptic and harder
> > to understand - or maybe not). Anyway, this should be fast unless you
> > have millions of records.
>
> > Be sure to join any lines split apart by the newsreader!!!
>
> > % Startup code.
> > clc; % Clear command window.
> > clear all; % Get rid of variables from prior run of this m-file.
> > workspace; % Show panel with all the variables.
>
> > % Generate some sample data
> > % with missing itemIDs and missing transaction numbers.
> > transaction_Record = [...
> > 1,1;...
> > 2,1;...
> > 3,1;...
> > 4,1;...
> > 22,1;...
> > 33,1;...
> > 24, 3;...
> > 12, 1;...
> > 16, 4;...
> > 11, 15;...
> > 8, 2;...
> > 7, 15;...
> > 17, 23;...
> > 20, 21]
> > transactions = transaction_Record(:,1);
> > itemIDs = transaction_Record(:, 2);
> > % Loop through finding all transaction #'s
> > % for each item ID.
> > maxItemID = max(itemIDs)
> > for item = 1 : maxItemID
> > indexesForItem = find(itemIDs == item);
> > transactionsForItem = transactions(indexesForItem);
> > % Check to see if there is a transaction for this item.
> > if ~isempty(transactionsForItem)
> > % There is a transaction for this item.
> > % Log it.
> > caTransactionsForItem{item} = transactionsForItem;
> > % Print transactions to the command window.
> > message1 = sprintf('\nItem %d occurred in transaction(s):', item);
> > message2 = sprintf('%d, ', transactionsForItem);
> > fprintf(1, [message1, message2]);
> > else
> > % There is NO transaction for this item.
> > % State that in the command window.
> > fprintf(1, '\nNo transactions for item %d', item);
> > end
> > end
>
> > The results:
> > Item 1 occurred in transaction(s):1, 2, 3, 4, 22, 33, 12,
> > Item 2 occurred in transaction(s):8,
> > Item 3 occurred in transaction(s):24,
> > Item 4 occurred in transaction(s):16,
> > No transactions for item 5
> > No transactions for item 6
> > No transactions for item 7
> > No transactions for item 8
> > No transactions for item 9
> > No transactions for item 10
> > No transactions for item 11
> > No transactions for item 12
> > No transactions for item 13
> > No transactions for item 14
> > Item 15 occurred in transaction(s):11, 7,
> > No transactions for item 16
> > No transactions for item 17
> > No transactions for item 18
> > No transactions for item 19
> > No transactions for item 20
> > Item 21 occurred in transaction(s):20,
> > No transactions for item 22
> > Item 23 occurred in transaction(s):17,
>
> one of the other solutions
>
> t=[
> 1 2 3 4 22 33 24 12 16 11 8 7 17 20 % <- transaction
> 1 1 1 1 1 1 3 1 4 15 2 15 23 21 % <- item
> ].';
> tu=unique(t(:,2));
> ar=arrayfun(@(x) t(ismember(t(:,2),x),1).',tu,'uni',false);
> % look at transactions for item #1
> ar{1}
> % ans = 1 2 3 4 22 33 12
>
> % to beautify output, only
> am=max(cellfun(@numel,ar));
> ar=cellfun(@(x,y) [x,y,zeros(1,am-numel(y+1))],num2cell(tu),ar,'uni',false);
> ar=cat(1,ar{:});
> disp(ar);
> %{
> 1 1 2 3 4 22 33 12
> 2 8 0 0 0 0 0 0
> 3 24 0 0 0 0 0 0
> 4 16 0 0 0 0 0 0
> 15 11 7 0 0 0 0 0
> 21 20 0 0 0 0 0 0
> 23 17 0 0 0 0 0 0
> %}
>
> us

Thank you very much guys for your reply. Its been helpful. One more
quick question, can anyone tell me how to get the indices of an entry
in a sparse matrix?
sparse matrix is of the form
(45,33) 1
(343,433) 1
(34343,333) 1
i need to get the values of the indices. Loops are of no use. If I
have to run a loop, I have to go through the whole matrix (including
zero values) so efficiency sucks and of course this is not the purpose
of sparse matrix.
If anyone having problem understanding my question please accept my
apologies in advance as I am new to matlab and I really don't know how
to ask a question.

Regards.

| Next | Last
Pages: 1 2
Prev: Matlab/C#
Next: Loading a 3D image using DirectX file (.x file)