From: singlepoint on 27 Feb 2010 08:34 Hey Guys, I have a to extract the frequent item sets using matlab. Let me give a flavor of the problem. I have data in the following format transaction_id, item_id example, 1,1 2,1 3,1 4,1 it means that item_id 1 was purchased in transactions 1,2,3 and 4. I have to maintain a list of each item_id and transaction_id. like for item_id = 1 transaction_id = 1,2,3,4 (all those in which this item was purchased) similarly for other item_ids as well. I believe there are some set operators in matlab which can help with this kind of task. This is basically an implementation of vertical data format (frequent pattern mining algorithm) in matlab. Any ideas guys?
From: us on 27 Feb 2010 08:50 singlepoint <singlepoint(a)gmail.com> wrote in message <2bdc726d-8951-478a-8bf8-44ee8748f58d(a)u15g2000prd.googlegroups.com>... > Hey Guys, > I have a to extract the frequent item sets using matlab. Let me give a > flavor of the problem. I have data in the following format > > transaction_id, item_id > example, > 1,1 > 2,1 > 3,1 > 4,1 > it means that item_id 1 was purchased in transactions 1,2,3 and 4. I > have to maintain a list of each item_id and transaction_id. > like > for item_id = 1 > transaction_id = 1,2,3,4 (all those in which this item was purchased) > similarly for other item_ids as well. I believe there are some set > operators in matlab which can help with this kind of task. This is > basically an implementation of vertical data format (frequent pattern > mining algorithm) in matlab. Any ideas guys? it is not clear... you have to learn how to post in this NG (CSSM)... rather than using a lot of descriptive words, come up with a parsimonious(!) set of input data and output data you'd like to see... look through CSSM and see how others post... us
From: ImageAnalyst on 27 Feb 2010 09:19 singlepoint: Why not just go through in a for loop? This demo I wrote will even handle missing itemIDs and missing transaction numbers. It's intuitive and easy to understand (although long because I expanded it to be explicit, added abundant comments, and had it print to the command window.) If you wait long enough, I'm sure someone can provide a one or two liner (though it may be more cryptic and harder to understand - or maybe not). Anyway, this should be fast unless you have millions of records. Be sure to join any lines split apart by the newsreader!!! % Startup code. clc; % Clear command window. clear all; % Get rid of variables from prior run of this m-file. workspace; % Show panel with all the variables. % Generate some sample data % with missing itemIDs and missing transaction numbers. transaction_Record = [... 1,1;... 2,1;... 3,1;... 4,1;... 22,1;... 33,1;... 24, 3;... 12, 1;... 16, 4;... 11, 15;... 8, 2;... 7, 15;... 17, 23;... 20, 21] transactions = transaction_Record(:,1); itemIDs = transaction_Record(:, 2); % Loop through finding all transaction #'s % for each item ID. maxItemID = max(itemIDs) for item = 1 : maxItemID indexesForItem = find(itemIDs == item); transactionsForItem = transactions(indexesForItem); % Check to see if there is a transaction for this item. if ~isempty(transactionsForItem) % There is a transaction for this item. % Log it. caTransactionsForItem{item} = transactionsForItem; % Print transactions to the command window. message1 = sprintf('\nItem %d occurred in transaction(s):', item); message2 = sprintf('%d, ', transactionsForItem); fprintf(1, [message1, message2]); else % There is NO transaction for this item. % State that in the command window. fprintf(1, '\nNo transactions for item %d', item); end end The results: Item 1 occurred in transaction(s):1, 2, 3, 4, 22, 33, 12, Item 2 occurred in transaction(s):8, Item 3 occurred in transaction(s):24, Item 4 occurred in transaction(s):16, No transactions for item 5 No transactions for item 6 No transactions for item 7 No transactions for item 8 No transactions for item 9 No transactions for item 10 No transactions for item 11 No transactions for item 12 No transactions for item 13 No transactions for item 14 Item 15 occurred in transaction(s):11, 7, No transactions for item 16 No transactions for item 17 No transactions for item 18 No transactions for item 19 No transactions for item 20 Item 21 occurred in transaction(s):20, No transactions for item 22 Item 23 occurred in transaction(s):17,
From: us on 27 Feb 2010 10:01 ImageAnalyst <imageanalyst(a)mailinator.com> wrote in message <3edbc0b2-e843-4358-8f20-1bf8cdd1c0b6(a)m37g2000yqf.googlegroups.com>... > singlepoint: > Why not just go through in a for loop? This demo I wrote will even > handle missing itemIDs and missing transaction numbers. It's > intuitive and easy to understand (although long because I expanded it > to be explicit, added abundant comments, and had it print to the > command window.) If you wait long enough, I'm sure someone can > provide a one or two liner (though it may be more cryptic and harder > to understand - or maybe not). Anyway, this should be fast unless you > have millions of records. > > Be sure to join any lines split apart by the newsreader!!! > > % Startup code. > clc; % Clear command window. > clear all; % Get rid of variables from prior run of this m-file. > workspace; % Show panel with all the variables. > > % Generate some sample data > % with missing itemIDs and missing transaction numbers. > transaction_Record = [... > 1,1;... > 2,1;... > 3,1;... > 4,1;... > 22,1;... > 33,1;... > 24, 3;... > 12, 1;... > 16, 4;... > 11, 15;... > 8, 2;... > 7, 15;... > 17, 23;... > 20, 21] > transactions = transaction_Record(:,1); > itemIDs = transaction_Record(:, 2); > % Loop through finding all transaction #'s > % for each item ID. > maxItemID = max(itemIDs) > for item = 1 : maxItemID > indexesForItem = find(itemIDs == item); > transactionsForItem = transactions(indexesForItem); > % Check to see if there is a transaction for this item. > if ~isempty(transactionsForItem) > % There is a transaction for this item. > % Log it. > caTransactionsForItem{item} = transactionsForItem; > % Print transactions to the command window. > message1 = sprintf('\nItem %d occurred in transaction(s):', item); > message2 = sprintf('%d, ', transactionsForItem); > fprintf(1, [message1, message2]); > else > % There is NO transaction for this item. > % State that in the command window. > fprintf(1, '\nNo transactions for item %d', item); > end > end > > The results: > Item 1 occurred in transaction(s):1, 2, 3, 4, 22, 33, 12, > Item 2 occurred in transaction(s):8, > Item 3 occurred in transaction(s):24, > Item 4 occurred in transaction(s):16, > No transactions for item 5 > No transactions for item 6 > No transactions for item 7 > No transactions for item 8 > No transactions for item 9 > No transactions for item 10 > No transactions for item 11 > No transactions for item 12 > No transactions for item 13 > No transactions for item 14 > Item 15 occurred in transaction(s):11, 7, > No transactions for item 16 > No transactions for item 17 > No transactions for item 18 > No transactions for item 19 > No transactions for item 20 > Item 21 occurred in transaction(s):20, > No transactions for item 22 > Item 23 occurred in transaction(s):17, one of the other solutions t=[ 1 2 3 4 22 33 24 12 16 11 8 7 17 20 % <- transaction 1 1 1 1 1 1 3 1 4 15 2 15 23 21 % <- item ].'; tu=unique(t(:,2)); ar=arrayfun(@(x) t(ismember(t(:,2),x),1).',tu,'uni',false); % look at transactions for item #1 ar{1} % ans = 1 2 3 4 22 33 12 % to beautify output, only am=max(cellfun(@numel,ar)); ar=cellfun(@(x,y) [x,y,zeros(1,am-numel(y+1))],num2cell(tu),ar,'uni',false); ar=cat(1,ar{:}); disp(ar); %{ 1 1 2 3 4 22 33 12 2 8 0 0 0 0 0 0 3 24 0 0 0 0 0 0 4 16 0 0 0 0 0 0 15 11 7 0 0 0 0 0 21 20 0 0 0 0 0 0 23 17 0 0 0 0 0 0 %} us
From: singlepoint on 27 Feb 2010 11:11
On Feb 27, 8:01 pm, "us " <u...(a)neurol.unizh.ch> wrote: > ImageAnalyst <imageanal...(a)mailinator.com> wrote in message <3edbc0b2-e843-4358-8f20-1bf8cdd1c...(a)m37g2000yqf.googlegroups.com>... > > singlepoint: > > Why not just go through in a for loop? This demo I wrote will even > > handle missing itemIDs and missing transaction numbers. It's > > intuitive and easy to understand (although long because I expanded it > > to be explicit, added abundant comments, and had it print to the > > command window.) If you wait long enough, I'm sure someone can > > provide a one or two liner (though it may be more cryptic and harder > > to understand - or maybe not). Anyway, this should be fast unless you > > have millions of records. > > > Be sure to join any lines split apart by the newsreader!!! > > > % Startup code. > > clc; % Clear command window. > > clear all; % Get rid of variables from prior run of this m-file. > > workspace; % Show panel with all the variables. > > > % Generate some sample data > > % with missing itemIDs and missing transaction numbers. > > transaction_Record = [... > > 1,1;... > > 2,1;... > > 3,1;... > > 4,1;... > > 22,1;... > > 33,1;... > > 24, 3;... > > 12, 1;... > > 16, 4;... > > 11, 15;... > > 8, 2;... > > 7, 15;... > > 17, 23;... > > 20, 21] > > transactions = transaction_Record(:,1); > > itemIDs = transaction_Record(:, 2); > > % Loop through finding all transaction #'s > > % for each item ID. > > maxItemID = max(itemIDs) > > for item = 1 : maxItemID > > indexesForItem = find(itemIDs == item); > > transactionsForItem = transactions(indexesForItem); > > % Check to see if there is a transaction for this item. > > if ~isempty(transactionsForItem) > > % There is a transaction for this item. > > % Log it. > > caTransactionsForItem{item} = transactionsForItem; > > % Print transactions to the command window. > > message1 = sprintf('\nItem %d occurred in transaction(s):', item); > > message2 = sprintf('%d, ', transactionsForItem); > > fprintf(1, [message1, message2]); > > else > > % There is NO transaction for this item. > > % State that in the command window. > > fprintf(1, '\nNo transactions for item %d', item); > > end > > end > > > The results: > > Item 1 occurred in transaction(s):1, 2, 3, 4, 22, 33, 12, > > Item 2 occurred in transaction(s):8, > > Item 3 occurred in transaction(s):24, > > Item 4 occurred in transaction(s):16, > > No transactions for item 5 > > No transactions for item 6 > > No transactions for item 7 > > No transactions for item 8 > > No transactions for item 9 > > No transactions for item 10 > > No transactions for item 11 > > No transactions for item 12 > > No transactions for item 13 > > No transactions for item 14 > > Item 15 occurred in transaction(s):11, 7, > > No transactions for item 16 > > No transactions for item 17 > > No transactions for item 18 > > No transactions for item 19 > > No transactions for item 20 > > Item 21 occurred in transaction(s):20, > > No transactions for item 22 > > Item 23 occurred in transaction(s):17, > > one of the other solutions > > t=[ > 1 2 3 4 22 33 24 12 16 11 8 7 17 20 % <- transaction > 1 1 1 1 1 1 3 1 4 15 2 15 23 21 % <- item > ].'; > tu=unique(t(:,2)); > ar=arrayfun(@(x) t(ismember(t(:,2),x),1).',tu,'uni',false); > % look at transactions for item #1 > ar{1} > % ans = 1 2 3 4 22 33 12 > > % to beautify output, only > am=max(cellfun(@numel,ar)); > ar=cellfun(@(x,y) [x,y,zeros(1,am-numel(y+1))],num2cell(tu),ar,'uni',false); > ar=cat(1,ar{:}); > disp(ar); > %{ > 1 1 2 3 4 22 33 12 > 2 8 0 0 0 0 0 0 > 3 24 0 0 0 0 0 0 > 4 16 0 0 0 0 0 0 > 15 11 7 0 0 0 0 0 > 21 20 0 0 0 0 0 0 > 23 17 0 0 0 0 0 0 > %} > > us Thank you very much guys for your reply. Its been helpful. One more quick question, can anyone tell me how to get the indices of an entry in a sparse matrix? sparse matrix is of the form (45,33) 1 (343,433) 1 (34343,333) 1 i need to get the values of the indices. Loops are of no use. If I have to run a loop, I have to go through the whole matrix (including zero values) so efficiency sucks and of course this is not the purpose of sparse matrix. If anyone having problem understanding my question please accept my apologies in advance as I am new to matlab and I really don't know how to ask a question. Regards. |