Prev: Additive Creation Dirac's new radioactivities predicts why lithium, beryllium, boron are so rare Chapt 5 #174; ATOM TOTALITY
Next: MATHEMATICIANS GIVE UP YOUR ANCIENT METHODS --LIKE MEASURING" PI" BY TAPE MEASURE--- ADOPT -1 TANGENT AS THE LEAST MEASURE
From: Mathematisch on 18 Jun 2010 04:28 Hi, I hope somebody understands the problem here and can come out with some good ideas/suggestions. What would be a good binary relationship definition in this following scenario: I do a pairwise comparison in "a set of items" to find out how many "features" a given pair of items share. The number of items in the set is about ~55000. So the number of computations goes into > 1 billion. After the computation I get a result table which shows how many features each pair of items share with each other. On the basis of this table I would like to create clusters using the transitive closure (If you have suggestions for any other better suited method, please do tell). At the moment I do not know what would be the best choice for the criterion on: "the minimum number of features a pair of items need to share, so that this pair is considered as related". Is there any mathematical method to specify an optimal number or a sound approximation for that minimum number? Does it make sense to create/ guess that minimum number on the basis of e.g. average number of items shared between all the item pairs? Thanks a lot. Regards, M.
From: William Elliot on 18 Jun 2010 05:02 On Fri, 18 Jun 2010, Mathematisch wrote: > What would be a good binary relationship definition in this following > scenario: > > I do a pairwise comparison in "a set of items" to find out how many > "features" a given pair of items share. > > After the computation I get a result table which shows how many > features each pair of items share with each other. > f(a,b) = f(b,a) = n > On the basis of this table I would like to create clusters using the > transitive closure (If you have suggestions for any other better > suited method, please do tell). > What binary relation? The result table is a symmetric two place function. > At the moment I do not know what would be the best choice for the > criterion on: "the minimum number of features a pair of items need to > share, so that this pair is considered as related". Is there any > mathematical method to specify an optimal number or a sound > approximation for that minimum number? No, those are heuristic decisions. > Does it make sense to create/guess that minimum number on the basis of > e.g. average number of items shared between all the item pairs? > Why not? That's the usual, average rule of thumb. What are you trying to do? Order the items by similarity? For that you'll need a base line. How much are you offering for technical consultation?
From: Chip Eastham on 18 Jun 2010 11:18
On Jun 18, 4:28 am, Mathematisch <mathemati...(a)gmail.com> wrote: > Hi, > > I hope somebody understands the problem here and can come out with > some good ideas/suggestions. > > What would be a good binary relationship definition in this following > scenario: > > I do a pairwise comparison in "a set of items" to find out how many > "features" a given pair of items share. The number of items in the set > is about ~55000. So the number of computations goes into > 1 billion. > > After the computation I get a result table which shows how many > features each pair of items share with each other. > > On the basis of this table I would like to create clusters using the > transitive closure (If you have suggestions for any other better > suited method, please do tell). > > At the moment I do not know what would be the best choice for the > criterion on: "the minimum number of features a pair of items need to > share, so that this pair is considered as related". Is there any > mathematical method to specify an optimal number or a sound > approximation for that minimum number? Does it make sense to create/ > guess that minimum number on the basis of e.g. average number of items > shared between all the item pairs? > > Thanks a lot. > Regards, M. To say that you will "create clusters using the transitive closure" after forming the table counting shared features doesn't give much information about why you want to do this & whether another approach would be better suited. Are there a finite number of "features"? (If so this obviously bounds the number that any pair of items can share.) Should three items that share the _same_ ten features be treated the same as three items that share pairwise ten features but have no features in common to all three? However the method you suggest can certainly be tried to see if it suits your purpose. It seems to me that while no criterion has been proposed that makes one threshold "optimal" or "sound", it is clearly the case that starting with a higher threshold & reducing it will decrease monotonically the number of equivalence classes. You might be interested in this approach to clustering: [Hierarchical clustering] http://www.ics.uci.edu/~eppstein/280/tree.html regards, chip |