Prev: VB6 OCX In C# Windows App
Next: xml and xml schema
From: Rob on 6 Mar 2010 23:35 On Sat, 06 Mar 2010 22:46:24 -0500, "Mr. Arnold" <Arnold(a)Arnold.com> wrote: >Rob wrote: >> When selecting records by forcing one field to be unique, is it >> necessary to use the .Distinct(...) function with an external >> comparator? >> >> EX: Say you had a list of purchase records...multifields. You want to >> get -just- a list of people who had bought items. The 'purchaser' >> field must be unique, so you don't just get back the entire list, but >> you want to retrieve all fields in the record, so you can't just use >> .Distinct(). (with no args) >> >> I've seen examples that require coding an external class, creating an >> instance, then passing that object to .Distinct(...) but this seems >> unduly complex. Is there a simpler way? I have Calvert's book, but >> that doesn't seem to be covered. I thought that this would be built >> in. > >I don't know. If it were me, I might try to this. > > >var list = new List<T>(); > >var distinctlist= (from a in resultset.distinct() select >a.purchaser).tolist(); > >foreach(var distinct in distinctlist) >{ > var dist = (from a in resultset.Where(a => a.purchaser == distinct >select a).first(); > >list.add(dist); >} Ah, so my original query was understandable. Yeah, I believe that will work! I should have thought of that, but was thinking in terms of a single step. And apparently there is no built-in function for enforcing uniqueness of a given field. I'm surprised. Thanks, Mr. Arnold.
From: Peter Duniho on 7 Mar 2010 00:15 Mr. Arnold wrote: > [...] > var list = new List<T>(); > > var distinctlist= (from a in resultset.distinct() select > a.purchaser).tolist(); > > foreach(var distinct in distinctlist) > { > var dist = (from a in resultset.Where(a => a.purchaser == distinct > select a).first(); > > list.add(dist); > } Even not counting the compile errors, I don't see how the above is useful. It will result in a new enumeration that has the same number of elements as the original, but with each element in the enumeration simply being a reference to the first element found for each key. For example, a collection that looks something like this: { { "One", 1 }, { "One", 2 }, { "Two", 3 }, { "Two", 4 }, { "Two", 5 }, { "Two", 6 }, { "Three", 7 }, { "Three", 8 }, { "Three", 9 } } will get converted to this: { { "One", 1 }, { "One", 1 }, { "Two", 3 }, { "Two", 3 }, { "Two", 3 }, { "Two", 3 }, { "Three", 7 }, { "Three", 7 }, { "Three", 7 } } I'm almost certain the OP doesn't want that. It's hard for me to see why anyone would. A somewhat more realistic interpretation of the original question, though one I'm still skeptical of, is that the original example above should map to this: { { "One", 1 }, { "Two", 3 }, { "Three", 7 }, } In other words, each unique key is represented once, with the row being an arbitrarily chosen (e.g. first) element from each matching key. If we take that as the goal, then the code above could be fixed by passing an appropriate IEqualityComparer<T> to the Distinct() method to filter the results only on the key, rather than the whole record. But if that's the goal, the above code is not very efficient (this is especially true if the source is a real database, where each key results in a whole new query to the database). The following should work, and more importantly should be much more efficient: var result = resultset .GroupBy(a => a.purchaser) .Select(d => d.First()); or alternatively: var result = from d in (from a in resultset group a by a.purchaser) select d.First(); or (yet another alternative): var result = from a in resultset group a by a.purchaser into grouped select grouped.First(); Of course any of the above results in selecting an arbitrary, essentially random "representative" from each group of records, which while a reasonable guess as to the original question's intent still doesn't seem that useful to me. Hopefully the OP can clarify their question, so that we can stop guessing as to what he's really trying to do. :) Pete
From: Peter Duniho on 7 Mar 2010 00:29
Rob wrote: > [...] > And apparently there is no built-in function for enforcing uniqueness > of a given field. I'm surprised. There is. It's the Distinct() overload that has an IEqualityComparer<T> parameter. I think that GroupBy() really is what you want. It would be much more efficient than repeatedly querying the original collection. However, if you insist on using the Distinct() method, you may find this little helper class useful: static class EqualityComparerFactory { public static IEqualityComparer<T> ComparerForEnumerable<T>(IEnumerable<T> e, Func<T, T, bool> equals, Func<T, int> hash) { return new CustomEqualityComparer<T>(equals, hash); } public static IEqualityComparer<T> Comparer<T>(Func<T, T, bool> equals, Func<T, int> hash) { return new CustomEqualityComparer<T>(equals, hash); } private class CustomEqualityComparer<T> : IEqualityComparer<T> { private Func<T, T, bool> _equals; private Func<T, int> _hash; public CustomEqualityComparer(Func<T, T, bool> equals, Func<T, int> hash) { _equals = equals; _hash = hash; } #region IEqualityComparer<T> Members public bool Equals(T t1, T t2) { return _equals(t1, t2); } public int GetHashCode(T t) { return _hash(t); } #endregion } } Here's some sample code showing how to use it (incorporating the basic code provided by Mr. Arnold, just adding the comparer): static void Main(string[] args) { Random rnd = new Random(); string[] rgstr = { "One", "Two", "Three" }; var source = (from i in Enumerable.Range(0, 20) select new { Key = rgstr[rnd.Next(rgstr.Length)], Value = i }).ToArray(); var result0 = GetEmptyList(source); var comparer = EqualityComparerFactory.ComparerForEnumerable(source, (x1, x2) => x1.Key.Equals(x2.Key), x => x.Key.GetHashCode()); var distinctlist = (from a in source.Distinct(comparer) select a.Key).ToList(); foreach (var distinct in distinctlist) { var dist = (from a in source.Where(a => a.Key == distinct) select a).First(); result0.Add(dist); } } static List<T> GetEmptyList<T>(IEnumerable<T> enumerable) { return new List<T>(); } The first factory method ComparerForEnumerable() is written to take the IEnumerable<T> you're going to be working with, so that the generic type can be inferred for the method arguments and the IEqualityComparer<T> to return. This is necessary when dealing with anonymous types. Of course, if you are always dealing with named types, you can just use the Comparer() factory method instead, specifying the type parameter explicitly. Pete |