Welcome to Keen Software House Forums! Log in or Sign up to interact with the KSH community.
  1. You are currently browsing our forum as a guest. Create your own forum account to access all forum functionality.

Remove duplicates from list

Discussion in 'Programming Questions and Suggestions' started by yokmp, Oct 14, 2015.

Thread Status:
This last post in this thread was made more than 31 days old.
  1. yokmp Trainee Engineer

    Messages:
    40
    Because the PB cant use LINQ (so Distinct().ToList() is not an option) I tried to compare items by a loop:
    Code:
    // sort the list first so we get dupes next to each other
        list.Sort();
    // now loop through list
        for (int i = 0; i < list.Count; i++)
        {
    // if there is only 1 item in it or we would exceed the index by adding 1 break the loop
            if (list.Count == 1 || list.Count < i+1)
                break;
    // compare each pair and if true remove the index and lower i by 1
            if (list[i] == list[i+1])
            {
                list.RemoveAt(i);
                i--;
            }
        }
    It seems to be an easy task but i always get 'Failed to compare two elements of the array.'

    Whats the issue here or how can i get rid of duplicates?
     
  2. d4rky1989 Apprentice Engineer

    Messages:
    332
    Hi, modifying a list while iterating is IMO always a bad practice as it is prone to error.
    This is a much cleaner way which also does not change the order of the elements of the list (may be required in some cases):
    Code:
    public static void Distinct <T> (List<T> list) {
      var elems = new HashSet<T> ();
      list.RemoveAll (item => !elems.Add (item));
    }
     
    Last edited: Oct 23, 2015
  3. yokmp Trainee Engineer

    Messages:
    40
    In my case the order is no point to care about (i guess). So I came up with this:
    Code:
    // after few conditions fill a blacklist
    List<IMyCubeGrid> gridlist = new List<IMyCubeGrid>();
    for (int i = 0; i < blockGrid.Count; i++)
    {
        IMyShipConnector CurrentItem = ((IMyShipConnector)blockGrid[i]);
        if (!gridlist.Contains(CurrentItem.CubeGrid))
        {
            gridlist.Add(CurrentItem.CubeGrid);
        }
    }
    
    // and later on remove items from another list which are on the blacklist
    for (int i = 0; i < currentList.Count; i++)
    {
        if (gridBlacklist.Contains(currentList[i].CubeGrid))
        {
            currentList.RemoveAt(i);
        }
    }
    This code is, more or less, the same as i wrote in my last post but it works. Do you see any pitfalls beside a changed order?
     
  4. d4rky1989 Apprentice Engineer

    Messages:
    332
    Well, yes. The first one does work. But it has a bad complexity (O(n^2) if I'm correct). In the case of space engineers where the total amount of blocks is fairly small it shouldn't be a problem.
    My shown code should have a linear complexity assuming HashTaple.Add has a complexity of O(1) and List.RemoveAt has a complexity of O(n).
    If the order is not a problem you can remove duplicates this way, too:
    Code:
    gridList = new List<IMyCubeGrid> (new HashSet<IMyCubeGrid> (blockList.ConvertAll (b => b.CubeGrid)));
    The second is indeed wrong. You are skipping items as you modify the list while iterating it.
    Code:
    currentList.RemoveAll (curItem => gridBlackList.Contains (curItem));
    If gridBlackList is a HasSet this operation should perform faster.
     
    Last edited: Oct 17, 2015
  5. Malware Master Engineer

    Messages:
    9,862
    @yokmp
    Code:
    for (int i = currentList.Count - 1; i >= 0; i--) {
      if (gridBlacklist.Contains(currentList[i]).CubeGrid))
    {
    currentList.RemoveAt(i);
    }
    }
    
    This should be the most performant way to remove items from a List, because it doesn't have to move items within the list, all it does is decrease the item count. Also I agree with @d4rky1989 that a HashSet for the blacklist is the most performant way to do the comparison.
     
  6. d4rky1989 Apprentice Engineer

    Messages:
    332
    Just did a short test as I was interested in the performance differences between different approaches:
    - 32767 duplicated integers (65534 items in total)
    - For-loop (i++): ~220 ms
    - For-loop (i--): ~420 ms
    - RemoveAll: ~5 ms

    These are very interesting results. The only explanation I can think of why the RemoveAll method is so fast is that it may be implemented using a low-level language (f.i. asm or C/C++).

    The bad performace of the loop (i--) vs (i++) could be that iterating a list backwards counters the hardware prefetching and caching mechanisms. Or the reason can be found in the implementation of the list together with its RemoveAt method.

     
    Last edited: Oct 23, 2015
  7. yokmp Trainee Engineer

    Messages:
    40
    Maybe RemoveAt() doesnt keep the List in memory so it had to load it over and over again or RemoveAll() is just implemented in a low lvl lang.
    At least my stuff works now, you can test it here if you're interested:
    <Grid based Item Sorting>
     
  8. ephraimdov Trainee Engineer

    Messages:
    2
    List<String> duplicates = lst.GroupBy(x => x) .Where(g => g.Count() > 1) .Select(g => g.Key) .ToList();

    More about...C# List

    Dov
     
  9. Malware Master Engineer

    Messages:
    9,862
  10. Regn Trainee Engineer

    Messages:
    74
    In PHP, I only use foreach and basic if arguments to iterate through what I don't want in the loop. In fact, I created my own website/login system, the whole shebang, and I only ever used "for" loop once to generate a unique ID that could be stored in the database. Very simple, and efficient. Decidedly the fastest, most secure and easiest login system and forum I've ever used was one of my own design. In C#, however, foreach is a maze to avoid at all cost as it spew errors at the first minor inconsistency it finds. What I've discovered is that C sharp's OOP makes it "easy" to write ugly redundant code, for no real benefit what so ever, at the expense of being strict where it shouldn't. Couldn't possibly detest a language more than C#, to be frank. This is, of course, just my personal opinion, as I'm sure a lot of people prefer C# over anything else.
     
Thread Status:
This last post in this thread was made more than 31 days old.