I have a list that has duplicate items. I'd like to get a count of how many duplicates there are of each item in the list, then remove any duplicates.
For example list = {"a","b","b","a","c","a"} would give me something like {{"a",3}, {"b",2}, {"c",1}}
I've already done this using a double loop, but it's terribly inefficient. Is there some simple way to do this using prebuilt functions from ETO?
My list will have upwards of 1000 items, which is why efficiency matters so much. A double loop would mean a million comparisons. (1000 items each compared 1000 times)
Solved! Go to Solution.
Try this...
Rule myList1 As List = {"a","b","b","a","c","a"}
Rule myList2 As List = Partition(myList1)
Rule myList3 As List
Dim myList As Any
For Each myList In myList2
myList3 = myList3 + {{first(myList), length(myList)}}
Next myList
End Rule
'To remove duplicates from original lst...
Rule myList5 As List = RemoveDuplicates(myList1)
Hope that helpd!
Regards,
Luc
While partition is a very handy function, my experience has shown it is extremely slow with large datasets. I try to avoid it when I can.
You have presented an elegant solution, but I believe the following should be 4-5x faster. You wont notice much difference on smaller datasets, but when ran with 1,000 plus elements, you will gain a lot of time by not using partition.
Uncached Rule qwe As List
Dim tLst As List = {"a","b","b","a","c","a"}
Dim nLst As List = {}
Dim tLstLen As Integer = length(tLst)
Dim uniques As List = removeDuplicates(tLst)
Dim unique As Any
For Each unique In uniques
nLst = nLst + {{unique, tLstLen - length(removeAllOccurrencesOf(unique, tLst))}}
Next
Return nLst
End Rule
Edit:
I wanted to make sure I wasn't remembering incorrectly so I went ahead and ran a test with/without partition. Using the same 10,000 element list of single character strings and caching that list before running either test, I came up with 48623 milliseconds using partition and 8338 without partition.
Yeah for what it's worth, if you're doing a lot of large list processing, it can be faster to write a .net function and call it. Do your list processing in .net and then pass back the value/s.
Wow this is much faster than my double loop method. In C# I can write fairly efficient code, but with Intent I never know what features the language has.
Thanks!
Another possible way could be to sort the list first and then count the repeating items in the sorted list.
Parameter Rule InputList As List
Dim inList As List={"a", "d", "b", "c", "aa", "d", "e", "c", "a", "c", "b"}
Return inList
End Rule
Rule SortList As List
SortList=sort(InputList, :Ascending)
End Rule
Rule CountofItems As List
Dim sortedList As List=SortList
Dim countList As List={}
Dim lengthSortedList=length(sortedList)
Dim prevItem As Any=first(sortedList)
Dim currItem As Any
Dim count As Number = 1
Dim itemcount As Number = 0
While count <= lengthSortedList
currItem=nth(count, sortedList)
If prevItem=currItem
itemcount=itemcount+1
Else
countList=countList+{prevItem, itemcount}
prevItem=currItem
itemcount=0
count=count-1
End If
count=count+1
End While
countList=countList+{prevItem, itemcount}
Return countList
End Rule
I'm fairly new to Intent so I can't compare this to other methods. Generally speaking though, sorting a list and iterating through it should be nlogn complexity (assuming the Sort function is nlogn) even otherwise I would think it would be possible to implement something like a QuickSort to sort the items.
Using a map (to use a C++ STL term) would be another way - iterating through the items in the list once and saving the count of each item in a map (can be accessed in constant time) but again since I'm not completely familiar with Intent, I don't know how to implement something that would be equivalen to a map (maybe child lists).
Here is another method that uses a map to store the occurrence of items in the input list. This can be comparable to using a sorted list and could be useful if the data set is large.
Parameter Rule InputList As List
Dim inList As List={"a","d","b","c","aa","d","e","c","a","c","b"}
Return inList
End Rule
'Count items in a list - using a map
Rule CountofItems_Map As List
Dim countList As List={}
Dim createMap? As Boolean = defineMap(:countMap, reset?:=True)
Dim lengthInputList As Number=length(InputList)
Dim currItem As Any
Dim count As Number = 1
While count <= lengthInputList
currItem=nth(count, InputList)
Dim currItemName As Name=MakeName(currItem)
Dim currItemCount As Integer=getMapValue(:countMap, currItemName)
If currItemCount=NoValue Then
setMapValue(:countMap, currItemName,1)
Else
setMapValue(:countMap, currItemName,currItemCount+1)
End If
count=count+1
End While
For Each currItem In getMapKeys(:countMap)
countList=countList+{{currItem,getMapValue(:countMap,currItem)}}
Next
Return countList
End Rule
Can't find what you're looking for? Ask the community or share your knowledge.