do fast intersections in python
November 14th, 2007 by Lawrence David
let’s say you’ve got two lists creatively named: list1 & list2, and you’d like their intersection.
an easy way to get this intersection would be to iterate through both lists:
intersect = filter(lambda i: i in list1,list2)
this is slow though for large lists, as it takes roughly M X N steps, where M and N are the lengths of list1 and list2 respectively.
a much more efficient approach is to use hashes (or dicts, in python-speak):
dict1 = {}
dict2 = {}
for ele in list1:
dict1[ele] = 1
for ele in list2:
if ele in dict1:
dict2[ele] = 1
intersect = dict2.keys()
by using dictionaries, this approach only takes M + N steps.
Actually, there’s the set datatype that’s been around for a few versions, too. It behaves like a dictionary that has keys but no values.
>>> sentence_one = “let’s see what words these two sentences have in common”
>>> sentence_two = “these two sentences might have a few of the same words”
>>> list1, list2 = sentence_one.split(‘ ‘), sentence_two.split(‘ ‘)
>>> list1, list2
(["let's", 'see', 'what', 'words', 'these', 'two', 'sentences', 'have', 'in', 'common'], ['these', 'two', 'sentences', 'might', 'have', 'a', 'few', 'of', 'the', 'same', 'words'])
>>> set1, set2 = set(list1), set(list2)
>>> set1 | set2
set(['a', 'what', 'words', 'these', 'of', 'sentences', 'two', 'same', "let's", 'few', 'see', 'common', 'have', 'in', 'the', 'might'])
>>> set1 & set2
set(['these', 'words', 'two', 'have', 'sentences'])
>>> list(set1 & set2)
['these', 'words', 'two', 'have', 'sentences']
yeah sets is even faster