Feed on
Posts
Comments

let’s say you’ve got two lists creatively named: list1 & list2, and you’d like their intersection.

an easy way to get this intersection would be to iterate through both lists:

intersect = filter(lambda i: i in list1,list2)

this is slow though for large lists, as it takes roughly M X N steps, where M and N are the lengths of list1 and list2 respectively.

a much more efficient approach is to use hashes (or dicts, in python-speak):

dict1 = {}

dict2 = {}

for ele in list1:

dict1[ele] = 1

for ele in list2:

if ele in dict1:

dict2[ele] = 1

intersect = dict2.keys()

by using dictionaries, this approach only takes M + N steps.


Bookmark and Share

if that was helpful ...

check out the other tips and tricks i've compiled on these pages. you might learn something else interesting!

2 Responses to “do fast intersections in python”

  1. on 14 Nov 2007 at 5:37 pm Jason Scheirer

    Actually, there’s the set datatype that’s been around for a few versions, too. It behaves like a dictionary that has keys but no values.

    >>> sentence_one = “let’s see what words these two sentences have in common”
    >>> sentence_two = “these two sentences might have a few of the same words”
    >>> list1, list2 = sentence_one.split(‘ ‘), sentence_two.split(‘ ‘)
    >>> list1, list2
    (["let's", 'see', 'what', 'words', 'these', 'two', 'sentences', 'have', 'in', 'common'], ['these', 'two', 'sentences', 'might', 'have', 'a', 'few', 'of', 'the', 'same', 'words'])
    >>> set1, set2 = set(list1), set(list2)
    >>> set1 | set2
    set(['a', 'what', 'words', 'these', 'of', 'sentences', 'two', 'same', "let's", 'few', 'see', 'common', 'have', 'in', 'the', 'might'])
    >>> set1 & set2
    set(['these', 'words', 'two', 'have', 'sentences'])
    >>> list(set1 & set2)
    ['these', 'words', 'two', 'have', 'sentences']

  2. on 09 Feb 2009 at 8:06 am bob

    yeah sets is even faster

Did I get this wrong? Let me know!

Trackback URI | Comments RSS

More blogs about http://desk.stinkpot.org:8080/tricks.