Thorny SQL Set Operation

UPDATE 2: I posted a full script that contains both working solutions in a related post.

UPDATE: Hans developed a query that works. See the comments. It works even if my group is (Female + French), but my Person is (Female + French + Politician) – e.g., extra categories not part of the group are ignored.

Thanks for all the help!

********************************

I’ve been working for a bit on a (possibly) hard SQL query and thought I would write out my problem here, in the hopes that (a) I’ll solve it, and (b) I’ll make the solution available to others via the magic of Teh Google.

Here’s the problem: I have a concept of “category”. Example categories: Man, Woman, Movie Star, Politician, American, French.

I have a concept of “person”. Persons can belong to one or more categories.

I have a concept of “group”. A Group is merely a listing of one or more categories. For example, “Male Movie Stars” is a Group (Man + Movie Star). “Female French Politicians” is a group (Woman + Politician + French).

What I want to do in my query is this: Given a Group, find all Persons who belong to that Group.

pseudo-code query:

select * from Person p
join PersonCategory pc on (p.personid = pc.personid)
join GroupCategory gc ON (pc.categoryid = gc.categoryid)
where gc.groupid = @groupid

However, normal JOIN syntax using the IN operator does an “OR”-type operation, when what I really want is an “AND”-type operation. Using the “Female French Politicians” example from above, this query will return French OR Woman OR Politician. Not what I want.

Thinking…

Advertisements

11 Responses to “Thorny SQL Set Operation”


  1. 1 Tom Music May 28, 2008 at 7:41 am

    No one-query solutions come to mind for me. At least nothing that isn’t really a two-query solution mashed together into a single one.

    The way I would do this would be to run one real fast query that gets the number of categories contained in the specific group, and call this $category_count.

    Then I would use a second query that returned person_ids and the number of category_id matches from that specific group for each person_id. At the end of the query I would stick on a HAVING statement to limit the query results to only show people who matched exactly $category_count number of categories.

    I look forward to seeing what other approaches people recommend!

  2. 2 Hans May 28, 2008 at 8:23 am

    For the AND, I think you’ll need to self join PersonCategory. Not sure how to do it in a single query though. If you know the category_ids for the group in advance, you could use something like:

    SELECT p.*
    FROM Person p, PersonCategory pc1, PersonCategory pc2
    WHERE pc1.person_id = pc2.person_id
    AND pc1.person_id = p.id
    AND pc1.category_id = @category_id_1
    AND pc2.category_id = @category_id_2;

    For each additional category, you’d have to add another self join. Pretty ugly.

  3. 3 Hans May 28, 2008 at 9:00 am

    Something like this might also do the trick:

    select * from Person p
    join PersonCategory pc on (p.personid = pc.personid)
    where pc.personid in (
    select pc.personid from PersonCategory pc
    where pc.categoryid in (
    select cg.categoryid from GroupCategory gc
    where gc.groupid = @groupid
    )
    group by pc.personid
    having count(*) = (
    select count(*) from GroupCategory gc
    where gc.groupid = @groupid
    )
    and pc.categoryid in (
    select cg.categoryid from GroupCategory gc
    where gc.groupid = @groupid
    );

  4. 4 Brian Dorsey May 28, 2008 at 9:24 am

    Another approach using ‘group by’ and ‘having’:
    http://pastebin.com/f69dd5434

    The core query is:

    — Female, French, Politicians
    select p.id, p.full_name, count(*) from person p
    join person_category pc on pc.person_id = p.id
    join category c on c.id = pc.category_id
    where c.id in (2,6,4) — Woman, French, Politician
    group by p.id, p.full_name
    having count(*) = 3 — the number of categories
    order by p.id

  5. 5 Anthony Stevens May 28, 2008 at 9:28 am

    @Hans: thanks! This works. I had just implemented a bitmasking solution that relied on a scalar function IsMember(personid, groupid), but this looks more direct.

  6. 6 Brian Dorsey May 28, 2008 at 9:28 am

    Re-reading Tom’s comment, my solution is the same. (I didn’t get the group info dynamically, but that could be done with sub-selects.)

  7. 7 Anthony Stevens May 28, 2008 at 9:39 am

    @tom music, @brian dorsey: Thanks for your help. The problem with the COUNT method is that extra categories on the person that are not part of the group will throw it off. (I think). So if you have

    Person = (Female + French + Politician)

    and

    Group = (Female + Politician)

    Then the COUNT method will exclude that person because 3 != 2.

  8. 8 Tom Music May 28, 2008 at 10:17 am

    @Anthony: There shouldn’t be the “3 != 2” situation (due to extra categories in the Person but not the Group) because the only Categories that are counted are the ones the Person has that are part of the specific Group being queried.

    Since we’d only count when P.CategoryID = G.CategoryID…

    Person = (Female + French + Politician)
    – and –
    Group = (Female + Politician)

    would only result in a count of 2, since there are only two in common. This would then be compared via HAVING to the count of 2 that comes from the outside statement (counting the CategoryIDs in the specific Group), and voila!

  9. 9 Anthony Stevens May 28, 2008 at 12:18 pm

    @Tom: you’re absolutely correct. I wrote out the version of the query that you and Brian pointed to and posted a full solution at https://xidey.wordpress.com/2008/05/28/comparing-sql-queries-that-solve-my-thorny-sql-set-operation-problem/


  1. 1 Bitmasking Solution For Thorny SQL Set Operation « The Pursuit of a Life Trackback on May 28, 2008 at 9:51 am
  2. 2 Comparing SQL Queries That Solve My Thorny SQL Set Operation Problem « The Pursuit of a Life Trackback on May 28, 2008 at 12:17 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: