Usually, trees in SQL are created this way (postgresql syntax is used here:

create table tree (
   id int not null primary key,
   parent_id int,
   sortorder int not null,
   foreign key (parent_id) references tree(id) on delete cascade
);

-- enforce uniqueness
create unique index tree_unique_idx1 on tree(parent_id, sortorder) where parent_id is not null;
create unique index tree_unique_idx2 on tree(sortorder) where parent_id is null;

Sortorder defines the ordering for all childs-items with the same parent. Is there a fast solution for querying all children from a specific node down? In PostgreSQL there are extensions that make tree-handling easier, however - they are not portable. Is there a indexable solution using only standard-sql (and maybe stored procedures, triggers), but no vendor specific extension (e.g. HierarchyID in ms-sql)?

asked Feb 09 '11 at 09:53

Neutrino's gravatar image

Neutrino
71115


One Answer:

hi!

The only thing that comes to my mind is to manually implement something like HierarchyID (trigger) and then search using an indexed LIKE.

In your particular case, there is a constraint making parent + sortorder unique (even for the top level nodes). That means that the sortorder of the hierarchy along is enough to make an entry unique.

The "hierarchy" built be the trigger does therefore only need to concatenate the sortorder. That string can then server as LIKE filter (because of the uniqueness) and for the ORDER BY.

Assuming PostgreSQL:

create table tree (
   id int not null primary key,
   parent_id int,
   sortorder int not null,
   hierarchy varchar(1000000),
   foreign key (parent_id) references tree(id) on delete cascade
);

create unique index tree_unique_idx1 
                 on tree(parent_id, sortorder) where parent_id is not null;
create unique index tree_unique_idx2 on tree(sortorder) where parent_id is null;
create index tree_hierarchy on tree(hierarchy varchar_pattern_ops);

create or replace function getHierarchy() returns trigger as $$
declare
    h varchar;
begin
    IF (TG_OP = 'INSERT' AND NEW.hierarchy is not null) THEN
        RAISE EXCEPTION 'Hierarchy is set by a trigger, just leave it NULL';
    END IF;
    IF (TG_OP = 'UPDATE' AND (NEW.ID <> OLD.ID OR NEW.SORTORDER <> OLD.SORTORDER)) THEN
        RAISE EXCEPTION 'holy shit. skipping that cases for this sample';
    END IF;
    IF (NEW.SORTORDER < 0) THEN
        RAISE EXCEPTION 'negative sortorder not supported';
    END IF;

    select into h coalesce(hierarchy || '/' , '') from tree where id = NEW.parent_id;
    NEW.hierarchy := coalesce(h, '') || lpad(to_hex(NEW.sortorder), 8, '0');
    return NEW;
end
$$ language plpgsql;

create trigger setHierarchy
   before insert or update
   on tree for each row execute procedure getHierarchy();

So, that's the easy part ;)

Now, some test-data (lower the amount as you like):

insert into tree values (1, null, 1); -- root node
insert into tree values (2, 1, 1);
insert into tree values (3, 1, 3);
insert into tree values (4, 1, 2);
insert into tree values (5, 1, 4);
insert into tree values (6, 3, 1);
insert into tree values (7, 3, 3);
insert into tree values (8, 3, 2);
insert into tree select generate_series, floor(generate_series * 0.5), generate_series from generate_series(9, 100000);

The query is now very straight, for example:

explain analyze
 select *
   from tree
  where hierarchy like '00000001/00000002/00000009/00000012/00000024/00000048/00000090%' 
  order by hierarchy using ~<~;

------- Simplified: 
Index Scan using tree_hierarchy on tree  (actual time=0.036..1.979 rows=1023 loops=1)
   Index Cond: (((hierarchy)::text ~>=~ '00000001/[...]'::text) 
           AND ((hierarchy)::text ~<~ '00000001/[...]'::text))
   Filter: ((hierarchy)::text ~~ '00000001/[...]%'::text)

Note that there is no Sort operation. Fetching the hierarchy pattern is easy using the PK:

select hierarchy from tree where id = 144;

But now, all the issues with that:

  • sub-selecting the pattern doesn't work. Well, it works, from a functional perspective, but it will not use the index on hierarchy. The reason is that PostgreSQL doesn't know that the returned LIKE expression has a selective prefix. So, it assumes that it doesn't has one (e.g., like it would start with '%') and performs an full table scan (Seq Scan). That's yet another case where using bind parameters (and similar constructs) prevents proper index usage. Other databases might behave different.
  • you can save space by "compressing" the hierarchy. e.g. by using a higher numeric base (I just took hex, because to_hex was at hand).
  • The tree_hierarchy index uses varchar_pattern_ops so that the LIKE matching works, regardless of the locale.
  • The ORDER BY needs the USING ~<~ to work in connection with the varchar_pattern_ops index
  • A very similar approach is sill possible, if the constraints don't make the sortorder-hierarchy unique--But it would be required to encode the ID into the hierarchy as well.
  • A similar approach should work in other DB's as well, However, if the tree grows deep, the hierachy might get long. So you might reach certain limitations--Such as Oracles infamous ORA-01450
  • This sample implementation is probably buggy.

answered Feb 09 '11 at 15:44

Markus%20Winand's gravatar image

Markus Winand ♦♦
93651120

edited Feb 13 '11 at 09:44

The above queries work, though the result is not really orderded, at best it can be ordered to have parent first, then childs. The real challenge is to get the result in correct order: Suppose you have a tree like this:

A(ID=1, sortorder=1)
.C(ID=2, sortorder=1)
.B(ID=3, sortorder=2)

(in that order). How to get the result correctly ordered?

(Feb 10 '11 at 03:49) Neutrino
1

@Neutrino Update. Hope that's what you expected.

(Feb 13 '11 at 09:33) Markus Winand ♦♦

Works fine, and performance is good. Though I'm a bit disappointed, because this solution is well known to me (I used it alot in a project and hoped to see other approaches). One of the drawbacks of this approach is that updating sortorder of a node or moving nodes requires to do much more housekeeping in the hierarchy field (you already made prepartions for this in the trigger)

Regarding compression, I used base 64 here, that helped a bit to keep the hierarchy indexable (Postgres is limiting this to 255 chars).

(Feb 16 '11 at 14:44) Neutrino

I don't think that there is another solution. The only possible improvement is to keep the string short. e.g. not mapping the complete 'int' range (e.g., if you know you don't have more than X childs per parent), removing the separator ('/') which is not needed because the SORTORDER must be stored at a fixed length. Updating the SORTORDER is obviously lot of work for the DB because all the records must be physically moved in the index. However, the SQL should be rather straight--I guess--and can use the index as well.

(Feb 17 '11 at 02:25) Markus Winand ♦♦