Grouper groups DDL v7

The goals are:

  • Optimize for SELECT/join speed
  • Compact storage

Here is a suggestion for how grouper data will be stored in the database (not all tables accounted for)


  • Take out individual privileges, individual privs can be internally represented as groups in a private stem.  Recommend not to use individual privs.  How would GrouperAll work?  Same way?  Maybe make another type of group "group/role/entity/privilege".   Or it could be "internal" group, if we think we want to use internal groups for other things too...
  • Each Grouper data structure (group, stem, etc), will have a "core" table, and a 1-to-1 "more" table (optional). 
    • The core table should follow the 80/20 rule and have what is generally queried to make things fast. 
    • If the objects are going to the UI or being edited, the "more" table will be needed.
  • Tables which are currently shared for multiple reasons (e.g. memberships and privileges, or attributes and permissions) will be split out for single purpose
    • An exception to this is the generic metadata table (description, etc), which is 1-to-1 with the "owner" object but does not need SQL foreign key
  • If a table has multiple mutually exclusive foreign key cols, these can be split out into an intermediate table (e.g. privilege owners and attribute owners)
  • Indexes will be scrutinized and only added as needed.  Indexes can be smaller since fewer large primary key and foreign key cols
  • "Enum" type columns will be generally one single CHAR type using capital letter (e.g. which type of privilege, or if enabled or not).  If there are more than 62 options (alphanumeric cap/lower/digit), or if we want to make things more clear, then do two or three chars.
  • The APIs for inserting/updating/deleting objects in Grouper will be batchable (can take N tasks and do them with database batches)
  • Linkage tables (e.g. "attribute assign" tables do not need primary keys or other standard cols).  A daemon can clean things up
  • The Grouper primary key data type will be as follows.  Note, this id is not exposed via web service or provisioning or exporting and is internal only.  This will be used for foreign keys instead of uuid.
  • Use the id_index stragey for primary key ints where a certain number are reserved per JVM.  Some might be wasted but thats ok
  • TODO take out metadata table and add "more" table to other tables
Grouper primary key
DatabaseTypeNotes
Oraclenumber(38)
Postgresbigint
Mysqlbigint
  • The Grouper UUID data type will be binary (risk of text compare issues with institutions which store uuids).  The membership all view could have a composite key of two bigints or a 16byte col from a function based on the two 8 byte bigints.
Grouper UUID
DatabaseTypeNotes
Oracleraw(16)
Postgresbytea
Mysqlbinary(16)
  • Grouper timestamps are timestamp type
Grouper timestamp
DatabaseTypeNotes
Oracletimestamp
Postgrestimestamp
Mysqltimestamp
  • id_index 



grouper_v3_group_core table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_group_seq
Primary key of this table
Foreign key to other tables
uuid

Grouper UUID

UUID binary data
Functions will convert to hex (but could be case issues)
id_indexBigintNumeric ID that can be used outside of Grouper
parent_stem_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_stem_core.id_int
enabledChar(1)T | F
extensionVarchar(255)
nameVarchar(1024)
alternate_nameVarchar(1024)
type_of_groupChar(1)
last_updatedBigintmillis since 1970 for optimistic locking... increment by 1 if same as before


Group indexes.  Note, do we need a type_of_group index?  Note, what type of index is group_enabled_idx (bitmap)?

grouper_v3_group_core indexes
NameColumnsNotes
group_id_int_idxid_int


group_uuid_idxuuid
group_alternate_name_idxalternate_name
group_name_idxname
group_parent_and_ext_idxparent_stem_id_int, extension
group_parent_and_dispext_idxparent_stem_id_int, display_extension
group_enabled_idxenabled
group_enabled_timestamp_idxenabled_timestamp_millis
group_disabled_timestamp_idxdisabled_timestamp_millis
group_last_membership_idxlast_membership_change
group_last_imm_membership_idxlast_imm_membership_change


Note: this could be in the core table with references to the dictionary table

grouper_v3_group_more table
ColumnTypeNotes
group_id_intGrouper primary key

Foreign key to grouper_v3_group_core, 1-to-1
Primary key of this table

last_mship_change_millisGrouper timestamp bigint
last_imm_mship_change_millisGrouper timestamp bigint
display_extensionVarchar(255)
display_nameVarchar(1024)
descriptionVarchar(4000)
enabled_timestamp_millisGrouper timestamp bigint
disabled_timestamp_millisGrouper timestamp bigint
metadata_id_intBigintSame value as id_int in grouper_v2_metadata table
Foreign key
last_updatedBigint

We do not need this if descriptions are in the dictionary table

grouper_v3_metadata table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_metadata_seq
descriptionVarchar(1024)
the_typeChar(1)M (membership), etc
Just so we know when "foreign key like" went awry
last_updatedBigint


Convert all composites to JEXL scripted groups?

grouper_v3_composite table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_metadata_seq
owner_group_id_intGrouper primary key bigintValue from grouper_v3_group.id_int
left_factor_group_id_intGrouper primary key bigintValue from grouper_v3_group.id_int
right_factor_group_id_intGrouper primary key bigintValue from grouper_v3_group.id_int
typeChar(1)I (intersection), C (complement), U (union), X (exclusive or)
last_updatedBigint



grouper_v3_object_field table
ColumnTypeNotes
internal_idGrouper primary keyPrimary key of this table
Foreign key to other tables
stem_internal_idBigintForeign key to stem internal_id
group_internal_idBigintForeign key to group internal_id
attribute_def_internal_idBigintForeign key to attribute def internal_id
field_internal_idBigintForeign key to field internal_id
last_updatedTimestamp


grouper_v3_membership table
ColumnTypeNotes
internal_idGrouper primary keyPrimary key of this table
Foreign key to other tables
uuid

Grouper UUID

UUID binary data
Functions will convert to hex (but could be case issues)
member_internal_idBigintForeign key to member internal_id
object_field_internal_idBigintForeign key to object_field internal_id
enabled_timestamp_millisTimestamp
disabled_timestamp_millisTimestamp
enabledChar(1)T | F
metadata_id_intBigintSame value as id_int in grouper_v3_metadata table


Note, the "self group sets" will still be in the grouper_v3_group_mship_set table so you can query memberships with no outer join or union.  Perhaps there is no unique key so there could be multiples that a full sync could deal with

grouper_v3_group_mship_set table
ColumnTypeNotes
id_intGrouper primary key
owner_object_field_internal_idGrouper primary key bigintForeign key to object_field internal_id
member_object_fie_internal_idGrouper primary key bigintForeign key to object_field internal_id


grouper_v3_gr_attr_asgn table (composite key)
ColumnTypeNotes
group_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_group_core.id_int
attr_asgn_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_attr_def_name_core.id_int


grouper_v3_attr_asgn_asgn table (composite key)
ColumnTypeNotes
attr_asgn_owner_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_attr_asgn.id_int
attr_asgn_asgn_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_attr_asgn.id_int


grouper_v3_attr_asgn table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_attr_asgn_seq
Primary key of this table
Foreign key to other tables
uuid

Grouper UUID

UUID binary data
Functions will convert to hex (but could be case issues)
attribute_def_name_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_attr_def_name_core.id_int
enabled_timestamp_millisGrouper timestamp bigint
disabled_timestamp_millisGrouper timestamp bigint
enabledChar(1)T | F
metadata_id_intBigintSame value as id_int in grouper_v3_metadata table
the_typeChar(1)G (group), H (group assign), S (stem), T (stem assign) etc
Just so we know when foreign key like went awry
last_updatedBigint


grouper_v3_attr_val_st table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_attr_val_seq
Primary key of this table
Foreign key to other tables
attr_asgn_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_attr_asgn.id_int
last_updatedBigint
value_stringVarchar(4000)
metadata_id_intBigintSame value as id_int in grouper_v3_metadata table
grouper_v3_attr_val_in table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_attr_val_seq
Primary key of this table
Foreign key to other tables
attr_asgn_id_intGrouper primary key bigintForeign key to id_int in grouper_v3_attr_asgn.id_int
last_updatedBigint
value_integerBigint
metadata_id_intBigintSame value as id_int in grouper_v3_metadata table
grouper_v3_audit table
ColumnTypeNotes
id_intGrouper primary keyFrom grouper_v3_audit_seq
Primary key of this table
Foreign key to other tables
member_id_intMember primary key bigintForeign key to id_int in grouper_v3_member.id_int
timestamp_millisBigintMillis since 1970 when this happened
actionBigint

Foreign key to action table which has category and action

First two chars are category, last one is action

MEI - membership insert
MED - membership delete
MEU - membership update 
GRI - group insert
COI - configuration insert
GHE - GSH template execute
GPI - group privilege insert
etc

ip_address_id_intbigintforeign key to grouper_v3_audit_ipaddress
enginechar(1)U - UI
W - WS
G - GSH
D - Daemon
host_id_intbigintForeign key to grouper_v3_audit_host
grouper_v3_audit_host table
ColumnTypeNotes
id_intBigint
hostnamevarchar(50)Hostname
grouper_v3_audit_ipaddress table
ColumnTypeNotes
id_intBigint
ip_address_v6byte(16)ipv6
ip_address_v4byte(4)ipv4
grouper_v3_audit_group table
ColumnTypeNotes
audit_id_intAudit primary key bigintForeign key to grouper_v3_audit.id_int
group_id_intBigintForeign key to grouper_v3_group.id_int
relationchar(1)

Describe the relationship to the audit entry, e.g.

O - owner of the group where membership changed
L - left factor of composite

grouper_v3_audit_member table
ColumnTypeNotes
audit_id_intAudit primary key bigintForeign key to grouper_v3_audit.id_int
member_id_intBigintForeign key to grouper_v3_member.id_int
relationchar(1)

Describe the relationship to the audit entry, e.g.

M - member added to group where membership changed

grouper_v3_audit_dictionar_rel table
ColumnTypeNotes
audit_id_intAudit primary key bigintForeign key to grouper_v3_audit.id_int
audit_dictionary_idbigintForeign key to grouper_v3_audit_dictionary
relationchar(1)For any given action there can be dictionary terms associated.  e.g.
P - privilege type
grouper_v3_audit_dictionary table
ColumnTypeNotes
id_int

Grouper primary key

positive int

From grouper_v3_audit_seq
Primary key of this table
Foreign key to other tables
termvarchar(4000)

Some term (unique) e.g.
R - READ privilege

grouper_v3_audit_dictionary_large table
ColumnTypeNotes
id_int

Grouper primary key

negative int

From grouper_v3_audit_seq
Primary key of this table
Foreign key to other tables
termclob

For values larger than 4k

term_size_bytesint

Notes

Oracle: use hextoraw and rawtohex to get UUIDs
Postgres: decode('DEADBEEF', 'hex');
Mysql: CONV(string, 16, 2)

Batch updates

For example, create a bunch of stems, all at once

  • Create a stem
    • (1) insert into grouper_stems
    • (2) insert into grouper_privileges
    • (3) insert into grouper_audit_log
  • Create a stem
    • (11) insert into grouper_stems
    • (12) insert into grouper_privileges
    • (13) insert into grouper_audit_log
  • Create a stem
    • (21) insert into grouper_stems
    • (22) insert into grouper_privileges
    • (23) insert into grouper_audit_log

Process these changes

In a transaction, do three batched SQLs

  • (1) insert into grouper_stems, (11) insert into grouper_stems, (21) insert into grouper_stems
  • (2) insert into grouper_privileges, (12) insert into grouper_privileges, (22) insert into grouper_privileges
  • (3) insert into grouper_audit_log, (13) insert into grouper_audit_log, (23) insert into grouper_audit_log

If there is a problem, then do each unit of work in a transaction and report the error

Note, the loader would not need threads anymore, just do batches of work

Note, rules and hooks would need to be honored.  If a create is vetoed still continue with others.  Hooked would be able to add work

An interface could tell Grouper which object types are batchable

Get rid of hooks?

Change possibilities of rules?

Database ideas (legacy wiki)

Lets look at the data design of Grouper and see what gives us the most benefits and is doable in a reasonable amount of time.

Goals are to improve performance, reduce the amount of storage required in the database, reduce the amount of memory required in JVMs

https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/

https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439

IdeaDescriptionNotes
Database function library

Can each database support a basic grouper function library

  1. Decode uuid binary to string
  2. Convert string uuid to binary
  3. Decode basic codes to and from (e.g. field int to name)
If we can have a function library we can 
convert views to be the exact same (perhaps)
and reduce database storage and foreign keys
Remove dashes in uuidsOld uuids have dashes, lets remove

Need to change all foreign keys and values.  Find uuids in attribute values (rules) etc and convert.

This is so we can convert to and from binary.  APIs and WS should be backwards compatible

UUIDs to binary

UUIDs to binary would change from 34 chars (40 varchar) to 16 bytes

Not sure if a char is always a byte, might be more.

Need to see what queries, joins, and ORM would look like.

Database functions to convert these and Java helper methods to convert in consistent way

UUIDs to base64?

If we cant change from UUID varchar to binary, maybe go to base64.

Goes from 34 chars (40 varchar) to 22 chars

Note this is not needed if we can go to binary
UUIDs more sequential?Indexes on UUID might do better if somewhat incrementingMaybe first 2? bytes could be number of days since 1970, and they would generally keep the index 
in the same page for a given period of time
Add id index on all tables
(bigint 8 bytes)
Some tables have id index right now.  All can.  Same algorithm as now where we reserve ids?  Or database function?
Change primary key to big int id indexInstead of UUID
Change foreign keys to id indexInstead of UUID
Do all tables need a uuid?See where uuid is needed on all tables.  Maybe a bigint would workNote we want to keep web services backwards compatible, keep uuid for all major
outward facingtables.  E.g. do audits and PIT need uuids?
Do we need audit cols?
Other cols to remove?
Do we need created_on, created_by, last_updated and updated_by?
Maybe just last updated in a 4 byte number minutes since 1970 gives rough idea of age?
Should auditing tables be responsible for capturing that?  Make sure audit purging doesnt delete this.
Refactor audit tablesWhat needs to be stored?  Make tables for group or object data?  Index tables?Make queries efficient.  Index data.
Can we remove indexes?Can we remove any indexes?
Convert label keys to one char or inte.g. If there is a small list of enums, consider just using one char or small int
Remove lookup tables?

e.g. do we need grouper_field or can that be handled by DB functions and tinyint for col?

e.g. grouper_field_convert_id_name(5) → readers
grouper_field_convert_name_id('readers') → 5
grouper_field_is_access_by_id(5) → true


Can we use more bitmap indexes?Do attribute values need bitmaps?
Harmonize on time columnsDate/Timestamp type or millis/micros from 1970?  Be consistentDB functions can help convert.  Does this matter or just for new cols?
Move fields to other tables?Or have multiple ORM mappings.  e.g. we dont need group description
and friendly name when calculating memberships

Example: grouper_memberships

ColumnCurrent size (bytes)New size (bytes)Description
id3416from varchar to binary
id_index08new primary key
member_id348from varchar uuid to bigint id index
owner_id348from varchar uuid to bigint id index
field_id341from varchar uuid to a smallint
owner_group_id
owner_stem_id
owner_attr_def_id
348

from varchar uuid to bigint id index

these are mutually exclusive so counted once

via_composite_id348from varchar uuid to bigint id index
enabled11same
enabled_timestamp88same
disabled_timestamp88same
mship_type121theres only a few types, just represent in one char
creator_id340not needed, this is in audits
create_time80not needed, this is in audits
hibernate_version_number88same
context_id3416from varchar uuid to bigint id index
TOTAL385 bytes99 bytes74% of the storage eliminated
indexes smaller and faster

Example: grouper_attribute_assign

ColumnCurrent size (bytes)New size (bytes)Description
id3416from varchar to binary
id_index08new primary key
owner_group_id
owner_stem_id
owner_attr_def_id
etc
348

from varchar uuid to bigint id index

these are mutually exclusive so counted once

attribute_assign_action_id348from varchar uuid to bigint id index
attribute_def_name_id348from varchar uuid to bigint id index
notes1010can be more but generally blank so
estimate 10.  Maybe move to another table?
attribute_assign_delegatable151theres only a few types, just represent in one char
attribute_assign_type151theres only a few types, just represent in one char
disallowed11same
enabled11same
enabled_timestamp88same
disabled_timestamp88same
last_updated80not needed
mship_type121theres only a few types, just represent in one char
creator_id340not needed, this is in audits
create_time80not needed, this is in audits
hibernate_version_number88same
context_id3416from varchar uuid to bigint id index
TOTAL298 bytes87 bytes71% of the storage eliminated
indexes smaller and faster

Example: grouper_group_set

ColumnCurrent size (bytes)New size (bytes)Description
id3416from varchar to binary
id_index08new primary key
owner_group_id
owner_stem_id
owner_attr_def_id
etc
348

from varchar uuid to bigint id index

these are mutually exclusive so counted once

owner_group_id_null
owner_stem_id_null
owner_attr_def_id_null
etc
348

from varchar uuid to bigint id index (not null)

these are mutually exclusive so counted once

member_group_id
member_stem_id
member_attr_def_id
etc
348

from varchar uuid to bigint id index

these are mutually exclusive so counted once

member_id348from varchar uuid to bigint id index
field_id341from varchar uuid to a smallint
member_field_id341from varchar uuid to a smallint
owner_id348from varchar uuid to bigint id index
mship_type161from varchar uuid to a smallint
depth44same
via_group_id348from varchar uuid to a smallint
parent_id348from varchar uuid to a smallint
creator_id340not needed, this is in audits
create_time80not needed, this is in audits
hibernate_version_number88same
context_id3416from varchar uuid to bigint id index
TOTAL512 bytes95 bytes91% of the storage eliminated
indexes smaller and faster
  • No labels

1 Comment

  1. We might wish to consider sequences for doing the IDs.  99% of my DB experience is Postgresql, so please forgive me I point out things that are unique to that system.  Sequences are used for IDs because unlike all(?) other operations, the increment of a sequence is immediately visible outside of the transaction, prior to commit.  So you don't get into locking situations.  However, if transaction A gets a value, then B gets one, and then A rolls back, then an value will be skipped.  Since sequences are 64 bit, running out isn't generally an issue.  This is easy enough for you to see by creating a sequence, starting transactions in two windows and incrementing the sequence and getting the value.  A sequence can be reset if needed, too.


    create sequence test;
    begin;
    select * from test;
    select nextval( 'test' );
    ..
    rollback;

    You'll see that the sequence will not roll back to the original value.  Postgresql has the data type SERIAL which is just something like:


    create table test2(
        id bigint not null default nextval( 'test' ),
        name varchar(256) not null default ''
    );
    insert into test2( name ) values( 'test' );
    insert into test2( name ) values( 'test1' );
    insert into test2( name ) values( 'test2' )
    select * from test2;
    id | name
    ----+-------
    4 | test
    5 | test1
    6 | test2
    (3 rows)

    In oracle, SERIAL can be emulated with triggers.  I looked this up for mariahdb as well, and there are a way to do it.  We could just use triggers..

    pg_dump (dumps a database as SQL, more or less) handles sequences appropriately.  It sets the last known value and the reload has the actual id value in the data, so the default is not taken.

    Also, you probably want to make sure UNIQUE is set on every column that uses a SERIAL or emulated SERIAL.  You can do

    insert into test2( id, name ) values( 7, 'test2' );

    which would set id to 7, but have no impact on the underlying sequence.  If the sequence was less than seven, eventually, it would get there, and if past, you might create other bad situations.  I suppose this is pretty obvious since any id column is bound to be a foreign key and indexed anyhow.


    Mike