5.4.3. Multi Join (High-Speed clip0090 action)

<< Click to Display Table of Contents >>

Navigation:  5. Detailed description of the Actions > 5.4. Join Tables >

5.4.3. Multi Join (High-Speed clip0090 action)

 

Icon:   clip0097

 

Function: MultiJoin
 
Property window:
 

clip0103

 

Short description:

Join several tables.

You can have many different KEY columns inside the MASTER table.

 

Long Description:

This Action joins several tables based on different keys. Some definition:

The first input pin contains the MASTER table.

The other input pins (second, third, etc.) contain the SLAVE tables.

During the join, the SLAVE tables are added to (i.e. “joined with”) the MASTER table.

 

 
The clip0097 MultiJoin Action only computes “Left Outer Joins”.

 

Please refer to the section “5.1.7. Composite Primary Keys”, to know how to handle composite primary keys.

 

It is assumed that the column used as key inside the SLAVE tables have no duplicates (i.e. it can be used as primary key). Unexpected results may occur if that’s not the case. Anatella checks for the uniqueness of the keys in the SLAVE tables. This check might take some time and you can thus de-activate it but it’s strongly not recommended. If the check fails (i.e. if there are some duplicate keys inside one of the slave tables), you can investigate this data quality issue using the clip0104 NaïveDeDuplicate Action.

 

 

The clip0097 MultiJoin Action first loads all the SLAVE tables into RAM Memory and then, once the loading is finished, it starts processing the MASTER table row-by-row. Thus, we have the following:
 

The MASTER table is processed row-by-row and can thus have any size (i.e. it can be as big as you want without increasing RAM memory consumption).

The RAM Memory consumption of the clip0097 MultiJoin Action is proportional to the size of the SLAVE tables. If your SLAVE tables are large, you’ll have a large RAM Memory consumption. There are three ways to reduce RAM memory consumption (to be able to handle bigger SLAVE tables):

If the same SLAVE table S is used several times in different joins, connect the SLAVE table S one time to only one of the input pin of the clip0097 MultiJoin Action.

 

For example: We want to know the longitude & latitude of the two individuals A and B, each time they call each other. We have two tables:
 

clip0105

 

This is the best way of performing the 2 joins:

 

clip0106

 

Use a memory-efficient data-type to store the data inside your columns (see section 5.1.2 about data-types). To remind you: The most memory-efficient data type is “Key” (4 bytes per cell) and after “Float” (8 bytes per cell).

You should only select, inside the slave tables, the columns required inside the output table of the clip0097 MultiJoin Action because only these columns will be loaded into RAM memory (thus reducing the memory required to store the SLAVE table). By default, ALL the columns of the SLAVE tables are loaded into RAM memory (this is thus very bad from the RAM memory consumption point-of-view).

 

ANATEL~3_img252

 

         To select some particular columns from the SLAVE tables:

 

1.untick the checkboxes here:

 

clip0107

 

2.click here to select some particular column:

 

clip0108

         

 

Despite the above optimizations, the clip0097 MultiJoin Action might still consume a very large amount of RAM memory: Please also refer to section “5.3.2.7. Main RAM Memory Consumption” to know more about this subject.

 

Typically, the clip0097 MultiJoin Action is used to de-normalize (i.e. put everything inside a single table) the databases that are in “Star Schema”. For example, we have:

 

clip0109

 

It should be obvious, from the look of the above schema, to guess why this schema is named “Star” schema. Inside the database literature, you’ll find that:
 

The large table “in the center” of the schema (i.e. the table named TRANSACTION, in green) is referred as the “FACT” table in the literrature.
 

The small tables “on the border” of the schema (i.e. the blue tables) are referred as the “DIMENSION” tables in the litterature.

 

 
To de-normalize “Star Schema” databases, you will:
 

set as MASTER table of the clip0097 MultiJoin Action the FACT table (in the example above: the TRANSACTION table).

set as SLAVE tables of the clip0097 MultiJoin Action all the DIMENSION tables (in the example above: the blue tables).

 

 

We’ll have the following:

 

ANATEL~3_img257

   

Using column-name-prefixes is important to avoid any “collision”. In the above example, the output table from the clip0097 MultiJoin Action contains the columns: CUSTOMER_Name, PRODUCT_Name, SHOP_Name, SALESMAN_Name, PROMOTION_Name. If we forgot to set any prefixes (Warning: This is the default Anatella behavior!), all these different columns ends up with exactly the same name (that is “Name”) and we have many “column name collisions”. Collisions are detected at runtime inside the clip0110 CSVFileWriter Action and the clip0111 GenericODBCWriter Action.