ETL PROCESS TO A RESIDENCE SYSTEM (PART ONE)

Adonis Tejeda Rivera
4 min readNov 9, 2020

Hello everyone… I recently started in this world called Business Intelligence thanks to an elective chosen at my university, I’m very excited about everything that I am learning every day that is why I decided to write a series of blogs to show you my knowledge acquired.

So before read continued this blog, I recommend this interesting article of Analytics Vidyha about the difference between OLTP and OLAP.

Figure 1

As we can see in Figure 1. the relational model of the residence system consists of 6 tables, this transactional system is stored in PostgreSQL. What you first want to do is convert this relational model to a multidimensional model, all this is achieved thanks to the requirements previously defined with the users, which are the following:

  • Total payments made for apartments
  • Number of payments made for apartments
  • Total payments made by owners
  • Number of payments made by owners
  • Total collected for the fees of each apartment
  • Number of payments made per rate for each apartment
  • Increase in the rate of apartments per years
  • Percentage of increase in the rate of the apartments per year
  • Average rate for apartments
  • Number of apartments owned by the owners
Figure 2

In figure 2 we can see the list of requirements, where column number 3 refers to the name of the measures that will be present in our fact tables, while column number 4 refers to the dimensions that satisfy those measures previously defined. The table in Figure 2 was made with the simple fact of facilitating the construction of the multidimensional model.

Figure 3

It should be noted that to build the multidimensional model we must take into account the denormalization process, that is why the torres table passes as one more field to the apartamentos table. In figure 3 all the previously defined measures are found horizontally, while vertically all the dimensions that will be part of our multidimensional model are found. Now, the next step would be to relate each of the measures with the dimensions (with the help of the graph in Figure 2) in order to build our fact tables, for example, in Figure 3 it can be seen that all the dimensions satisfy the measures total_recaudos and cantidad_pagos. The dimensions dim_tiempo, dim_propietarios and dim_apartamentos satisfy the measures incremento, porcentaje_inc and tarifa_prom. Finally the dimensions dim_tiempo and dim_propietarios satisfy the measures total_pagos, cantidad_pagos_p and cantidad_apto. In the above we can find something in common, and that is that the dimension dim_tiempo satisfies all the measures, this dimension always has to be in our multidimensional model. Remember one principal characteristics of Data Warehouse is deals with historical data.

Based on all the above, we can clearly visualize 3 fact tables, where the measures total_recaudos and cantidad_pagos will belong to a fact table that we will call fac_pagos, the measures incremento, porcentaje_inc and tarifa_prom will belong to fac_tarifas and finally the measures total_pagos, cantidad_pagos_p and cantidad_apto will belong to fac_cuentas.

Figura 4

Having all the dimensions and the fact tables identified, we proceed to create our multidimensional model as shown in Figure 4. Each of the dimensions has an internal code which is unique. If we appreciate the dimension dim_apartamentos the field nom_torre refers to the table torres that is present in our relational model, this thanks to the denormalization process that was mentioned previously. It can also be seen that in each of the fact tables there are the internal codes of each of the dimensions that satisfy them. With our multidimensional model ready, the next step will be to build our Data Warehouse, our Data Warehouse will be stored in Oracle.

In the second part of this series of blogs we will see how the process of building our Data Warehouse will be and how we will populate it using Pentaho. See you later…

--

--