H SR HO CHSCHULE FÜR TECHNI K RA PPERSW I L FHO Fachhoch sch ul e Ost sch w ei z Materialized Views in PostgreSQL Experiments around Jonathan Gardner's Proposal Seminar Database Systems Master of Science in Engineering Major Software and Systems HSR Hochschule für Technik Rapperswil www.hsr.ch/mse Supervisor: Prof. Stefan Keller Author: Reto Guadagnini Rapperswil, December 2011 Materialized Views in PostgreSQL Abstract Materialized Views act like a cache between the client who wants two query a view and the view. The client queries the Materialized View instead of the ordinary view, which improves the query performance. Current versions of PostgreSQL do not have built in support for Materialized Views. Jonathan Gardner proposed in a paper how Materialized Views could be handcrafted in PostgreSQL. This paper investigates his proposal. After a short introduction to Materialized Views it describes two experiments around J. Gardner’s proposal. In the first experiment the query performance of an Eager Materialized View was compared with the query performance of an ordinary view. We could show that the Materialized View really improves the query performance. In the second experiment we applied Gardner’s proposal to a “real” database to show that it is also applicable to real databases. We were able to apply Gardner’s proposal successfully to a more realistic example database. Keywords: materialized views, views, PostgreSQL, SQL, relational databases. R. Guadagnini 2 Materialized Views in PostgreSQL Table of Contents 1 Introduction ......................................................................................................................... 4 2 What are Materialized Views? ............................................................................................ 5 2.1 3 Types of Materialized Views ...................................................................................... 6 2.1.1 Snapshot Materialized Views................................................................................. 6 2.1.2 Eager Materialized Views ...................................................................................... 7 2.1.3 Very Lazy Materialized Views ................................................................................ 8 2.1.4 Lazy Materialized Views ........................................................................................ 8 Experiments........................................................................................................................ 9 3.1 Experiment 1: J. Gardner’s Eager Materialized View example ................................. 9 3.1.1 Measurements ..................................................................................................... 10 3.1.2 Discussion............................................................................................................ 11 3.2 Experiment 2: Eager Materialized View on the “World” database ........................... 12 3.2.1 Import of the “World” database ............................................................................ 12 3.2.2 Creating basic tables/functions for the Materialized View ................................... 12 3.2.3 Implement the Materialized View ......................................................................... 13 3.2.4 Test the implementation of the Materialized View ............................................... 15 3.2.5 Discussion............................................................................................................ 15 4 Conclusion ........................................................................................................................ 16 5 References ....................................................................................................................... 17 6 Appendix ........................................................................................................................... 18 6.1 Figures ..................................................................................................................... 18 6.2 Experiment 1 ............................................................................................................ 19 6.2.1 SQL script to create the example schema........................................................... 19 6.2.2 Random test data generation function ................................................................. 23 6.2.3 Benchmark operations ......................................................................................... 24 6.2.4 Raw measurement results ................................................................................... 25 6.3 Experiment 2 ............................................................................................................ 26 6.3.1 Basic tables and functions for the Materialized View .......................................... 26 6.3.2 Materialized View implementation ....................................................................... 27 6.3.3 Test of our Eager Materialized View implementation .......................................... 30 R. Guadagnini 3 Materialized Views in PostgreSQL 1 Introduction The main purpose of so called “Materialized Views” is to speed up time consuming queries on views. Actual versions of the Oracle Database and the Microsoft SQL Server have built in support for Materialized Views [4] [5]. Microsoft calls the Materialized Views of his SQL Server “Indexed Views” [5]. At the moment PostgreSQL has no built in support for Materialized Views, but as shown in a user survey, their support is highly requested by the PostgreSQL users [6]. Jonathan Gardner suggests in [1] how Materialized Views could be handcrafted in PostgreSQL. The goal of this work was to show that his suggestions are applicable to a “real” example database and that Materialized Views really improve the performance of queries on views. In the following we will first give a short introduction to Materialized Views in PostgreSQL according to [1]. Then we will describe an experiment, which we have performed to show that Materialized Views really improve the performance of queries and a second experiment to show the applicability of Gardner’s suggestions to a “real” example database. R. Guadagnini 4 Materialized Views in PostgreSQL 2 What are Materialized Views? A view represents a named query, which can be used in other queries like a normal table. Thus a view represents the relation which results, when the underlying query is performed. In the case of a normal view this query is performed again and again each time when the view is accessed. Depending on the underlying query, accessing the view can be very expensive and take a long time. A “Materialized View” instead is a real table which represents a view. A Materialized View contains the content of a view and for this reason the underlying query has not to be performed each time when someone accesses the Materialized View. The Materialized View acts like a cache between the view and its clients: Client Dataflow Queries Query Materialized View View Table 1 Table 2 … Table n Figure 1: Basic concept of a “Materialized View” Figure 1 shows the basic concept behind a Materialized View. We have a set of tables (table 1 to table n), called “Underlying Tables” and a view which is based on these tables. The Materialized View is based on the view and represents the relation of the view, which is created by the view when accessing it, in form of a real table. The Client who wants to query the view does not run its query on the view. He runs his query on the Materialized View instead. Thus the query of the client should be performed faster, because it runs directly on a real table (the Materialized View) and not on a view, which has to create the relation to query first by performing a query on the underlying tables. Materialized Views have also some drawbacks. One of them follows directly from their definition: Materialized Views require memory to store the content of the view, which they represent, thus they need clearly more space than ordinary views. R. Guadagnini 5 Materialized Views in PostgreSQL We just introduced the basic notion of Materialized Views and now we will have a look at the different types of Materialized Views. 2.1 Types of Materialized Views There are different types of Materialized Views. They differ in the way the Materialized View is created from the view and the strategies used to keep the Materialized View in synch with the underlying tables. Jonathan Gardner distinguishes in his paper four types of Materialized Views [1]: Snapshot Eager Lazy Very Lazy We will present these different types of Materialized Views in more detail and describe how they could be implemented in PostgreSQL according to [1]. 2.1.1 Snapshot Materialized Views Snapshot Materialized Views are updated only on explicit request. This update operation could be implemented as a PL/pgSQL function “refresh_matview” which first deletes the content of the Materialized View, queries the view for the new content and then fills the Materialized View again with the new content [1]: Materialized View Function: refresh_matview View Figure 2: Snapshot Materialized View with function "refresh_matview" The Snapshot Materialized Views are the easiest ones to implement, because we just need to create the table for the Materialized View and write the “refresh_matview” function, which refreshes this table on request, to implement one of them. R. Guadagnini 6 Materialized Views in PostgreSQL 2.1.2 Eager Materialized Views Eager Materialized Views will be updated immediately, if the content of the underlying tables changes. J. Gardner suggests in [1] the following scheme for their implementation in PostgreSQL: Materialized View mv_refresh_row(PK) mv_refresh() View Table 1 Table 2 Triggers: Insert Update Delete Triggers: Insert Update Delete … Table n Triggers: Insert Update Delete Figure 3: Implementation scheme for an Eager Materialized View in PostgreSQL An Eager Materialized View is refreshed by two functions “mv_refresh_row” and “mv_refresh”. On each of the underlying tables three triggers are defined. There is a trigger for each of the basic data manipulation operations: Insert, Update and Delete. If necessary, these triggers call the function “mv_refresh_row” in an appropriate way to keep the Materialized View in synch with the underlying tables and thus in synch with the view. The “mv_refresh_row” function expects the Primary Key (PK) of the view as parameter, thus we have to identify the columns of the view which form the Primary Key of the view to implement the “mv_refresh_row” function and the triggers. It is possible, that the view contains data from some mutable functions like “now()”. To reflect changes of this data in the Materialized View the “mv_refresh” function is used. The way the functions and triggers have to be implemented depends on the relationship between the tuples in the Materialized View and the tuples in the underlying tables. There are five different kinds of relationships [1]: One-to-one One tuple in the Materialized View depends on exactly one tuple in an underlying table. R. Guadagnini 7 Materialized Views in PostgreSQL Many-to-one Many tuples in the Materialized View depend on one tuple in an underlying table. This relation typically results, if the underlying tables are joined together to create the view. If we have for example two underlying tables “employee” and “town” which are joined together to form an employee view, many employee tuples in the view have to change if the postal number of a single town is changed, thus there is a many-to-one relation between the tuples in the Materialized View and the tuples in an underlying table. Many-to-many Many tuples in the Materialized View depend on many tuples in an underlying table. Such relations result if the view is created by a join of the underlying tables and an aggregation of their tuples. Other It is also possible, that we have data in the view which does not come from any of the underlying tables. The data could come for example from a function like “now()”. 2.1.3 Very Lazy Materialized Views Very Lazy Materialized Views are similar to Snapshot Materialized Views. The Very Lazy Materialized View will be updated only on request (by a call to a function like “matview_refresh”). To speed up the update process all the changes on the underlying tables will be logged. To update the Very Lazy Materialized View the list of changes since the last update will be read from the log and the Materialized View will be updated according to this list. 2.1.4 Lazy Materialized Views Lazy Materialized Views are similar to Very Lazy Materialized Views with the difference that they get updated on commit instead of update on request like the Very Lazy ones. J. Gardner states in [1] that he currently does not know how to implement them in PostgreSQL, so we did not cover them in our experiments. R. Guadagnini 8 Materialized Views in PostgreSQL 3 Experiments In our experiments we concentrated only on one type of Materialized Views, the eager one. We performed two experiments. The first experiment compares the query performance of Materialized Views with ordinary views and the second experiment shows how an Eager Materialized View could be implemented on more realistic example database. 3.1 Experiment 1: J. Gardner’s Eager Materialized View example In the first experiment we rebuilt the example, which is used in [1] to explain the implementation of Eager Materialized Views, in PostgreSQL. Then we measured the performance gain of the Materialized View compared to the ordinary view in this example using some self-generated test tuples. The example looks as follows: b_mv b_mv_refresh_row b_mv_refresh b_v a Triggers: b_mv_it b_mv_ut b_mv_dt 1 n b Triggers: b_mv_it b_mv_ut b_mv_dt 1 n c Triggers: b_mv_it b_mv_ut b_mv_dt Figure 4: Gardner’s example for an Eager Materialized View The example consists of three underlying tables “a”, “b” and “c” and a table “b_mv” which represents the Materialized View. Table “a” is in a 1 to n relation to table “b”. Table “b” is in a 1 to n relation to table “c”. On these three tables the view “b_v” is defined as follows: CREATE VIEW b_v AS SELECT b.b_id AS b_id, a.v AS a_v, b.v AS b_v, sum(c.v) AS sum_c_v FROM a JOIN b USING (a_id) JOIN c USING (b_id) WHERE (b.expires IS NULL OR b.expires >= now()) GROUP BY b.b_id, a.v, b.v; Listing 1: Definition of the view “b_v” [1] R. Guadagnini 9 Materialized Views in PostgreSQL As we can see from Listing 1, “b_v” represents the result of a join of the three tables a, b and c. The primary key “b_id” of table “b” identifies each tuple in the view “b_v” in a unique manner, thus it is used as parameter for the “b_mv_refresh_row” function. Gardner uses in his examples a special table called “matview” to keep track of the Materialized Views which have been created [1]. Furthermore he defines the functions “create_matview”, “drop_matview” and “refresh_matview” which perform operations as expected by their names and keep track of their operations in the “matview” table. 3.1.1 Measurements To compare the performance of queries on a Materialized View and on an ordinary view we implemented the example on our machine. The SQL scripts which we have used can be found in the appendix (under chapter 6.2). For our measurements we used a system (laptop) with the following specifications: CPU: Intel Core 2 Duo T9550 @ 2.66 GHz RAM: 4 GB DDR2 SDRAM @ 1066 MHz Hard drive: Hitachi HTS 250GB @ 7200 rpm Operating System: Windows 7 Professional SP 1 (32 bit) DBMS: PostgreSQL 9.0.5-1 with pgAdminIII 1.12.3 To fill the example database with test tuples we wrote a PL/pgSQL function “generate_test_data()” (see chapter 6.2.2), which generates the desired number of test tuples and fills the database with them. For each number of test tuples we measured three times the query time and calculated arithmetic mean of the measurement results to compensate outliers. The measurements were performed as follows: 1. We performed the query “SELECT generate_test_data(n);” to fill the database with n randomly chosen tuples. 2. We performed three times the query “SELECT * FROM b_v;” on the ordinary view “b_v” and measured each time the runtime of the query. 3. We performed three times the query “SELECT * FROM b_mv;” on the Materialized View “b_mv” and measured each time the runtime of the query. R. Guadagnini 10 Materialized Views in PostgreSQL The following diagram shows the results (arithmetic mean) of our measurements for different numbers of tuples: Performance Measurements Query on view b_v Query on materialized view b_mv 180'000 160'000 Query time [ms] 140'000 120'000 100'000 80'000 60'000 40'000 20'000 0 Number of tuples Figure 5: Query time: Materialized View versus ordinary view (raw data in chapter 6.2.4) 3.1.2 Discussion Our measurements, depicted in Figure 5, give just a trend, because there are many factors (software version, system workload etc.) which affect them. But the trend shows that queries on Materialized Views are indeed faster than queries on ordinary views especially if there are many tuples (more than 100’000 in our experiment) in the view. For two million tuples the query time was reduced in our experiment by factor using the Materialized View compared to the ordinary view. While using our “generate_test_data()” function we noticed that (as expected) the performance of the INSERT operation is diminished by the Materialized View. We had to add commands to our “generate_test_data()” function which disable the triggers of the Materialized View before the test data insertion and enable them again afterwards to get an acceptable performance. This shows that the improved query performance of Materialized Views does not come for free. The INSERT, UPDATE and DELETE operations on the underlying tables become more expensive if we use a Materialized View. R. Guadagnini 11 Materialized Views in PostgreSQL Experiment 2: Eager Materialized View on the “World” database 3.2 For the second experiment we used the “World” sample database from PgFoundry [8] to show how an Eager Materialized View could be implemented on a more realistic database than the database in experiment 1. In the following we describe how we implemented an Eager Materialized View on the “World” database. 3.2.1 Import of the “World” database First we created a new database “World” in PostgreSQL on our machine and performed then the following command to create the schema of the “World” database: psql -d world -U postgres -f 1_world.sql The “World” database consists of three tables “city”, “country” and “countrylanguage”: city 1 1 country 1 n countrylanguage Figure 6: Tables of the “World” database The Primary Keys (PK) of the tables are: “city.id” for city “country.code” for country countrylanguage has a primary key composed of “countrylanguage.countrycode” and “countrylanguage.language” 3.2.2 Creating basic tables/functions for the Materialized View Second we created the “matviews” table and the associated functions “create_matview”, “drop_matview” and “refresh_matview” which were also used in the example from experiment 1 and were published in [1]. The “matviews” table keeps track of the Materialized Views which have been created. The function “create_matview(v)” creates a Materialized View for the view “v”, “drop_matview(mv)” deletes the Materialized View “mv” and “refresh_matview(mv)” fills the Materialized View “mv” with the actual content of its associated view. R. Guadagnini 12 Materialized Views in PostgreSQL 3.2.3 Implement the Materialized View Now our database is ready for the creation of a view and a corresponding Materialized View. We defined a view “country_v” on the tables “city”, “country” and “countrylanguage” as follows: CREATE VIEW country_v AS SELECT city.id AS city_id, city.name AS city_capital, country.code AS c_code, country.capital AS c_capital_id, country.name AS c_name, countrylanguage.countrycode AS cl_code, countrylanguage.language AS cl_language, countrylanguage.percentage FROM country, city, countrylanguage WHERE city.id = country.capital AND country.code = countrylanguage.countrycode; Listing 2: The view “country_v” defined on the “World” database Now we can create the corresponding Materialized View “country_mv” using the “create_matview” function: SELECT create_matview('country_mv', 'country_v'); To implement the triggers for the Materialized View “country_mv” we must identify the primary key of the associated view “country_v”. The primary key of “country_v” is composed of the columns “cl_code” and “cl_language”. We define the “country_mv_refresh_row” function as follows: CREATE FUNCTION country_mv_refresh_row(country_mv.cl_code%TYPE, country_mv.cl_language%TYPE) RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN DELETE FROM country_mv WHERE cl_code = $1 AND cl_language = $2; INSERT INTO country_mv SELECT * FROM country_v WHERE cl_code = $1 AND cl_language = $2; RETURN; END '; Listing 3: The “mv_refresh_row” function for our Materialized View The “mv_refresh_row” function expects the primary key of the view as parameter and updates the tuple in the Materialized View which is identified by this key. We do not need to define the function “mv_refresh”, which was used in the example of experiment 1, because our view “country_v” does not depend on mutable functions. R. Guadagnini 13 Materialized Views in PostgreSQL Finally we implement the triggers for our Materialized View. We create an INSERT, UPDATE and DELETE trigger for each of the underlying tables. To show how such a trigger is implemented we look at the INSERT trigger of the city table: CREATE FUNCTION country_mv_city_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(code, language) FROM (SELECT * FROM country, countrylanguage WHERE countrycode) AS x WHERE capital = NEW.id; RETURN NULL; END '; CREATE TRIGGER country_mv_it AFTER INSERT ON city FOR EACH ROW EXECUTE PROCEDURE country_mv_city_it(); code = Listing 4: The INSERT trigger for the “city” table The INSERT trigger is called each time when a tuple is inserted in the “city” table. He calls the function “country_mv_city_it()” to update the Materialized View according to the inserted tuple. The function “country_mv_city_it()” determines the primary key of all rows of the materialized view which are affected by the insertion of the new tuple and calls the function “country_mv_refresh_row(PK)” with the primary key of these rows as parameter to update them in the Materialized View. The whole implementation of our Materialized View can be found in the appendix (chapter 6.3.2). The Materialized View we have created in our second experiment could be depicted as follows: country_mv country_mv_refresh_row country_v city 1 Triggers: country_mv_it country_mv_ut country_mv_dt 1 country 1 n Triggers: country_mv_it country_mv_ut country_mv_dt countrylanguage Triggers: country_mv_it country_mv_ut country_mv_dt Figure 7: Eager Materialized View on the “World” database R. Guadagnini 14 Materialized Views in PostgreSQL 3.2.4 Test the implementation of the Materialized View Even if we the “World” database is quite simple the implementation of a Materialized View on it is error prone. To test our implementation we used the commands described in chapter 6.3.3. We have tested if our implementation behaves as expected when we perform data manipulation operations (INSERT, UPDATE and DELETE) on the underlying tables of our Materialized View. 3.2.5 Discussion As showed we could realize a Materialized View as proposed by J. Gardner in [1] on a more realistic database. The “World” database we used is quite simple but even with such a simple database the implementation of a Materialized View is not really easy. For more complex databases the implementation of a Materialized View could be really difficult. The steps to create an Eager Materialized View could be summarized as follows: 1. Create the view if it is not already created. 2. Identify the primary key of the view. 3. Create the basic tables/functions “matviews”, “create_matview()”, “drop_matview()” and “refresh_matview()” published in Gardner’s paper [1]. 4. Create the table for the Materialized View using the “create_matview()” function. 5. Implement the “mv_refresh_row()” function with the primary key of the view as argument. 6. If the view bases on some mutable functions like “now()” implement the function “mv_refresh()”. 7. Implement all INSERT, UPDATE and DELETE triggers on the underlying tables. 8. Finally test if the Materialized View behaves as expected when INSERT, UPDATE and DELETE operations are performed on the underlying tables. R. Guadagnini 15 Materialized Views in PostgreSQL 4 Conclusion We showed that J. Gardner’s approach to realize Eager Materialized Views in PostgreSQL is applicable to “real” databases and that the query performance could be improved by using a Materialized View (up to factor in our experiment). The main drawback of this kind of Materialized Views is, that they are not easy to implement. Materialized Views improve on one side the query performance but on the other side they decrease the performance of data manipulation operations on the underlying tables. Another drawback of the Materialized Views is that they consume more memory than ordinary views. These drawbacks should be kept in mind when applying Materialized Views. Other database management systems like the Oracle Database or the Microsoft SQL Server have built in support for Materialized Views [4][5], which simplifies the creation of Materialized Views. They also offer even more sophisticated features. For example the Microsoft SQL Server supports data manipulation operations on Materialized Views [5] while the Materialized Views described in this paper can only be used in a read only fashion. We think that built in support for Materialized Views would be a good feature for PostgreSQL as it could make things a lot easier. [3] gives some ideas how the support for Materialized Views could be built into a database management system. R. Guadagnini 16 Materialized Views in PostgreSQL 5 References [1] Jonathan Gardner: PostgreSQL/Materialized Views. Retrieved September 20, 2011, from http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views [2] Dan Chak: Materialized Views that Really work. PGCon 2008. Retrieved November 10, 2011, from http://www.pgcon.org/2008/schedule/events/69.en.html [3] Ashish Gupta, Inderpal Singh Mumick: Materialized Views: Techniques, Implementations, and Applications. The MIT Press. First Edition, 1999. [4] Oracle Database Data Warehousing Guide, 10g Release 2. Retrieved November 28, 2011, from http://docs.oracle.com/cd/B19306_01/server.102/b14223/basicmv.htm#g1028195 [5] Eric Hanson, Susan Price, et. al.: Improving Performance with SQL Server 2008 Indexed Views. Microsoft Developer Network. Retrieved November 28, 2011, from http://msdn.microsoft.com/en-us/library/dd171921%28v=sql.100%29.aspx [6] http://wiki.postgresql.org/wiki/Materialized_Views. Retrieved November 10, 2011. [7] http://www.if-not-true-then-false.com/2010/postgresql-select-a-random-number-in-arange-between-two-numbers/. Retrieved November 25, 2011. [8] http://pgfoundry.org/projects/dbsamples/. Retrieved November 28, 2011. R. Guadagnini 17 Materialized Views in PostgreSQL 6 Appendix 6.1 Figures Figure 1: Basic concept of a “Materialized View” ...................................................................... 5 Figure 2: Snapshot Materialized View with function "refresh_matview" .................................... 6 Figure 3: Implementation scheme for an Eager Materialized View in PostgreSQL .................. 7 Figure 4: Gardner’s example for an Eager Materialized View ................................................... 9 Figure 5: Query time: Materialized View versus ordinary view (raw data in chapter 6.2.4) .... 11 Figure 6: Tables of the “World” database ................................................................................ 12 Figure 7: Eager Materialized View on the “World” database ................................................... 14 R. Guadagnini 18 Materialized Views in PostgreSQL 6.2 6.2.1 Experiment 1 SQL script to create the example schema The following script creates the Eager Materialized View example from Gardner’s paper. It is copied together from various listings published in Gardner’s paper [1]: CREATE TABLE matviews ( mv_name NAME NOT NULL PRIMARY KEY , v_name NAME NOT NULL , last_refresh TIMESTAMP WITH TIME ZONE ); CREATE OR REPLACE FUNCTION create_matview(NAME, NAME) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; view_name ALIAS FOR $2; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF FOUND THEN RAISE EXCEPTION exists.'', matview; END IF; ''Materialized view ''''%'''' already EXECUTE ''REVOKE ALL ON '' || view_name || '' FROM PUBLIC''; EXECUTE ''GRANT SELECT ON '' || view_name || '' TO PUBLIC''; EXECUTE ''CREATE TABLE '' || matview || '' AS SELECT * FROM '' || view_name; EXECUTE ''REVOKE ALL ON '' || matview || '' FROM PUBLIC''; EXECUTE ''GRANT SELECT ON '' || matview || '' TO PUBLIC''; INSERT INTO matviews (mv_name, v_name, last_refresh) VALUES (matview, view_name, CURRENT_TIMESTAMP); END '; RETURN; CREATE OR REPLACE FUNCTION drop_matview(NAME) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF NOT FOUND THEN RAISE EXCEPTION ''Materialized view % does not exist.'', matview; END IF; R. Guadagnini 19 Materialized Views in PostgreSQL EXECUTE ''DROP TABLE '' || matview; DELETE FROM matviews WHERE mv_name=matview; END '; RETURN; CREATE OR REPLACE FUNCTION refresh_matview(name) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF NOT FOUND THEN RAISE EXCEPTION ''Materialized view % does not exist.'', matview; END IF; EXECUTE ''DELETE FROM '' || matview; EXECUTE ''INSERT INTO '' || matview || '' SELECT * FROM '' || entry.v_name; UPDATE matviews SET last_refresh=CURRENT_TIMESTAMP WHERE mv_name=matview; END '; RETURN; CREATE TABLE a ( a_id INT PRIMARY KEY, v INT ); CREATE TABLE b ( b_id INT PRIMARY KEY, a_id INT REFERENCES a, v INT, expires TIMESTAMP ); CREATE TABLE c ( c_id INT PRIMARY KEY, b_id INT REFERENCES b, v INT ); CREATE VIEW b_v AS SELECT b.b_id AS b_id, a.v AS a_v, b.v AS b_v, sum(c.v) AS sum_c_v FROM a JOIN b USING (a_id) JOIN c USING (b_id) WHERE (b.expires IS NULL OR b.expires >= now()) GROUP BY b.b_id, a.v, b.v; SELECT create_matview('b_mv', 'b_v'); CREATE FUNCTION b_mv_refresh_row(b_mv.b_id%TYPE) RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN R. Guadagnini 20 Materialized Views in PostgreSQL DELETE FROM b_mv WHERE b_id = $1; INSERT INTO b_mv SELECT * FROM b_v WHERE b_id = $1; RETURN; END '; CREATE FUNCTION b_mv_refresh() RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(b_id) FROM b, matviews WHERE matviews.mv_name = ''b_mv'' AND b.expires >= matviews.last_refresh AND b.expires < now(); UPDATE matviews SET last_refresh = now() WHERE mv_name = ''b_mv''; END '; RETURN; -- a triggers CREATE FUNCTION b_mv_a_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.a_id = NEW.a_id THEN PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = NEW.a_id; ELSE PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = OLD.a_id; PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = NEW.a_id; END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_ut(); CREATE FUNCTION b_mv_a_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = OLD.a_id; RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_dt(); CREATE FUNCTION b_mv_a_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(b.b_id) FROM b WHERE b.a_id = NEW.a_id; RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON a FOR EACH ROW EXECUTE PROCEDURE b_mv_a_it(); -- b triggers CREATE FUNCTION b_mv_b_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.b_id = NEW.b_id THEN PERFORM b_mv_refresh_row(NEW.b_id); ELSE R. Guadagnini 21 Materialized Views in PostgreSQL PERFORM b_mv_refresh_row(OLD.b_id); PERFORM b_mv_refresh_row(NEW.b_id); END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_ut(); CREATE FUNCTION b_mv_b_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(OLD.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_dt(); CREATE FUNCTION b_mv_b_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(NEW.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON b FOR EACH ROW EXECUTE PROCEDURE b_mv_b_it(); -- c triggers CREATE FUNCTION b_mv_c_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.b_id = NEW.b_id THEN PERFORM b_mv_refresh_row(NEW.b_id); ELSE PERFORM b_mv_refresh_row(OLD.b_id); PERFORM b_mv_refresh_row(NEW.b_id); END IF; RETURN NULL; END '; CREATE TRIGGER b_mv_ut AFTER UPDATE ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_ut(); CREATE FUNCTION b_mv_c_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(OLD.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_dt AFTER DELETE ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_dt(); CREATE FUNCTION b_mv_c_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM b_mv_refresh_row(NEW.b_id); RETURN NULL; END '; CREATE TRIGGER b_mv_it AFTER INSERT ON c FOR EACH ROW EXECUTE PROCEDURE b_mv_c_it(); R. Guadagnini 22 Materialized Views in PostgreSQL 6.2.2 Random test data generation function We wrote for our little benchmark a PL/pgSQL function which fills the example database with random datasets, so that there are n tuples in the view “b_v” and thus also n tuples in the corresponding materialized view “b_mv”: -- Function: generate_test_data(integer) -- DROP FUNCTION generate_test_data(integer); CREATE OR REPLACE FUNCTION generate_test_data(n integer) RETURNS void AS $BODY$ DECLARE rnd_nr integer; rnd_a_for_b integer; rnd_c_for_b integer; index_of_curr_tpl_in_c integer := 1; BEGIN -- Disable triggers ---------------------------------------------RAISE NOTICE 'Disabling all triggers which update the matview...'; ALTER TABLE a DISABLE TRIGGER b_mv_dt; ALTER TABLE a DISABLE TRIGGER b_mv_it; ALTER TABLE a DISABLE TRIGGER b_mv_ut; ALTER TABLE b DISABLE TRIGGER b_mv_dt; ALTER TABLE b DISABLE TRIGGER b_mv_it; ALTER TABLE b DISABLE TRIGGER b_mv_ut; ALTER TABLE c DISABLE TRIGGER b_mv_dt; ALTER TABLE c DISABLE TRIGGER b_mv_it; ALTER TABLE c DISABLE TRIGGER b_mv_ut; -- Delete the old test data contained in the tables a, b and c. -RAISE NOTICE 'Cleaning tables...'; -- We use truncate because it is much faster than DELETE FROM TABLE. TRUNCATE a, b, c; RAISE NOTICE 'Tables cleaned.'; -- Fill the tables a,b and c with new random data. ---------------- Fill table a with n datasets: RAISE NOTICE 'Generate % new test datasets...', n; RAISE NOTICE 'Filling table a...'; FOR i in 1 .. n LOOP SELECT INTO rnd_nr get_random_number(0, 99); INSERT INTO a (a_id, v) VALUES (i, rnd_nr); END LOOP; -- Fill table b with n datasets: RAISE NOTICE 'Filling table b...'; FOR i in 1 .. n LOOP SELECT INTO rnd_nr get_random_number(0, 99); -- Give the current tuple of b a random tuple of a -- (Some tuples of a will not get a tuple of b, but this isn't a -- problem for the benchmark) SELECT INTO rnd_a_for_b get_random_number(1,n); INSERT INTO b (b_id, a_id, v) VALUES (i, rnd_a_for_b, rnd_nr); END LOOP; -- Fill table c with n datasets: RAISE NOTICE 'Filling table c...'; FOR i in 1 .. n LOOP -- Give eache tuple of b between 1 and 5 tuples of c SELECT INTO rnd_c_for_b get_random_number(1,5); FOR j in 1 .. rnd_c_for_b LOOP R. Guadagnini 23 Materialized Views in PostgreSQL -- RAISE NOTICE 'i: %., j: %.', i, j; SELECT INTO rnd_nr get_random_number(0, 99); INSERT INTO c (c_id, b_id, v) VALUES (index_of_curr_tpl_in_c, i, rnd_nr); index_of_curr_tpl_in_c := index_of_curr_tpl_in_c + 1; j := j+1; END LOOP; END LOOP; -- Refresh the matview after testdata generation and reenable all triggers -RAISE NOTICE 'Refreshing matview...'; PERFORM refresh_matview('b_mv'); RAISE NOTICE 'Reenabling all triggers matview...'; ALTER TABLE a ENABLE TRIGGER b_mv_dt; ALTER TABLE a ENABLE TRIGGER b_mv_it; ALTER TABLE a ENABLE TRIGGER b_mv_ut; which update the ALTER TABLE b ENABLE TRIGGER b_mv_dt; ALTER TABLE b ENABLE TRIGGER b_mv_it; ALTER TABLE b ENABLE TRIGGER b_mv_ut; ALTER TABLE c ENABLE TRIGGER b_mv_dt; ALTER TABLE c ENABLE TRIGGER b_mv_it; ALTER TABLE c ENABLE TRIGGER b_mv_ut; RAISE NOTICE 'Testdata generation finished.'; RETURN; END $BODY$ LANGUAGE plpgsql VOLATILE SECURITY DEFINER COST 100; ALTER FUNCTION generate_test_data(integer) OWNER TO postgres; Our function calls “get_random_number” which we have taken from [7]: CREATE OR REPLACE FUNCTION get_random_number(INTEGER, INTEGER) RETURNS INTEGER AS $$ DECLARE start_int ALIAS FOR $1; end_int ALIAS FOR $2; BEGIN RETURN trunc(random() * (end_int-start_int) + start_int); END; $$ LANGUAGE 'plpgsql' STRICT; 6.2.3 Benchmark operations To get our performance measurements we executed the following three SQL statements for various numbers n and noticed their runtime: -- Fill the database with test datasets SELECT generate_test_data(n); -- Query the view b_v SELECT * FROM b_v; -- Query the materialized view b_mv SELECT * FROM b_mv; R. Guadagnini 24 Materialized Views in PostgreSQL 6.2.4 Raw measurement results R. Guadagnini 25 Materialized Views in PostgreSQL 6.3 6.3.1 Experiment 2 Basic tables and functions for the Materialized View For our own example we reused the “matviews” table and the functions “create_matview”, “drop_matview” and “refresh_matview” from [1]: CREATE TABLE matviews ( mv_name NAME NOT NULL PRIMARY KEY , v_name NAME NOT NULL , last_refresh TIMESTAMP WITH TIME ZONE ); CREATE OR REPLACE FUNCTION create_matview(NAME, NAME) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; view_name ALIAS FOR $2; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF FOUND THEN RAISE EXCEPTION exists.'', matview; END IF; ''Materialized view ''''%'''' already EXECUTE ''REVOKE ALL ON '' || view_name || '' FROM PUBLIC''; EXECUTE ''GRANT SELECT ON '' || view_name || '' TO PUBLIC''; EXECUTE ''CREATE TABLE '' || matview || '' AS SELECT * FROM '' || view_name; EXECUTE ''REVOKE ALL ON '' || matview || '' FROM PUBLIC''; EXECUTE ''GRANT SELECT ON '' || matview || '' TO PUBLIC''; INSERT INTO matviews (mv_name, v_name, last_refresh) VALUES (matview, view_name, CURRENT_TIMESTAMP); END '; RETURN; CREATE OR REPLACE FUNCTION drop_matview(NAME) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF NOT FOUND THEN RAISE EXCEPTION ''Materialized view % does not exist.'', matview; END IF; EXECUTE ''DROP TABLE '' || matview; DELETE FROM matviews WHERE mv_name=matview; R. Guadagnini 26 Materialized Views in PostgreSQL END '; RETURN; CREATE OR REPLACE FUNCTION refresh_matview(name) RETURNS VOID SECURITY DEFINER LANGUAGE plpgsql AS ' DECLARE matview ALIAS FOR $1; entry matviews%ROWTYPE; BEGIN SELECT * INTO entry FROM matviews WHERE mv_name = matview; IF NOT FOUND THEN RAISE EXCEPTION ''Materialized view % does not exist.'', matview; END IF; EXECUTE ''DELETE FROM '' || matview; EXECUTE ''INSERT INTO '' || matview || '' SELECT * FROM '' || entry.v_name; UPDATE matviews SET last_refresh=CURRENT_TIMESTAMP WHERE mv_name=matview; END ' 6.3.2 RETURN; Materialized View implementation The following script contains the definition of our own Eager Materialized View example based on the “World” database: -- View country_v -------------------------------------------------CREATE VIEW country_v AS SELECT city.id AS city_id, city.name AS city_capital, country.code AS c_code, country.capital AS c_capital_id, country.name AS c_name, countrylanguage.countrycode AS cl_code, countrylanguage.language AS cl_language, countrylanguage.percentage FROM country, city, countrylanguage WHERE city.id = country.capital AND country.code = countrylanguage.countrycode; -- Materialized view country_mv -----------------------------------SELECT create_matview('country_mv', 'country_v'); -- country_mv_refresh_row function --------------------------------CREATE FUNCTION country_mv_refresh_row(country_mv.cl_code%TYPE, country_mv.cl_language%TYPE) RETURNS VOID SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN DELETE FROM country_mv WHERE cl_code = $1 AND cl_language = $2; R. Guadagnini 27 Materialized Views in PostgreSQL INSERT INTO country_mv SELECT * FROM country_v WHERE cl_code = $1 AND cl_language = $2; RETURN; END '; -- triggers --------------------------------------------------------- city triggers CREATE FUNCTION country_mv_city_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.id = NEW.id THEN PERFORM country_mv_refresh_row(code, language) FROM (SELECT * FROM country, countrylanguage WHERE code = countrycode) AS x WHERE capital = NEW.id; ELSE PERFORM b_mv_refresh_row(code, language) FROM (SELECT * FROM country, countrylanguage WHERE code = countrycode) AS x WHERE capital = OLD.id; PERFORM country_mv_refresh_row(code, language) FROM (SELECT * FROM country, countrylanguage WHERE code = countrycode) AS x WHERE capital = NEW.id; END IF; RETURN NULL; END '; CREATE TRIGGER country_mv_ut AFTER UPDATE ON city FOR EACH ROW EXECUTE PROCEDURE country_mv_city_ut(); CREATE FUNCTION country_mv_city_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(code, language) FROM (SELECT * FROM country, countrylanguage WHERE countrycode) AS x WHERE capital = OLD.id; RETURN NULL; END '; CREATE TRIGGER country_mv_dt AFTER DELETE ON city FOR EACH ROW EXECUTE PROCEDURE country_mv_city_dt(); CREATE FUNCTION country_mv_city_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(code, language) FROM (SELECT * FROM country, countrylanguage WHERE countrycode) AS x WHERE capital = NEW.id; RETURN NULL; END '; CREATE TRIGGER country_mv_it AFTER INSERT ON city FOR EACH ROW EXECUTE PROCEDURE country_mv_city_it(); code = code = -- country triggers CREATE FUNCTION country_mv_country_ut() RETURNS TRIGGER R. Guadagnini 28 Materialized Views in PostgreSQL SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.code = NEW.code THEN PERFORM country_mv_refresh_row(countrycode, language) FROM countrylanguage WHERE countrycode = NEW.code; ELSE PERFORM country_mv_refresh_row(countrycode, language) FROM countrylanguage WHERE countrycode = OLD.code; PERFORM b_mv_refresh_row(countrycode, language) FROM countrylanguage WHERE countrycode = NEW.code; END IF; RETURN NULL; END '; CREATE TRIGGER country_mv_ut AFTER UPDATE ON country FOR EACH ROW EXECUTE PROCEDURE country_mv_country_ut(); CREATE FUNCTION country_mv_country_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(countrycode, language) FROM countrylanguage WHERE countrycode = OLD.code; RETURN NULL; END '; CREATE TRIGGER country_mv_dt AFTER DELETE ON country FOR EACH ROW EXECUTE PROCEDURE country_mv_country_dt(); CREATE FUNCTION country_mv_country_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(countrycode, language) FROM countrylanguage WHERE countrycode = NEW.code; RETURN NULL; END '; CREATE TRIGGER country_mv_it AFTER INSERT ON country FOR EACH ROW EXECUTE PROCEDURE country_mv_country_it(); -- countrylanguage triggers CREATE FUNCTION country_mv_countrylanguage_ut() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN IF OLD.countrycode = NEW.countrycode AND OLD.language = NEW.language THEN PERFORM country_mv_refresh_row(NEW.countrycode, NEW.language); ELSE PERFORM country_mv_refresh_row(OLD.countrycode, OLD.language); PERFORM country_mv_refresh_row(NEW.countrycode, NEW.language); END IF; RETURN NULL; END '; CREATE TRIGGER country_mv_ut AFTER UPDATE ON countrylanguage R. Guadagnini 29 Materialized Views in PostgreSQL FOR EACH ROW EXECUTE PROCEDURE country_mv_countrylanguage_ut(); CREATE FUNCTION country_mv_countrylanguage_dt() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(OLD.countrycode, OLD.language); RETURN NULL; END '; CREATE TRIGGER country_mv_dt AFTER DELETE ON countrylanguage FOR EACH ROW EXECUTE PROCEDURE country_mv_countrylanguage_dt(); CREATE FUNCTION country_mv_countrylanguage_it() RETURNS TRIGGER SECURITY DEFINER LANGUAGE 'plpgsql' AS ' BEGIN PERFORM country_mv_refresh_row(NEW.countrycode, NEW.language); RETURN NULL; END '; CREATE TRIGGER country_mv_it AFTER INSERT ON countrylanguage FOR EACH ROW EXECUTE PROCEDURE country_mv_countrylanguage_it(); 6.3.3 Test of our Eager Materialized View implementation To test our own Eager Materialized View implementation for the “World” database we used the following SQL statements: -- Both queries should return the empty relation SELECT * FROM country_v WHERE c_code = 'WON'; SELECT * FROM country_mv WHERE c_code = 'WON'; -- Test INSERT triggers -------------------------------------------INSERT INTO city (id, name, countrycode, district, population) VALUES (5000, 'Caprica', 'WON', 'qwertz', 100); INSERT INTO country (code, continent, region, name, capital, surfacearea, population, localname, governmentform, code2) VALUES ('WON', 'Europe', 'Western Europe', 'Wonderland', 5000, 100, 40, 'test', 'Monarchy', 'WO'); INSERT INTO countrylanguage (countrycode, language, isofficial, percentage) VALUES ('WON', 'German', TRUE, 60.5); INSERT INTO countrylanguage (countrycode, language, isofficial, percentage) VALUES ('WON', 'Italian', TRUE, 20.0); INSERT INTO countrylanguage (countrycode, language, isofficial, percentage) VALUES ('WON', 'Other', TRUE, 19.5); -- Both queries should return the same relation with 3 tuples SELECT * FROM country_v WHERE c_code = 'WON'; SELECT * FROM country_mv WHERE c_code = 'WON'; -- Test UPDATE triggers -------------------------------------------UPDATE city SET name = 'Entenhausen' WHERE id = 5000; UPDATE country SET name = 'Mittelerde' WHERE capital = 5000; UPDATE countrylanguage SET percentage = 20 WHERE countrycode = 'WON' AND language = 'Other'; UPDATE countrylanguage SET percentage = 19.5 WHERE countrycode = 'WON' AND language = 'Italian'; -- Both queries should return again the same relation SELECT * FROM country_v WHERE c_code = 'WON'; SELECT * FROM country_mv WHERE c_code = 'WON'; -- Test DELETE triggers -------------------------------------------DELETE FROM countrylanguage WHERE countrycode = 'WON' AND language = 'German'; DELETE FROM countrylanguage WHERE countrycode = 'WON' AND language = 'Italian'; DELETE FROM countrylanguage WHERE countrycode = 'WON' AND language = R. Guadagnini 30 Materialized Views in PostgreSQL 'Other'; DELETE FROM country WHERE code = 'WON'; DELETE FROM city WHERE id = 5000; -- Both queries should return the empty relation SELECT * FROM country_v WHERE c_code = 'WON'; SELECT * FROM country_mv WHERE c_code = 'WON'; R. Guadagnini 31