ZopeMag's mascot the ZOPE fish


Article Finder
People
Issue 3 - Revision 5  /   April 9, 2003 


 
  ZopeMag Links:
Latest Issue
Credits
About the Fish
Issue 10
Issue 09
Issue 08
Issue 06
Issue 05
Issue 04
Issue 03
Issue 02
Issue 01
 
 
Downloads
     
  Letter from the Editor:
   Here We GO

Interviews:
Each issue we interview important people in the Zope world.

 Paul Everitt

Articles:
Throughout the quarter we cover topics of interest to Zope developers, designers, and users.

  ZPT for Beginners

  Community Site

  PostgreSQL Transactions

  Zope 3 UI

  Overview of Zope 3

Product Review:
Every two weeks we review a new Zope Product

  RDF Summary

  NeoBoard

  Neo Portal

 
 
Downloads
     
  Downloads:
Products we talk about in this issues Articles and Reviews

 
     

Illustration by Brendan Davis
tutorial
Transactions by example:

Transactions by example: (Part I of II)
- An Online Registration Desk using Zope and PostgreSQL
- - - - - - - - - - - -

By Samuel Sotillo | March 27, 2003

print

Abstract

This article is the first of two articles describing how to use PostgreSQL transactions with Zope and Python. In the first article we will give an overview of some basic elements of database design followed by a explanation of the need for transactions and how they are implemented. The second article describes how we use the database structure in Zope. A basic online registration desk for a fictitious College is used as a sample application. The whole database design process--from the data model to the functional specification--is briefly described. Sample Zope, Python, and SQL code is shown to illustrate the implementation.

____
 
 
Sidebar - Transactions
A sequence of SQL commands can be considered a transaction if it is
Atomic: a transaction can only be carried out in full or not at all.
Consistent: if the transaction fails the system must be restored to the state prior to initiation of the transaction.
Isolated: transactions cannot be affected by the execution of other concurrent transactions
Permanent: once a transaction is committed, its results cannot be altered by the failure of subsequent transactions.
 
____

Introduction

Transactions are one of the most confusing database-related topics. Some people consider them to be a kind of stored procedure (i.e., an SQL procedure stored on the server). However, transactions are something more than that. In this article we are going to review some basic questions with respect to transactions while developing a simple case study –-- a college's online registration desk –-- with Zope and PostgreSQL.

Currently, most colleges and universities provide some sort of online registration. This service offers students a set of basic functions that helps them with registering for new classes, updating personal information, and so on. Our registration desk will provide this kind of service for a fictional Virtual College.

In order to implement this Virtual College's basic registration desk, I will specify first a data model for the registration by describing the database objects we need and their relationships with each other. Then I will describe our functional model -- that is, the way we will query our database to handle the data required by our application -- which I will implement by using both Python and Zope SQL methods. Finally, I will describe our user interface and implement it using Zope. The main goals in our description are simplicity and clarity.

We assume a basic understanding of Zope, Python, and SQL as a prerequisite for understanding this article.

Database design overview

Designing a database comprises several stages. The first stage involves planning and analysis. In this context, planning means defining our goals while--at the same time--anticipating the resources we will need to achieve them. Analysis means determining the data requirements of the database. The next stage is modeling of the data. A data model represents our data structure as entities (or database objects) and depicts their relationships with each other. The data model uses the information collected during the planning-and-analysis stage and represents it in a way that is familiar to end-users.

In this context, an entity or database object means a "thing", "concept" or, indeed, anything that contains descriptive information about the data stored in the database—-- for instance, a phone book is an entity that describes, so to speak, ''links'' to people because it is an arrangement of information (we will call this information ”''attributes''” from now on) which describes how to reach the people listed in the book through their phones or their mail addresses.

Entities use attributes to describe ”''objects''”. Several types of attributes are defined for database objects. Key attributes are the first type: they are used to identify an entity unequivocally. For example, in our registration model each class in the list of available classes has a unique identifier – a ''class_id'', so to speak – which is the key attribute that is used to differentiate a particular class from the rest. Non-key attributes are a second type: they are used to describe an entity's properties. A good example of a non-key attribute is the name of a course: it may happen that the same course name is use for different classes (which may or may not be held at different times); therefore, the ''course_name'' attribute would not be unique. Derived attributes are a third type: they are attributes that can be derived from other attributes--after some mathematical manipulation, for instance. After a student completes their registration, the system must calculate the student's balance, which is determined from the fee that applies to each class the student has registered for .This balance is a good example of a derived attribute. Attributes should be validated according to rules intended to guarantee atomicity--that is, that they present an indivisible fact and describe entities without ambiguity.

A relationship represents the way entities associate with each other. In database language a relationship can be characterized by its cardinality, optionality, direction, and dependence. In our case, cardinality is the most important characteristic: it denotes how many instances of one entity are related to a single instance of another. Examples of cardinality are one-to-one, one-to-many, and so on. All this information then gets put together in a set of documents. These documents should include a detailed description of each object used by our model, including an accurate and complete definition of each one of them. Examples of such documents are Entity-Entity and Entity-Attribute matrices—-- see Figure 1 for an example of an EE matrix. Our EE matrix only presents information about the cardinality of the relationship between pairs of database objects. It is useful as a starting point for developing the ER diagram, which will be explained next.

Once the entities, attributes, and relationships have been specified and documented, the next step is to generate our Entity-Relationship diagram. The ER diagram is a graphic depiction of each database object and its relationship with every other database object. The point of such a diagram is to ensure that our data model describes a hierarchy of data that must be preserved and followed by our functional model.

ER data model

Figure 2 below shows an example of an ER diagram we will use for our project. Only those objects relevant to our case study are depicted. The cardinalities of all relationships involved are ONE-TO-MANY (1:M). This means that in each case one instance of an object relates to many instances of another object. For example, in the case of the object students, the same student (represented by a unique student_id number) can be found in several instances of the objects sit_list, registration_list, and invoice_list (sit_list gives the seat assignment of a student in the courses he has registered for; registration_list gives the courses a student has registered for; invoice_list gives the bills sent to a student for the courses he has registered for). These relationships will help us to determine the correct sequence of events that our transactions must accomplish in order to guarantee absolute database consistency--as we will see later when we consider our functional model. For now, let us describe the objects depicted in Figure 2 in detail.

The objects faculty and students are self-explanatory. Both objects are described by certain common attributes that both teachers and students as human beings share--for instance, last and first name, Social Security number, physical address, phone number, and so on. The object catalog specifies all courses included in the College curriculum--regardless of whether they are being offered during the present term or not. On the other hand, class_list describes all those courses available in a particular term--it is a subset of catalog and it includes temporal and geographical information relevant to a specific term or semester. catalog and class_list are related through course_id. This relationship (of cardinality 1:M) links a specific course with one or more available classes--for example, ENG-111 (Expository Writing) is offered in different rooms and at different times, in the same term. Although class_list's contents usually change from one term to the next, catalog's is usually fixed since the introduction of completely new courses is not so frequent. Additionally, class_list is related to faculty through the faculty_id attribute because each class requires an instructor. All these relationships are expressed by a set of primary and foreign keys that define a generalized hierarchy--a collection of objects that share some common attributes. This hierarchy will help us to describe our functional model later on. In fact, we just navigated one branch of the hierarchy, the one that relates the students and faculty objects with the college's internal academic processes as represented by the catalog, class_list, and sit_list objects.

The rest of Figure 2 shows another branch of the hierarchy that starts with the students object and goes all the way down to the account_receivables object. This branch relates the student's academic records--explained above--to the financial statement associated with the college's billing process.

It is at this point that it will be appropriate to introduce the concept of transactions into our model. Both branches of our ER diagram involve two different sequences of actions--transactions--which it will be necessary to follow for updating the information stored in the database. The first transaction guarantees a seat to the registering student--it checks whether there is still room left in the desired class and then proceeds to reserve a seat for the student while decrementing by one the number of seats available to other students. The second transaction is a financial transaction--it describes a student's purchase of a seat in a given course from the college. It generates an invoice and updates the relevant business information. However, the second transaction cannot be initiated until the first one is completed—committing the first one is obviously a prerequisite for initiating the second one since a student should only receive a bill for registration in a class (i.e. for being assigned a seat in that class) after the registration has been confirmed. He must not be billed for a seat he or she did not get. For this reason, we will consider both transactions as dependant components of a unique, more general sequence of actions that is necessary to guarantee the success of the whole registration process.

The basic idea behind a transaction is that the steps comprising it cannot be carried out partially -- this is called "atomicity" (see the sidebar "Transactions" for a short description of basic transaction concepts). Once a transaction is begun, the whole sequence of steps comprising it must be followed through to the end. If any of the steps fails, the whole sequence must be rolled back in such a way that the original state of the whole system is restored - that is, the state held by the system just before the transaction was started (this is called "consistency"). How this is achieved is described below. The reader is invited to check the resources listed at the end for in-depth descriptions of the SQL schema defined for our database. For now, let's review the definition of transactions from the programmer's point of view.

A naive definition of transactions

Because we are dealing with databases -- specifically with PostgreSQL -- I will define a transaction or multi-statement transaction as a group of SQL queries that must be executed as a unit -- that is, all the queries in the transaction must be completed or the transaction as a whole is aborted. Transactions have several interesting features –-- the sidebar "Transactions" reviews some of them. Transactions are a powerful mechanism for ensuring data consistency and integrity. They permit us to deal with undesirable behaviors such as dirty reads, non-repeatable reads, and phantom records. Dirty reads occur when a transaction reads data which has been modified by another non-committed transaction -- if the latter transaction fails, the data used by the former one is then flawed. Non-repeatable reads occur when a transaction reads a record and then tries to read it a second time after another transaction has deleted it; the first transaction will not be able to find the record. Finally, phantom records occur when a transaction reads a group of rows that satisfy a certain condition and then another transaction adds more records that satisfy the same condition: the result here is that the first read becomes defective. Our functional model must deal with these kinds of problems, as we will see in the next sections.

Functional Model

Figure 3 below shows an example of a multi-statement transaction executed from the PostgreSQL command-line interface. The example depicts the sequence of SQL queries necessary to reserve a seat. Here we assume that our student -- with Social Security Number '999-99-9999' -- wants to register for a particular class: "INTRODUCTION TO PROGRAMMING" or "CIS-115" (course number). We will use the student's Social Security number as the authenticating ID ("999-99-9999"). Our first step requires, therefore, querying the database to get the student's unique identifier (student_id), because we use this identifier as our primary key in students. Additionally, students.student_id is the ancestor of sit_list.student_id -- the foreign key used to relate some instances of sit_list (the descendant node in the hierarchy) to a particular instance of students (the ancestor node) -- see the data model above (Figure 2) and how the hierarchy proceeds from left to right.

Now we need to verify that this student has not previously reserved a seat for the same class. At the same time, we need to check how many students are already registered in "CIS-115" and verify that the seat limit for this class has not been reached. Lines 3 to 10 show how we verify all this. First, we find out which class_id corresponds to "CIS-115" -- in this case the class_id "12" corresponds to course_id "4", which corresponds in turn to course_code "CIS-115", as shown in lines 3 to 6. Next, we find out how many students are already registered in this class -- line 7 and 8 show that we have 10 students registered already. Then, we need to find out how many students can be registered in this class –-- lines 9 and 10 show that the seat limit is 16. Finally, our last check is to find out if our student is already registered in this class. Lines 11 and 12 show that "None" student with the same student_id (2) has reserved a seat in class "CIS-115".

Figure 3

. Figure 3: Sample sequence of independent SQL statements that may be used to reserve a seat from the PostgreSQL's command line processor.

At this point, after all these validations have been carried out, we can proceed to reserve a seat for our student. Lines 13 to 16 show how we enter a new record with the relevant information into sit_list and registration_list.

Note that we have not yet employed a transaction: Figure 3 does not show a real multi-statement transaction but a sequence of independent SQL queries or individual transactions (PostgreSQL considers an individual SQL query as a "single-statement transaction"). PostgreSQL executes each one of these queries as a single unit but anything might happen during the execution of the queries. For instance, if some other student interested in the same course initiates the same sequence of queries concurrently with the first student, it could happen that the seat limit for the class has been reached with or more students being nonetheless registered for the class after all the seats had been assigned -- that is, phantom records would have been added that could not be taken into consideration at the time the first student was carrying out his registration. This would happen if another student reserved the last seat just after step 7 is carried out by the first student and while the first student is still working on steps 13 and 15. On completing step 15 the first student would be registered for the course although there would not be a seat for him .This would mean that database integrity would not be maintained. This problem arises because this whole sequence of SQL queries is not executed atomically, i.e. as an indivisible unit. Additionally, sit_list data is not kept from being modified until after the transaction commits. An adequate level of isolation is critical for ensuring database integrity.

Atomicity and isolation

Fortunately, PostgreSQL offers several ways of dealing with the problems just mentioned. To guarantee the atomicity of a query sequence, we need to explicitly indicate this at the beginning and at the end of our sequence. Figure 4 shows how to do this. In this case, we add two new statements to our sequence--BEGIN WORK at the beginning and COMMIT WORK at the end. These two statements tell PostgreSQL that all SQL queries between them must be executed as a single unit.

Figure 4

. Figure 4:Sample multi-statement transaction with isolation level SERIALIZABLE

As for the isolation problem, we have two options. The first one is to serialize our transactions. By default, all PostgreSQL transactions are executed in a COMMITTED READ mode (this is called the "isolation level" of the transaction). In this mode, however, users can see other transactions commit while their transactions are still open or uncommitted -- this means that they can see changes made to the database as a consequence of other committed transactions while they are still within their uncommitted transactions. Changing the isolation level of the transaction to SERIALIZABLE can prevent this.

Isolation level is a very important issue because it affects the visibility of data changed by uncommitted transactions. We must always prevent a transaction from seeing undesirable or uncommitted changes that can affect the integrity of the whole database. For example, one student may initiate a new transaction that aborts just after changing the sit_list object – this means that the number of available seats actually remains unaltered because this first transaction gets rolled back. However, another student may try to do the same thing concurrently before the first transaction aborts. The transaction the second student initiated would see that the first student had (supposedly) reserved the last seat; therefore, when the second transaction committed its owner (the second student) would go without a seat despite the fact that the last seat ultimately had not been reserved at all.

In the mode SERIALIZABLE a transaction in progress cannot see any change committed by other transactions. However, although this solves the so-called visibility problem, there is still a problem when updating the database as required by lines 17 and 19 in Figure 4. In the SERIALIZABLE mode, transactions are executed serially--this means that if another user wants to reserve a seat in the same class one wants--assuming that there is still enough room -- their transaction will abort unless ours has committed. While this isolation level guarantees complete isolation, it turns out to be too restrictive. In fact, the SERIALIZABLE mode is useful for SELECT-only transactions (i.e., transactions that can only read data from the database without altering it) because no change is carried out in the database by this kind of transaction.

Figure 5

. Figure 5: Sample multi-statement transaction with LOCK in table sit_list

A second method for controlling the isolation level in PostgreSQL transactions is by using locks. Write-locks provide a fine-grained mechanism for preventing other users from modifying a row or even a whole table while a transaction is being executed. One such mechanism is a SELECT...FOR UPDATE statement that locks a row for update until the end of the transaction. Another mechanism is the LOCK statement. With the LOCK statement one can prevent concurrent access to a table or row for the duration of a transaction. The level of isolation can be set by reserving a table in an EXCLUSIVE, ACCESS EXCLUSIVE, SHARE, or ACCESS SHARE mode. In EXCLUSIVE mode, no other lock can be applied to the table. SHARE mode allows users to share a lock on a table or row, preventing any of them from applying an EXCLUSIVE lock. More information about this can be found in the PostgreSQL manual. The PostgreSQL default lock mode is ACCESS EXCLUSIVE: it prevents a concurrent operation from being performed anywhere in the whole table or object. In our case, we use LOCK TABLE sit_list IN ACCESS EXCUSIVE MODE as shown in Figure 5 to guarantee that our sit_list table cannot be modified until the transaction commits. Note, however, that this lock does not shield the table from concurrent reads.

This concludes our introduction to transactions, which are a very interesting and tricky subject. A detailed and careful study is required to understand the various aspects of this topic - of course, it is not our goal to do that here. We refer our reader to the PostgreSQL manual or the links listed below.

Further Resources

Information about PostgreSQL may be found at this URL:
http://www.postgresql.org

An interesting introduction to database modeling may be found at:
http://www.utexas.edu/its/work/review/windows/database/datamodeling/index.html

Information about the necessary modules and adapters to access PostgreSQL from Python and Zope may be found at:
http://www.zope.org/Members/aho/oio_utils/ZPyGreSQL_to_ZPsycopg

The following URL links to the chapter in the PostgreSQL manual that describes transactions and locks:
http://www.guides.sk/pgsql/chapters/transactions/


Samuel Sotillo:

Samuel Sotillo is a free-lance writer from Venezuela. His interests are: Zope, transactional systems, Python, PostgreSQL, MySQL, Web services, and Latin-American literature and history.


shim
shim  ZopeMag is committed to bringing you the best in Zope Documentation. shim
shim


Home   Subscribe   FAQ   Contact   Write for us   Privacy Policy   Weekly News   PyZine   opensourcexperts.com  

Reproduction of material from any of ZopeMag's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 ZopeMag Zope/Plone hosting by Nidelven IT