ZopeMag's mascot the ZOPE fish


Article Finder
People
Issue 2 - Revision 4  /   January 16, 2003 


 
  ZopeMag Links:
Home Page
About the Fish
Issue 10
Issue 09
Issue 08
Issue 07
Issue 06
Issue 05
Issue 04
Issue 03
Issue 02
Issue 1
 
 
Downloads
     
  Letter from the Editor:


Interviews:
Each issue we interview important people in the Zope world.

 Eric Peter Germain

Articles:
Throughout the quarter we cover topics of interest to Zope developers, designers, and users.

  Customized User Folders Part II

  Using Zope Sessions

  Debugging ZPTs

  Storing Binary Data

Product Review:
Every two weeks we review a new Zope Product

  ZShrink Revisited

  External Editor

 
 
Downloads
     
  Downloads:
Products we talk about in this issues Articles and Reviews

  ZShrink
  BinaryDataExample
  DBUserFolder
 
     

Illustration by Brendan Davis
tutorial
Storing Binary Data

Storing Binary Data in a Relational SQL Data Database and Publishing it from Zope
There are good reasons for storing images in a relational database
- - - - - - - - - - - -

By Nico Grubert | January 15, 2003

print

There are three ways of storing images so that they are accessible in Zope. Firstly, one can store them in the ZODB; secondly, they can be stored in the local file system; thirdly, they can be stored in a relational database. There are good reasons – as described below – for choosing the third alternative. This article presents how to store images and other binary data in a relational database so that they are accessible to Zope. The database used as an example here is PostgreSQL but with minor changes other databases such as MySQL or Oracle can be used.

The Idea

When I started to use Zope with relational databases such as PostgreSQL and MySQL I only developed applications where normal text data (strings, integers, etc.) was stored in the database. Binary data, such as images or other data files, I either uploaded into the ZODB or I used Zope products like "ExtFile" or "LocalFS" to store files in the local file system, from which I would then access them in Zope.

After I had developed a few applications I wondered how I could combine content from different sources to publish a single document (which I call "MyDocument" in the context of this article) with a more complex structure than a simple document with several text-fields. I wanted to store meta-information as well as images and binary data files (e.g. *.tgz archives) to get a complex "MyDocument" document that would be publishable via the Web.

Originally, I had decided to store the meta-data of "MyDocument" in a PostgreSQL database table. The images and binary files were uploaded to the ZODB -- the URL of an uploaded object was stored with the document's meta-data in the database table. This worked fine for a while until somebody who had access to the management screen deleted the Zope folder "PDF" with more than 1000 stored PDF data files! I took a look at the size of the ZODB and knew why he had decided to delete the folder: the ZODB size was 7 Gigabyte -- this is much too large! Of course, my application no longer worked because the objects stored in the "PDF" folder could not be found. Instead of a functioning product I had a bloated ZODB, with data inconsistency.

After this experience, I decided that in the future I would store all data, i.e. meta-data and binary data, in the PostgreSQL database on a different server. This way my ZODB would be small and fast and the danger of data inconsistency would be minimized.

To get an idea of what my "MyDocument" pages looked like, take a look at Figure 1 below.

Introduction

Zope Page Templates (ZPTs) will play an integral part in Zope's future. Using ZPTs enables Web designers and application developers to work better together. With ZPTs, the Web designer can create the layout of a Webpage using a presentation tool such as Adobe GoLive™ or Dreamweaver™ and then give it to application developers who use the Template Attribute Language (TAL), the TAL Expression Syntax (TALES) and the Macro Expansion for TAL (METAL) to program dynamic content. Since TAL and METAL are used as attributes within existing HTML tags and do not use extra tags, as DTML does, the pages are not messed up when they are opened in a presentation tool and the presentation tools do not mess up the source code. Moreover, you can easily include sample data as a placeholder in your code for dynamically created content when using Zope Page Templates.

This article, however, is not about the pros and cons of Zope Page Templates but about how to debug your templates with or without using a presentation tool to edit them. Although it is much easier to edit ZPTs in a presentation tool instead of using the Zope Management Interface (ZMI), for debugging you will often also need to open the page templates in the ZMI.

Fig. 1 - Two documents whose meta-data and binary data come from a PostgreSQL database
Fig. 1 - Two documents whose meta-data and binary data come from a PostgreSQL database

Getting started

On my machine I am running Redhat Linux 7.2, PostgreSQL 7.1, Zope 2.5.0 and the database adapter for PostgreSQL, called PscycopgDA 1.3.

Before I created my database tables, I had to be clear about the various document types I wanted to publish. I decided I needed to publish two document types (which I call "ProductInfo" and "Download"). On a "ProductInfo" page I wanted to publish an image and the-meta data of the associated article; on a "Download" page, I wanted to publish the meta-data and a link to a downloadable data file. Remember: our goal was to store the images, the data files and the meta-data in a PostgreSQL database. The meta-data goes into an SQL table called "tbl_metadata", the images go into an SQL table called "tbl_productinfo" and data files go into an SQL table called "tbl_download". Figure 2 will help clarify the structure of the data in the "ProductInfo" and the "Download" documents.

Fig. 2 - Three SQL tables in the PostgreSQL database
Fig. 2 - Three SQL tables in the PostgreSQL database

The SQL table tbl_metadata gives the meta-data for an article; tbl_productinfo contains an image belonging to the article; and tbl_download contains a downloadable file whose description is given in "description" in tbl_metadata.

After I created these three SQL tables in the database, I added a DTML Document "addDocumentForm" providing an input-form for adding new "ProductInfo" and "Download" documents.

Adding new documents

In Figure 3 you can see the form used for adding the meta-data of a new document. Depending on which "Group" one chooses (Productinfo Document or Download Document) another data entry page is opened (after one clicks on "Add Metadata") and an image or a file for downloading can be chosen.

Fig. 3 - The input form “addDocumentForm“ used to add meta-data
Fig. 3 - The input form “addDocumentForm“ used to add meta-data

There are 3 steps to adding a new document:

  1. Entering the meta-data in the input-form "addDocumentForm" and choosing a document type
  2. Calling – by means of clicking on "Add Metadata" – the DTML Method "addDocumentMethod", which in turn calls the Z SQL Method "sql_insertMetadata" used to insert the meta-data into the SQL table. In order to add the Z SQL Method "sql_insertMetadata" I had to add a "Z Psychopg Database adapter" object and enter the connection string consisting of the database name, a username and a password.
  3. Choosing an image or a file over its filename in the browse window opened by "Add Document Method". Submitting the form calls an external method to insert the data into the SQL tables "tbl_productinfo" or "tbl_download", respectively. The obj_id value of the record that goes into "tbl_metadata" is the same obj_id value for the record that goes into "tbl_productinf" or"tbl_download". This ensures that a given image is uniquely associated with a record in the tbl_metadata table.
Inserting binary data into the database table

Inserting the meta-data may be easy, but how is binary data inserted into a database table? First, let's take a look at the SQL table "tbl_productinfo".

CREATE TABLE tbl_productinfo ( 
  obj_id                INTEGER, 
  imagename             VARCHAR (200),
  imagedata             bytea

To insert binary data into a database table I used the fieldtype "bytea" instead of "TEXT" (in MySQL and Oracle the corresponding field-type is "blob" for "binary large object."). I did not use a Z SQL method to insert the binary data. Inserting binary data into a "bytea" field is not as easy as inserting normal "TEXT'" or "INTEGER" values into an SQL table. This requires an External Method, which we describe next.

Using External Methods

To insert binary data into the database table "tbl_productinfo", I created an External Method named "addFileDataMethod", which is called from the DTML Method "addDocumentMethod" when I click on the [Upload] button. In the <ZopeHome>/Extension directory file path I created a new Python File "addFileData.py". Click here to view the source code of "addFileData.py". To insert data into SQL tables from a Python method without calling a Z SQL method 4 steps are necessary:

  1. Open a database connection
  2. Create a cursor for the database connection
  3. Execute the SQL query
  4. Commit on the database connection

When the External Method "addFileDataMethod" is called the add_filedata() method is executed. To insert the binary data, 5 steps are necessary:


. Click Here for the code example

The first step opens a database connection in order to insert the data into the SQL Database. To open a database connection, I call the method connect(), which is defined in the Psycopg module I have imported. (The Psycopg module is the PostgreSQL adapter which enables Zope and Python to communicate with the PostgreSQL server.) The connect() method expects a connection string (name of the database, username) in order to be able to connect to the database.

How to get the connection string ?

Since I did not want to hardcode the connection string because I might want to change the database name, username or password at some point in the future, I needed to get the connection string that is set in the Z Psycopg Database adapter object "PsycopgDA_CMSDB".This is used by the Z SQL method "sql_insertMetadata" which inserts the meta-data into the table "tbl_metadata". But first I have to get the connection id attribute of the Z SQL method "sql_insertMetadata" in order to know which Z Psycopg Database adapter object is used to insert data into "tbl_metadata". We write the id of the Z Psycopg Database adapter object into a variable called db_con. Once I know the id of the Z Psycopg Database adapter object I can get the object itself by writing self.aq_acquire(db_con). I assigned the database adapter object to dba (see Part 1 of the code for the External Method 'addFileDataMethod'.) so I can write dba.connection_string to retrieve the connection string which is passed to the connect() method.

The most important step is 4. The c.execute expression takes two parameters.
The first parameter is the SQL query where the variables obj_id, img_title and data are inserted. These variables come from the second parameter – a dictionary that contains precisely these three variables:

{'obj_id':obj_id, 'img_title':title, 'data':psycopg.Binary(file.read())}

This is a more elegant way to build dynamic SQL statements in Python than splitting the "INSERT INTO tbl_productinfo (obj_id, …)…" statement as follows:

"INSERT INTO"+tbl_name+"(obj_id, …)…"

The values for the keys 'obj_id' and 'img_title' are the parameters passed to the method add_fileobject () – the value for 'data' is built by calling the Binary class which is defined in the psycopg module. Then the bytecode of the data file is read by file.read() and all the bytecode is passed to the Binary class. There is no need to use the Python Binary class since the psycopg module has its own Binary class.

When one adds a new "ProductInfo" document the meta-data is inserted into "tbl_metadata" and the binary data and the name of the image are inserted into "tbl_ productinfo". To see what the binary data in the database looks like, I can query the database by the following simple SQL query. This is done by entering the query into the text area one gets on clicking on the Test tab of the object "PsycopgDA_CMSDB".

SELECT * FROM tbl_productinfo

The result is shown in Figure 4:

Fig. 4 - -  Result of the SQL SELECT statement, which returns binary data
Fig. 4 - - Result of the SQL SELECT statement, which returns binary data

How one can see the actual image is described below under "Publishing Images".

Adding new documents of type "Download" works similarly to adding "ProductInfo" documents, with the difference, of course, that the binary data and the filename are inserted into the table called "tbl_download". When one adds a new "Download" document one has to check whether the filename one entered already exists in the database table "tbl_download". The filenames must be unique because later we call a Python script, "downloads" (see below for the code) and pass a filename to it. This unique filename is used to return the corresponding file-data from the SQL table "tbl_download".

Retrieving data from the database

How can the data be retrieved from the database, i.e. how can real images be published by retrieval of the image's binary data, or files belonging to a "Download" document be downloaded? To view a document all three SQL tables "tbl_metadata", "tbl_productinfo" and "tbl_download" (see Figure 2) must be queried. The following Z SQL Method "sql_getDocumentbyId" serves this purpose::

Z SQL Method "sql_getDocumentbyId"

Arguments:	obj_id

SELECT 
  a.*,
  b.imagename,
  c.filename
FROM 
  (tbl_metadata a LEFT JOIN tbl_productinfo b ON a.obj_id = b.obj_id)
                  LEFT JOIN tbl_download c ON a.obj_id = c.obj_id
WHERE  
  a.obj_id = <dtml-sqlvar obj_id type=int>

With the Z SQL method "sql_getDocumentbyId" I retrieve the meta-data of a document and the name of the image or file. To retrieve the binary data I will use another Z SQL Method, which I will create later.

Publishing Images

To publish an image I use a DTML Document "viewDocument":

(snippet)

<img src="query_image?img_id=<dtml-var obj_id >">

"query_image" is a DTML Method that calls the Z SQL Method "sql_getImageData" to retrieve the binary data of an image.
The DTML Method "query_image":

<dtml-var "sql_getImageData(image_id=img_id)[0][0]">
 

The Z SQL Method "sql_getImageData":
      
Arguments:	image_id

SELECT 
  imagedata
FROM
  productinfo
WHERE
     obj_id = <dtml-sqlvar image_id type=int>

Remember, the first string in the image data-field is "GIF" (see Fig. 3 above), so the browser knows to set the content-type to "image/gif". Calling the DTML Method "query_image" and passing, for example, img_id=8 to it, will display the image in Fig. 4, which has the object ID 8. For this reason, in the DTML Document "viewDocument", I can write: <img src="query_image?img_id=<dtml-var obj_id>"> to publish the image.

Downloading files

To download a file I need to have a Z SQL Method to retrieve the binary data of the data file I uploaded into the database table "tbl_download". I have called this Z SQL Method "sql_getFileData". In the DTML Document "viewDocument" I would like to have a hyperlink showing the filename. When I click on this hyperlink, I am prompted: "Save File as…".

In the DTML Document "viewDocument" there are the following lines:

<dtml-if filename>
<a href="downloads/<dtml-var filename>"><dtml-var filename></a> < dtml-else > no file </dtml-if>

Moving the mouse cursor over the hyperlink will show a URL in the status bar of your browser, for example:

http:///BinaryDataExample/downloads/myFile.tgz

This URL looks clear to knowledgeable end-users who will probably think that the file "myFile.tgz" comes from a global download archive ("downloads"), but YOU (as programmer) know that it comes from a database. What then is "downloads" ? The answer is given in the next few paragraphs.

The object "downloads" is a Python Script which returns the filename and the file-data so that clicking on a hyperlink brings up the "Save File as …" window containing the filename. Let's take a look at this Script:

Python Script "downloads"


. Click Here for the code example

First, this Python Script checks to see if a filename was passed to it, which means whether somebody has clicked on the hyperlink, for example <PATH>/downloads/myFile.tgz. The argument traverse_subpath splits up the pathname into a list containing the single object names in the URL path. The "traverse_subpath[0]" expression returns the last element in the URL path -- that is, the first element in the list, which is simply the filename (here: "myFile.tgz").

The filename is passed to the Z SQL method "sql_getFileData", which returns the file-data corresponding to a given filename. Since the filename is unique (see the last paragraph in the Section "How to get the connection string"), the proper file-data is always returned. In order to get a downloadable file, the script has to set the Content-Type to "application/octet-stream".

Note: The expression container.sql_getFileData( filename = filename )[0][0] will return the value in the image-data field which represents the binary data.

The expression container.sql_getFileData( filename = filename ) would return something like "<Shared.DC.ZRDB.Results.Results instance at 0x8aa111c>" which is an instance of the class Results. This instance is created whenever a Z SQL Method is called. The syntax used is as follows: container.ZSQL_Method( argument = filename )[index_of_the_record][index_of_the_column]

container.sql_getFileData( filename = filename )[first_record][imagedata]
The browser realizes the proper content type and chooses the application assigned to the file extension because of the first string in the binary field.

When I download a *.tgz or *.zip file a download window appears and the application assigned to the file extensions *.tgz or *.zip is displayed -- e.g., downloading the "myFile.tgz" file will show "WINZIP", or something similar.

Conclusion

This is one way of storing binary data files and combining them with the normal content coming from an SQL table. If you are looking for a way of storing your binary data in an SQL database and retrieving the data from the database tables to publish it you can try the method described in this article by downloading it here.


Nico Grubert: Nico Grubert: was born in Germany and grew up in a small village in eastern Germany. In 1997, he began computer science studies. After his University graduation he moved to Berlin to work with beehive. He also co-authored the Zope book "Zope: How to Build and Deliver Web Applications". When Nico's not thinking about Zope you can find him producing and playing electronic music.


shim
shim  ZopeMag is committed to bringing you the best in Zope Documentation. shim
shim


Home   Subscribe   FAQ   Contact   Write for us   Privacy Policy   Weekly News   PyZine   opensourcexperts.com  

Reproduction of material from any of ZopeMag's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 ZopeMag Zope/Plone hosting by Nidelven IT