|
|
||||||||||||||||||
|
|
||||||||||||||||||
![]() |
![]() |
Issue 5 - Revision 9 / October 4, 2003
|
|||
|
Building a Community Site - Step-by-Step (Part II) - - - - - - - - - - - - By Milos Prudek | August 27, 2003 Overview This is the second article in a series about practical experience with development of an advanced community site in Zope. In this article I would like to present short snippets of code that solve various tasks typical for community sites. Mirroring WebsitesNew articles are hard to come by if the community in question is new. orl.cz started with 5 members, and only one of them wrote new articles. The idea of mirroring external content presented itself, inspired mostly by the notorious Google Cache. (It is notorious with Web developers because when you search with Google, next to each link there is an additional link called “Cached”. This link contains the page even if the original page was deleted from the original site long before that. People have been upset at the fact that pages they have deleted from their own sites are still available on Google.) It was decided that the mirrored content would be stored indefinitely; therefore it could not rely on the availability of the original Web page. We needed a routine that takes a URL, grabs the Web page at that URL including images, and modifies IMG SRC tags so that they point to local copies of these images. Although such a routine could be written from scratch in Python, there are proven and established programs available for free, such as pavuk or wget, that can be used to serve this purpose. We chose wget for two reasons: it did exactly what was needed for this task, and its executable size is much smaller than pavuk's. Size matters, because wget would be loaded into memory each time it was used, consuming precious CPU cycles. Venerable wget is not able to store data in the ZODB. Instead, it stores all cached Web pages in the local filesystem. Therefore. it would be great if Zope could safely display files located in the filesystem. Fortunately, Zope can do this thanks to an external product: The product LocalFS for Zope allows the local file directory to be linked to the ZODB. Originally I wanted to have a single LocalFS instance in the Zope root that would point to a directory containing all the mirrored Web pages. This approach soon manifested several pitfalls related to URL manipulation. Mirrored articles needed to appear in arbitrary folders, and the <BASE HREF> HTML tag added automatically by Zope to most pages proved to be a hard obstacle in this context. Features available in HTML 4.0 such as inline frames and the <object> HTML tag were considered as a workaround for the BASE HREF problem, but as they added internal scrollbars they were deemed unsatisfactory. Reluctantly, it was decided that creating a new LocalFS instance for each mirrored Web page would be the most transparent solution. Since it is not possible to use Python Scripts to make changes in the filesystem, the whole mirroring process takes place in a small External Method, called by a Python Script. This script, not displayed here in full for the sake of brevity, creates a Folder instance with user attributes and it populates these attributes with values from a dictionary named Dct. One of the Dct values is a mirrored URL specification in a dictionary key named murl, as illustrated in the following snippet.
The first line simply calls an External Method with an murl parameter. This External Method calculates a relative URL, which is needed elsewhere in the Python Script, and incremental placement of the mirrored Website for the filesystem in saveDir. It also returns the whole output of wget, which could potentially be processed, to see if the operation was free of errors. The second line simply creates a new LocalFS instance with appropriate parameters. The mirrorWeb External Method contains the single subroutine mirror:
Let me walk you through this External Method. wget stores all mirrored Websites in directories under savePath, as defined in line 3. These directories are named sequentially, and lines 4-11 calculate the next available directory name. The process is uneventful and easy to follow. The command line for wget, constructed in line 13, deserves some justification for the plentitude of parameters employed.
Although external command execution is possible with a simpler call to os.system(), the os.popen command family provides easier access to standard output. The os.popen3() call used in line 15 is the best way to run external commands from Python, especially if you are interested in the error output. And I bet that you are, because stupid wget (as of version 1.8.2) unfortunately feeds both standard messages and logging to stderr! That's why line 17 concatenates both outputs. Now on to the relURL calculation. relURL, or relative URL, is a URL that we can store in a document to provide a hyperlink to our mirrored content. If wget did not rename mirrored files, we could have derived a relative URL using a simple slice. Alas, things aren't that easy. wget may rename mirrored content rather wildly, thanks to the -k and E parameters (see above). Rather than trying to guess the wget decision we simply analyze the wget log file in lines 18-20. Line 18 extracts the first mirrored filename, but also some unnecessary cruft - everything up to the next “->” string, possibly quite a few lines of text. Line 19 goes one step further and makes sure that nothing beyond the first line remains in the analyzed log fragment. To better understand these steps you can run wget manually and watch its logging output. Line 20 has as many nested expressions as one can decently use without compromising readability, at least in my opinion. Rather than waste space explaining it, let me show you this line in action using the interactive Python interpreter: Snippet 3
>>> chunk='"here3/detail.htm?id=30750.html" [1]'
>>> relURL = chunk[chunk.rfind('/')+1:chunk.rfind('"')]
>>> relURL
'detail.htm?id=30750.html'
Do not hesitate to stress-test the relURL calculation in the Python interpreter by trying different chunk values. To recap, given murl = http://www.yahoo.com/one/two/, the final chunk variable will be calculated as chunk = "4/index.html" [1]; consequently, relURL = “index.html”. And this should work for any relative URL that wget creates. Rating articlesCommunity sites often provide a way to rate an article. Open communities must use cookies to prevent cheating (voting multiple times), and cookies, of course, can be defeated. In a closed community, where only members may rate articles, cookies are not needed and the server can keep a list of people who rated the article in an 'article' property called, say, “examiners”, with type :lines. A list of examiners may be displayed by the rating form value_add_form.
... where glean_name_from_login() is a trivial two-line Python Script that simply pulls titles, name and surname from an SQL table via a ZSQL method. Note the proper way to check whether the 'examiners' property is empty: _.len(examiners[0])>0. The form then prompts for new_value in a series of radio buttons. This is processed by the following value_add DTML Method:
...where value_sum and value_count are 'article' properties which may be used to calculate an average rating (in percent) by simple division value_sum/value_count*100. The link to the rating form value_add_form should not be visible if a user has already rated the article. A simple in operator in the page that provides the link to the value_add_form Method will take care of this:
However, from the security viewpoint it is not enough to hide the rating form. A malicious registered user could conceivably try to access the value_add URL directly. Enriching Snippet 5 with a proper <dtml-if> to prevent this is left as an exercise for the reader. Enhanced searchThe Z Search Interface instance provides basic search capability for any Zope-searchable object (such as ZSQL methods and ZCatalogs). However, the Z Search Interface is nothing more than a factory that creates two DTML Methods (or two ZPT templates in recent Zope versions) which usually need to be customized by hand. Default Z Search Interface methods generate a form and a report. The report is the form's action. A Web user must use the 'Back' button of their browser if they want to repeat their search with different values. Users resent extra clicks, and many Websites therefore now implement results together with a new form for entering a new search. This could be called a “recursive” search form. The following method, search, written from scratch, illustrates a recursive search form implemented with the form's action hyperlinking back to the form.
Line 3 is necessary in this recursive form, because without it Zope would throw the “global name 'keywords' is not defined” exception in line 10. The ORL.CZ Web site stores articles published by the Web site users in a ZClass object called "Article". Each Article has a :lines property called keywords. A :lines property is like a mathematical set of words. Each Article can have multiple keywords, and the same keyword may appear in multiple Articles. The keywords variable in line 10 is used to query a ZCatalog instance Catalog_Art, which has a KeywordIndex for the keywords property. The uniqueValuesFor() function is an ideal tool to dynamically extract a list of current keywords and thus form the set of <option ...> HTML tags. Line 11 makes sure that the item selected in the previous call is also displayed as selected in the current call of this DTML Method. The DTML Method search_rep called in line 20 is not displayed here, since it is simply the general Report method generated by the Z Search Interface. Line 19 stipulates that search_rep is not called when the search Method runs for the first time. Otherwise it would display all Articles, which is unnecessary since the user has not entered any search data yet.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZopeMag is committed to bringing you the best in Zope Documentation. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
Reproduction of material from any of ZopeMag's pages without prior written permission is strictly prohibited. Copyright 2003 - 2005 ZopeMag |
|