################################################################ $RCSfile: README.txt,v $ Authors: Chip Morris and Craeg Strong, Ariel Partners LLC Philipp von Weitershausen, philiKON Valley $Date: 2005/09/06 02:08:59 $ ################################################################ Contents 1. Successor project to XMLTransform 2. Quick Start 3. Prerequisites 4. Description 5. Known Limitations 6. Contributions 7. Schema Migration 8. XSLT Processor Support Status 9. Notes 10. Unit Testing on Win32 Synopsis: ZopeXMLMethods 1.1.1 released ZopeXMLMethods provides methods to apply to Zope objects for XML/XSLT processing. XSLTMethod associates XSLT transformers with XML documents. It features file-system caching and works with many XML/XSLT libraries. Version 1.1.1 is upgraded for compatibility with the latest Zope (2.8.1) and 4Suite (CVS) versions. Successor Project to XMLTransform ZopeXMLMethods release 1.0 represents a fundamental change in paradigm. Rather than pointing directly to both the XML source and the XSLT source, an XSLTMethod now points at the XSLT source and is *applied* to the XML source like a python script or DTML method. This is good news, as it makes XSLTMethod much more Zope-ish and completely obviates the XML Transformer Registry. However, the down side is that there is no easy upgrade path from previous releases. Summary of changes - Removed DTML GUI pages, replaced with Zope Page Templates - Zope Page Templates now required - XMLTransform module: renamed to ZopeXMLMethods - XMLTransform class: enhanced and renamed to XSLTMethod - TransformerRegistry class: removed - FourSuite/Pyana/LibXslt/SabPythProcessor classes: upgraded to support newer releases, but otherwise unchanged - CacheManager class: enhanced to use the transformation path (URL) as the caching key Quick Start **Don't forget to read the Prerequisites section below!** ZopeXMLMethods consists of a set of methods that can be applied to Zope objects to perform various types of XML processing. In general each type of method is applied to a Zope object in the same manner that a standard DTML Method or Python Script might be applied. The only requirement for the source Zope object is that it must somehow produce XML. Currently, this XML must be in ASCII form, but in the future additional formats such as DOM or SAX events will be supported. ZopeXMLMethods includes a Cache Manager that is specialized to notice changes to the XML source files and to store cached contents in files in the filesystem, rather than the Zope object database. ZopeXMLMethods is the successor project to XMLTransform. Unfortunately it is not backward compatible due to the change in usage paradigm and increased project scope. Some upgrade hints are included in the documentation, however. The XMLTransform project is now considered obsolete. ZopeXMLMethods is now hosted on SourceForge. The project page is located "here":http://zopexmlmethods.sourceforge.net As of today, ZopeXMLMethods includes XSLTMethod, which enables Zope users to associate an XSLT transformer with an XML document that automatically renders the result of the transformation when called. It is applied to another Zope object in the same manner that a DTML Method or Python Script is applied. The XSLTMethod can behave like a number of standard Zope objects, so that the output of a transformation can be used in place of a normal Zope object. There are no constraints on the type of Zope objects used for the XML or XSLT. In fact, the content may cobbled together from multiple sources, as long as the final content may be obtained as well-formed XML from a single object for each. Future releases of ZopeXMLMethods will add additional methods to support additional XML standards such as XPath, XPointer, XMLSchema, RDF, XQuery, XUpdate etc. ZopeXMLMethods features a pluggable architecture that makes it possible to dynamically choose between different XML/XSLT processors at runtime. It currently works with any of the following: 4suite (0.11.1 and current), Gnome libxml2/libxslt, Pyana (a Python wrapper on top of Xalan/C), and SabPyth (a Python wrapper on top of Sablotron). The library is designed such that it should be relatively straightforward to support additional processors in future. It is even possible to ZSync from one Zope instance to another where the two Zope instances use *different* XML libraries. This makes it much easier to test or upgrade your XML processing library or to support heterogeneous platforms. The ZopeXMLMethods product adds two separate objects to the "Add" menu in the Zope Management Interface: - XSLT Method - XML Method Cache Manager The quickest way to get started with ZopeXMLMethods is to read the description below, then follow the directions in TUTORIAL.txt, then re-read the description below :-) The tutorial includes examples of increasing complexity that should cover most normal uses of the product. **Don't forget to read the Prerequisites section below!** ZopeXMLMethods features a pluggable architecture that makes it possible to dynamically choose between different XSLT Processors at runtime. It currently works with any of the following combinations: - "PyXML 0.6.6":http://www.sourceforge.net/projects/pyxml and "4Suite 0.11.1":http://www.4suite.org/. For me, on either my Win2K machine or my Red Hat Linux 8.0 machine, that means downloading and installing the following:: PyXML-0.6.6.tar.gz 4Suite-0.11.1.tar.gz **DON'T FORGET TO SPECIFY THE --without-xslt --without-xpath OPTIONS FOR PYXML** - "PyXML 0.8.2":http://www.sourceforge.net/projects/pyxml/ and "4Suite 1.0a":ftp://ftp.4suite.org/pub/4Suite/ For me, on either my Win2K machine or my Red Hat Linux 8.0 machine, that means downloading and installing the following:: PyXML-0.8.2.tar.gz 4Suite-1.0a1.tar.gz **DON'T FORGET TO SPECIFY THE --without-xpath OPTION FOR PYXML** Note that this release of ZopeXMLMethods is *not* compatible with 4Suite 0.12.0a3, but it should be compatible with the upcoming 1.0 beta and final releases. - "libxml2 2.5.5":http://www.xmlsoft.org/ and "libxslt 1.0.28":http://www.xmlsoft.org/ (*Python* bindings). For me, on my Red Hat Linux 8.0 machine, that means downloading and installing the following:: libxml2-2.5.5-1.i386.rpm libxml2-python-2.5.5-1.i386.rpm libxslt-1.0.28-1.i386.rpm libxslt-python-1.0.28-1.i386.rpm "Here":http://www.zlatkovic.com/projects/libxml/index.html is a site with a Win2K port of the base libraries, and you can get the python bindings "here":http://users.skynet.be/sbi/libxml-python/ My efforts at installing libxslt on windoze have been unsuccessful so far. *If someone has successfully set up libxslt on Win2K, please email me the recipe and I will update this document.* - "Pyana 0.6":http://sourceforge.net/projects/pyana/. This is a python wrapper around the apache XML parser xercesC++, version 1.4 and XSLT processor xalanC++, version 2.1 **These should not be confused with the Java products xalanJ and xercesJ. They are something totally different** For me, on my Red Hat Linux 8.0 machine, that means downloading and installing the following:: Pyana-0.6.0.linux-i686-extras.tar.gz Pyana-0.6.0.linux-i686-py2.1.tar.gz On my Win2K machine, that means downloading and installing:: Pyana-0.6.0.win32-py2.1.exe - "SabPyth 0.52":http://www.gingerall.com/charlie/ga/xml/x_sabpyth.xml. This is a python wrapper around the Sablotron XSLT processor, version 0.97. For me, on my Red Hat Linux 8.0 machine, that means downloading and installing the following:: js-1.5rc4-2.i386.rpm sablotron-0.97-1.i386.rpm sablotron-devel-0.97-1.i386.rpm Sab-pyth-0.52.linux-i686.tar.gz Unfortunately I have not been able to get Sabloton working on Red Hat Linux 8.0. However, I did get it working on Win2K, by downloading and installing the following:: Sablot-Win-0.97-FullPack.zip Sab-pyth-0.52-win32-py2.1.exe expat_win32bin_1_95_6.exe (from expat.sourceforge.net, for libexpat.dll) One should be able to get ZopeXMLMethods working with a Java-based XSLT processor via XML-RPC without too much trouble. See 'ZOPE\lib\python\Products\ZopeXMLMethods\IXSLTProcessor.py' for more details. Contributions are greatfully accepted. Support Open Source! Prerequisites This product requires the presence of at least one XSLT processor. It features a pluggable architecture that makes it possible to dynamically choose between different XSLT Processors at runtime. Today, the product offers support for 4Suite, Pyana, SabPyth, and GNOME libxml2/libxslt out of the box, but it should be straightforward to add support for another library, if your favorite is not on that list. You can either do it yourself, or ask for help on the ZopeXMLMethods developers list at mailto:zopexmlmethods-devel@lists.sourceforge.net Java-based XSLT processors can be supported as well (for example, "Saxon":http://saxon.sourceforge.net or "XalanJ":http://xml.apache.org/xalan-j) via XML-RPC, perhaps using EIONET's "XMLRPC":http://www.zope.org/Members/EIONET/XMLRPC product. This requires a little extra work and more maintenance, but shouldn't be too bad. Supporting URI resolution to local Zope resources might be difficult, however. Below are some quick instructions for how to setup the various alternatives. *Please be sure to read the installation instructions for the package you are installing.* ZopeXMLMethods should work fine for Zope releases 2.4 and above, provided Page Templates are installed. We have done most of our testing on release 2.5.1 and 2.6.1 under Linux. 1. Install Zope 2. Ensure that you are starting Zope with *at least two threads* (by default Zope starts with 4 threads, unless you change it via the -t option) 3. Install your XML/XSLT processor libraries 4. Install ZopeXMLMethods (see 'ZOPE\lib\python\Products\ZopeXMLMethods\INSTALL.txt') 5. Ensure that ZopeXMLMethods is registered to use the particular XSLT processor you wish (if you installed more than one). You can use the automated test suite to check the installation. Here's what you do: 1. Download and install "ZopeTestCase":http://www.zope.org/Members/shh/ZopeTestCase/ carefully, as per instructions. Be sure to install it into the 'lib/python/Testing' area, *not* the 'lib/python/Products' area! We used ZopeTestCase version 0.6.2 2. From a shell or DOS window prompt, do the following, where 'ZOPE' represents the directory in which you installed Zope. UNIX users substitute '/' for '\':: cd ZOPE\lib\python\Products\ZopeXMLMethods\tests ZOPE\bin\python alltests.py 3. As long as you are running a version of 4Suite, you should see lots of messages followed by 'OK'. That means all 45 tests ran successfully. You are in, baby!:: Ran 45 tests in 14.712s OK If you are running libxslt, several of the testcases will timeout, but it should be all done within a minute or two. you should see lots of messages followed by 'FAILED'. Seven testcases fail, but that is expected, because the current version of libxslt does not yet offer support for URI resolver hooks in Python. But 38 out of 45 ain't bad!:: Ran 45 tests in 253.918s FAILED (failures=7) If you are running Pyana on Linux, you should see lots of messages followed by 'FAILED'. Seven testcases fail, but that is expected, because the current version of Pyana/Linux does not yet offer support for URI resolver hooks in Python. But 38 out of 45 ain't bad!:: Ran 45 tests in 15.294s FAILED (failures=7) If you are running Sablotron on Win32, you should see lots of messages followed by 'FAILED'. Three testcases fail, but that is expected, because the current version of Sablotron does not yet offer support for XML Catalogs. But 42 out of 45 ain't bad!:: Ran 45 tests in 13.259s FAILED (failures=3) Description XSLTMethod An XSLTMethod is a Zope object that links an XML document to a desired XSLT script. The XSLTMethod automatically runs the XSLT transformation and renders the results when accessed through DTML or page templates or through the web. An XSLTMethod represents a single XSLT stylesheet. By applying the XSLTMethod to an XML source object, you apply the underlying XSLT. However, an XSLTMethod object does not directly contain either the XML source document nor the XSLT transformer. Instead, it obtains the XSLT transformer from a Zope object whose ID is recorded as a property. It is *applied* to another Zope object that provides the XML source in the same way that a PythonScript or DTMLMethod might be applied to a Zope object, (i.e. via acquisition). In this way, an XSLTMethod object represents an XSLT script that can be applied to any number of XML source objects. Any number of XSLTMethod objects can be applied to a single XML source object, and a single XSLTMethod can be applied to any number of XML source objects. This flexibility differentiates ZopeXMLMethods from other XML/XSLT-based Zope products, in that it recognizes the fact that there is often a many to many relationship between XML documents and XSLT transformers. NOTE: it is possible to hard-code the name of a default XSLT stylesheet to use inside of an XML document using the processing instruction (as described in this "W3C":http://www.w3.org/TR/xml-stylesheet/ recommendation) While there are various pros and cons for using the xml-stylesheet processing instruction in this way, as of today ZopeXMLMethods provides *no* particular support for this approach. The XML source to which an XSLTMethod is applied can come from nearly anywhere, for example: - Content retrieved from an SQL database and converted to XML format - A DTMLDocument that is an XML file, with portions grabbed from other objects via DTML tags. - An XMLFile instance (XMLFile is part of the "XMLKit":http://www.zope.org/Members/haqa/XMLKit Zope product) - A CVSFile object that points to an external XML document in a CVS repository (CVSFile is part of the "CVSFile":http://www.zope.org/Members/arielpartners/CVSFile Zope product) - An ExternalFile object that points to an external XML document in the file system (ExternalFile is part of the "ExternalFile":http://www.zope.org/Members/arielpartners/ExternalFile Zope product) - A Page Template, a DTML Method, A File object, etc. - A ParsedXML object (ParsedXML is part of the "ParsedXML":http://www.zope.org/Members/karl/ParsedXML/ParsedXML Zope product) - A Zope File object In this way, XSLTMethod can be used to form "pipelines," where the output of one object becomes the input of the next. This approach is more modular: each kind of object performs only one task, and can be tested and/or replaced on an individual basis. Refer to the pipeline examples in the tutorial for more details. Obtaining the XSLT XSLTMethod obtains the XSLT transformer via acquisition. Here are some examples:: XSLTMethod XSLT XSLT Transformer ID property /root/myXForm /root/myXSLT myXSLT OK XSLTMethod XSLT XSLT Transformer ID property /root/a/b/c/d/e/myXForm /root/myXSLT myXSLT OK XSLTMethod XSLT XSLT Transformer ID property /root/myXForm /root/a/b/c/d/myXSLT a/b/c/d/myXSLT OK XSLTMethod XSLT XSLT Transformer ID property /root/myXForm /root/a/myXSLT myXSLT NOT OK: myXForm CANNOT FIND myXSLT Applying it to the XML source An XSLTMethod is applied to an XML object like a DTMLMethod. Here are some examples:: XSLTMethod XML Browser /root/myXForm /root/myXML http://localhost:8080/root/myXml/myXForm OK XSLTMethod XML Browser /root/myXForm /root/a/b/c/myXML http://localhost:8080/root/a/b/c/myXml/myXForm OK XSLTMethod XML Browser /root/a/b/c/myXForm /root/myXML http://localhost:8080/root/myXml/a/b/c/myXForm OK As you can see, acquisition is incredibly powerful. Use it wisely :) Rendering The Output: behave_like An XSLTMethod can behave_like any number of standard Zope objects, including: - DTMLDocument - PageTemplate - DTMLMethod - File - ParsedXML For example, if an XML source file was transformed via XSLT to HTML, and that HTML included some TAL attributes, that is, it was actually a Zope Page Template, the templates would automatically be resolved assuming the 'behave_like' was set to "PageTemplate". Refer to the examples in the tutorial for more details. CacheManager XML Method Cache Manager is a Zope object that caches the results of processing done by a set of xxxMethod objects (only XSLTMethod as of today). A particular Cache Manager serves the set of xxxMethod instances within its parent folder and all subfolders (its clients). In contrast to Zope's built-in caching objects, XML Method Cache Manager is specialized for use with xxxMethod objects. That is, it will notice when either an XML source object or XSLT transformer object have changed, and invalidate the cache file representing the results of that particular transformation. Also, the CacheManager stores cached content in the filesystem, *not* in the Zope object database. By default, XSLTMethod never caches its results. However, XSLT processing is expensive. In the real world, there is generally no need to transform the content every time except in rare circumstances such as when the XML content is retrieved from a database on the fly and changes dynamically. For all but these rare circumstances, caching can improve performance considerably. This is what CacheManager is for. An xxxMethod object searches for a CacheManager instance via acquisition. That is, it looks for it in its parent folder, than its parent's parent, on up the tree to the root folder, and stops when it finds the first one. A CacheManager is a purely optional thing. Removing one will *never* break anything -- it is purely an optimization, because XSLT transformations (or other XML processing) can be expensive. Every xxxMethod object has a "caching behavior" property. This controls whether caching should be done *if* a CacheManager is present. If a Cache Manager is *not* present, caching will never be done, regardless of the setting of the property. The property is there to guard against those situations where caching is *never* appropriate, such as when dynamic content is obtained from a database as described above. If an xxxMethod "caching" property is set to "on" *and* a CacheManager is present, caching will be done. Assuming the conditions above, once a CacheManager has stored the results of a transformation on behalf of an XSLTMethod object, the XSLTMethod will thereafter always retrieve its results from the CacheManager rather than re-running the transformation unless one of the following occurs: 1. The source XML document to which it is applied is modified 2. It is applied to a different source XML document. 3. The XSLT transformer is points to is modified 4. Its XSLT transformer reference is changed to point to a different transformer altogether. Typically, cached content is stored in the system temporary directory, 'c:\tmp' on windows platforms and '/tmp' on UNIX platforms. The placement of cache files is controlled by the Cache Manager's 'cachePrefix' property. The CacheManager includes some convenience functionality in the Cache tab of its Zope Management Interface. A CacheManager operates on the set of method objects that exist in its containing folder and all subfolders: its clients. From the Cache tab, it is possible to perform batch operations on all of its clients, such as: 1. Turn caching on or off 2. List the filenames for all of the currently cached content 3. Remove all the files representing the currently cached content Note that the files that contain the cached content may be manually removed from the disk without any ill effects. CacheManager is robust enough to notice this and simply re-cache (replace) the missing files the next time the relevant content is requested. XSLT Transformers We typically organize our XSLT transformers in a hierarchy, for convenience. It is useful to regard each transformer as belonging to a *family* of transformers. The family is determined by the format of the XML file to be transformed. For example, there might be a DocBook family that understands the popular XML DocBook format, a "Resume" family that understands a homegrown Resume format, and others. For each family, there may be several individual transformers, one for each kind of output desired. Standard examples of different outputs include XML (to convert XML of one format into another), browseable HTML, printable HTML, Structured Text (STX), FO (formatting objects), VXML, WAP, etc. Coupled with a FO processor like "FOP":xml.apache.org/fop, one could churn out many more output types such as PDF, PCL, PS, AWT, etc. Typically, we create a subfolder for each family, then create a transformer for output type as an object within the subfolder. The transformers are then arranged in an intuitive way, such as: - resume/html - resume/fo - article/html - article/fo But you could put them all in a base folder: - resumeToHTML - ArticleToSTX Or even have multiple sublevels - Role/Print/HTML - Role/Print/FO - Role/Browse/STX - Article/Print/HTML The possibilities are endless! Our advice is to start simple and add structure as you add more and more transformers. There are no requirements on transformer files other than that they be well-formed XSLT documents. They need not produce stand-alone HTML pages (pages with tags), but can produce HTML fragments, XML fragments, or plain text output. XSL Parameters Parameters may be passed to the XSLT transformation of an XSLTMethod. If parameters are needed, create a property in the current context named *XSLparameters*. This property should be of type "lines", that is, it should return a (Python) sequence of strings. It may be defined on the XSLTMethod object itself or acquired from the XSLTMethod's context. Each name on the list is itself looked up in the current context. If its value is a scalar, then the pair 'name=value' will be supplied to the XSL transformer as a parameter specification. If the value is an object, then the pair 'name=url' is returned where url is the absolute URL of the object. For example, suppose the *XSLparameters* value is '["properties", "color"]'. Suppose Zope object with id='properties' exists in the context of the current XSLTMethod, say at 'localhost:8080/test/foobar'. Suppose the XSLTMethod itself has a property named 'color' with value 'blue'. Then the following parameters will be "passed on the command line" to the XSLT transformer:: properties=localhost:8080/test/foobar color=blue See the tutorial for some examples of how parameters might be used. URN namespaces and reusable content In certain circumstances, it is desirable to "genericize" content, such that it is independent of the particular context in which it is currently being used. For example, some documents may include portions of other documents. Certain documents may be created out of reusable "boilerplate" pieces. There may be standard legal clauses, headers/footers, or other pieces of content that are used in multiple places. Even XSLT transformers themselves might be created out of reusable pieces (for example, a family of XSLT transformers with multiple output flavors might include several reusable templates). In circumstances such as the above, URI resolution may be used to avoid hardcoding document references. For example, the URN "urn:acme:legal/header_boilerplate" might refer to a header that is included in all legal documents. URI resolvers and XML catalog technology was invented to provide a way to map such URNs to actual URLs. By plugging in a different map, you can reuse the URN in many different situations. Fortunately, the 4Suite XML libraries support both custom URI resolution and XML Catalogs. URNs consist of two pieces, an NID and an NSS:: NID: namespace ID NSS: namespace specific string For example:: URN: urn:acme:legal/header_boilerplate NID: acme NSS: legal/header_boilerplate In order to do URN lookup, there must be a string variable named *URNnamespaces* in the context in which a new XSLTMethod is created. It may be defined on the XSLTMethod object itself or acquired from the XSLTMethod's context. This variable defines the different namespaces that can be used in URNs. URNs may be used to lookup XSLT transformers or XML documents using XPath expressions, for example in the XSLT 'document()' function, or via 'xsl:import()' or 'xsl:include()'. The namespaces themselves can either be the names of folders in the ZODB, or string properties that give base URIs. If the namespace is the name of a ZODB folder, the NSS will be interpreted as a list of Zope contexts relative to the folder. In the example above, the URNnamespaces property would contain a single string "acme". You would create a folder obtainable via acquisition from the XSLTMethod called "acme" with a subfolder called "legal." The "legal" subfolder would contain an object with the ID "header_boilerplate.":: Root_Folder/acme/legal/header_boilerplate See the tutorial for some examples of URN namespaces in action. Known Limitations There are issues relating to caching the intermediate results of XSLT pipelines. The problem is that cached results are associated with URLs, but every intermediate result in an XSLT pipeline gets the same URL. That means that each intermediate result is written to the same cache file, on top of the previous one. This can lead to some strange results. For example, consider a ZPT with the following content::