Case study: EuskalKultura.com. Improving the performance of a Plone Site

September 17, 2008 by erral

We published EuskalKultura.com in early June. EuskalKultura is a website for Basque people living far away from is homeland, mainly in America (both southern and northern America). It’s just a Plone 2.5 with some custom products such as birthday greetings and other one or two custom archetypes with 3 or 4 fields.

But the main work before publishing it was to import the information of the old website. The old site was a PHP based website with lots and lots of items (mainly news items but also events, restaurant information, interviews, …), all of then multilingual, mainly in Basque and Spanish. So we have to write some scripts to pull the data from the MySQL database and create the content in Plone throug invokeFactory, with the usual UnicodeDecodeErrors :)

After testing it we managed to import all the data and create all that newsitems and events, all of them properly linked thanks to LinguaPlone to have fully translated website.

Short after publishing the website, we discovered that it was consuming a lot of RAM. We hosted it in a memory limited account in a FreeBSD account at HighSpeedRails, but we have both excessive use of RAM and constant restarts. HighSpeedRails provides some scripts to control the memory consumption of your Zope applications, and restarts it when passes the established limit. We also tried to upgrade the memory limit, and put it higher, but our Plone started to eat all available memory in the hosting service.

So, we decided to reproduce the situation locally, fix it if possible and reproduce again the fixed website.

Thanks to zc.buildout, reproducing the environment was quite easy, I just had to checkout the corresponding buildout from our svn server and run ./bin/buildout. We downloaded the 1.5 GB size Data.fs and Lur wrote a python script to try to reproduce server’s load parsing Apache logs. In the meantime, I wrote a harder test-plan, using also Apache logs, to use it with JMeter, taking many ideas from the Plone Performance Sprint 2007.

In our initial tests, we easily got our Plone site consume 700 MB (and growing) of RAM after running it for half an our (or less).

It was our, and client’s idea, to be able to select the content featured at the home, so we used CompositePack to get it. We created a new layout for it, and write a browser view to avoid featured newsitems appear in the news listing. The code under the hood, was proved to be totally inefficient, and after some analysis of the website, we realized that the home page was automatic, I mean, our client wasn’t using that feature.-

Our client, also used newsitems (all of them saved inside a folder) to create diferent kind of newsitems: short articles, featured newsitems, common newsitems, … and wanted to show in the home page all newsitems except the ones keyworded as XXXX. Again, the code to get that newsitem was highly inneficient.

So, we decided to get rid of CompositePack for the home page, and to use AdvancedQuery to be able to make not queries to the Plone Catalog.

Those minor changes, proved to be great, because the navigation on the website improved a lot, it was quite faster, and the response time of the home paged decreased notably.

Another bottleneck was found in the keyword portlet. Our client uses keywords to tag news items and events, and wanted a way to have a list of all keywords used in news items and events, each one in the corresponding section. We fastly created a view getting those keywords from the catalog (with a catalog query and a inneficient algorithm :) ), and using the strategy copied from Quills, we created a traversal adapter to have a view with all the news items keyworded with the selected one. The process of getting the list of needed keywords was slow, so we changed it with a static list of keywords, updated on daily-basis through a cronjob.

We also moved all news items stored in plain Plone folders to Plone Large Folders (based on BTrees and disabled out-of-the-box in Plone 2.5). We had no ordering requirements in the news items or events folder, because their main view is a properly configured Topic/Collection, so the change wasn’t dramatic, but time-consuming. It took hours to cut-and-paste and recatalog all that news item and events.

We also took a look on all customized templates, and made some improvements to avoid common Plone problems: avoid using getObject once and again, use existing views instead of home-made-scripts, …

Finaly, we also removed from our buildout some unneeded products, improved the packaging of our base products and put the zodb-object-cache option to the default (5000 objects).

After running again the python script and the JMeter test plan together, we found that our Plone site wasn’t consuming more than 300 MB of RAM, although having the script running all the night. Incredible!!!

We made some more tests, changin the zodb-object-cache option of our Zope instance, to a more significant value (following Hector Velarde’s advice). We tried with 10000, 20000, 30000 or 50000, but we didn’t get any improvement neither in memory consumption nor in CPU usage according to top (OK, perhaps this is not the way to monitor a process, but that’s the way we were taught ;) ), so we decided to set the value to 5000.

So, we downloaded the live Data.fs on friday (thanks cron), run the news items and events cut-and-paste scripts on saturday and sunday, make all the changes on monday and re-upload the Data.fs again on monday midnight. On tuesday, we just made the last configuration changes, and got it running.

After 8 hours running, it’s just consuming 340 MB of RAM, some more than in our local tests, but far away from the 700 MB-after-5-minutes, we had in the previous situation.

My Plone doesn’t show translated msgids located in a locales directory !!

April 16, 2008 by erral

In a project I’m currently working on, I have written a product following the new way of writing Plone products: creating an egg and putting its translations in a locales directory.

After translating the msgids, creating the po files and reading once and again the Maurits’ post although I’m on Plone 2.5, I was seeing always the same translated strings, always in the language my browser was configured in, no matter I changed the language using Plone’s language selector.

After googling a bit, I found this Five and i18n tutorial in which Phillip says:

The goal of the sprint was to allow both fallback translation services (PTS, Localizer) and Zope 3 translation domains come to the same conclusion regarding which language should be chosen. The use case is that you have a site running Localizer or PTS and a bunch of “old” products using either one of those for translation. Now you have an additional, “new” Five-based product using Zope 3 translation domains. Most of the time, a page contains user messages from more than one domain, so you would all domains be translated to the same language.

So, I just have to add an overrides.zcml file to my product (and load it in my buildout), with the following content:

<adapter
for="zope.publisher.interfaces.http.IHTTPRequest"
provides="zope.i18n.interfaces.IUserPreferredLanguages"
factory="Products.Five.i18n.PTSLanguages"
/>

Now, Five and Zope3 components of my Plone site, negotiate the language just like PTS does, so I have correctly translated msgids in my site.

Painless big XML file parsing in Python

March 5, 2008 by codesyntax

Parsing big (some 60 MB is already big for me) XML files in Python was a bit painful until now. I used to import minidom and sometimes sax.

The problem with minidom is that the whole XML file loads into memory and you can’t do anything else until you process the file. If you do it with sax, you have to work detecting every element start and end.

I learnt today a better solution from Erral: using lxml . Just a couple of lines so that you see how can we convert an XML file into a list of dicts. Just like:

from lxml import etree
coords = etree.parse("/path/to/your/xml/file").getroot()
coords_list = []
for coord in coords:
    this = {}
    for child in coord.getchildren():
        this[child.tag] = child.text
        coords_list.append(this)

Quite straightforward, isn’t it? It’s already in Kelpi: XML to list of dict parsing

(posted by Gari)

Paster is your friend

February 26, 2008 by erral

If you are starting a new Plone project, paster is definetely your friend. Your necesary and helpful friend.paster can be seen as a code generation tool, and perhaps it can be seen as something to avoid, but it helps, and helps a lot, to write all that boilerplate code you need to write each time you start a Plone project, such as the buildout.cfg file (now that zc.buildout seems to be the de facto standard to manage both development and deployment of Plone projects), skin, css and javascript registratin in a so called theme product, or new profile and content-type registration ina content-type or archetype product.

One of the thing I like the most from paster is the ‘–svn-repository’ option. Before using paste, I found myself many times importing incomplete projects to our svn repository or deleting and later checkouting products. Now, each time I create a new product, or egg, I only have to add a ‘–svn-repository=http://url-to-my-svn’ and I’m done. paster creates the trunk, branches and tags structure, it checkouts the trunk, adds the files, and everything is set to start working.

paster would be nothing for Plone if ZopeSkel wouldn’t exist. ZopeSkel is a collection of paster templates you can use to create your producs. For example, there is a template to create a theme product or a buildout file or an archetypes based product. To use is, you just have to invoke paster with the name of the template:

erral@lindari:/tmp$ paster create -t plone3_buildout myproject --svn-repository=http://myurl

Answer just a couple of questions and you’ll have a ready-to-go buildout configuration file in your repository.

The archetype template and the support of local ZopeSkel commands (as explained by Mustapha helps you to create a new Archetypes based content-type. But not only it creates the base boilerplate code. Thanks to the local-command support, you can add new content-types, new portlets or even new browser views, just anytime after creating the project. You can add today a browser view, and tomorrow a new content-type. You just have to worry to invoke the correct `paster`command, everything else (add configure.zcml lines, new Generic Setup profile configuration files, etc) is done by paster:

erral@lindari:/tmp$ paster create -t archetype my.content
...
erral@lindari:/tmp$ cd my.content/
erral@lindari:/tmp$ ls
erral@lindari:/tmp/my.content$ paster addcontent --list
Available templates:
  atschema:     A handy AT schema builder
  contenttype:  A content type skeleton
  portlet:      A Plone 3 portlet
  view:         A browser view skeleton
  zcmlmeta:     A ZCML meta directive skeleton
erral@lindari:/tmp/my.content$ ls my/content/
browser    configure.zcml  __init__.py    portlets  tests
config.py  content         interfaces.py  profiles  tests.py
erral@lindari:/tmp/my.content$ paster addcontent contenttype
...
erral@lindari:/tmp/my.content$ ls my/content/content
configure.zcml  __init__.py  mynewcontenttype.py

Awesome !!!

Many people don’t like code generation tools. I don’t know I should call paster a code generation tool or a helpful-boilerplate-writing-avoider-tool. Is something you will want to use after trying it for the first time.

Using Plone4ArtistsAudio in a zc.buildout based Plone installation

January 22, 2008 by erral

For two recently launched websites (Karkara.com and Eibar.org) I have used Plone4ArtistsAudio, to easily create an audio album with podcasting capabilities without creating a new content-type. P4AAudio, is a product for Plone that exposes some python libraries to re-use common Plone content-types, like Files, to use them like an audio album.

For example, Gari has in his blog a folder called podcast, where he uploads his radio programs broadcasted by Euskadi Irratia every two weeks. I added Plone4ArtistsAudio to this Plone installation and enabled media in this folder, and only in this folder, and all audio files uploaded like Plone standard files were audio enhanced automagically to have this pretty look and feel.

Gari’s podcast folder

This is pretty cool: I had to add no custom content-type (so, no new portal types) to Plone. Some zope3 concepts like interfaces, adaptation or event subscribers are used by Plone4ArtistsAudio to create this product.

But everything has its cons. The installation of Plone4ArtistsAudio was a pain. In our latest projects we have used zc.buildout to deploy our sites following Martin’s tutorial on Plone.org and many other suggestions from Plone community.

Plone4ArtistsAudio is just a bundle of python modules p4a.audio, p4a.ploneaudio, p4a.common, p4a.z2utils and plone.app.form. It ships all these modules inside a folder called pythonlib and it add this folder to the sys.path when loading.

For some reason, this way of including the modules in the PYTHONPATH is not compatible with the buildout way of working, because when starting, zope complained about plone.app.form not being found.

After some searching on the Internet, I found that something similar happened with PloneGetPaid but I couldn’t fix it. So, after some help by Tim Terlegård, I decided to leave Plone4ArtistsAudio in my products folder, but add the libraries it ships as develop-eggs. Be careful, you cannot add this libraries directly as installable eggs from the PyPI, because some of them are marked to be zip_safe eggs, which aren’t because they need the zcml files inside them loaded by Zope.

So, my buildout.cfg file is now something like this:

[buildout]
newest = false
index = http://download.zope.org/ppix
versions = versions

parts =
    plone
    zope2
    productdistros
    cachefu
    instance zopepyeggs =
    elementtree
    p4a.audio
    p4a.common
    p4a.fileimage
    p4a.ploneaudio
    p4a.z2utils
    plone.app.form

develop =
    src/p4a.audio
    src/p4a.common
    src/p4a.fileimage
    src/p4a.ploneaudio
    src/p4a.z2utils
    src/plone.app.form  [...]

And svn:externals set to the following URLs inside src directory:

p4a.audio http://www.plone4artists.org/svn/projects/p4a.audio/tags/release-1.0.1/
p4a.ploneaudio http://www.plone4artists.org/svn/projects/p4a.ploneaudio/tags/release-1.0.1/
p4a.common http://www.plone4artists.org/svn/projects/p4a.common/tags/release-1.0/
p4a.z2utils http://www.plone4artists.org/svn/projects/p4a.z2utils/tags/release-1.0/
p4a.fileimage http://www.plone4artists.org/svn/projects/p4a.fileimage/tags/release-1.0/
plone.app.form -r 18164 https://svn.plone.org/svn/plone/plone.app.form/branches/plone-2.5/

So, Plone4ArtistsAudio is a great product, it provides not only a nice interface without adding new content types, but also uses zope3 concepts like interfaces, adapters and subscriber and it also provides a podcast RSS feed, but its installation is not as easy as dropping the product in the Products folder.

Wow, this first post was quite long, but I hope it will be helpful for those trying to install Plone4ArtistsAudio in their buildout based Plone.

Happy Ploning and happy buildouting.. ;)