Parsing big (some 60 MB is already big for me) XML files in Python was a bit painful until now. I used to import minidom and sometimes sax.
The problem with minidom is that the whole XML file loads into memory and you can’t do anything else until you process the file. If you do it with sax, you have to work detecting every element start and end.
I learnt today a better solution from Erral: using lxml . Just a couple of lines so that you see how can we convert an XML file into a list of dicts. Just like:
from lxml import etree
coords = etree.parse("/path/to/your/xml/file").getroot()
coords_list = []
for coord in coords:
this = {}
for child in coord.getchildren():
this[child.tag] = child.text
coords_list.append(this)
Quite straightforward, isn’t it? It’s already in Kelpi: XML to list of dict parsing
(posted by Gari)
Tags: python xml kelpi