
This work is licensed under a Creative Commons Attribution-Share Alike 2.0 France License.
The other day I needed a small xml parser to convert an xml document into a different format. First I tried xml-light. This is a simple parser all written in ocaml that stores the parser xml document in an ocaml data structure. This data structure can be user to access various fields of the xml document. It does not offer a dom-like interface, but actually I consider this a feature. Unfortunately xml-light is terribly slow. To parse 30K-plus lines of xml it takes far too long to be considered for my application.
The next logic choice was to try Expat, that is a event-based parser and it is extremely fast. Since using an event based parser can be a bit cumbersome (and I already had written of bit of code using xml-light), I decided to write a small wrapper around expat to provide a xml-light interface to it.
The code is pretty simple and the main idea is taken from the cduce xml loader.
First we provide a small data structure to hold the xml document as we examine it. Nothing deep here. Notice that we use Start and String as we descend the tree and Element we we unwind the stack.
Then we need to provide expat handlers to store xml fragments on the stack as we go down. Note that we have an handler for cdata, but not an handler for pcdata as it is the default.
At the end we just register all handlers with the expat parser and we return the root of the xml document.
I've copied the xml-light methods and to access the document in a different file. I've also made everything lazy to save a bit of computing time if it is only necessary to access a part of a huge xml document.
The complete code can be found here: git clone https://www.mancoosi.org/~abate/repos/xmlparser.git
(this repo might disappear in the future ...)
Recent comments
49 weeks 6 days ago
1 year 1 week ago
1 year 14 weeks ago
1 year 16 weeks ago
1 year 18 weeks ago
1 year 20 weeks ago
1 year 22 weeks ago
1 year 51 weeks ago
2 years 3 weeks ago