NSXMLParser memory use

written by mfazekas on May 24th, 2009 @ 10:33 AM

Cocoa on MacOSX provides 2 XML parsers: NSXMLDocument for tree based parsing and NSXMLParser for event driven (or streaming) parsing.

iPhone has only NSXMLParser. One would expect this (event based) parser to have minimal memory footprint. So I was very upset when i had out of memory issues parsing a 8 MB XML document. This article describes the issues with NSXMLParser and shows some workarounds/alternatives.

NSXMLParser in theory

A quote from Apple documentation on NSXMLParser :

Event-driven parsing—because it deals with only one XML construct at a time and not all of them at once—consumes much less memory than tree-based parsing. It is ideal for situations where performance is a goal and modification of the parsed XML is not.

This is basically Dom vs SAX. Streaming/SAX parsers can (and should) work with minimal memory as they don’t have to keep the whole XML document in memory.

NSXMLParser in practice

But let’s take a look what’s going on with NSXMLParser in a real word application. The following object alloc output was captured with a program downloading an XML document then parsing it with NSXMLParser initWithData:.

The xml was 8 megabyte. As you can see NSXMLParser initWithData: allocated an extra 22 MB to parse that 8MB document. On iPhone 22MB is pretty significant. But it just got worse during the parsing it NSXMLParser keeped allocating memory, until the program reached 54MB total usage and was killed by the OS before the parsing finished. My estimate is that for this 8MB document NSXMLParser used at least 37MB of extra memory.

AQXMLParser to rescue

AQXMLParser is an NSXMLParser replacement. And just by replacing the word NSXMLParser with AQXMLParser in the code, the memory use got much better. It could finish the parse of the document with a peak memory use of 19 MB. The extra memory used by AQXMLParser was less than 1 MB.

Summary: issues with NSXMLParser

  1. it is not a real streaming parser: initWithURL: will download the full xml before processing it. For memory use this is bad as it have to allocate the memory for the full xml wich can’t be reclaimed until the end of parse. For performance it’s also bad, as you cannot interleave the IO intensive part of downloading and CPU intensive part of parsing.
  2. it provides no other forms of input than NSData and NSURL. For example if you send your authentication information in the NSURLRequest body then your only option is to download the data yourself and use initWIthData:
  3. it will not release memory. It seems that strings/dictionaries created during the parsing is kept around until the end of parse. I’ve tried to improve it with creative use of NSAutoreleasePool but without any success.
  4. it is slow. XMLPerformance demo shows that NSXMLParser is up to 2x slower than parsing with libxml2. This is mostly the overhead of creating Objective-C strings and dictionaries.

Workardounds/alternatives

AQXMLParser is a drop in replacement that solves all but the performance overhead of Objective-C interface:

  1. it is a streaming parser, it starts parsing as soon as the first chunks arrives, and will not download the full document.
  2. input can be NSInputStream and NSURLRequest in addition to NSURL and NSData.
  3. it will release the strings/dicitonaries created by it as soon as possible.

If you need maximum performance you can use libxml2 as demonstrated by apple in XMLPerformance. But it means you have to rewrite your parsing code.

Getting AQXMLParser

To use it add AQXMLParser.h, AQXMLParser.m to your project. Add libxml2 to your headers/libraries, and add CFNetwork framework to your poject. See the readme for details.

Related links

Comments are closed