Writing XML with libxml and iconv

In the past, I’ve always hacked my own XML output functions. The result wasn’t always good XML, and it took a lot of fprintf()-massaging.

Then I needed a DOM parser for C (not C++ or C#), and the only one I really liked was libxml. It’s got the proper license for me to use it, it’s simple to use, and has botth DOM and SAX parsers.

Here’s a libxml example of how to make your own xml output, taken from the Eressea II sources (you’ll find examples for making your own parser everywhere):

#include <libxml/tree.h> int main(int argc, char** argv) { xmlDocPtr doc = xmlNewDoc(BAD_CAST "1.0"); xmlNodePtr node = xmlNewNode(NULL, BAD_CAST "eressea"); xmlNewProp(node, BAD_CAST "game", xml_s("Ümläutß")); xmlAddChild(node, xmlNewNode(NULL, BAD_CAST( xmlDocSetRootElement(doc, node); xmlKeepBlanksDefault(0); xmlSaveFormatFile(argv[1], doc, 1); xmlFreeDoc(doc); }

That BAD_CAST is just a macro to convert char* into (xmlChar*), and you write it whenever you think that your input is already good UTF-8 and are too lazy to convert. Please see Joel’s article on Unicode first. For places where I don’t have that guarantee, my code uses iconv, a character conversion library to convert the internal char* to UTF-8. Here’s an iconv example for the xml_s function used above:

#include <iconv.h> iconv_t utf8; xmlChar* xml_s(const char * str) { static char buffer[1024]; /* it's enough */ const char * inbuf = str; char * outbuf = buffer; size_t inbytes = strlen(str)+1, outbytes = sizeof(buffer); iconv(utf8, &inbuf, &inbytes, &outbuf, &outbytes); return (xmlChar*)buffer; } int main(int argc, char** argv) { utf_8 = iconv_open("UTF-8", ""); puts(xml_s("ä߀")); iconv_close(utf8); }

That’s so much more fun than fprintf-wrangling.

2 thoughts on “Writing XML with libxml and iconv

  1. Most of the time I spent writing this blog entry was wasted on trying to get nucleus to make preformatted text not look like such a disaster…

  2. Hach, ich will ja unbedingt irgendwas in C# programmieren, einfach, um mal wieder zu programmieren. Ich bin so aus der Übung. Allein, es fehlt mir an richtigen Ideen. Der C-Code in diesem Beitrag war übrigens das bisher tech-geekigste, was mir in einem Blog untergekommen ist. =)

    Und gerade fällt mir auf, daß ich wieder mal einen deutschen Kommentar zu einem englischen Beitrag schreibe. Lustig, daß ich es nichtmal mehr merke, wenn ich englische Texte lese.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.