Wednesday, July 2, 2008

Compiling documents the OpenXML way

Today, I've got a RFC about the automatically generation of a document. De document should be a write-out of a thesaurus, which is contained in a SQL database. It should run server-sided, and should produce the document with the ease of a push of the button.
Besides the standard list of words, the document should contain page-numbers. In the past, this was done by writing out html, saving the document with the .doc extension and pushing the file with the right MIME headers. But with this approach, insertion of page numbers on each page is quite a burden (if possible at all)... so enter OpenXML.

The other scenario, using mail-merge functionality, isn't used. I want the merge takes place on the server, not on the client.

The code here is just programming with XML and the System.IO.Packaging namespace, introduced in .NET 3.0. I still haven't tried the OpenXML SDK, so you should know what you're doing with adding XML fragments and their relations into the package.
The solution is quite straightforward, and it won't be difficult to expand this to your own needs.

We start by referencing the System.IO.Packaging namespace. It is delivered in the WindowsBase GAC dll. Add a reference to the WindowsBase and you can build your own Package from scratch:

private void Load(string documentPath)
{
Package pkgOutputDoc = null;
pkgOutputDoc = Package.Open(@"c:\work\test.docx", FileMode.Create, FileAccess.ReadWrite);
Uri uri = new Uri("/word/document.xml", UriKind.Relative);
PackagePart partDocumentXML = pkgOutputDoc.CreatePart(uri,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");

StreamWriter streamStartPart = new StreamWriter(partDocumentXML.GetStream(FileMode.Create, FileAccess.Write));
XmlDocument xdoc = new XmlDocument();
xdoc.Load(@"C:\work\document.xml");
FillDocument(xdoc);
xdoc.Save(streamStartPart);
streamStartPart.Close();
pkgOutputDoc.Flush();

pkgOutputDoc.CreateRelationship(uri, TargetMode.Internal,
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument",
"rId1");
pkgOutputDoc.Flush();
pkgOutputDoc.Close();
}


I've added all the xml fragments I've used as embedded resources to my dll, so it is quite easy for me to version and deploy this solution. The added document.xml, the main content of my docx file, is expanded by generating custom xml, based on the database content.
I've read my database content into a dictionary, and I'm generating the xml based on each keyword in my database.

<w:p>
<w:r>
<w:t>Term</w:t>
</w:r>
<w:r>
<w:br />
</w:r>
<w:r>
<w:tab />
<w:t>SN</w:t>
</w:r>
<w:r>
<w:tab />
<w:t>scope note for Term</w:t>
</w:r>
<w:r>
<w:br />
</w:r>
<w:r>
<w:tab />
<w:t>UF</w:t>
</w:r>
<w:r>
<w:tab />
<w:t>Term B</w:t>
</w:r>
<w:r>
<w:br />
</w:r>
<w:r>
<w:tab />
<w:t>RT</w:t>
</w:r>
<w:r>
<w:tab />
<w:t>Term C</w:t>
</w:r>
<w:r>
<w:br />
</w:r>
</w:p>

This xml fragment is built by the following code

private void FillDocument(XmlDocument xdoc)
{
XmlNamespaceManager nsMgr = new XmlNamespaceManager(new NameTable());

string wNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
nsMgr.AddNamespace("w", wNamespace);

XmlNode wBody = xdoc.SelectSingleNode("/w:document/w:body", nsMgr);
//begin inserting at the last defined paragraph
XmlNode lastParagraph = xdoc.SelectSingleNode("/w:document/w:body/w:p[last()]", nsMgr);

Dictionary<string, List<RelatedKeyword>> thesaurus = ThesaurusList.GetThesaurus();

foreach (string keyword in thesaurus.Keys)
{
XmlElement thesaurusTerm = xdoc.CreateElement("w", "p", wNamespace);
wBody.InsertAfter(thesaurusTerm, lastParagraph);
lastParagraph = thesaurusTerm;

XmlElement thesaurusRterm = xdoc.CreateElement("w", "r", wNamespace);
XmlElement thesaurusText = xdoc.CreateElement("w", "t", wNamespace);
XmlElement thesaurusRbreak = xdoc.CreateElement("w", "r", wNamespace);
XmlElement thesaurusBreak = xdoc.CreateElement("w", "br", wNamespace);

thesaurusText.InnerText = keyword;

thesaurusTerm.AppendChild(thesaurusRterm);
thesaurusRterm.AppendChild(thesaurusText);

thesaurusTerm.AppendChild(thesaurusRbreak);
thesaurusRbreak.AppendChild(thesaurusBreak);

foreach (RelatedKeyword relatedKeyword in thesaurus[keyword])
{
XmlElement termTypeR = xdoc.CreateElement("w", "r", wNamespace);
XmlElement termDescriptionR = xdoc.CreateElement("w", "r", wNamespace);
XmlElement termBreakR = xdoc.CreateElement("w", "r", wNamespace);
XmlElement termTypeT = xdoc.CreateElement("w", "t", wNamespace);
XmlElement termTypeTab = xdoc.CreateElement("w", "tab", wNamespace);
XmlElement termDescriptionT = xdoc.CreateElement("w", "t", wNamespace);
XmlElement termDescriptionTab = xdoc.CreateElement("w", "tab", wNamespace);
XmlElement termBreak = xdoc.CreateElement("w", "br", wNamespace);



termTypeT.InnerText = relatedKeyword.Relation;
termDescriptionT.InnerText = relatedKeyword.Keyword;

thesaurusTerm.AppendChild(termTypeR);
termTypeR.AppendChild(termTypeTab);
termTypeR.AppendChild(termTypeT);

thesaurusTerm.AppendChild(termDescriptionR);
termDescriptionR.AppendChild(termDescriptionTab);
termDescriptionR.AppendChild(termDescriptionT);

thesaurusTerm.AppendChild(termBreakR);
termBreakR.AppendChild(termBreak);

}
}

}


Now I need to add different xml fragments to the package. The XML fragments for settings, footer and styles are added with this generic function.

AddPart(pkgOutputDoc, uri, partDocumentXML,
"application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml",
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings",
"rId2",
"/word/settings.xml",
"settings.xml");

/// <summary>
/// Adds the part from an embedded XML to the Package.
/// </summary>
/// <param name="package">The package.</param>
/// <param name="documentUri">The document URI.</param>
/// <param name="partDocumentXML">The part document XML.</param>
/// <param name="contentType">Type of the content.</param>
/// <param name="relationshipType">Type of the relationship.</param>
/// <param name="relationId">The relation id.</param>
/// <param name="partPath">The part path.</param>
/// <param name="embeddedFile">The embedded file.</param>
private void AddPart(Package package, Uri documentUri, PackagePart partDocumentXML,
string contentType, string relationshipType, string relationId, string partPath,
string embeddedFile)
{
XmlDocument xdoc = new XmlDocument();
Uri uriPart = new Uri(partPath, UriKind.Relative);
PackagePart part = package.CreatePart(uriPart, contentType);
Uri relativePartUri =
PackUriHelper.GetRelativeUri(documentUri, uriPart);
Stream contentStream = GetEmbeddedXml(embeddedFile);
xdoc.Load(contentStream);
contentStream.Close();
xdoc.Save(part.GetStream());
partDocumentXML.CreateRelationship(relativePartUri, TargetMode.Internal, relationshipType, relationId);
}


The footer can contain the page number by using this xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:ftr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:p>
<w:pPr>
<w:jc w:val="right"/>
</w:pPr>
<w:fldSimple w:instr=" PAGE \* MERGEFORMAT ">
<w:r>
<w:t>1</w:t>
</w:r>
</w:fldSimple>
</w:p>
</w:ftr>


No comments: