Recent Posts
Calendar
July 2009
S M T W T F S
« Jun   Aug »
 1234
567891011
12131415161718
19202122232425
262728293031  

Archive for July 20th, 2009

iTextSharp – PDF to HTML – Cleaning HTML

H05K0013

The last prerequisite step prior to actually converting our HTML into PDF code is to clean up the HTML.

The method I use takes advantage of the XML parser in .NET but in order to use that we have to have XHTML compliant XML.

For this exercise, what I am most concerned about is that the HTML tags all have matching closing tags, that the tags are nested in a hierarchical structure, and that the tags all are lower case.

Some of this we will have to rely on the user to provide, like properly nesting the tags.  But some of this we can attempt to clean up in our code.  If you know you will have complete control over your HTML, you might be able to skip this step.  But I think the code is simple enough that you’ll want to add it anyhow.

Read the rest of this entry »