Recent Posts
Calendar
November 2013
S M T W T F S
« Oct    
 12
3456789
10111213141516
17181920212223
24252627282930

Posts Tagged ‘htmltidy’

iTextSharp – PDF to HTML – Cleaning HTML

H05K0013

The last prerequisite step prior to actually converting our HTML into PDF code is to clean up the HTML.

The method I use takes advantage of the XML parser in .NET but in order to use that we have to have XHTML compliant XML.

For this exercise, what I am most concerned about is that the HTML tags all have matching closing tags, that the tags are nested in a hierarchical structure, and that the tags all are lower case.

Some of this we will have to rely on the user to provide, like properly nesting the tags.  But some of this we can attempt to clean up in our code.  If you know you will have complete control over your HTML, you might be able to skip this step.  But I think the code is simple enough that you’ll want to add it anyhow.

Read the rest of this entry »