Archive for July 20th, 2009

iTextSharp – PDF to HTML – Cleaning HTML

H05K0013

The last prerequisite step prior to actually converting our HTML into PDF code is to clean up the HTML.

The method I use takes advantage of the XML parser in .NET but in order to use that we have to have XHTML compliant XML.

For this exercise, what I am most concerned about is that the HTML tags all have matching closing tags, that the tags are nested in a hierarchical structure, and that the tags all are lower case.

Some of this we will have to rely on the user to provide, like properly nesting the tags.  But some of this we can attempt to clean up in our code.  If you know you will have complete control over your HTML, you might be able to skip this step.  But I think the code is simple enough that you’ll want to add it anyhow.

Read the rest of this entry »

Other Related Items:

LINQ in ActionLINQ in Action

LLINQ, Language INtegrated Query, is a new extension to the Visual Basic and C# programming languages designed to simplify data queries and databas... Read More >

Health & Medical Web Design Templates, Layouts, Logos, HTML and Stock PhotosHealth & Medical Web Design Templates, Layouts, Logos, HTML and Stock PhotosHealth and medical themed CD contains a collection of HTML web templates, layouts, CSS files, images, logos, and photographs all related to the medica... Read More >
Greddy Oil Filters: QX-01 #7156Greddy Oil Filters: QX-01 #7156 New Page 1

Product Details: GReddy Oil Filters use "double pleats" folding technology on the inner filter to provide a maximum... Read More >

DotNetNuke Sponsor

 

Most Valuable Blogger
Sponsor