Home » Posts tagged "html" (Page 2)

iTextSharp – HTML to PDF – Writing the PDF


Last week we parsed the HTML and created code that keeps track of the various attributes we are going to need when we create the PDF.  Today we will finish the code and create the Elements that we can include in our PDF document.

One consideration we will need to keep in mind as we write out the PDF is that we have pushed various font characteristics that may overlap onto our stack.

Continue reading “iTextSharp – HTML to PDF – Writing the PDF”

iTextSharp – HTML to PDF – Parsing HTML


Now that we have the HTML cleaned up, the next thing we will want to do is to parse the HTML.

In my actual code for this, I parse the HTML and create the PDF at the same time, but for the purposes of these posts, I’m going to deal primarily with parsing the HTML here and then deal with the PDF creation code later.

Continue reading “iTextSharp – HTML to PDF – Parsing HTML”

iTextSharp – HTML to PDF – Cleaning HTML


The last prerequisite step prior to actually converting our HTML into PDF code is to clean up the HTML.

The method I use takes advantage of the XML parser in .NET but in order to use that we have to have XHTML compliant XML.

For this exercise, what I am most concerned about is that the HTML tags all have matching closing tags, that the tags are nested in a hierarchical structure, and that the tags all are lower case.

Some of this we will have to rely on the user to provide, like properly nesting the tags.  But some of this we can attempt to clean up in our code.  If you know you will have complete control over your HTML, you might be able to skip this step.  But I think the code is simple enough that you’ll want to add it anyhow.

Continue reading “iTextSharp – HTML to PDF – Cleaning HTML”