I googled much but there is no satisfactory free pdf to html conversion tool, tips or article. So I resolved to start a fresh discussion upon "How to convert a pdf into html file in asp.net?"
How could I convert a pdf file into an html page? Should I first convert it into a microsoft file, and then into html file?
Any code snippet, dll or links to tips/tutorials is appreciated. Thanks in advance.
I tried to convert pdf to html by executing the pdftohtml.exe. But I guess executing the .exe file on server is not safe. Each time I execute this file, the DOS command window pops up and closes. Is there any remedy to this? Or there exists any other component
[in fact, I need the free one. Commercial one is not suitable for me] for the same task? However, I praise the work done by this exe file- each page in the pdf file is converted to one html and the main html file consists of all those listed pages as links
and display in the same page.
Now, could we put the background of each generated html page as normal white color? And importantly could we stop the command window pop-off behaviour? With these work around, this would be a very useful pdf to html converter.
Thanks for the quick reply. But the function you mentioned creates a pdf document on the fly using iTextSharp library. How could this help to convert a pdf file into html? Further, we do already have pdf file and we won't need to generate extra one. I am
little confused. Could you explain little more please? Thank you.
Thanks for the responses. I visited all the links provided by you both. But I coudn't get what I am looking for. Some of them talked about converting html to pdf [just opposite to what I am in a need of!] and some focused on extracting text from pdf. I have
used pdftotext earlier and this gives me the unformatted text only. This will not contribute to a html file even if I convert this text into html. What would be the next move ? Thanks!
using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
sangam100
Participant
1639 Points
316 Posts
Convert pdf file into html in asp.net
May 26, 2009 07:42 AM|LINK
Hi all,
I googled much but there is no satisfactory free pdf to html conversion tool, tips or article. So I resolved to start a fresh discussion upon "How to convert a pdf into html file in asp.net?"
How could I convert a pdf file into an html page? Should I first convert it into a microsoft file, and then into html file?
Any code snippet, dll or links to tips/tutorials is appreciated. Thanks in advance.
asp.net file conversion
Very useful visual studio shortcuts and more tips and tricks
5 Different ways to open new window in asp.net
qwe123kids
All-Star
48619 Points
7957 Posts
MVP
Re: Convert pdf file into html in asp.net
May 26, 2009 11:43 AM|LINK
Hi,
ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl3-win32.zip
http://aspdotnetcodebook.blogspot.com/2008/08/how-to-convert-pdf-file-to-text-in.html
http://forums.asp.net/p/1417395/3130077.aspx
Avinash Tiwari
Remember to click “Mark as Answer” on the post, if it helps you.
sangam100
Participant
1639 Points
316 Posts
Re: Convert pdf file into html in asp.net
May 27, 2009 04:49 AM|LINK
Hi qwe123kids ,
I tried to convert pdf to html by executing the pdftohtml.exe. But I guess executing the .exe file on server is not safe. Each time I execute this file, the DOS command window pops up and closes. Is there any remedy to this? Or there exists any other component [in fact, I need the free one. Commercial one is not suitable for me] for the same task? However, I praise the work done by this exe file- each page in the pdf file is converted to one html and the main html file consists of all those listed pages as links and display in the same page.
Now, could we put the background of each generated html page as normal white color? And importantly could we stop the command window pop-off behaviour? With these work around, this would be a very useful pdf to html converter.
Thanks a lot!
Very useful visual studio shortcuts and more tips and tricks
5 Different ways to open new window in asp.net
qwe123kids
All-Star
48619 Points
7957 Posts
MVP
Re: Convert pdf file into html in asp.net
May 27, 2009 05:12 AM|LINK
using
System;using
System.Data;using
System.Configuration;using
System.Collections;using
System.Web;using
System.Web.Security;using
System.Web.UI;using
System.Web.UI.WebControls;using
System.Web.UI.WebControls.WebParts;using
System.Web.UI.HtmlControls;using
iTextSharp.text;using
iTextSharp.text.pdf;using
System.IO; namespace PDF_TO_HTML{
public partial class _Default : System.Web.UI.Page{
protected void Page_Load(object sender, EventArgs e){
CreatePDF(@"D:\VSS_TEST\Avinashtest.pdf");}
public bool CreatePDF(string sFilePDF){
bool bRet = false;Response.Write(
" My First Information"); Document document = new Document(); try{
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(sFilePDF, FileMode.Create));document.Open();
iTextSharp.text.Table aTable = new iTextSharp.text.Table(2, 2); aTable.AddCell("Avinash");aTable.AddCell(
"Tiwari"); aTable.AddCell("Is My");aTable.AddCell("Name");document.Add(aTable);
bRet = true;}
catch (DocumentException de){
Response.Write( de.Message);
}
catch (IOException ioe){
Response.Write(ioe.Message);
}
document.Close();
if (bRet)Response.Write( sFilePDF +
" has been created"); return bRet;}
}
}
Download the itext Dll from
http://sourceforge.net/project/platformdownload.php?group_id=72954
Avinash Tiwari
Remember to click “Mark as Answer” on the post, if it helps you.
sangam100
Participant
1639 Points
316 Posts
Re: Convert pdf file into html in asp.net
May 27, 2009 05:48 AM|LINK
Hi qwe123kids,
Thanks for the quick reply. But the function you mentioned creates a pdf document on the fly using iTextSharp library. How could this help to convert a pdf file into html? Further, we do already have pdf file and we won't need to generate extra one. I am little confused. Could you explain little more please? Thank you.
Very useful visual studio shortcuts and more tips and tricks
5 Different ways to open new window in asp.net
aditya1986
Contributor
4168 Points
760 Posts
Re: Convert pdf file into html in asp.net
May 27, 2009 05:53 AM|LINK
you follow the following links
http://www.dot-net-search.com/asp.net+C%23+code+for+converting+html+to+pdf/
http://www.codeproject.com/KB/string/pdf2text.aspx
http://www.worldofasp.net/tut/GeneratePDF/Generate_pdf_from_html_with_Csharp_and_iTextSharp_265.aspx
http://www.sautinsoft.com/products/pdf-metamorphosis/index.php
http://www.velocityreviews.com/forums/t83456-pdf-conversion-using-c-and-aspnet.html
i hope it helps you
mudassarkhan
All-Star
78956 Points
13402 Posts
MVP
Re: Convert pdf file into html in asp.net
May 27, 2009 06:21 AM|LINK
http://www.codeproject.com/KB/string/pdf2text.aspx
Contact me
sangam100
Participant
1639 Points
316 Posts
Re: Convert pdf file into html in asp.net
May 27, 2009 06:32 AM|LINK
Hi, aditya1986 and mudassarkhan,
Thanks for the responses. I visited all the links provided by you both. But I coudn't get what I am looking for. Some of them talked about converting html to pdf [just opposite to what I am in a need of!] and some focused on extracting text from pdf. I have used pdftotext earlier and this gives me the unformatted text only. This will not contribute to a html file even if I convert this text into html. What would be the next move ? Thanks!
Very useful visual studio shortcuts and more tips and tricks
5 Different ways to open new window in asp.net
qwe123kids
All-Star
48619 Points
7957 Posts
MVP
Re: Convert pdf file into html in asp.net
May 27, 2009 07:08 AM|LINK
Hi,
http://www.ikvm.net/
//PDF BOX
http://sourceforge.net/project/showfiles.php?group_id=78314
/*****CODE ************************/
using System;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
namespace PDFReader
{
class Program
{
static void Main(string[] args)
{
PDDocument doc = PDDocument.load("lopreacamasa.pdf");
PDFTextStripper pdfStripper = new PDFTextStripper();
Console.Write(pdfStripper.getText(doc));
}
}
}
/*******************************/
Or
http://studentclub.ro/lucians_weblog/archive/2007/03/22/read-from-a-pdf-file-using-c.aspx
Avinash Tiwari
Remember to click “Mark as Answer” on the post, if it helps you.
qwe123kids
All-Star
48619 Points
7957 Posts
MVP
Re: Convert pdf file into html in asp.net
May 27, 2009 07:15 AM|LINK
Hi,
/****************Check this code****************/
using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
namespace WebApplication1
{
public partial class _Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
Response.Write(parseUsingPDFBox(@"D:\VSS_TEST\70_536.pdf"));
}
private static string parseUsingPDFBox(string filename)
{
PDDocument doc = PDDocument.load(filename);
PDFTextStripper stripper = new PDFTextStripper();
return stripper.getText(doc);
}
}
}
*****************************************************************/
Download the Dll fiel from previous Post of Mine..Links Are Provided
OR
http://www.codeproject.com/KB/string/pdf2text.aspx
Avinash Tiwari
Remember to click “Mark as Answer” on the post, if it helps you.