via Google, or your favourite search engine, c# read .pdf
will return a gazillion search results.
a program like PSPad (free) will let you study the file you wish to example in hexadecimal.
Nick, given that you've posted to forums.asp.net c. 2000 times, you're likely already exceptional with Google et al; so i'm wondering what it is i do not understand about your O.P.
B-) Gerry Lowry, Chief Training Architect, Paradigm Mentors Learning never ends... +1 705-999-9195 wasaga beach, ontario canada TIMTOWTDI =.there is more than one way to do it
Thanks for the links. I only want to read the xml out of a 275 page .pdf file and then generate objects using xsd.exe (or something that turns xml into code), that I can then generate table hierarchical structure using DbSet<T>.
Depending on how your data is structured and given that this seems like a one time event for you, you might be successful by simply exporting to text and then parsing your text file.
B-) Gerry Lowry, Chief Training Architect, Paradigm Mentors Learning never ends... +1 705-999-9195 wasaga beach, ontario canada TIMTOWTDI =.there is more than one way to do it
As for this issue, I suggest you could try to read PDF content using iTextSharp, then get the XML data by using string method or Regex method. Here are some relevant articles,
please refer to them.
.NET forums are moving to a new home on Microsoft Q&A, we encourage you to go to Microsoft Q&A for .NET for posting new questions and get involved today.
Contributor
3105 Points
2122 Posts
parse pdf file
Feb 21, 2015 11:21 PM|Xequence|LINK
I want to parse a pdf file and locate the xml inside of the document. Suggestions?
Credentials
CurbSmash
Star
14297 Points
5797 Posts
Re: parse pdf file
Feb 22, 2015 12:39 AM|gerrylowry|LINK
@xequence
via Google, or your favourite search engine,
c# read .pdf
will return a gazillion search results.
a program like PSPad (free) will let you study the file you wish to example in hexadecimal.
Nick, given that you've posted to forums.asp.net c. 2000 times, you're likely already exceptional with Google et al; so i'm wondering what it is i do not understand about your O.P.
Have you been to http://partners.adobe.com/public/developer/xml/topic.html ?
also: http://partners.adobe.com/public/developer/en/xml/AdobeXMLFormsSamples.pdf
is this a one time, one file event?
are you thinking about any generic .pdf file? a special .pdf file?
http://weblogs.asp.net/gerrylowry/clarity-is-important-both-in-question-and-in-answer
Contributor
3105 Points
2122 Posts
Re: parse pdf file
Feb 22, 2015 04:36 PM|Xequence|LINK
Thanks for the links. I only want to read the xml out of a 275 page .pdf file and then generate objects using xsd.exe (or something that turns xml into code), that I can then generate table hierarchical structure using DbSet<T>.
Credentials
CurbSmash
Star
14297 Points
5797 Posts
Re: parse pdf file
Feb 22, 2015 06:04 PM|gerrylowry|LINK
@xequence TIMTOWTDI
Depending on how your data is structured and given that this seems like a one time event for you, you might be successful by simply exporting to text and then parsing your text file.
.pdf files look a bit messy inside.
There are products, example https://bytescout.com/products/developer/pdfextractorsdk/index.html some of which have demo versions but it's possible that the demo versions are restricted to the number of pages that the can handle.
you might want to try searches like:
extract xml from pdf
msdn .net .pdf to .xml
et cetera
All-Star
45489 Points
7008 Posts
Microsoft
Re: parse pdf file
Feb 25, 2015 04:24 AM|Zhi Lv - MSFT|LINK
Hi xequence,
As for this issue, I suggest you could try to read PDF content using iTextSharp, then get the XML data by using string method or Regex method. Here are some relevant articles, please refer to them.
How to read PDF content using iTextSharp in .NET
How to Extract Text From PDF File Using C#.Net
String Methods
Regex Methods
Best Regards,
Dillion