Last post Oct 09, 2017 06:43 AM by Billy Liu
Oct 06, 2017 06:36 AM|Muhammad Kashan Khan|LINK
I am using iTextSharp PDF extractor. I need to parse the string with exact stated spaces mentioned in the document. Attached image is for reference.
In current scenario, the string returned is with one space only.
How can i achieve this?
Any help would be appreciated.
Oct 09, 2017 03:16 AM|Alexander Zandra|LINK
Am not sure if iTextsharp can extract text from PDF with white spaces, what i have tried is the following method with spire.pdf. And this works well for me.
//create PdfDocument object
PdfDocument doc = new PdfDocument();
//load sample file
//get first page
PdfPageBase page = doc.Pages;
//get the extracted text
string text = page.ExtractText();
//write to file
FileStream fs = new FileStream("result.txt",FileMode.Create);
StreamWriter sw = new StreamWriter(fs);
Oct 09, 2017 06:43 AM|Billy Liu|LINK
Hi Muhammad Kashan Khan,
Muhammad Kashan Khan
Attached image is for reference.
I can't see the image.
Do you want to replace some value to two or more spaces and then display it in the page?
I think you could try to use " ".
For example, I parse “&&” to three spaces:
string newcontent = "";
using (PdfReader reader = new PdfReader(Server.MapPath("/Doc/Hello.pdf")))
StringBuilder text = new StringBuilder();
ITextExtractionStrategy Strategy = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
for (int i = 1; i <= reader.NumberOfPages; i++)
string page = "";
page = PdfTextExtractor.GetTextFromPage(reader, i, Strategy);
string lines = page.Split('\n');
foreach (string line in lines)
newcontent += line.Replace("&&", "   ");
Label1.Text = newcontent;