My application allows users to upload files in a variety of formats (plain text, PDF, .doc, .docx), which are then saved to Blob Storage. I'm using the Sautinsoft DocumentCore class to load the documents for conversion to Base64 string for rendering in the
browser. I've got these working right now:
public static string WordDocxToHtml(CloudBlockBlob blob)
{
using (MemoryStream msInput = new MemoryStream(blob.OpenRead().ToBytes()))
{
var dc = DocumentCore.Load(msInput, new DocxLoadOptions());
using (MemoryStream msOutput = new MemoryStream())
{
dc.Save(msOutput, new HtmlFixedSaveOptions()
{
CssExportMode = CssExportMode.Inline,
EmbedImages = true
});
return msOutput.ToArray().ToBase64String();
}
}
}
public static string PdfToHtml(CloudBlockBlob blob)
{
return blob.OpenRead().ToBytes().ToBase64String();
}
public static string PlainTextToHtml(CloudBlockBlob blob)
{
return PdfToHtml(blob);
}
Just in case anyone reading this needs to do the same thing... this displays the result in the browser:
So all this works great, but if I try to load a .doc Word document instead of a .docx, Sautinsoft no longer has my back. It appears they only support loading documents that are .docx, pdf, and .rtf. So I'd like to convert the CloudBlockBlob .doc file to
a .docx or .pdf file in memory. Saving them to a temp folder is going to be pricey in terms of speed. I'm grabbing the blob as a MemoryStream... is there a way to convert the type in memory?
It's not how hard you push in life, but who you push, that makes the difference between success and running for your life.
It will be better that you could share a demo that can reproduce the issue , currently , we could not give some suggestions effectively .
Best Regards ,
Sherry
MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue.
If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.
I'm trying to convert the file in memory, with saving the file to a temp directory being the last choice. That would really slow down my application. I've tried this method, and it works, but it'll cause a bad user experience. I can't create a public demo
for this, unfortunately. It's not that complex, however. I'm loading the documents with Sautinsoft's "DocumentCore.Load" method, and then converting the contents of the file to a Base64 string for rendering in the browser. Since Sautinsoft doesn't support
Word .doc files, I'm having troubles. I just need to get the Base64 string of the document, however it's done. Since I've already got the .docx files working, I thought the easiest fix would be to convert the old Word docs to the newer format, but the only
way I've found so far that works is what you suggested, and that will be really slow. Also, if I recall correctly, I'd need to get Word installed on the web server as well.
It's not how hard you push in life, but who you push, that makes the difference between success and running for your life.
I've got things working, but I had to save a temp file (actually 2 temp files) to do it. That requires me to get a temp folder with write access on the server, and Word installed on the server for no other reason than saving ".doc" as ".docx" files. This
is NOT an ideal solution for the reasons I just stated, and it's also a LOT slower than generating Base64 strings from ".docx" files.
public static string WordDocToHtml(CloudBlockBlob blob, string filename = "Default.doc")
{
var results = "";
try
{
var path = $"{tempDirectory}{filename}";
var docBytes = blob.OpenRead().ToBytes();
using (var fs = new FileStream(path, FileMode.Create, FileAccess.Write))
{
fs.Write(docBytes, 0, docBytes.Length);
}
var word = new Word.Application();
var document = word.Documents.Open(path);
var newfilename = "";
// In case filename is saved without extension.
if (path.IndexOf(".doc") != -1)
newfilename = path.Replace(".doc", ".docx");
else
newfilename = path + ".docx";
document.SaveAs2(newfilename, Word.WdSaveFormat.wdFormatXMLDocument, CompatibilityMode: Word.WdCompatibilityMode.wdWord2010);
word.ActiveDocument.Close();
word.Quit();
using (MemoryStream msInput = new MemoryStream(File.ReadAllBytes(newfilename)))
{
var dc = DocumentCore.Load(msInput, new DocxLoadOptions());
using (MemoryStream msOutput = new MemoryStream())
{
dc.Save(msOutput, new HtmlFixedSaveOptions()
{
CssExportMode = CssExportMode.Inline,
EmbedImages = true
});
results = msOutput.ToArray().ToBase64String();
}
}
if (File.Exists(path))
File.Delete(path);
if (File.Exists(newfilename))
File.Delete(newfilename);
}
catch (System.Exception)
{
throw;
}
return results;
}
I'm posting this here in case anyone has a similar issue and can use a starting point. As far as I've been able to tell, it's not possible without a ton of work to save the ".doc" as a ".docx" unless you save it as a file. The SaveAs2 method of the Document
class only takes a file path as a param. If anyone knows how to get around this, please let me know.
It's not how hard you push in life, but who you push, that makes the difference between success and running for your life.
Member
184 Points
611 Posts
Convert Word .doc to Base64 or PDF
Sep 20, 2019 05:41 PM|WilliamSnell|LINK
My application allows users to upload files in a variety of formats (plain text, PDF, .doc, .docx), which are then saved to Blob Storage. I'm using the Sautinsoft DocumentCore class to load the documents for conversion to Base64 string for rendering in the browser. I've got these working right now:
Just in case anyone reading this needs to do the same thing... this displays the result in the browser:
So all this works great, but if I try to load a .doc Word document instead of a .docx, Sautinsoft no longer has my back. It appears they only support loading documents that are .docx, pdf, and .rtf. So I'd like to convert the CloudBlockBlob .doc file to a .docx or .pdf file in memory. Saving them to a temp folder is going to be pricey in terms of speed. I'm grabbing the blob as a MemoryStream... is there a way to convert the type in memory?
Contributor
2070 Points
606 Posts
Re: Convert Word .doc to Base64 or PDF
Sep 23, 2019 08:57 AM|Sherry Chen|LINK
Hi WilliamSnell ,
For converting .doc to .docx , you could refer to this link which may be helpful for you
https://stackoverflow.com/a/34111839/10201850
It will be better that you could share a demo that can reproduce the issue , currently , we could not give some suggestions effectively .
Best Regards ,
Sherry
Please remember to click "Mark as Answer" the responses that resolved your issue.
If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.
Member
184 Points
611 Posts
Re: Convert Word .doc to Base64 or PDF
Sep 23, 2019 04:34 PM|WilliamSnell|LINK
I'm trying to convert the file in memory, with saving the file to a temp directory being the last choice. That would really slow down my application. I've tried this method, and it works, but it'll cause a bad user experience. I can't create a public demo for this, unfortunately. It's not that complex, however. I'm loading the documents with Sautinsoft's "DocumentCore.Load" method, and then converting the contents of the file to a Base64 string for rendering in the browser. Since Sautinsoft doesn't support Word .doc files, I'm having troubles. I just need to get the Base64 string of the document, however it's done. Since I've already got the .docx files working, I thought the easiest fix would be to convert the old Word docs to the newer format, but the only way I've found so far that works is what you suggested, and that will be really slow. Also, if I recall correctly, I'd need to get Word installed on the web server as well.
Member
184 Points
611 Posts
Re: Convert Word .doc to Base64 or PDF
Sep 24, 2019 11:58 PM|WilliamSnell|LINK
I've got things working, but I had to save a temp file (actually 2 temp files) to do it. That requires me to get a temp folder with write access on the server, and Word installed on the server for no other reason than saving ".doc" as ".docx" files. This is NOT an ideal solution for the reasons I just stated, and it's also a LOT slower than generating Base64 strings from ".docx" files.
I'm posting this here in case anyone has a similar issue and can use a starting point. As far as I've been able to tell, it's not possible without a ton of work to save the ".doc" as a ".docx" unless you save it as a file. The SaveAs2 method of the Document class only takes a file path as a param. If anyone knows how to get around this, please let me know.