Last post Mar 25, 2014 03:25 AM by smirnov
Mar 13, 2014 03:29 AM|dileepkumarkalva|LINK
I want to convert a word document to xml file and xml file should have all font characteristics(bold, size) as like word document.
Any help is greatly appreciated !!
Mar 13, 2014 03:57 AM|smirnov|LINK
The latest Word already saves files in XML format, hence the .docx extension. Hope that suffices your need. Simply rename docx to zip and unpack with unzip
Mar 13, 2014 06:17 AM|dileepkumarkalva|LINK
Can you share any code with me as i am new to xml. I don't know how to convert word document to xml.
am trying below code, but its not working. I dont want to go for third party tool like spire.. etc.
String Whole_Data= File.ReadAllText(filename);
Mar 13, 2014 06:44 AM|smirnov|LINK
take docx. rename it to zip.
Mar 13, 2014 08:30 AM|dileepkumarkalva|LINK
Thank you. Its converting into xml.
but how do we achieve this programatically !!
Mar 13, 2014 09:19 AM|smirnov|LINK
This does not convert anything. Docx is written in an xml format, which consists of a zip archive file containing xml and binaries. If rename and extact is only what you wanted to do then just use File.Move() to rename and ExtractToDirectory() to extract
(.NET4.5 is required)
string zipPath = @"c:\example\file.docx";
string extractPath = @"c:\example\xml";
File.Move(zipPath, zipPath + ".zip");
System.IO.Compression.ZipFile.ExtractToDirectory(zipPath + ".zip", extractPath);
for .net4 and older use other methods, e.g. http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
Mar 25, 2014 02:38 AM|dileepkumarkalva|LINK
Replying late as i have involved in another project.
Your answer is working fine, but i want to read word document font size, type, forecolor, bold/italic/underline etc...
In xml it is showing only font type i.e times new roman and not other.
I am trying to create a word parser, that parses each and every line based on font characteristics as above.
Any help is greatly apprectiated.
Mar 25, 2014 03:25 AM|smirnov|LINK
Create a simple document with required characteristics, save as docx, unzip and see where they are stored.
You should get a file similar to http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
<w:rFonts w:ascii="Arial" w:h-ansi="Arial" w:cs="Arial" />
<wx:font wx:val="Arial" />
<w:kern w:val="32" />
<w:sz w:val="32" />
<w:sz-cs w:val="32" />
where "Arial" is the type face and "32" is the size in half-points (1/144 of an inch), means the font size is 16. The same will be for forecolor, bold/italic/underline etc...