The latest Word already saves files in XML format, hence the .docx extension. Hope that suffices your need. Simply rename docx to zip and unpack with unzip
This does not convert anything. Docx is written in an xml format, which consists of a zip archive file containing xml and binaries. If rename and extact is only what you wanted to do then just use File.Move() to rename and ExtractToDirectory() to extract
(.NET4.5 is required)
where "Arial" is the type face and "32" is the size in half-points (1/144 of an inch), means the font size is 16. The same will be for forecolor, bold/italic/underline etc...
None
0 Points
12 Posts
word to xml conversion
Mar 13, 2014 03:29 AM|dileepkumarkalva|LINK
Hi All,
I want to convert a word document to xml file and xml file should have all font characteristics(bold, size) as like word document.
Any help is greatly appreciated !!
Thanks,
Dileep kumar
All-Star
35159 Points
9075 Posts
Re: word to xml conversion
Mar 13, 2014 03:57 AM|smirnov|LINK
The latest Word already saves files in XML format, hence the .docx extension. Hope that suffices your need. Simply rename docx to zip and unpack with unzip
None
0 Points
12 Posts
Re: word to xml conversion
Mar 13, 2014 06:17 AM|dileepkumarkalva|LINK
Hi smirnov,
Can you share any code with me as i am new to xml. I don't know how to convert word document to xml.
am trying below code, but its not working. I dont want to go for third party tool like spire.. etc.
String Whole_Data= File.ReadAllText(filename);
File.WriteAllText(filename1, Whole_Data);
Thanks,
Dileep kumar
All-Star
35159 Points
9075 Posts
Re: word to xml conversion
Mar 13, 2014 06:44 AM|smirnov|LINK
take docx. rename it to zip.
Unzip.
None
0 Points
12 Posts
Re: word to xml conversion
Mar 13, 2014 08:30 AM|dileepkumarkalva|LINK
Thank you. Its converting into xml.
but how do we achieve this programatically !!
Thanks,
Dileep kumar
All-Star
35159 Points
9075 Posts
Re: word to xml conversion
Mar 13, 2014 09:19 AM|smirnov|LINK
This does not convert anything. Docx is written in an xml format, which consists of a zip archive file containing xml and binaries. If rename and extact is only what you wanted to do then just use File.Move() to rename and ExtractToDirectory() to extract (.NET4.5 is required)
Example
string zipPath = @"c:\example\file.docx";
string extractPath = @"c:\example\xml";
File.Move(zipPath, zipPath + ".zip");
System.IO.Compression.ZipFile.ExtractToDirectory(zipPath + ".zip", extractPath);
for .net4 and older use other methods, e.g. http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
None
0 Points
12 Posts
Re: word to xml conversion
Mar 25, 2014 02:38 AM|dileepkumarkalva|LINK
Hi,
Replying late as i have involved in another project.
Your answer is working fine, but i want to read word document font size, type, forecolor, bold/italic/underline etc...
In xml it is showing only font type i.e times new roman and not other.
I am trying to create a word parser, that parses each and every line based on font characteristics as above.
Any help is greatly apprectiated.
All-Star
35159 Points
9075 Posts
Re: word to xml conversion
Mar 25, 2014 03:25 AM|smirnov|LINK
Create a simple document with required characteristics, save as docx, unzip and see where they are stored.
You should get a file similar to http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
where "Arial" is the type face and "32" is the size in half-points (1/144 of an inch), means the font size is 16. The same will be for forecolor, bold/italic/underline etc...