Saturday, September 24, 2011

Word to CHM source code

I have just finished updating an old buggy sourceforge project and decided to share code and binaries. More over, I have written a nice post on why I had to update the project and how to use it to convert word to mobi files for kindle but thanks to Windows Live Writer (which crashed :-( ) I lost it all!

Summary of previous (lost) post

  • Kindle (up to 3) doesn’t support word
  • I use calibre for file conversions (and ebook management)
  • word—> pdf—> mobi process is no good for technical reports with lots of images and diagrams (most of the pictures are displayed at random places)
  • word –> rft –>mobi is better but will not work on large files
  • word –>chm –>mobi  pretty good results
  • The old project converted doc to html via office interops, cleared the html using HTMLTidy interop and saved a xhtml. The output was fed to Microsoft Help Compiler and voila the chm file.
  • The old project was buggy on a x64 platform (probably the dev had a x86 machine, used x86 interops, and forgot to change the compiler option from “AnyCPU” to x86)
  • The old project had an old HTMLTidy reference which I replace for the newer TidyManaged.

Source code:


1 comment:

Andreas Botsikas said...

To support Character Encoding other thatn UTF8 locate the comment "'Abot comment: Change this to support other character encodings") in HTMLConvertor->HtmlConvertorBase.vb file