E-Book and HTML Formatting Rules of Thumb
Version 1.2 (21 October 2010)

Do Not, repeat, Do Not “convert” your Great American Novel directly from Microsoft Word .doc format into HTML. Word is a constantly moving spec; and even Word's built in export to HTML (which many ebook formats use as a base) is not that great.

Here's an example from a sort of RPG game that I am currently playing on StarDestroyer.net – unlike others who are putting some straight thought into original star nations; I'm simply playing my player nation as a straight up parody of the new Battlestar Galactica and various other SF franchises.

Original:

INT. ANNAPOLIS PORT HANGAR BAY

The port flight pod had been hurriedly evacuated of all but non-essential personnel – so important was it to maintain the fiction that there were no Bragulans on the Annapolis.

As the Cheney came into the hangar pod, the polyarc lights on the top of the hangar pod revealed that the once pristine olive drab paint covering the dropship had been scorched away over most of its surface area.

As it passed into the pressurized volume of the hangar, the radiation alarms began to chatter excitedly.

Chief of the Deck Tylenol stared at the radiation alarms with disbelief before taking charge in his characteristic way.

“Okay everyone get the fuck away from that dropship. It’s goddamned hot! Get the anti-rad foam out and spray it in teams so nobody goes over their allowable dose limit for the day!”

Microsoft Word 2010 Save As Web Page HTML Generated Code (22.5 kb)

(400 lines of garbage deleted)

<p class=MsoNormal><b style='mso-bidi-font-weight:normal'><span style='font-size:14.0pt;mso-bidi-font-size:11.0pt;line-height:115%'>INT. ANNAPOLIS PORT HANGAR BAY<o:p></o:p></span></b></p>

<p class=MsoNormal>The port flight pod had been hurriedly evacuated of all but non-essential personnel – so important was it to maintain the fiction that there were no <span class=SpellE>Bragulans</span> on the Annapolis.<o:p></o:p></p>

<p class=MsoNormal>As the <i style='mso-bidi-font-style:normal'>Cheney</i> came into the hangar pod, the <span class=SpellE>polyarc</span> lights on the top of the hangar pod revealed that the once pristine olive drab paint covering the <span class=SpellE>dropship</span> had been scorched away over most of its surface area.<o:p></o:p></p>

<p class=MsoNormal>As it passed into the pressurized volume of the hangar, the radiation alarms began to chatter excitedly.<o:p></o:p></p>

<p class=MsoNormal>Chief of the Deck Tylenol stared at the radiation alarms with disbelief before taking charge in his characteristic way.<o:p></o:p></p>

<p class=MsoNormal>“Okay everyone get the fuck away from that <span class=SpellE>dropship</span>. <span class=GramE>It’s</span> goddamned hot! Get the anti-rad foam out and spray it in teams so nobody goes over their allowable dose limit for the day!”</p>

Open Office 3.1.1 Writer/Web HTML Generated Code from a Copy and Paste from Word 2010 (1.31 kb)

<P><FONT SIZE=4><B>INT. ANNAPOLIS PORT HANGAR BAY</B></FONT></P>

<P>The port flight pod had been hurriedly evacuated of all but non-essential personnel &ndash; so important was it to maintain the fiction that there were no Bragulans on the Annapolis.</P>

<P>As the <I>Cheney</I> came into the hangar pod, the polyarc lights on the top of the hangar pod revealed that the once pristine olive drab paint covering the dropship had been scorched away over most of its surface area.</P>

<P>As it passed into the pressurized volume of the hangar, the radiation alarms began to chatter excitedly.</P>

<P>Chief of the Deck Tylenol stared at the radiation alarms with disbelief before taking charge in his characteristic way.</P>

<P>&ldquo;Okay everyone get the fuck away from that dropship. It&rsquo;s goddamned hot! Get the anti-rad foam out and spray it in teams so nobody goes over their allowable dose limit for the day!&rdquo;</P>

As you can see, the Open Office Generated code from a cut and paste from Word 2010 is significantly 'cleaner', and conforms more to modern basic HTML – for example the term “&ldquo;” is the HTML term for a unicode left double quotation mark. The cleaner your HTML book code is; the smoother the conversion to whatever ebook format (Kindle Mobi, EPUB, etc) you chose will be; with less glitches caused by Word's unnecessary overuse of HTML code for super precise formatting.

Some examples of how HTML formats show up on the Kindle 3 have been done by me using the SHIFT+ALT+G undocumented feature to take a screenshot of the Kindle's current screen. It stores the screenshot as a GIF file in the /documents folder of your Kindle.

Kindle 3 Example 1: Using the <BLOCKQUOTE>  </BLOCKQUOTE> series of tags to indent a paragraph and the alternate <P STYLE="margin-left: 0.79in">  </P> method that Open Office Writer/Web 3.1.1 generates when you use the INDENT PARAGRAPH feature.
Kindle 3 Example 2: Using the <H1>  </H1> series of tags (Header1 Style in Open Office).
Kindle 3 Example 3: This tests the <HR> style of Horizontal Line; the <STRIKE>  </STRIKE> tags for strike-through text; the <SUP>  </SUP> tags for superscript text; and the <SUB>  </SUB> tags for subscript text.
Kindle 3 Example 4: This tests the <CODE>  </CODE> tags. It produces a remarkably good looking typewritten text. Unfortunately, the <TT>  </TT> tags did not turn out as well; so you can disregard <TT> in your book formatting.

Replacing Double (") Quotations with Curly (“ ”) Quotations

Microsoft Office: Make sure Word's “Autoformat as you type” is set up to replace straight quotes with smart quotes. Then hit CTRL-H. Replace " with ". It sounds counter-intuitive, but as Word replaces each straight quote, it replaces it with a correctly formatted smart quote.

Making Manual Page Breaks via hand edited Mobipocket Code

<mbp:pagebreak/>    This generates a hard page break in Kindle/Mobipocket format books. Generally, you want the pagebreak to be BEFORE the tag that generates the text for chapter headers; e.g. <mbp:pagebreak/><H1>Chapter One</H1>.

Making Manual Page Breaks via Calibre

If you are using Calibre to convert from HTML into Kindle (Mobi) format; don't use <mbp:pagebreak/> in your HTML. Instead, use this: <DIV style="page-break-after:always"></DIV>.
If you want automatic page breaks when a certain style is used; like for a chapter header, create a stylesheet at the beginning of your HTML document:
<HEAD>
<STYLE TYPE="text/css">
<!--
H1 { page-break-before: always }
-->
</STYLE>
</HEAD>
With this, every paragraph that uses H1 as a paragraph type will have a page break before it. You can change this by changing the H1 into H2, H3, etc...

HTML Header Tag Examples:

This is Header 1 (Size 24)

This is Header 2 (Size 18)

This is Header 3 (Size 14)

This is Header 4 (Size 12)

This is Header 5 (Size 10)
This is Header 6 (Size 7)

Fixing Corrupted Text in HTML Documents or Books:

You may come across e-books that have a large amount of “garbage” in them; such as:

enemy’s
instead of
enemy's

and

‘safety’
instead of
safety”

A possible “easy” fix for those is to insert the following bit of code at the beginning of the HTML file via notepad:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Plaintext forms for advanced HTML:

= &ldquo;
= &rdquo;