vardomskiy (vardomskiy) wrote,
vardomskiy
vardomskiy

Converting prc/Mobipocket files to HTML on Linux, Unix or MacOS

This post is part of a general "fuck you" to Mobipocket Corporation (now owned by Amazon), whose Mobireader for Symbian S60 is brain damaged, and doesn't allow one to select arbitrary directory on the flash or SD card. Also it's a general "fuck you" to various publishers that even though sell a "DRM-free" book to paying customers, use a highly proprietary format to try to force the reader adoption


If you have a .prc file (MobiPocket MobiBook format, commonly used by people such as Baen Free Library/Webscriptions, Tor publishing, Amazon, etc) and a Linux/Unix or MacOS system, and you would like to convert prc file to HTML, here are the steps that you need to take:



On the command line:

0a) Verify that you have developer tools installed on your system. Basically you need gcc and make and perl. If deploying from scratch, compile wget (ftp://ftp.gnu.org/pub/wget/) and ncftp/ncftpget (ftp://ftp.ncftp.org/ncftp/), since perl will happily fetch modules using those tools until it gets lwp going.

0b) as root start up perl with CPAN module, run through the initial configuration if you never did that before, and, if necessary install CPAN bundle.

root@macbook:~[10:18 am]# uname -a
Darwin macbook.local 9.5.0 Darwin Kernel Version 9.5.0: Wed Sep  3 11:29:43 PDT 2008; root:xnu-1228.7.58~1/RELEASE_I386 i386
root@macbook:~[10:18 am]# perl -MCPAN -e shell

/System/Library/Perl/5.8.8/CPAN/Config.pm initialized.


CPAN is the world-wide archive of perl resources. It consists of about
100 sites that all replicate the same contents all around the globe.
Many countries have at least one CPAN site already. The resources
found on CPAN are easily accessible with the CPAN.pm module. If you
want to use CPAN.pm, you have to configure it properly.

If you do not want to enter a dialog now, you can answer 'no' to this
question and I'll try to autoconfigure. (Note: you can revisit this
dialog anytime later by typing 'o conf init' at the cpan prompt.)

Are you ready for manual configuration? [yes] yes

[...]

Basically defaults as CPAN module prompts you are generally pretty safe choices. You might want to use /tmp/.cpan as temp space, however, so that it would get purged next time you reboot.

[...]
commit: wrote /System/Library/Perl/5.8.8/CPAN/Config.pm
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.7602)
ReadLine support available (try 'install Bundle::CPAN')

cpan> install Bundle::CPAN

You will be prompted 3 or 4 times during the installation of Bundle::CPAN. Generally suggested defaults are fine.

Once Bundle::CPAN is installed, type

cpan> reload cpan

to reload the CPAN module.



1) now install EBook::Tools.

At the time of writing latest version of EBook::Tools is version 0.3.3

cpan> install EBook::Tools                                                                                                 
Running install for module 'EBook::Tools'
Running make for A/AZ/AZED/EBook-Tools-0.3.3.tar.gz
Fetching with LWP:
  ftp://ftp.nrc.ca/pub/CPAN/authors/id/A/AZ/AZED/EBook-Tools-0.3.3.tar.gz
CPAN: YAML loaded ok (v0.68)
CPAN: Digest::SHA loaded ok (v5.47)
Fetching with LWP:
  ftp://ftp.nrc.ca/pub/CPAN/authors/id/A/AZ/AZED/CHECKSUMS
Checksum for /tmp/.cpan/sources/authors/id/A/AZ/AZED/EBook-Tools-0.3.3.tar.gz ok
EBook-Tools-0.3.3/
EBook-Tools-0.3.3/t/

[...]

Ebook-Tools will have a bunch of pre-requisits, that it will ask you about installing. Say yes.

Some of the pre-requisites have other pre-requisits. Luckily CPAN shell figures it out for you, and you can just tell it that yes, you do want to install the pre-requisites as well.

At the time of writing, the latest version of EBook:Tools is 0.3.3. It has a problem in lib/EBook/Tools/Unpack.pm module, where if there is no EXTH data in a prc file, it will croak. Zed Pobre, author of the package fixed it in upcoming version 0.4 of the module, but in the the meanwhile you have to edit Unpack.pm (on a Mac located somewhere along the lines of /Library/Perl/5.8.8/EBook/Tools/Unpack.pm, on a Linux box somewhere like /usr/local/share/perl/5.8.8/EBook/Tools/Unpack.pm), and find line that reads
    my @mobiexth = @{$$self{datahashes}{mobiexth}};

It should be line 412 or so
and replace the above line with
    my @mobiexth;
    if(defined $self->{datahashes}{mobiexth})
    {
         @mobiexth = @{$self->{datahashes}{mobiexth}};
    }




The actual patch:
Index: lib/EBook/Tools/Unpack.pm
===================================================================
--- lib/EBook/Tools/Unpack.pm   (revision 301)
+++ lib/EBook/Tools/Unpack.pm   (working copy)
@@ -522,7 +522,11 @@
     my $subname = ( caller(0) )[3];
     debug(2,"DEBUG[",$subname,"]");
 
-    my @mobiexth = @{$$self{datahashes}{mobiexth}};
+    my @mobiexth;
+    if(defined $self->{datahashes}{mobiexth})
+    {
+        @mobiexth = @{$self->{datahashes}{mobiexth}};
+    }
     my $data;
     my %exthtypes = %EBook::Tools::Mobipocket::exthtypes;
     my %exth_is_int = %EBook::Tools::Mobipocket::exth_is_int;




Now you are ready to convert prc file to HTML.

Find ebook.pl file (/Library/Perl/5.8.8/EBook/ebook.pl on a mac, or /usr/local/share/perl/5.8.8/EBook/ebook.pl) and create a symlink to it somewhere convenient, and run it


macbook:~ $ cd ~
macbook:~ $ ln -s /Library/Perl/5.8.8/EBook/ebook.pl ebook
macbook:~ $ ./ebook 
Failed to open /Users/user/.ebooktools/config.ini: No such file or directory at ./ebook line 49
Use of uninitialized value in string eq at ./ebook line 55.
No command specified.
Valid commands are: adddoc additem blank config dc downconvert fix genepub genmobi setmeta splitmeta splitpre stripscript tidyxhtml tidyxml unpack
macbook:~ $ 




Now run ebook.pl with an arguments unpack filename.prc

macbook:/prc/Tor/prc$ ls -la Scalzi_John*
-rw-r--r-- 1 user group 856032 2009-01-26 11:11 Scalzi_John-Old_Mans_War.prc
macbook:/prc/Tor/prc$ ebook unpack Scalzi_John-Old_Mans_War.prc 
Use of uninitialized value in string eq at /usr/local/bin/ebook line 55.
macbook:/prc/Tor/prc$ ls -la Scalzi_John*
-rw-r--r-- 1 user group 856032 2009-01-26 11:11 Scalzi_John-Old_Mans_War.prc

Scalzi_John-Old_Mans_War:
total 896
drwxr-xr-x 2 user group   4096 2009-01-30 13:58 .
drwxr-xr-x 3 user group   4096 2009-01-30 13:58 ..
-rw-r--r-- 1 user group  34668 2009-01-30 13:58 Old_Mans_War-0001.gif
-rw-r--r-- 1 user group   5868 2009-01-30 13:58 Old_Mans_War-0002.gif
-rw-r--r-- 1 user group   1128 2009-01-30 13:58 Old_Mans_War-0003.gif
-rw-r--r-- 1 user group   9520 2009-01-30 13:58 Old_Mans_War-0004.gif
-rw-r--r-- 1 user group 838839 2009-01-30 13:58 Scalzi_John-Old_Mans_War.html
-rw-r--r-- 1 user group   1343 2009-01-30 13:58 Scalzi_John-Old_Mans_War.opf
macbook:/prc/Tor/prc$ 


Now you should have a directory with the base name of the prc file, containing HTML version of the prc book.


Update 2010-03-01: Current Ebook::Tools is 0.4.6, and for the most part it works really well on .mobi files out of the box, so give that a try.

There is a bug when if the .mobi file has multibyte unicode characters split across pdb section boundaries, then a block of the output gets corrupted. This is not a common situation, and hopefully there will be a fix in a near future.

Other then that, Ebook::Tools is just perfect.
Tags: fuckyou
Subscribe

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment