scrAPI on Snow Leopard

Posted by Marcus Wyatt on 3 November 2009

Today I needed to scrape some data from a website and tried to use the trusted old scrAPI to do the job. Grrrr, its not working. Throwing an error:

Scraper::Reader::HTMLParseError: Scraper::Reader::HTMLParseError: Unable to load /Library/Ruby/Gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib

After some time on google I didn’t find any fixes for the issue. So I decided to build from source…

I grabbed the assaf’s github repository.

  • git clone git://github.com/assaf/scrapi.git

Then tried the tests by running

rake test

117 tests, 346 assertions, 0 failures, 44 errors

Nope errors all over the show… Looking at the original exception message, I checked if the libtidy.dylib exist on the lib/tidy directory. Nope, not there….

So where do I get this library file….

MacPorts to the rescue… Install tidy from MacPorts using the following command:

  • sudo port install tidy

Now we need to find where MacPorts installed the files using the following port command:

  • port contents tidy

The result:

Port tidy contains:

Now all we need to do is copy the library file to our scrAPI source directory:

  • cp /opt/local/lib/libtidy.dylib [your source location]/lib/tidy/libtidy.dylib

Ok, before we speed ahead. Lets just run those test to check if all is fine:

117 tests, 474 assertions, 0 failures, 0 errors

Awesome, we are almost there. Next we need to build the gem using the rake:

  • rake package

Make sure you get a ‘Successfully built RubyGem’ message. Now we are ready to install the newly build gem and test scrAPI again.

  • sudo gem install pkg/scrapi-1.2.1.gem

And there you go, scrAPI working again.


