Today I needed to scrape some data from a website and tried to use the trusted old scrAPI to do the job. Grrrr, its not working. Throwing an error:
Scraper::Reader::HTMLParseError: Scraper::Reader::HTMLParseError: Unable to load /Library/Ruby/Gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib
After some time on google I didn’t find any fixes for the issue. So I decided to build from source…
I grabbed the assaf’s github repository.
- git clone git://github.com/assaf/scrapi.git
Then tried the tests by running
117 tests, 346 assertions, 0 failures, 44 errors
Nope errors all over the show… Looking at the original exception message, I checked if the libtidy.dylib exist on the lib/tidy directory. Nope, not there….
So where do I get this library file….
- sudo port install tidy
Now we need to find where MacPorts installed the files using the following port command:
- port contents tidy
Port tidy contains:
Now all we need to do is copy the library file to our scrAPI source directory:
- cp /opt/local/lib/libtidy.dylib [your source location]/lib/tidy/libtidy.dylib
Ok, before we speed ahead. Lets just run those test to check if all is fine:
117 tests, 474 assertions, 0 failures, 0 errors
Awesome, we are almost there. Next we need to build the gem using the rake:
- rake package
Make sure you get a ‘Successfully built RubyGem’ message. Now we are ready to install the newly build gem and test scrAPI again.
- sudo gem install pkg/scrapi-1.2.1.gem
And there you go, scrAPI working again.