Today I needed to scrape some data from a website and tried to use the trusted old scrAPI to do the job. Grrrr, its not working. Throwing an error:
Scraper::Reader::HTMLParseError: Scraper::Reader::HTMLParseError: Unable to load /Library/Ruby/Gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib
After some time on google I didn’t find any fixes for the issue. So I decided to build from source…
I grabbed the assaf’s github repository.
- git clone git://github.com/assaf/scrapi.git
Then tried the tests by running
rake test
117 tests, 346 assertions, 0 failures, 44 errors
Nope errors all over the show… Looking at the original exception message, I checked if the libtidy.dylib exist on the lib/tidy directory. Nope, not there….
So where do I get this library file….
MacPorts to the rescue… Install tidy from MacPorts using the following command:
- sudo port install tidy
Now we need to find where MacPorts installed the files using the following port command:
- port contents tidy
The result:
Port tidy contains:
/opt/local/bin/tab2space
/opt/local/bin/tidy
/opt/local/include/buffio.h
/opt/local/include/fileio.h
/opt/local/include/platform.h
/opt/local/include/tidy.h
/opt/local/include/tidyenum.h
/opt/local/lib/libtidy-0.99.0.dylib
/opt/local/lib/libtidy.0.dylib
/opt/local/lib/libtidy.a
/opt/local/lib/libtidy.dylib
/opt/local/lib/libtidy.la
Now all we need to do is copy the library file to our scrAPI source directory:
- cp /opt/local/lib/libtidy.dylib [your source location]/lib/tidy/libtidy.dylib
Ok, before we speed ahead. Lets just run those test to check if all is fine:
117 tests, 474 assertions, 0 failures, 0 errors
Awesome, we are almost there. Next we need to build the gem using the rake:
- rake package
Make sure you get a ‘Successfully built RubyGem’ message. Now we are ready to install the newly build gem and test scrAPI again.
- sudo gem install pkg/scrapi-1.2.1.gem
And there you go, scrAPI working again.