scrAPI on Snow Leopard
Posted by Marcus Wyatt on 3 November 2009
Today I needed to scrape some data from a website and tried to use the trusted old scrAPI to do the job. Grrrr, its not working. Throwing an error:
Scraper::Reader::HTMLParseError: Scraper::Reader::HTMLParseError: Unable to load /Library/Ruby/Gems/1.8/gems/scrapi-1.2.0/lib/scraper/../tidy/libtidy.dylib
After some time on google I didn’t find any fixes for the issue. So I decided to build from source…
I grabbed the assaf’s github repository.
- git clone git://github.com/assaf/scrapi.git
Then tried the tests by running
rake test
117 tests, 346 assertions, 0 failures, 44 errors
Nope errors all over the show… Looking at the original exception message, I checked if the libtidy.dylib exist on the lib/tidy directory. Nope, not there….
So where do I get this library file….
MacPorts to the rescue… Install tidy from MacPorts using the following command:
- sudo port install tidy
Now we need to find where MacPorts installed the files using the following port command:
- port contents tidy
The result:
Port tidy contains:
/opt/local/bin/tab2space
/opt/local/bin/tidy
/opt/local/include/buffio.h
/opt/local/include/fileio.h
/opt/local/include/platform.h
/opt/local/include/tidy.h
/opt/local/include/tidyenum.h
/opt/local/lib/libtidy-0.99.0.dylib
/opt/local/lib/libtidy.0.dylib
/opt/local/lib/libtidy.a
/opt/local/lib/libtidy.dylib
/opt/local/lib/libtidy.la
Now all we need to do is copy the library file to our scrAPI source directory:
- cp /opt/local/lib/libtidy.dylib [your source location]/lib/tidy/libtidy.dylib
Ok, before we speed ahead. Lets just run those test to check if all is fine:
117 tests, 474 assertions, 0 failures, 0 errors
Awesome, we are almost there. Next we need to build the gem using the rake:
- rake package
Make sure you get a ‘Successfully built RubyGem’ message. Now we are ready to install the newly build gem and test scrAPI again.
- sudo gem install pkg/scrapi-1.2.1.gem
And there you go, scrAPI working again.




![Framing #2 - Drottningholms slott/Drottningholm Palace (UNESCO World Heritage) [Explore First Page, THANK YOU] Framing #2 - Drottningholms slott/Drottningholm Palace (UNESCO World Heritage) [Explore First Page, THANK YOU]](http://static.flickr.com/7102/7204258846_3843eb8ecb_t.jpg)

Mike said
This worked like a charm, thanks man!
Marcus said
Great, thanks for sharing!
Jeremy Weathers said
I had to comment out line 3 of the scrapi Rakefile “Gem::manage_gems” was deprecated and removed
mike said
I also had to add a “require ‘test/unit’” in test/selector_test.rb when I was testing scrapi (running ubuntu 10.04)
Dana Corcino said
Thank you, I have been seeking for info about this subject for ages and yours is the best I have found so far.