SCRAPING SOFTWARE

Poll

Results

How do I go around www.craigslist.com which has captcha? Which are the scrapers which are able to tackle the captcha? Which scrapers are free that tackle this feature? And which one aren’t? Please share ? 🙂

solved 0
angelfun2002 2 years 4 Answers 685 views

Answers ( 4 )

  1. I found some Python packages that you can use for free I guess. The links are given below:

    https://github.com/Nalipp/craiglist-jobs-scrapper
    https://pypi.org/project/craigslist-scraper/

    I don’t know whether they can bypass captchas. But here’s how you would bypass captchas

    Sign up for a captcha solving service like 2captcha 
    write your own web scrapping script using Python/Node.js any other programming language.
    Use captch solving services API to solve captcha if you find it
    They will solve the captcha for you and you will be able to parse what you want from the page.

    NOTE: captcha solving services are not free. You pay for per thousand captchas they solve.

    I hope that helps.

    • Shahriar Shovon : Thanks for your help……. Not sure what you meant with : Sign up for a captcha solving service like 2captcha 

      What is 2captcha? And how do you sign up for that? Why would I want to sign up for this?  Thanks so much 🙂

       

      Angel 🙂

      • 2captcha provides APIs or Application Programming Interfaces that you can use to solve captchas. It is used in scrapping captcha protected web pages.

        The 2captcha APIs are not free to use. You have to create an account there and buy their service. That’s what I mean by “signing up for a captcha solving service”.

        You can learn more at the official website of 2captcha.

        Of course, there are other captcha solving services like DeathByCaptcha, Anti-Captcha, CaptchaCoder etc.

        You’re welcome.

         

  2. Craigslist has a lot of vital information, however it is discouraging that they do not provide an API for people to work with and get the needed data.
    To provide workarounds for the lack of an API, the needs for writing scrapers arose. However, to the dismay of the programmer, the scraper doesn’t work as expected due to CAPTCHA constraints.
    What Craigslist does is to check for IP addresses which are sending in web page requests too frequently, and the CAPTCHA would most likely pop up. One solution would be to make requests through a tool like Scrapy with large intervals between each request, hoping not to have the CAPTCHA come up.
    The best solutions to extracting data from Craigslist are paid scrapers like Octoparse, Import.io as their configurations can change the IP address at intervals without CAPTCHA issues.
    A free solution on the technical side would be extract the CAPTCHA images and keep looping through them to get an OCR readable CAPTCHA image.
    Another free solution—less technical—would be to make use of a Chrome extension called Instant Data Scraper. You can use it to scrape data from some websites, and Craigslist is one.
    However, a paid solution would be more productive as you do not need to go through the technical stress of writing a Scraper from scratch; exercising patience as you wait for long intervals or write code for OCR readable images.
    Some of these commercial data scrapers are at the risk of being sued by Craigslist as they have done so in the past. You can check more on that here: Craigslist vs 3 Taps.
    That said, still give commercial scrapers a try.

    Best answer

Leave an answer