Building a scalable web scraper for a large number of different websites

The goal of the project is to build a scalable web scraper which should scrape data from more a dozen different websites at first. Later on, it should be possible to upscale the scraper to a few thousand websites.

Those websites are known and should be added iteratively to the scraper. The websites have a different structure each which is why the development and maintenance costs per site need to stay as small as possible. The aim is to scrape the websites on a weekly basis at first. Later on, the scraping intervals should be reduced to a daily basis or even shorter. The scraped data needs to be stored in an useful and efficient way in a database in the cloud. Furthermore, the scraping must be intolerant to changes in the designs of the websites and it must prevent being blocked.

Currently, a simple scraper in Python exists which can scrape a few websites by using the Selenium library. However, this does not need to be continued at all cost.

The following tasks are part of your engagement for the project:

o Developing a modular and scalable software architecture for the web scraping project (preferably with Python)

o Containerizing the program in Docker

o Deploying and managing the containers in the cloud, probably with AWS and Kafka

o Implementing different measures to prevent blacklisting and being blocked

o Setting up a SQL database, probably PostgreSQL with AWS

The following tasks might be part of a further engagement:

o Implementing the web scrapers for a large number of different websites

o Maintaining and monitoring the scrapers for the websites

o Adding a web crawler to find additional websites

o Parsing the stored data and processing them into a more useful format

Your qualifications:

o Web Scraping (Importance: 9/10)

o Python (Importance: 7/10)

o Docker (Importance: 8/10)

o AWS (Importance: 5/10)

o Kafka or other Pipelining/Queuing Tools (Importance: 8/10)

o Cloud Databases (Importance: 6/10)

o PostgreSQL (Importance: 10/10)

You are expected to work closely together with our developer in Germany. The tasks above need to be coordinated and done in cooperation with him. Therefore, a willingness to work between 10 AM and 10 PM Central European Time is required.

We wish to get to know you first by working together in a limited project scope. If you are a fit for our team, we are willing to intensify our cooperation with you and hire you for future projects.

Dovednosti: Sběr dat z webových stránek, Python, Docker, Amazon Web Services, PostgreSQL

Zobrazit více: different websites visit, web mobile phone number, different websites, create web scraper, virtuemart large number items, upload video web php large, virtuemart slow large number products, need web scraper takes search data excel, building web scraper vba, web scraper odds betting websites, developers for hire web developer and database integration skills required will be c html5 css3 asp net java and sql, developers for hire web developer and database integration skills required will be c# html5 css3 asp net java and sql, building a web scraper in python, building scalable web sites pdf, building scalable web sites o'reilly pdf, building scalable websites henderson pdf, practical node.js: building real-world scalable web apps 2nd edition, building scalable web sites

O zaměstnavateli:
( 0 recenzí ) Germany

Identifikační číslo projektu: #28930972

10 freelancers are bidding on average €10/hour for this job


we are using python in scraping Please, contact me and send me the link to the site so I could make a FREE SAMPLE Please, contact me and send me the link to the site so I could make a FREE SAMPLE Hi there, I’ve read Další

€8 EUR / hodina
(72 Recenzí)

Hello there. I am very interested in your project. *** As web scraping and python expert ***. I can handle this and am confident of winning. So I have rich experience in scraping app development with python , seleni Další

€10 EUR / hodina
(4 Recenzí)

Hello. An experienced web extractor doing projects mainly in PHP but Python might also be an option. Thanks for considering Eugene

€15 EUR / hodina
(6 Recenzí)

Hello, This is Amine from Malaysia, a full stack web developer, who has working 5 years of working experiences in this field. I am fully feeling comfortable working with Python, web Scraping, AWS, PostgreSQL.. I will Další

€10 EUR / hodina
(3 Recenzí)

Hi, there. Here is an expert web scraping and automation developer who is very familiar with python/Selenium. After checking your job description and skill set, I found this job suits me as well. I can work in the tim Další

€12 EUR / hodina
(4 Recenzí)

This project really caught my eyes. I have the required qualification to do this work. I will be working with python using scrapy framework. There are really javascript heavy website nowadays which really makes it diff Další

€8 EUR / hodina
(14 Recenzí)

Hello I have read job description carefully and understood your requirements. I have worked on two projects based on python/selenium/web scraping similar with yours in past few weeks. On first project, I have implement Další

€12 EUR / hodina
(1 recenze)

Hi Sir Nice to meet you i am expert in python with web scraping at high level. I agree with your time zone confidential level of skiils you wrote above. Plase come in chat and show me details

€12 EUR / hodina
(1 recenze)

⭐⭐⭐⭐⭐ Hi, there. ⭐⭐⭐⭐⭐ You’re looking for a scraper and I can do it JS and Python. In the previous time, I made a code to catch the data from Italian site and also US sports betting sites. I can show my previous work Další

€10 EUR / hodina
(0 Recenzí)

I have strong experiance on below, please give chance to work on this project. qualifications: o Web Scraping (Importance: 9/10) o Python (Importance: 7/10) o Docker (Importance: 8/10) o AWS (Importance: 5/10) o Kafka Další

€6 EUR / hodina
(0 Recenzí)