Find Jobs
Hire Freelancers

Build an Online Store

min $50000 USD

Zavřený
Zveřejněno přibližně před 7 roky

min $50000 USD

Zaplaceno při doručení
Large Scale Crawler Looking for a developer (or company) to build a robust web crawler system. There are approximately 20,000+ websites that we want to crawl and extract data from. We want to be able to extract these data within 3-6 months. 1. Design the architecture of the crawler or use existing open source crawler as a template. Because we’re dealing with large volume of data the architecture needs to be: • Robust and scalable • Efficient and Fast • Support proxies (to bypass anti-scraping systems) 2. Create Admin dashboard where Admin can: a. Add, Edit, View, Delete, Stop, Search crawler b. Input the URL to crawl c. Specify the data that needs to be extracted (ie. Title, Title URL, etc.) d. View, Edit, and Delete extracted data e. Option to download the data in JSON, XML, CSV f. API of the data (either via Authorization Tokens or other means) for upload and integration h. Users Management with ACL (Access Control List), Create, Edit, View, Delete users 3. Data normalization and clean up. The data coming in are unformatted and unstructured; an example would be the location or city, some site list location or city as Houston, TX, while other list as Houston, Texas or USA-TX-Houston. Therefore, the location or city data needs to be formatted, we use Google Location. 4. Because the data changes daily on these 20,000+ websites, there needs to be notifications put in place to notify the system of the changes (ie. what’s been added and what’s been removed) and update the data automatically. 5. Once the data is verified and cleansed, it will be available for search either via Solr or ElasticSearch or any other recommendation. Some of the technical challenges that need to be addressed from the beginning: • Make sure that the crawler compresses the data before fetching it otherwise it will uses a huge amount of storage • No need to re-crawl a website every 1-2 days, because it would be a waste of resources, however we do want the data every 1-2 days • Ways to prevent crawler from DoS (Denial of Service) • Ways to prevent the system from crashing and overloading because there are so many crawlers running • System should be scalable to handle crawling 100,000 – 200,000 websites • Queuing: does the crawler start right away or does it run in batches at a certain time? How does it scale when we start adding more sites to crawl? Example Day 1: Admin adds 100 sites to crawl Day 2: Admin adds 200 sites to crawl Day 3: Admin adds 500 sites to crawl Day 4: etc.
IČ projektu: 13528239

O projektu

12 nabídky
Vzdálený projekt
Aktivní před 7 roky

Chcete si vydělat nějaké peníze?

Výhody podávání nabídek na Freelancer

Stanovte si rozpočet a časový rámec
Získejte za svou práci zaplaceno
Načrtněte svůj návrh
Registrace a podávání nabídek je zdarma
12 freelanceři nabízejí v průměru $53 931 USD za tuto práci
Avatar uživatele
Hello sir I hope you are doing well. I have read your requirements carefully and I am very much confident to execute your requirements successfully. I am very expert in PHP , Laravel Framework ,Magento ,WordPress & woo-commerce, Drupal, Joomla and Website Design. I have 6+ year in Website Design and development. Please have a look at my profile, I have successfully done many projects. I work round the clock and available for discussions anytime. I am available now and ready to start the project immediately. I can provide work samples in private chat. Message me for further discussion. Many thanks for providing the opportunity to bid on the project. Thanks & Regards Gamdur Singh
$50 000 USD v 10 dnech
4,9 (164 recenze)
7,2
7,2
Avatar uživatele
Hello, I want to show you all relevant Demo and Designs which is similar to your project completed previously. To make sure about the requirement set and customizations, I want to discuss this project with you further on personal chat. Let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, I use to be online 24x7 on Freelancer so probably you will get a quick response from my end. Following are my Expertise Area: 1)PHP with CodeIgniter and Laravel Framework. 2)Node JS 3)Angular JS 4)Mobile App Development Thanks
$51 546 USD v 40 dnech
5,0 (20 recenze)
6,7
6,7
Avatar uživatele
Hi mate, I’d be glad to assist for web development . I have read description carefully understand requirement and planned to proceed with your requirement. I am excited for this opportunity and I have strong feeling that I could be the best fit for this job. I have 5+ years experience with web development. Proven experience in MySQL, HTML5, CSS3, JavaScript, Ajax & Strong JQuery. Excellent command over MVC framework. Good experience on working on large projects. We can discuss more about work on chat. Thanks Vishal
$50 000 USD v 60 dnech
4,8 (68 recenze)
5,9
5,9

O klientovi

Pochází z NETHERLANDS
Netherlands
0,0
0
Členem od bře 26, 2017

Ověření klienta

Díky! Poslali jsme vám e-mailem odkaz pro získání kreditu zdarma.
Při odesílání e-mailu se něco pokazilo. Zkuste to prosím znovu.
Registrovaných uživatelů Zveřejněných projektů
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Načítání náhledu
Bylo uděleno povolení ke geolokaci.
Vaše doba přihlášení vypršela a byli jste odhlášeni. Přihlaste se znovu.