Software to scrape (public information) data from websites and web directories. Software must be installed on webserver and allow multiple users scraping data from DIFFERENT websites. Websites sources cover many countries. Ideally, a custom script will be developped for each website to do a perfect extraction from this website. Occasionnaly, these website specific scripts must be easily updatable or changed. The user first chooses which website he wants to scrape and the correct script is loaded. Then user decides his specific search and instructs the software to run the extraction script. For updating the data, the software must check existing data against new search and complete or update with the latest website data. User must be able to set speed and proxies so as to comply with 'Net politeness' (not to saturate the website and also avoid IP blacklisting. Scraped Data is exported to a) files (.CSV) possibly encrypted. b) or direct into a local (same server) or remote database (located on another webserver host machne) Probably technology: asp.net but I am open to any technology as long as it does a reliable, trouble free job. Reliability and smooth operation is more important than speed. User friendly interface is essential because users have very limited knowledge of configuring the scraping. Websites scripts can be delivered over several weeks or months, as long as we can get going with the initial scripts for USA (like yellow pages, superpages, and 3-5 others) Other details and voice contact AFTER Non-Disclosure Agreement (NDA) which might be required from shortlisted bidders. ONE LAST THING : Perhaps you know software that can meet my needs or that you can customize for me ? In this case, PLEASE do tell me and I undertake to remunerate you with a good bonus based on money this saves me. My goal is not to own software but to do scraping for myself and other clients. Thanks.
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Worker in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the worker's Worker Legal Agreement).
## Platform
I believe the asp.net platform is often used for this type of software running in a multi-users fashion on a webserver. I am open to any technology or platform suggested by the software developper. If the solution offered runs only on a client computer, then definitely it should be Windows based (XP or Windows 7) and perhaps we run several separate instances of the software on our webserver. One instance for each user doing a different scraping job. Server can be Unix or Windows, browser can be IE or FF. Database can be MySQL or other free and widely used.