Find Jobs
Hire Freelancers

Bulk Scanned PDFs doing selective area OCR Application Dev for vyadzmak

$250-750 USD

Probíhající
Zveřejněno před více než 11 roky

$250-750 USD

Zaplaceno při doručení
Project Description: I have thousands of scanned form pdfs (by form I don't mean they are editable or fillable pdfs, they are just strict rasterized tif based graphics). The forms are of different types. The scan quality of some of the pdfs are medium at best. I need someone to develop a desktop win32/64 based software that does ocr of some specific area on the form and save captured data to database. With regards to application (call it templateApp from now on) picking out the desired area on form to be ocr, I envision it to be a Win32/64 desktop application, where administrator (generally selective few users only) whom has rights to setup this capture and ocr specifics, will be opening a pdf, from there, he/she can mark multiple "area of interests" (like how we select an area to crop out in mspaint, dragging from left upper corner to right lower corner), and such info will be stored in db somewhere to be used by the actual ocr application (call it ocrProcessingApp from now on). Subsequently using ocProcessingApp, all incoming pdf files (of the same form format and in bulk hundreds or thousands) intended to be ocr and captured text from these multiple "area of interests" will be processed accordingly, and all text found can be stored in mysql database. Ocr requirement is going to english text only in this project, please keep in mind if you can do additional language ocr, we can extend project to different phase handling multiple language. I don't have any ocr library of choice, please PM me what ocr library you intend to use to be seriously considered as a candidate. It is very important to obtain high accuracy in ocr text while the incoming pdf/images are clear. If you had done any customization like pre-processing images/pdf e.g. scaling, de-noising, etc what makes images clearer before feeding into OCR engines like Tesseract, gocr, etc, it would be a bonus. If you had experience with grid table parsing based on Tesseract/Cuneiform, it would be a bonus also. Some of the forms have a grid between the values, others don't. Please PM me how you plan to tackle this issue to make out the text we need to capture. Please keep in mind I intend to create a general purpose tool i.e. not specifically geared towards a specific job only. You can assume all scanned pdfs are straight and not skewed. You can use any programming language, but if it's not a .NET language, java, C++ or python, please check with me before. Also please include what language you will be using as part of your quote. Application must work on XP, Vista, Windows7 both x32 and x64 OS. Please PM me which programming language you intend to develop these two applications (templateApp & ocrProcessingApp) under. Please make sure application is bug free. Please try not to use any 3rd party components where I have to pay for licenses. But if you must, please include 3rd party component info and cost in your quote or PM me. Ultimately cost, reliability, and licenses for royalty distribution is major factors. I need all source code and rights to the source and binary code in the end. Thank you for your interest in bidding on this project. Possible follow-on projects based on satisfactory work on this project. If you have any questions, please don't hesitate to ask. Thanks.so. Skills required: .NET, C# Programming, Java, OCR, Visual Basic Per our discussion previously via private messaging... Thanks.
IČ projektu: 3993579

O projektu

3 nabídky
Vzdálený projekt
Aktivní před 11 roky

Chcete si vydělat nějaké peníze?

Výhody podávání nabídek na Freelancer

Stanovte si rozpočet a časový rámec
Získejte za svou práci zaplaceno
Načrtněte svůj návrh
Registrace a podávání nabídek je zdarma

O klientovi

Pochází z CANADA
Scarborough, Canada
4,9
47
Členem od led 7, 2009

Ověření klienta

Díky! Poslali jsme vám e-mailem odkaz pro získání kreditu zdarma.
Při odesílání e-mailu se něco pokazilo. Zkuste to prosím znovu.
Registrovaných uživatelů Zveřejněných projektů
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Načítání náhledu
Bylo uděleno povolení ke geolokaci.
Vaše doba přihlášení vypršela a byli jste odhlášeni. Přihlaste se znovu.