Proof of Concept, Parallel Distributed Processing client/server

Dokončeno Zveřejněno Mar 30, 2009 K zaplacení v momentě doručení
Dokončeno K zaplacení v momentě doručení

This project is to build a simple implementation of a PDP client server to test the performance potential of the concept. A server process will communicate with 1 to 4 clients to process an input files in parallel. Clients must be separate processes with their own pid and memory. The server will parse an input file into sub-units and pass a minimum of 5,000 sub-units per minute to each of up to 4 clients (subunits are generally smaller than 8K when stored as a text file). Because of required throughput, I believe that it is preferable for the server to parse the input files into shared memory and then pass a pointer for the individual subunits to the client.

The server will operate from a command line with an input file argument. It must be possible to build the project under Win 32bit XP, Win 32bit and 64 bit Vista, Linux 32bit and 64 bit and Mac OS/BSD using free compilers such as gcc, g++ etc (under cygwin in windows). It is not necessary to build and test all of these for this project. Other solutions are possible, but please specify the software requirements in your bid.

I would prefer an implementation of an existing IPC technology (open source API etc.) as opposed to custom code "from scratch". I have included a list of available IPC APIs and such in the detailed documentation.

Please feel free to make alternative suggestions or to ask questions. You may make a bid to write a white paper that outlines the requirements, tecnology and how the system would work if you don't want to do the programming yourself. It is possible that more than one bid will be accepted for this project to test alternative methods.

## Deliverables

The server process will parse an input file into sub-units and send each sub-unit to a client to be processed. The input file is an .sdf file containing a number of chemical structures. There is an alternate input format where each structure is contained in an individual file (.mol file). It may be simpler to process the individual files by sending a list of the file names or a directory of *.mol as an alternative to a multi-structure input file that must be parsed into individual structures. I have included test input files in both formats and the coder may decide which to implement. The client will need to do a simple calculation on each sub-unit and return the results to the server. I would like a word count, a character count, and a mathematical transformation of these counts. In C++, for(int i=0;i<1000;i++) { transform=wordCount+charCount; base_b(transform,2); base_b(transform,3); base_b(transform,5); base_b(transform,7); base_b(transform,11); } where wordCount is int count of words in the sub-unit, charCount is int count of characters and transform is the int transformation. The loop allows for the ability to configure the length of processing time for a individual client. Output will be a text file giving the name of the sub-unit (file name for .mol or "name" attribute for .sdf), wordCount, charCount, and transform in tab delimited text. It is not necessary to keep the output in registration with the input at this point. This system must be very stable and be able to run for days at a time while processing millions of sub-units. It is preferable for the server to write the output as soon as it is received from the client. Failure of the client processes is common and must not affect the server process. A client may fail with an exception (trapped), or may hang, or may fail in non-excepted situations. This should not happen with any of the test files that are included, but the coder should keep it in mind as a design requirement. If the coder does not have a quad core platform available, the application can be tested with 2 clients. The server should accept an argument to specify the number of clients (1-4) so that I can test it here on a quad core. It is also possible to provide ssh access to a windows quad core. It is highly preferred that IPC be managed using existing technology, of which many are available. I would much rather use an open source API that is mature and has been tested on many platforms than to start from scratch with new and untested code. The availability of a user support group is also a significant advantage. Some of the available technologies that I have located are, RCF [url removed, login to view] DPC++ http://www-gppd.inf.ufrgs.br/projects/mcluster/dpc++/[url removed, login to view] OpenMP MPI [url removed, login to view] Something with sockets or shared memory is also possible, [url removed, login to view]~singh/CIS725/Fall99/programs/[url removed, login to view] [url removed, login to view] There are several in interpreted languages, which are not my first choice, but I will consider if the coder feels that they can provide the required performance. Hadoop pipes [url removed, login to view] Ruby starfish [url removed, login to view] Ruby skynet [url removed, login to view] If possible, I am looking for a coder who has created a similar application and is familiar with the suggested solution. Please feel free to ask questions or make suggestions that are outside of the parameters that have been outlined. I have included sample files with a small number of chemical structures. Feel free to duplicate these to create larger input files or to request a larger file set.

Inženýrství Linux Mac OS Microsoft MySQL PHP Projektový Management Softwarová architektura Testování softwaru UNIX Pracovní plocha Windows

Identifikační číslo projektu: #3773095

O projektu

1 nabídka Projekt na dálku Aktivní Apr 2, 2009

Uděleno uživateli:

renardpaul

See private message.

$85 USD za 14 dní
(115 recenzí)
6.7