Hello we are looking for a scala developer who has experience working on handling data in .packet form on spark clusters on google cloud platform. Basically the task is to access data from hdfs in .packet form, query through the data for relevant UIDs, fetch some specific fields in those UIDs, process parameters by performing some mathematical computations on those fields for those specific UIDs and store the processed values in a separate .packet file on hdfs. Further aggregation needs to be performed on the computed values, and final summary file needs to be stored into Mongo dB.
The technologies you need to be comfortable with : Dataproc on google (cloud native hadoop and spark), airflow (will be used for scheduling), google cloud platform (in general), scala (for scripts), Mongo dB (for data export)
8 freelanceů na tento projekt zveřejnilo nabídku v průměrné hodnotě ₹12156
Hi, I have more than 3+ years of experience in Hadoop technologies like mapreduce , spark, hdfs etc. I can complete your project contact me for more details
hi, I am hadoop, sparkand nosql engineer with 6 years of experience. can do this, comfortable with spark, scala, airflow, Google cloude. data proc I can manage.