Load Apache combined format log files into AWS Redshift for analysis
$30-250 USD
Zavřený
Zveřejněno před více než 6 roky
$30-250 USD
Zaplaceno při doručení
Hello,
How are you? I hope i find you well.
I am looking for a small project with the following workflow:
We have apache servers that has more then 300 millions request each month. After an hour, the apache server archive the log files into .gz file. So I need a python script that:
- Unzip the .gz file
- Read all the files content
- Parsing the files into: IP, date, query, target, id (Example of the log: [login to view URL] - - [28/Aug/2017:14:38:47 +0000] "GET /0.4/query?target=%2Be9ChyFL&id=3ac218e5787584c09d96d230ed563ceb267b59f4&nonce=9d33f05fda3a5b0d09d6bf73f4078e9c44673c2d&lang=ru-RU&version=chrome-20170609&auth=c6def3a8b002ecf04e7dd629460161fc516da4f9 HTTP/1.1" 200 1148 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" 571)
- Inserting all the variables above into redshift database (for example 'insert into {table} values('{ip}','{date}','{query}','{target}','{id}')'
- This process need to be automatic - once the scripts sees new archive files in the server it immediately start working on.
- I need that our dev team to query our database, the results will be fast and won't take 2 hours. It's bigdata project indeed.
Thank you, looking for your reply.
Indeed it's a bigdata project. I will develop the script in Python and it will run through cron
Relevant Skills and Experience
Extensive experience with cloud platforms (Amazon AWS/Google Cloud)
AWS services including SNS, SES, Lambda, DynamoDB, ElasticBeanstalk,ELB
Deployments with LAMP/Ruby stack. Puppet/Chef. Python
Proposed Milestones
$199 USD - Default Milestone
Hello, i am a solutions architect. I am expert on Python and AWS. I have been working on a click stream project recently. I like your project. Please contact me. Thanks.
İlgili Beceriler ve Deneyim
python, aws, big data
Önerilen Aşamalar
$270 USD - project fee
I'm assuming that your web servers are on ec2 since you're using redshift. This can be solved with a python lambda using boto3.
Relevant Skills and Experience
I have experience automating AWS management tasks and large tech infrastructure. I have lambdas in AWS parsing 100s of accounts simultaneously and performing maintenance tasks dynamically.
I am a AWS Redshift expert, I can do this job easily.
Relevant Skills and Experience
I have more than 8 years of experience in data warehousing and from past 2 years I am working in Reshift database.
Proposed Milestones
$100 USD - python script
$122 USD - Redshift tables and copy script to load the data
What is your Redshift cluster size?