Serverless Data Analytics On AWS (CloudTrail + S3 Data Lake + Athena + QuickSight)
‘Cloud Computing’ is on the rise and it is not a ‘new talk of the town’. And Serverless Architecture is one big hype! Without the hassle of managing and maintaining underlying physical architectures, it provides secure, durable and reliable source for storage of data and its security.
AWS, GCP, Azure has their fan-following and faith. According to a report in 2022 by Statistia, AWS stands as the leader with 33% and Azure with 21% and GCP with 8%.
When we talk of Server-less, AWS has a lot to offer from storage services to query services and visualisation options from data analytics perspective.
Analysing CloudTrail logs using Athena and Quicksight
In this, I have written on my practical on how Cloudtrail logs can be used to run SQL queries directly on the data in S3 without having to copy or move the data from its location. And finally, we are going to make use of Amazon QuickSight to create a really simple, introductory viz. for our API logs around the console.
- Creating CloudTrail
Using my Admin credentials for AWS Console, I created a Cloudtrail for data, management and insight events. It was storing data in a new, default created S3 bucket. By default, CloudTrail Lake is a multi-region service and enables you to log each and every API action on the console.
Do try exploring the Cloudtrail S3 bucket created, you can see folders organised for logs from data, insights and management. Also as a sub division, you can also see region data. This is a quite managed service for log-keeping across different accounts in an AWS organization.
2. Creating an IAM user
Next, you have to create a secondary (IAM) user. This step is kind of what you want to choose. You can have your user created with any access to log their actions. Remember, in AWS every action or every click is an API call. My user name was ‘raf’ and was created with a custom password and granted him full EC2 and S3 access.
3. Putting ‘raf’ to action
Log in as your IAM user and perform some actions. Remember, you have FULL ACCESS TO EC2 AND S3, *smirks*!
To have some good logs, I created a temporary EC2 instance and had a user data for a static webpage. After sometime, I deleted the instance and logged off the user, to put back my ‘Admin’ cap into action.
4. Creating Athena query service
Access the Event history for the Cloudtrail you created and you can see the logs there. Create an Athena table by referring to the same S3 table that was created your Cloudtrail logs. When using Athena for the first time, you need to configure a few settings for setting up a S3 table where your query results are stored, by default one is created and that can be selected.
A simple Query sample:
SELECT *
FROM
<athena table name>
WHERE
awsregion = ‘us-east-1’ AND
useragent LIKE ‘%console%’;
Note: You can see Athena table and Athena query results table using S3 service.
5. Lets log using Quicksight
You will have to set up an Athena subscription if you are a new user, and also authorise to access your S3 Athena buckets. Select your bucket and you can filter your initial viz. based on any field from the S3 Athena bucket, like ‘eventname’ , ‘awsregion’ and many.
You can also have custom sql like the one above, and have your viz. created accordingly. Also, several other visual types can also be explored like pie chart and line charts and many others.
Conclusion
The above was quite a basic and exploratory of what AWS has to offer to its users from Data Analytics perspective. Also, please stay alarmed while working on AWS services as not all are included in the free-tier.