Serverless Analytics with Amazon Kinesis and AWS Lambda

August 23rd, 2017 591 Words

AWS Lambda functions together with an Amazon Kinesis Stream offer a great way to process continuous information. I created an example project called Serverless Analytics to demonstrate this. You can use this as the starting point to create your very own Google Analytics clone and run it serverless and hopefully maintenance-free on Amazon.

Architecture

Serverless Analytics uses Amazon Kinesis to stream events to an AWS Lambda function. The JavaScript function receives up to 100 events per batch and processes the event’s payload. Based on the events, a simple request counter for your website’s URL in a DynamoDB table is increased. To easily put events to the stream, an Amazon API Gateway is used to proxy requests to Kinesis:

  • Amazon Kinesis to stream visitor events
  • Amazon API Gateway as HTTP proxy for Kinesis
  • Amazon DynamoDB for data storage
  • AWS Lambda to process visitor events

For data access, a basic dashboard is included. The dashboard is hosted on Amazon S3 and uses an API Gateway to request the data from the DynamoDB table. The basic setup looks somehow like this:

Serverless Analytics infrastructure

The website visitor tracking is done like any other service does it. You must add a few lines of JavaScript to your HTML pages and on every page load the browser sends a request with tracking data to a backend.

Deploying

Of course, the project relies on serverless for deployments. Just clone the repository, install all NPM/Yarn dependencies and make sure you have valid AWS credentials configured in your environment. After these simple requirements, you can run yarn deploy to get going.

After a successful deployment, the serverless-stack-output plugin writes a configuration file which is used to compile the static websites.

# Install dependencies
$ > yarn install
# Deploy
$ > yarn deploy

[]

Dashboard:  http://sls-analytics-dashboard.s3-website-us-east-1.amazonaws.com/
Website:    http://sls-analytics-website.s3-website-us-east-1.amazonaws.com/

Just visit the website’s URL, hit a few times the refresh button in your web browser and have a look at the dashboard!

Serverless Analytics examples

Tracking

Normally, visitor tracking works with sending an HTTP request to your tracking service (Google Analytics, e.g.). This can happen with a normal AJAX request or a non-JS fallback like a fake image.

The Serverless Analytics project uses the same approach. As said before, you have to copy a few lines of JavaScript into the footer of your website to enable tracking.

fetch("https://lqwyep8qee.execute-api.us-east-1.amazonaws.com/v1/track", {
  method: "POST",
  body: JSON.stringify({ url: location.href, name: document.title }),
  headers: new Headers({
    "Content-Type": "application/json",
  }),
});

On every page load, the JavaScript above sends a request to an Amazon API Gateway with information about the current URL and the title of the website.

Processing

All events about your website visitors end up in the Kinesis Stream and are processed by the AWS Lambda function. Based on the CloudFormation resource in the serverless.yml configuration, the AWS Lambda function receives up to 100 events per invocation.

The process.js file is the place where you can add more complex metrics. If your extended event processing takes requires too much time, you can always decrease the maximum number of events that this function receives.

Storage

All data are stored in a DynamoDB. As soon as your metrics get complexer, it might be smart move to rely on a different storage solution than DynamoDB, but for the current metrics, this is a suitable solution.

The serverless-dynamodb-autoscaling plugin takes care of configuring Amazon’s native DynamoDB Auto Scaling feature, so you should be covered for traffic peaks and lots of incoming events.

Feedback

Can you image running your own Google Analytics clone with serverless? You are welcome to write some feedback on twitter 👍


View on GitHub Source code is published using the MIT License.
  • Use SequelPro with OpenPGP cards like a YubiKey

    November 8 th, 2017 182 Words

    The YubiKey is a great OpenGPG smart card compatible hardware device. I use my YubiKey to store my private GnuPG key and for authenticating SSH connections. A few applications, however, don’t work with the OpenGPG card and require a file containing the key per default; Sequel Pro is one of them.

  • Use TypeScript and CircleCI v2 Workflows for NPM packages

    November 5 th, 2017 350 Words

    If you love software workflows as much as I do, you should check out my basics for deploying NPM packages using TypeScript, CircleCI v2, and GitHub Releases.

  • AWS Lambda with MaxMind GeoLite2 IP database

    November 3 rd, 2017 172 Words

    The MaxMind GeoLite2 database is basically the standard solution when you need to get the geo information for an IP address. Together with the mmdb-reader NPM package you can easily deploy your own serverless API to AWS Lambda to lookup locations for IP addresses.

  • Serverless DynamoDB Auto Scaling with CloudFormation

    July 19 th, 2017 151 Words

    Since a few days, Amazon provides a native way to enable Auto Scaling for DynamoDB tables! Luckily the settings can be configured using CloudFormation templates, and so I wrote a plugin for serverless to easily configure Auto Scaling without having to write the whole CloudFormation configuration.

  • Process Serverless CloudFormation Stack Output

    July 1 st, 2017 260 Words

    When you use a serverless environment for your service (and you should!), chances are high you might be using the Serverless framework and may end up in a situation like me with the need to process the AWS CloudFormation Stack Output after deploying the service.

  • Serverless Amazon SQS Worker with AWS Lambda

    April 1 st, 2017 1071 Words

    Have you ever wondered how to process messages from SQS without maintaining infrastructure? Amazon Web Services perfectly support SNS as a trigger for AWS Lambda functions, but with SQS you have to find a custom solution. This tutorial will show an experimental setup using Serverless to read messages from an SQS queue and build auto-scaling worker processes.

  • Serverless Alexa skill for Amazon Echo with AWS Lambda

    March 30 th, 2017 1667 Words

    If you read my first article about Amazon Alexa and AWS Lambda, you already know how to deploy a custom Alexa skill using Apex. With this article, you will learn how to use the Serverless framework to deploy a function to AWS Lambda and invoke it with your Amazon Echo using voice commands.