When it comes to sifting through customer feedback about your company and products, you’d be spending a lot of time and money without some type of automation in place. After all, there are a lot of steps to manually analyzing feedback: you probably have to log in somewhere, navigate to feedback, click into each feedback item, write down some notes, try to figure out a solution, and repeat. Pretty much a full-time job if your company has more than five products and a lot of feedback.
Of course, sometimes when we think about automating feedback analysis, we might think the process will take too long to develop or just might not be efficient in the first place. But automation will not only save money, it will allow you to focus on more precious tasks.
In this post, I'll explain how you can spin up a simple system that offers insight from customer feedback through machine learning and natural language processing. We can use Amazon services to help automate and comprehend the feedback, and for less than $15 a month, we can have a system that gives us quick analysis on our customer feedback.
Automation Components
We will be using AWS Comprehend for understanding the language and context of the feedback and its topic modeling to do a multiclass classification to categorize our product event. We'll use serverless techniques such as AWS Lambda to read the feedback from an S3 bucket, send it to get data from AWS Comprehend, and then parse out results so that we can predict the classification using a machine-learning concept known as topic modeling.
Natural language processing
Natural language processing uses machine learning to find relationships in text. Amazon Comprehend then identifies language, key phrases, and sentiment, and can then organize feedback into topics.
Machine learning + topic modeling
Machine learning will process those identifiers and predict what the feedback topics might be. For example, if a user leaves us a negative review, we can identify the negativity through sentiment, then categorize topics like broken item, warranty, etc. We can then classify the occurrences of each product and figure out what the root causes of product success or failure might be.
Building Our Automated System for Feedback Analysis
Okay, Lets begin!
If you haven't already, this is a good time to head over to Amazon and sign up using with AWS's free tier: https://aws.amazon.com/.
Set up an S3 bucket
Let’s start with setting up an S3 bucket. In the top menu, click Services
and look for S3
. We'll create a bucket named feedback-codeship
.
Creating a Lambda Package
We're going to be creating our lambda package in this step. This will require some understanding or knowledge about Python and botocore. We'll be using Python 3.6 to build our Lambda function.
auto_feedback.py
from __future__ import print_function import boto3 from decimal import Decimal import json from bson import json_util import urllib from random import random import base64 from s3transfer.manager import TransferManager import datetime import time import os import os.path import sys import tempfile import botocore_deepinsight_beta import datetime #loading the functions from botocore botocore_deepinsight_beta.setup_aws_data_path() #setting up boto root = os.environ["LAMBDA_TASK_ROOT"] sys.path.insert(0, root) import json import boto3 import urllib.parse #s3 client setup s3client = boto3.resource('s3') #init comprehend deepinsight = boto3.client(service_name='comprehend', region_name='us-west-2', use_ssl=True) #comprehend topic modeling attrs data_access_role_arn = "arn:aws:iam::008875219265:policy/....." input_doc_format = "ONE_DOC_PER_FILE" number_of_topics = 3 #the input data config will be a content folder with the date on it so we can track the s3 uploads now = datetime.datetime.now() input_data_config = "s3://feedback-codeship/" + now.strftime("%Y-%m-%d") output_data_config = "s3://feedback-codeship/outputs" --------------- Main Lambda Handler ------------------ def handler(event, context): # Get the object from the event and show its content type bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus( event['Records'][0]['s3']['object']['key'], encoding='utf-8') try: object = s3client.Object(bucket, key) # create temp file with the review and download the review to it object.download_file('/tmp/analysis.txt') file = open('/tmp/analysis-out.csv', 'w') with open('/tmp/analysis.txt') as f: for line in f: text = line ts = time.time() st = datetime.datetime.fromtimestamp( ts).strftime('%Y-%m-%d %H:%M:%S') file.write(bucket + '/' + key + str(',')) file.write(str(st) + str(',')) # sentiment response sentiment_response = deepinsight.detect_sentiment( Text=text, LanguageCode='en'|'es') # key phrases response phrase_response = client.detect_key_phrases( Text=text, LanguageCode='en'|'es') # Comprehend data and write to a csv row file.write("Sentiment" + str(',')) file.write(str(sentiment_response['Sentiment']) + str(',')) phrases = phrase_response["KeyPhrases"] threshold = 0.80 file.write("Key Phrases" + str(',')) for phrase in phrases: if (phrase['Score']) >= threshold): file.write(str(phrase['Text']) + str(',')) file.write("Topics" + str(',')) topic_response = comprehend.start_topics_detection_job(NumberOfTopics=number_of_topics, InputDataConfig=input_data_config, OutputDataConfig=output_data_config, DataAccessRoleArn=data_access_role_arn) job_id = topic_response["JobId"] describe_topics_detection_job_result = comprehend.describe_topics_detection_job(JobId=job_id) list_topics_detection_jobs_result = comprehend.list_topics_detection_jobs() file.write( json.dumps(list_topics_detection_jobs_result, default=json_util.default) + str(',')) file.close() s3client.meta.client.upload_file('/tmp/analysis-out.csv', Bucket=bucket, Key='analysis/' + key + '.csv') return 'Analysis Successfully Uploaded' except Exception as e: print(e) file.write('Error: '.format( key, bucket) + str(',')) file.write(str(e) + str(',')) raise e
We start by setting the imports that we'll need and then setting up the initializer for botocore and s3client. Then we'll set the variables we'll need for the topic modeling attributes -- you can just fill in your own with my example.
The method name should be handler
so that AWS Lambda can call AutoFeedback.handler
when an event is triggered in our S3 bucket. Then we get the object from the event and content type and try to our resultants based off of the event.
We have also set up our output format to be uploaded as a csv
like this:
bucket_name, timestamp, Sentiment, sentiment_response, KeyPhrases, item1, item2, item3, Topics, topics(json)
Then we'll take the Lambda package and zip it up:
zip -FSr auto_feedback.zip . -x *.git* *bin/\* *.zip.
We will then save the zip to our S3 bucket.
Setting Up Permissions and Lambda Functions with CloudFormation
This will be a fun setup to have our yml
file do all of our heavy lifting and get the Lambda linked with the correct permissions.
Using the Lambda package we zipped up to our S3 bucket, we'll now use that path in our setup.yml
. CodeUri
is the place to put the S3 path of our package, so that the Lambda we’re trying to create knows its function.
Setup.yml
AWSTemplateFormatVersion: '2010-09-09' Description: Automated way of understanding feedback. Transform: 'AWS::Serverless-2016-10-31' Resources: S3: Type: AWS::S3::Bucket AutoFeedback: Properties: CodeUri: [S3 URL] Description: "Get Feedback Analysis" Handler: auto_feedback.handler MemorySize: 128 Policies: Statement: - Sid: "comprehend" Effect: Allow Action: - comprehend:* Resource: "*" - Sid: "s3" Effect: Allow Action: - s3:* Resource: !Sub "arn:aws:s3:::${S3}/*" Environment: Variables: bucket: !Ref S3 Runtime: python3.6 Timeout: 20 Type: AWS::Serverless::Function Outputs: output: Value: Ref: "S3" Region: Value: Ref: AWS::Region
This file that we're uploading into AWS CloudFormation will set up our permissions and Lambda functions. Here's a really good reference about the template anatomy of our Setup.yml
.
Then we will need to check the following options and check the transforms
and be able to execute CloudFormation to get us ready to add our event to the S3 bucket!
Back to the bucket
Now we need to navigate back to our S3 bucket:
Click the Properties Tab
, then click Events
, and finally add the following properties:
Time to Test
Upload your feedback as a .txt
file into the directory of the S3 bucket or feel free to use this sample feedback text:
November 18, 2017 Color: Light Silver_|_Verified Purchase
Ultimately this will be a two-part review. The first part deals with appearance, quality, packaging, and shipping. The product is substantial, heavy, well boxed, and protected. Should there be a need, the packaging is suitable for return purposes. There was no need for that in this case, as everything arrived in pristine condition. Set up was simple and easy, directions/user manual do show evidence of translation issues and are a little rough but still effective.
The second part of this review will be posted after using. Will assess cleanup, function, ease of use, etc. More to follow.
And see our output file
with some of the analysis we were looking for in our created output folder:
auto_feedback, 2018-05-10T14:10:10+00:00, Sentiment, POSITIVE, KeyPhrases, appearance, quality, packing, shipping, Topics, topics(json)
And there we are. A simple system that offers insight from customer feedback through machine learning and natural language processing, with AWS Lambda, Comprehend, and CloudFormation.