Posts

On Thursday July 30th, Amazon Web Services announced a new developer community, AWS Community Builders Program. For months, Chris Miller, founder & CEO of Cloud Brigade, has had to keep this exciting news a secret. Due to Chris’s public contributions around AWS services, he was invited to participate within the Beta phase of this important program.

“I am honored and grateful for being invited to participate in this community,” said Chris. “As an AWS Community Builder I participate in an exclusive community of AWS enthusiasts, build cool things, and write about those experiences.”

Some of Chris and the Cloud Brigade team’s AWS contributions include advanced Machine Learning and Artificial Intelligence projects like the Poopinator and as a finalist in the 2019 AWS DeepRacer Challenge, a global autonomous racing league. Several additional cool projects are in the works, and it’s encouraged to stay informed on Cloud Brigade’s latest news.

This program’s purpose resonates with the Cloud Brigade team because it’s similar to what drives its own mission. Cloud Brigade is not only passionate about helping others build on AWS, but also passionate about helping companies make a positive market impact with meaningful outcomes. The Cloud Brigade team is all about building up communities and improving community knowledge sharing. Its 15 year presence and involvement within the Santa Cruz community is far and wide, including launching the careers of 40+ local college students. Now with Chris and Cloud Brigade’s extended AWS Community Builder resources, the team is looking forward to expanding its reach and advancing its innovation for any business or technical challenges out there.

If you’d like Cloud Brigade’s assistance with your project, please contact them here: https://www.cloudbrigade.com/contact/

If you’d like to apply to join the AWS Community Builders Program, it’s open to accept a limited number of global applicants per year. Apply here: https://aws.amazon.com/developer/community/community-builders/

#AWScommunity

@AWScloud

@CloudBrigade

What do strange dogs pooping in your yard and the way some people are responding to Covid-19 have in common? Both are undesirable behaviors that can be hard to detect and correct – until now. Applying the phrase, “with conflict comes creativity,” we got creative and designed and deployed an efficient and powerful ML/AI solution for detecting and correcting problems like these.

One night I had walked down the hill in the dark to fetch our garbage cans. About 30 minutes later I began getting that familiar whiff. You’ve been there, right? I had stepped in dog poop. Coincidentally, we had recently acquired a pair of AWS DeepLens cameras, essentially a development kit to explore machine learning through computer vision. Now I had the conflict, and the resources to get creative.

Introduction

Inspired by my recent Faux-paw, I set about to solve this problem with technology. You can imagine the look on my engineer’s face when I told him he was going to spend the next day training a computer model based on images of pooping dogs. As it turns out, you can find just about anything in Google Image Search, and this was no exception.

As I began to talk openly about our project in the business community (and received a variety of questioning glances), my friend Doug Erickson became animated and proclaimed : “I totally need that!”, referring to a rivalry with one of his neighbors. With product market fit established, The Poopinator™ was born.

Detecting bad behavior would not be enough, we would need a correction mechanism. After pondering several solutions, including an air powered truck horn, we decided to leverage a more humane technology that already existed in these problem locations – the sprinkler system.

SageMaker

For the uninitiated, SageMaker is a collection of tools AWS provides to enable geeks like us to do geek stuff like training computer models. Those tools include :

Labeling (Ground Truth)
Notebooks (Jupyter)
Training
Inference

While this article is meant to appeal to both technical and non-technical folks, you can learn more about AWS SageMaker here, and feel free to skip to the following sections.

Ground Truth

The first step in building a computer vision model is “labeling” the images using AWS Ground Truth. That is to say you use sample images of objects in order to train the model what you want it to detect. In the case of our canine friends, we had a constraint of only detecting a dog when they were in the act. So we set out to procure images of dogs who were pooping (and not pooping), then draw “bounding boxes” around the dogs.

Once you painstakingly label the images (it’s not really that hard), Sagemaker produces a manifest file which you will feed into your “training” job. But in order to do this, you need to create a Jupyter Notebook, which is kinda like a page in a Wiki, but it allows you to add and execute computer code such as Python.

For those of you who are new to Jupyter, there are plenty of example notebooks available in SageMaker that you can modify. If you want the sordid technical details, just click the “Deep Dive” buttons throughout the article.

Jupyter Deep Dive

Here we dive into the details of the SageMaker Notebook for the Poopinator. In the interest of providing clean code, we start with a number of variable declarations. We also copy our source images from the S3 bucket from our labeling job.

After the labeling job is complete we can review the results to verify the images were labeled correctly. Let’s get the labeling job manifest and copy it to out notebook.

Next we plot the annotated images. We display the images from the notebook and draw the bounding boxes from the labeling job on top of them.

Now let’s read the augmented manifest line by line and display the first 10 images with annotations and bounding boxes.

Training The Model

With all of the busy work completed in our notebook, we then launched a training job. With 221 images in our dataset, it took SageMaker just under an hour to train our model using a ml.p2.xlarge instance. In our first attempt we had about 100 images in our dataset, and it just underperformed during our inference testing.

We won’t go into the weeds here of training our model, other than to say we are using a pre-trained MXNet based model based on the Resnet50 convolutional neural network. It’s a thing-a-ma-jigger that enables us to do object recognition with images.

Again if you want to see how we manage the training job in SageMaker, just click the Deep Dive button below.

Training Deep Dive

We are now ready to use the labeled dataset in order to train a Machine Learning model using the SageMaker built-in Object Detection algorithm.

First we read in the list of annotated images, and randomly split them into a training set and a validation set.

Next, let’s upload the two manifest files to S3 in preparation for training.

We are ready to start the training job in the SageMaker console. You could do this manually, but why would you want to when you have a template? 🙂

Inferencing

Inferencing is the act of testing your model to see if it accurately “infers” the expected result. In this case to detect a dog, but only when it is pooping. Put in different terms, an inference can be defined as the process of drawing of a conclusion based on the available evidence plus previous knowledge and experience.

SageMaker provides an easy way to test your models using a variety of EC2 instances and GPUs. In the case of object detection with images, we simply feed SageMaker a bunch of test images, and it returns the images with bounding boxes drawn around the object it detected, as well as a label and a percentage of confidence.

As you can see the model did pretty darn good, but it did miss one very bad poodle. On the other hand, it did very well at not misinterpreting the sitting beagle. We won’t worry too much about this at the moment.

The details on inference in the notebook are in the Deep Dive.

Inferencing Deep Dive

Now that the training is done, let’s test the model.

Next we create an endpoint configuration.

The next cell creates an endpoint that can be validated and incorporated into production applications. This takes about 10 minutes to complete.

Once the endpoint is up, we can load our test images, define our bounding boxes, and send the images to the endpoint for inferencing.

Deploying to DeepLens

DeepLens is a camera, a computer, and an IoT device all wrapped in one. From the dog detection project example, we learned we could capture and stream video to AWS Kinesis Streams, which allowed us to view and share video of Fido doing his dirty deed. In addition we can use AWS IoT Core via the MQTT or Mosquito protocol to report events in real time which we use to turn on the sprinklers.

In order to deploy our model to the DeepLens, we had to optimize and repackage it.

DeepLens Deep Dive

Because we trained a custom model in SageMaker, in order for it to be deployed “at the edge” to the DeepLens device, we needed to optimize the model. This is in part due to the fact that the DeepLens does not have the compute horsepower we are afforded in the cloud, as well as utilizing the device specific GPU for inferencing.

Optimization is a process of converting the model’s “artifacts” into a format the DeepLens can utilize. AWS DeepLens uses the Computer Library for Deep Neural Networks (clDNN) to leverage the GPU for performing inferences. As of the time of this writing, the Apache MXNet project removed the utility referenced in this article, so you must checkout branch v1.7x to access the deploy.py script. Additional references here.

GreenGrass (aka Lambda at the Edge)

DeepLens leverages a special Lambda function to inference images. This is a little counterintuitive because you can’t actually test your Lambda function in the cloud. Lambda is simply an IDE and deployment mechanism in this case. Of course you can go old school and edit your code with Vim on the device during development.

We used the object-detection example code as a starting point, and layered in the Kinesis code from the dog detector example, and finally the image optimization code from the native bird identifier.

IoT Sprinkler Controller

If the above wasn’t cool enough, this is perhaps one of my favorite parts of the project. Hearing that “crack” sound of an electrical load switch on from a JSON payload just makes my hair stand on end. Our tool of choice for simple workloads is an Arduino based controller, and the Wemos D1 Mini Pro is my favorite. This unit is extremely small, and cheap ($5) in comparison to an official Arduino, but be warned it has some nuances which mean it doesn’t “just work” out of the box.

Programming an Arduino can be as simple as downloading a program file from Github, to pulling your hair out because you really need to know some C/C++ foo such as typecasting, pointers, managing heap size, etc. For building a proof of concept, it’s mostly just cut and paste from example code.

In our case we used the Arduino MQTT library (aka PubSubClient) from Nick O’Leary, and ArduinoJSON from Benoit Blanchon. One important note is that AWS IoT Core’s preferred authentication mechanism is certificates, and there are several ways you can load certificates in Arduino. We chose to use SPI Flash File System (SPIFFS) which requires the FS.h library, and a file uploader plugin for the Arduino IDE.

There is a lot to go into just on the Arduino controller, so we’re going to save that for another article. You can subscribe below to get the updates.

Project Box

Of course we needed to protect our DeepLens from the elements, and in Santa Cruz California you just never know when the fog is going to roll in. Who knows how many days it was going to catch Fido in the act, much less prevent water damage from the sprinklers. Fortunately, our friends at the Idea Fab Labs (down the hall from our office) hooked us up with laser cutting of a cedar and acrylic DeepLens enclosure we designed.

Deploying The Poopinator™

After a few collective weeks of engineering, we were ready to school Fido on neighborhood etiquette, AI style. In reality, we were ready to deploy our platform and start fixing bugs. As you probably know, nothing ever works in production the way it did in the test environment.

Aside from configuring the devices for Doug’s wifi network, all that was left to do was connect our Wemos IoT device to the sprinkler controller. We were able to connect our device inline so that sprinklers still ran at their regular timed interval, yet still have unfettered access to the sprinkler solenoid on the front lawn.

With the camera deployed, we started by testing the object-detection model provided by AWS, and ensuring video streams were being sent to Kinesis Video Streams. Doug was so accommodating, allowing us to use his front yard as the test bed. Even though we are friends, there’s just something creepy about controlling a camera at someone else’s house.

The Punishment Due

Of course you are waiting to see the video of Fido getting nailed by the sprinkler. Unfortunately this is a bit like taking your car to the mechanic, only to find out that squeak magically went away. We’re waiting for old Fido and we’ll post video here in the coming days.

A Practical Example

Now I’m sure you’re probably smiling ear to ear with this story, and if you are interested in owning your own Poopinator, or talking with us about ML/AI, reach out to us! If there’s enough interest in this kind of device, we might just produce a product.

In our current environment however, there’s perhaps a more “plaguing” example. (See what I did there?). Many businesses are struggling with public health rules imposed by a county health department near you. We’re talking about wearing a mask when entering local businesses.

We took our example a step further and are training a model to detect people not wearing a mask, or wearing the mask incorrectly. While we briefly considered soaking them with a sprinkler, we felt a more appropriate solution was to sound an audible alarm and perhaps a visible sign.

What’s Next

If you like what you read here, the Cloud Brigade team offers expert Machine Learning as well as Big Data services to help your organization with its insights. We look forward to hearing from you.

Please reach out to us using our Contact Form with any questions.

If you would like to follow our work, please signup for our newsletter.

It was November 2016, I was listening to public radio on my way to work. Our president elect was already known for his use of Twitter and reporters were discussing their challenges, grappling with what they should or should not report on. At that time there was a growing feeling that those tweets often distracted from more significant news.

Fake News was discussed daily, and I had been thinking about this problem for a few months. On a whim I thought, what if we created a website that allowed the public to rate our president’s tweets as true or false? It would be simple to build and integrate with twitter and maybe it would provide a benefit to the public. Then I thought I’d probably receive a cease and desist in about 5 nano-seconds, so I tabled the idea.

The following Monday during my 40 minute commute, I had come up with a better idea. What if we provided a platform that used “crowdsourcing” to rate news articles? The existing systems, primarily websites like Snopes, simply could not scale in response to the amount of “news” coming from thousands of websites.

You Debunk It was born

Of course there are many considerations that would determine if this would actually work. First, would the general public be willing to invest their time into rating news articles? How would we keep this tool unbiased, non-partisan, and prevent it from being gamed by individuals or bots? Within an hour I had drafted a project specification and taken it to our software development team to discuss.

We looked at several existing “design patterns” which were used successfully on other websites with an engaged community. As we roughed out our user interaction model, we combined these patterns to provide a system of checks and balances that would ensure there was no single source of truth in the rating system, and to minimize bias as articles were rated.

Finding Inspiration

We believed that a portion of our public, say 5%, was invested enough in the pursuit of truth to spend time rating articles. This type of engagement was evident in communities such as Reddit, Quora, Stack Overflow, Slashdot, and others. These sites provided the ability for questions to be answered, discussions to ensue, up/down voting, karma scoring, and flagging for abuse or spam.

We set out and built our web based application with :

News article submission system
Multi-factor signup and verification system
Searchable list of articles
Article ratings with threaded comment sections

Will They Use It?

With the proof of concept built and tested, we realized we faced a number of problems with adoption. Without enough articles, it would be difficult to ramp up adoption. The bigger problem was getting people to remember to use our website when reading news. We needed a way to remain relevant daily.

We considered that most news consumption was happening within social media sites such as FaceBook and Twitter. If we wanted to keep the public engaged, we needed to be part of the social media sites. We assumed FaceBook was not going to be open to this idea, so we went rouge and built a browser plugin.

Hacking FaceBook

We were not the first with this idea, a plugin had recently been released called the BSDetector. This plugin provided a small overlay providing a fake news category if the website was on the list of domain names curated by the author. This was later replaced with an independently maintained list from OpenSources.

The problems with this solution were that the list could be biased, and it only rated domain names. In reality, any website was capable of publishing inaccurate news, and we needed to be granular to the article level. A plugin that could hijack FaceBook’s news feed and overlay content was the feature we were after.

So we set out to build our own browser plugin which looked at the type of content posted in the FaceBook feed, and overlay our own toolbar with a category and truth score (if available), and a button a user could click to rate the article. This would drive user engagement, just what we needed. If FaceBook sent us a cease and desist, so be it. We provided an optional user installed plugin, and I suspected all they could really do is constantly change their HTML markup to thwart our plugin.

Great Responsibility

In addition to providing news ratings from our database if the URL existed in our database, this also gave us unprecedented access to composition of every user’s FaceBook feed, and allow us to gain insights that only FaceBook could see. Are you freaking out about privacy right now? We were too, and the Cambridge Analytica data scandal was still a year away from going public.

We had legitimate reasons for collecting this data in order to provide an effective tool which understood the news and social media landscape beyond what our users took the time to submit. It would also give us access to time and geographic data that would show how fake news spread across the internet.

Data is Scary

If you didn’t already know this, any browser plugin you install has access to your browser history in real time and can transmit that data to a remote server. While this is scary, the data can be used for a multitude of legitimate and beneficial outcomes. We now had a huge responsibility on our shoulders.

If users were going to adopt our application, they would first and foremost need to trust us. We couldn’t sugar coat it, we had to proactively disclose what data we were collecting and how we would use it. We also had to secure and randomize this data, including the user’s FaceBook ID which we used to prevent duplicate submissions from a single person.

Of course there were a number of other technical challenges, such as keeping the platform fast, and managing the massive amount of data we would receive, not to mention very bursty traffic patterns. You can read about the architecture of the application here.

How It Works

When an article was submitted to the system, the poster was required to provide a rating consisting of :

A category (inspired by BSDetector)
A 5 point truth score (inspired by Politifact)
A short narrative supporting this rating with at least one valid source (inspired by SlashDot)

In order to avoid voting bias, the category and rating were hidden from the public until rating consensus was achieved by a quorum of users.

When additional posters submitted a rating, they were required to agree or disagree with the original poster’s narrative and source (inspired by Stack Overflow). Optionally they could engage in discussion and those comments could be up-voted or down-voted (inspired by SlashDot, Reddit, Quora)

Users would be assigned a hidden karma score based on their engagement on the site, and taking into account all of their interactions. This would serve to provide a weight in how their ratings would be interpreted and trusted. Karma was a way not only to combat trolls and bots, but also deal with reasonable people who might have been “triggered” when posting on the site.

Another design consideration was that we didn’t know what we didn’t know. Over the lifetime of the application, our algorithm would undoubtedly change. Because we needed our system to provide consistency across all article ratings, we would preserve and protect all original user submitted data, and retroactively recalculate ratings when we changed our algorithm.

What happened to You Debunk It?

We were inspired to create a tool that contributed to the common good. It was a tool we invested tens of thousands of dollars into, time that would have otherwise been spent on client projects. Because there was really no viable business model behind the application, it became a social experiment, and a mock software project for some of our junior employees.

We looked at the possibility of funding from the usual suspects who support sites like Snopes, and Politifact, but this project was basically going to require a full time team to build and maintain. We just didn’t see a funding model that would support this for the long term.

Why was our Fake News app a horrible idea? As we approach the 2020 election, we’ve learned a lot about our society in the last four years. There is an abundance of information available to support any type of bias, and this has resulted in endless arguments on social media, each party believing they are correct.

We built this tool so people had a chance to think before they shared fake news in a headline triggered world. What we’ve learned is that the people who need to fact check the news won’t.

The unfortunate reality is that we live in a society in which many people simply don’t care about the truth, they only care about supporting their own bias. Can we fix this with technology?

Chris Miller
Founder, Cloud Brigade

You Debunk It is Dedicated to Greg Bettinger

Greg was a kind, gentle, and brilliant soul whom we had the pleasure or working with for five years. Greg was an MIT graduate, critical thinker, and curious computer scientist. Solving the fake news problem was something Greg was passionate about, so much so that despite his diagnosis with stage four colon cancer at age 49, he insisted on his continued work on You Debunk It through his sickness. Greg passed away surrounded by family on March 30th 2018.

A little known benefit of being a Santa Cruz Works Sponsor is that you get $1000 of AWS credits each year (along with other benefits here). You might be asking yourself, how can I actually use this great benefit?

To the individual “the cloud” is this mystical place where we store photos and check email. To businesses small or large, the cloud provides a lot of benefits that were previously too expensive or out of reach. Below are four examples of what you could do with your AWS credits. Hint: We saved the best for last.

You can run your bookkeeper’s workstation in the cloud.

We’ve heard from a number of clients recently who are looking to centralize access to their PC based accounting software, such as Quickbooks or Sage. With “fractional” positions becoming popular, in which a person serves a work function as a part time contractor, these functions are increasingly done remotely.

Providing secure remote access to a computer in your office can be painful, especially if you are a smaller organization without IT staff. This is where AWS Workspaces comes in really handy. With Workspaces, you can “rent” a dedicated Windows (or Linux) workstation for a monthly flat fee that includes the Windows 10 license, and you can add Microsoft Office for a few dollars more.

The advantages are many, such as no requirement to own the hardware or software licenses, and you can upgrade the processors, memory, and storage at the flip of a switch. AWS Workspaces provides a secure desktop client which allows you to connect to your desktop securely with an encrypted connection, and without the need for additional VPN software.

You can backup your data to the cloud.

Making backups of our data is something we all know we should do, perhaps we make manual backups on occasion, but rarely do we find automated and dependable backups. Ask yourself – if my place of business was inaccessible due to a fire, flood, earthquake, or Zombie Apocalypse, is all my important data safe? The answer is probably not 100%.

Defining exactly what a backup is, is somewhat convoluted with a number of correct answers. We’ll provide one example here. AWS Storage Gateway is a service that allows you to automatically backup your important files, such as those contained on a central server in your office, or a NAS device, and automatically back those files up to cost effective storage in the cloud.

Those backups can consist of daily snapshots that automatically rotate, giving you a “retention time” of your choosing (30 days for example). You can also create archival backups that you would only use in the case of a catastrophe. Depending on how frequently (or infrequently) you may need to access this data, you can choose the best pricing for this storage. The less frequent you need access to the data, the cheaper it is.

You can host your website in the cloud.

AWS provides cloud based hosting for any manner of website, from simple WordPress websites, to more complex and highly trafficked websites and web applications. Depending on your use case, you have a couple of options.

AWS Lightsail provides a simple to launch web hosting environment with support for a number of popular website platforms. It’s also great if you need to spin up staging environments, or run self-hosted business applications.

Another great benefit of Lightsail is that it scales with your needs, starting with fast SSD based storage, load balancing to handle traffic spikes. Each plan includes a static IP address, DNS management, server monitoring, SSH terminal access, and secure key management. It’s geektastic, but also designed for the rest of us.

You can keep tabs on your dog while you are at work.

Hey wait a sec, I thought you were talking about the benefits to businesses? We are, indulge us for a moment. You’ve probably heard some blah blah blah about Machine Learning and Artificial Intelligence lately. More than just buzzwords, it’s a real thing with practical applications. One way this tech has become popular is with Computer Vision which you’ve already used on your smartphone, namely facial recognition.

Computer Vision can be used to detect all sorts of objects and take an action in response. In our example, what if you absolutely just need to know if Fido is on the couch while you are at work? You can use AWS for that. With AWS DeepLens you can “train” Machine Learning to detect certain objects as seen below in our hilarious test footage.

We know Fido is allowed on the couch, and this same technology can be used for a lot of interesting purposes such as intruder detection, traffic monitoring, license plate recognition, or telling you when the postman came by. How do you ask?

Once an object is detected, you can configure AWS Lambda to send an SMS message to your phone. You can also use AWS Kinesis Video Streams to record the video so you can watch it online from your AWS account. There are a lot of practical uses for object recognition. How might you use Computer Vision in your business?

It seems we can’t find what you’re looking for. Perhaps searching can help.