As the field of AI advances, so do the tools, platform and means we use to train such revolutionary (or sometimes silly) systems. In the beginning, like most things, we started to develop our first kiddie models locally in a computer that was probably rocking a humble processor. This setup worked perfectly. At least for training small and shallow things such as random forests or regressions. Then as things started to complicate and to deepen (ha!) so did our requirements and the computing power needed to train our models.

As a consequence of these needs, many providers have taken the task of creating services that fulfill the need for such high computing power. Nanonets is an example of the latter. We offer what they call Machine Learning as a Service or a Machine Learning API which essentially is an ML platform that facilitates the training of an ML model by doing most of the heavy load for the user.

Nanonets ready to use models

While using Nanonets, you don’t even have to worry about the image processing or deep learning models at all! Among some of the solutions the service offers, they own a collection of “ready to use models” for use cases like NSFW classification, animal detection, general object detection, OCR, and more. Using these models is pretty straightforward and doesn’t require much from our part. To predict through them, one simply has to do an API call using as payload the image you want to evaluate. The result is a friendly and tidy JSON with the predictions that the magic of artificial intelligence has brought upon us.

Building custom models with Nanonets

But as cool as this sounds, these are not the models we’ll be focusing on today. In this article and the tutorial I’m about to present, we will use the “create your own model” feature offered by Nanonets. This solution allows the user to quickly create their custom model by just uploading the training data (not always, as we’ll see) and labeling them. Through this tutorial, I’ll show you step by step how to build an image classification model using Nanonets’ graphical user interface (GUI) to detect cracks in mobile device screens (sorry if I’m bringing some sad memories of that time you cracked yours). Then, to try the model and to give it a reason to be, I’ll build an Android app where a user takes images of mobile phones, and communicates to the model through Nanonets API, to tell whether the screen is cracked or not. Let’s begin!

You can visit this Github repository to try out the Nanonets Crack Detection demo.


We’ll start this tutorial by creating a free Nanonets account. To do so, go to https://app.nanonets.com/login#/, enter your email and voila (no credit card nor long form is required). With a free account, you’ll be able to access most of the platform’s features and perform 1K monthly API calls, which is more than enough for what we need. Once the account is created, log in to it, and you’ll be greeted by the app’s control panel.

Train a new model

Add labels

In the GUI’s main screen, select “New Model,”  to create a model. This could be one of the ready-to-use ones previously discussed or a completely new and custom one from scratch. In our case, we’re interested in a new “Image Classification” model, so click on that one to start building our model.

Click here to start the adventure.

The very first thing we have to do  is specifying the name of our categories or labels, and since we are dealing with broken screens and mobiles, I went with “broken_mobile_screen” and “mobile_screen.” After that, click on “CREATE MODEL” to proceed to the next part.

My labels

Upload images

In the next screen, you’ll be asked to upload the training set, which has to made of at least 25 images per label. However, you don’t necessarily have to. Nanonets provides a “search the web for images” feature that fetches some photos related to the labels’ names, so we could start the training right away without having to spend the whole evening looking for phone images. Talk about convenience, right? (I want to make a small note here. You might have thought that my labels’ names are a bit unconventional; “broken_mobile_screen” and “mobile_screen” could simply be “broken,” and “not_broken.”, right? The reason why I didn’t use these is that Nanonets would have returned a bunch of images of random broken things, so I had to be a bit more specific). To process, go ahead and click the “SEARCH THE WEB FOR IMAGES” button.

Training set

This is how some of the automatically retrieved images look like. On the broken side, you see what indeed are phones with cracked screens. On a more positive note, the pictures of “mobile_screen” feature a mix of phone screens that are either on, off, showing the app drawer and even an artistic one with a neon green color. However, these aren’t the kind of images I had in mind.

For the “mobile_screen” category I thought of images similar to the broken phones, but with a working (preferably off) screen. Now we might risk building a model that wrongly learns that an “on”phone is “mobile_screen,” while an "off" one, regardless of whether it is broken or not, is “broken_screen_mobile.” But, this is fine, I’m not going to worry so much about it. After all, we’re just testing and experimenting around, so that’ll do for now. For the time being, let us proceed by clicking on “UPLOAD IMAGES”; we also have the option of improving the training by uploading our own images, but let’s ignore that. Lastly, once they have been uploaded, click on “TRAIN” to commence the magic.

This part is where Nanonets shine. Upon a single click, the training will start. Meanwhile, you, the user didn’t have to code anything, tune the model, spend hours tweaking layers, nodes, and all those details that involve training a neural network. So, while the model is being fitted, take a small break, grab a coffee, glass of water, and come back later to see the result. But don’t take too long! The training should take around 2-3 minutes since the size of our dataset is pretty small. But in case your coffee date turns out to be longer than expected, do not despair, because Nanonets will send you an email telling you the training has finished.

Beep boop. Training in progress.

Model Metrics

After the training (and at any step during it) you’ll be able to measure the model performance with four different tabs or metrics: Progress, ROC, Confusion Matrix, and Correct Incorrect. For simplicity purposes, I’ll skip the ROC one (sorry!). In the first tab, “Progress,” you’ll find a line chart that shows you the model’s accuracy at every step of the training. Furthermore, the graph displays the performance of each experiment, that is, the different trials the model went through. In my case (shown in the image below), you can see that my model was fitted eight different times, and in all of them, the results were pretty satisfactory, especially experiment number 8, which at some point reached an accuracy of 100%.

Then, there’s the confusion matrix of the results obtained during the model’s evaluation. To refresh the memory a bit, and for those who aren’t so clear about what a confusion matrix (also known as an error matrix) is, I’ll say that it is a table that describes the classifier’s performance. If we read from left to right, top to bottom, the first cell indicates the true positive, which is the number of broken phone images that were correctly classified as “broken_mobile_screen.” To it’s right, are the false positives, or the non-broken-phones misclassified as broken. Its value is 0 (hooray!). Then, we have the number of false positives or the broken phones labeled as “mobile_screen.” Again, the score is 0. Lastly, there is the true negative count, which is the non-broken phones correctly classified. So, how should we interpret this? Since both the false positives and false negatives (also known as Type I and Type II errors) are zero, then we can agree that every test example was correctly classified, implying that our model is performing well (at least on our small data sample).

The last of the metrics tabs, “Correct Incorrect,” simply shows the testing set images and their labels. In our case, since all our predictions match the ground truth, it doesn’t present any incorrect classification. The next picture contains some of the outcomes.

Testing the model

Now that the training is done, and we concluded that the metrics are good (do you agree?), it’s time to test the model. Luckily, we don’t have to go so far since we can check it right here in the web UI. If you take a look at the left part of the GUI, you’ll find in the last section “Explore Model,” a “Test” tab. Go to it. Here you’ll notice a record of the images you’ve sent to the API and their response. Sure, right now it’s empty because we haven’t done anything yet, but after we build the app, let’s come back to it to see how it has changed. But for now, what I want to show is the button at the end that says “UPLOAD IMAGES.” If you click on it, you’ll be prompted to upload an image that the model will evaluate. Let’s try it using the below picture of the broken phone, so save it locally and then upload it.

Certainly broken. Photo created by Freepik

Once the image is uploaded, the model will infer it’s class, and present the result in the square on the right labeled “JSON RESPONSE.” In my case, the prediction confidence for “broken_mobile_screen” was 0.985, while the other was 0.147. Great!

So far, I’ve been through the main required steps to build a custom, from-scratch model in Nanonets using their interface. With only a couple of clicks and a few things typed in, we defined our labels, created a training set, trained the model, evaluated it, and tested it. For the second part of the tutorial, I’ll take what we’ve done a step further, and integrate the model in an Android app so we can use it and infer from it using an API call. However, while I’ll be showing the code as we proceed,  and explaining its main functionality, I won’t scrutinize every single line because otherwise, this article would be a thousand pages long.

The Android app

The app we’ll create is essentially an Android camera application that will send the taken images to your model and display the output, which is the same kind of output we saw in the previous JSON response field. Being an Android app means that we need to install Android Studio to develop the app, so please download the IDE (it also includes the SDK) if you wish to get following the tutorial hands-on. Also, I’ll be writing the app in Kotlin. Before we start, let me show you how the app looks like.

Don't know the final link yet.

This is the app’s main screen. Nothing wow. Just a title, a link to this article, another to the projects’ repo, and a button that takes us to the camera screen.

Again, not impressive at all. Just a preview of what the camera sees, and its shutter button. After pressing the said button, the image will be sent to the service, and while the model thinks and decide the fate of this poor phone, you’ll see a sort of “predicting...” popup.

Lastly, if everything worked as planned, the popup will display the prediction’s result.

Now as a GIF of me using the app, and another one from the app itself.

Cool, right? So, how do we build this? For starters, open Android Studio, and click on “Create New Project,” select “Empty Activity,” -> “Next” button, and give it a name (I’m calling mine BrokenScreen), a save location, make sure the language is Kotlin, and leave the rest as it is. Then, click “Finish.” After that, the project will be created, and you’ll see what’s called the “MainActivity,” or the app’s main screen, an automatically activity created by Android Studio. Besides this, if you go to the res/layout directory, you’ll see that a layout file named “activity_main.xml” has also been created; this layout file describes the structure or looks of the activity.

Creating a new project in Android Studio

The changes we’ll do on this activity are pretty minimal. First of all, right at the start, we’ll check if the camera and storage permission the app needs, has been granted. Then, we’ll add a link to this article, another to the project’s repository, and a button that will take us to the Camera Activity.  The code below is the Kotlin part, and the subsequent one is the edited page’s layout (XML).

Also, before we proceed further, we need to import into the project an external library named OKHTTP, that we’ll use to perform the REST calls to the API. To add the package, open the app’s Gradle file (Gradle is a build-automation system that also handles the project’s dependencies), and under the “dependencies” object, add this line:

implementation 'com.squareup.okhttp3:okhttp:3.10.0'

If you’re actually following the tutorial and replicating it, you’ll realize that Android Studio is complaining because it can’t find the “CameraActivity” class. But don’t despair, because that’s what we’ll do now. So, go to the left part of the screen to the directory where the MainActivity is (probably under the “BrokenScreen” directory), double click on it, then select “New” -> “Activity” -> “Empty Activity,” name it “CameraActivity” and click “Finish.” Again, this will create both a barebones activity and its layout.

As the name implies, the “CameraActivity” is the app’ s main screen and where you’ll spend most of your time. At its initialization under the function named onCreate(), we’ll create our camera instance, followed by setting some parameters such as FOCUS_MODE_CONTINUOUS_VIDEO
to focus continuously, and changing the camera’s orientation to portrait; if you wish, you can play around with this setting to change’s the camera’s behavior. This is how the code should look so far:

package com.example.brokenscreen

import android.hardware.Camera
import android.support.v7.app.AppCompatActivity
import android.os.Bundle
import android.os.Environment
import android.provider.MediaStore.Files.FileColumns.MEDIA_TYPE_IMAGE
import android.support.v7.app.AlertDialog
import android.util.Log
import android.widget.Button
import android.widget.FrameLayout
import okhttp3.*
import java.io.File
import java.io.FileNotFoundException
import java.io.FileOutputStream
import java.io.IOException
import java.text.SimpleDateFormat
import java.util.*
import okhttp3.RequestBody
import okhttp3.MultipartBody
import okhttp3.OkHttpClient
import org.json.JSONObject
import java.lang.StringBuilder


class CameraActivity : AppCompatActivity() {
    private var mCamera: Camera? = null
    private var mPreview: CameraPreview? = null
    private var TAG = "CameraActivity"

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_camera)

        // Create an instance of Camera
        mCamera = getCameraInstance()

        val params = mCamera?.parameters
        if (params?.supportedFocusModes!!.contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO)) {
            params?.focusMode = Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO
        }

        mCamera?.setDisplayOrientation(90)
        mCamera?.parameters = params
        
        ...
        
 }

This part here is only about the camera functionality itself, and it doesn’t cover anything related to what we, the end-user, actually sees on the screen. So, to solve this, we need a sort of camera preview, which in the app is a “container” that “holds” what the camera sees and shows it on the screen. So, under the same directory we currently are, do right click, select "New" -> "Kotlin File/Class" and named it CameraPreview. This is how the class looks like (most of the code is taken from this Android official tutorial).

Now, with the preview created, we just need to attach it to our Camera activity, like this:

mPreview = mCamera?.let {
            // Create our Preview view
            CameraPreview(this, it)
        }

Now to the good part. So far, we have built a pretty useless camera app that doesn’t even take pictures. To turn this horrible situation around, we need something that will be called after the shutter button is pressed (I’ll show the button soon). This something is a callback function, a function passed to another one as an argument. Our callback will be a parameter of the camera’s takePicture() function, which I’ll create after explaining the content of the callback.

The callback I’ll define is named mPicture, and it takes as parameter the raw data of the image just taken. At the top of the function, I’ll create and display the “Predicting...” popup I previously showed in the screenshots, so the user is aware that there’s stuff going on. The first of this stuff involves saving the image to the app’s directory, which is located in the device’s default’s document directory and named “BrokenScreen.” Once the image is saved, the next step is building the POST request we’ll send to the API; the components of this request are the image data, the model ID, and a basic access authentication whose key can be found in the “API Keys” tab of the GUI.

API Keys tab

I’ll take a brief pause to tell you some news you’ll like. If you’re fortunate enough, you won’t have to build the request by yourself because Nanonets provides code samples with the required fields filled in. To access them, go to the “Integrate” tab, select the language of choice, and copy/paste into your codebase. Why do I say if you’re lucky? That’s because Nanonets doesn’t provide the snippet for every language. For instance, Kotlin, the language we’re using is not one of them, but Java, which is supported is included, and if you paste Java code in Android Studio (and if your project is in Kotlin), it will convert it to Kotlin. Moreover, the provided code comes with the required API key, so we can also avoid this part.

Let's integrate!

Following the creation of the request, in the next steps, we’ll perform the API call and read its response. Assuming everything worked as planned (no timeout, lack of internet or mysterious acts of natures), the response will be the same JSON we previously saw. But we won’t just show an ugly JSON. Of course not! Instead, we’ll carefully parse it, and iterate through each label to build a string we’ll show in the dialog. This whole process of calling the API is performed in a background thread to avoid blocking the app while we wait for the answer. That’s the end of the callback.

The only remaining detail we need to address is assigning to the shutter button a click listener that will trigger every time we press it. Inside the listener, we’ll call the aforementioned takePicture() function with the newly mPicture callback as its third parameter (the first two are just null). The following code is the complete “CameraActivity,” and right after its the page’s layout which contains the preview’s container and the shutter button. This is the complete activity.

To wrap up the tutorial, we need to add some changes to the project’s Android Manifest, a file that describes several essential parts of the app, including the required permissions and the activities present. In this file, which you can find in app/src/main/AndroidManifest.xml, we’ll add the permissions we need – camera, internet, and storage – and the activities that comprise the app. The file below shows the final manifest.

The board is finally set, and we’re ready to launch the next billion dollars app. So, if you’d like to be rich, please press the “Play” green button located on the top bar on Android Studio to start the app either on a simulator (I don’t recommend this) or on your phone. If it is the first time you run the app, it will request right away the storage and camera permission (I hope you accept). Now, touch the “Go!” button and show to your friends how you cool you’re now that you’re doing cloud and AI stuff straight out from your phone. And remember, if you wish to see the predictions you’ve performed, go to the “Test” tab, and you’ll find them. Enjoy and happy predicting!

Recap and conclusion

In this article, we’ve learned how we can train a custom model using Nanonets and how to integrate it into an Android app. During the first part, I took you on tour through Nanonets’ service GUI and showed step by step how you can easily create a model by clicking a few buttons and adding two labels. Then, to test the system, I integrated it into a small Android camera app that sends an image to the model and returns the predictions. Nonetheless, there’s a serious flaw in the app, and that’s that the user can take a picture of anything it wants. For the purposes of this tutorial, that’s totally fine, and I’ll accept this oversight. However, if you wish you made this an actual billion dollars app, you should check beforehand if the image taken contains a phone. But that’s a story for another day.

The tutorial’s complete source code is available at: https://github.com/juandes/broken-screen

For any questions, doubts or existential issues, leave me a comment either here, or on my Twitter, at https://twitter.com/jdiossantos.

Thanks for reading :)