Train a custom image classifier¶

Use the EfficientNet algorithm to train custom image classifiers based new images and new labels. This will reuse the ImageNet weights of the model making the new models faster to train and giving state of the art performance.

1. Input data¶

In order to train a custom image classifier you need some labeled images. These images need to be ordered in a directory structure where each image is stored in a directory with the name of the label.

As an examples we will use the hymenoptera dataset, a dataset with ants and bees images. Download it from here. The structure of the directories in the zip file is:

Text Only

train
    ants
        img1.jpg
        img2.jpg
        ...
    bees
        img1.jpg
        img2.jpg
        ...
val
    ants
        img1.jpg
        img2.jpg
        ...
    bees
        img1.jpg
        img2.jpg
        ...

Upload these directory structure to an S3 bucket: e.g. s3://my-bucket/hymenoptera_data/.

Log in to AWS with a user with administrative privileges
Navigate to the EfficientNet Algorithm listing on the AWS Marketplace
Click Continue to Subscribe
Click on Accept offer (it might take 1 or 2 minutes for AWS to accept the offer)

Note that there is no charge for subscribing to this offering only when launching the Algorithm on SageMaker
Once you are subscribed click Continue to Configuration
On the Configure and launch page
1. Select the latest version (2.0.0) and region where you want to launch the Algorithm
2. Click on View in SageMaker

You will be sent to the Amazon SageMaker Create training job page.

3. Launch Training Job¶

On the Create training job page:

Name your job e.g. ants-and-bees
Select an IAM role that has access to S3 (where the data is stored) and SageMaker
Check that under Algorithm source the selected option is: An algorithm subscription from AWS Marketplace
Check that under Choose an algorithm subscription the selected option is: EfficientNet - Train Image Classifier
Under Resource configuration
1. Select an instance type that has a GPU, the minumum recommended is ml.p3.2xlarge
2. Under Additional storage volume per instance (GB) add enough space for the model and checkpoints to be stored, for example: 10 GB
In the Hyperparameters section configure the hyperparameters for the model, see below for a table of parameters available
In the Input data configuration you will have two Channels created train and val (both channels are required)
1. Fill the S3 location for each channel
  - For example: s3://my-bucket/hymenoptera_data/train and s3://my-bucket/hymenoptera_data/val
In the Output data configuration section set the S3 location to save the output model
- For example: s3://my-bucket/ants-bees-output
Click on Create Training Job

A new Training Job will be created. Note that based on the data and hyperparameters it might take a while for this job to finish.

After the job is finished you can deploy a new endpoint for the custom trained trained model.

Output¶

When Training Job succeds a new model will be saved to the location you specified on S3. The saved file will be a .tar.gz file with some metadata alongside the best mode.

File	Description
`args.yml`	Arguments of used to trained the model
`labels.json`	JSON map of ID to label
`model_best.pth.tar`	Serialized version of the best model found
`summary.csv`	Metrics saved for each epoch

Metrics¶

The training job outputs 4 models metrics to keep track of the performance while and after the job is running.

Metric	Description
`trail:loss`	Loss in the training dataset
`validationo:loss`	Loss in the validation dataset
`validation:acc_1`	Accuracy on the top-1 label
`validation:acc_5`	Accuracy on the top-5 labels

Metrics

Common Errors¶

If the Training Job fails check the logs on CloudWatch for that Job to see why it fails.

Some common errors are insuficient space on the device (increase the storage volume) or not enough resources on the GPU (try reducing the batch size).

Also check that the path to S3 are correct. The job outputs some debug information to make this easy. The /opt/ml directory where this files are located should look like this: