Annotating data is hard. Often requiring hours of work, manual annotation processes juggle between getting the images we want to annotate, quality control and of course, data annotation. While data annotation for tasks such as object detection, classification, etc are easy to annotate efficiently and at a high quality, complex tasks such as image matting require fine-grained, precise annotations to ensure that pixels are correctly labeled with the appropriate alpha vale; a process which is often time-consuming and produces results prone to human bias and therefore is of low quality.
Additionally, even the process of getting the kind of images we want is one that is often difficult. Perhaps your model fails in a particular subset of images where, say, the subject is wearing a certain type of glasses. Manually searching images for the type of glasses you want can be time consuming and of course, prone to generalization problems. Perhaps you scraped images from a particular product line, website, etc; all of which may introduce hidden biases in the model. Not to mention, this is time consuming.
This is where a fully synthetic data generation pipeline would come in handy. You can set up fully automated pipelines in your 3D renderer of choice and attempt to match the distributions upon which you are failing. Furthermore, these pipelines can be parametrized, allowing you to control the distributions from which generated images are sampled. Furthermore, besides the benefit of controlling the kind of images you want, you can also get highly precise, fast annotations for the images generated.
Now we know why we’re interested in synthetic data, let's take some time to understand what it is.
Case Study: Hangers hook as a failure case
While there are a number of relevant use cases, let us begin with a particularly simple use-case. As can be seen in the example below, one of the glaring issues we noticed about our background removal model, was the poor performance it had on thin, metallic parts of ecommerce images containing clothes on hangers as the subject of the image. In particular, we noticed that often the model would take significant portions of the surrounding pixels and assign these high alpha values, indicating that the model had some confusion regarding the exact difference between the two.
One of the possible reasons for that was that the boundary between what is considered to be the foreground (metallic hanger hook) and the background is often difficult to distinguish for humans. And upon perusing through the kind of data we had, we immediately noticed the problem. In a number of images, alpha values greater than 0 had been assigned to background pixels.
Furthermore, on consulting with the annotation team, we noticed that annotators were having great difficulty in annotating the images in these parts of the image. This coupled with the fact that the throughput for manual annotations was about 300-400 images/week, a new approach was definitely required.
Hence we needed a process that would allow us to create,
- Hanger images and extracted foregrounds with the true pixel colors at the blending boundaries.
- A fast, automated process for generating hundreds of images quickly.
Synthetic Data Generation as a Solution
Setting up the pipeline
This is where synthetic data comes in handy. Using Blender, we created hundreds of hanger images closely following the target distribution that we were underperforming on. In general we tried to set up a pipeline with the 2 following critical features.
- Parameterizable - We should be able to give input ranges for different object features and sample/generate images conforming to the specified distribution.
- Easy to animate - We needed a system in place with which we could create variance in lighting, position, number of shadows, etc.
- Automated - All of these should be easy to script and easy to scale/repurpose for other data targets.
With these in mind, we began setting up a simple yet powerful synthetic data pipeline.
Our choice of software, for the time being at the very least, is Blender. Blender has the advantage of being open-source, comes with an easy-to-use scripting API compatible with Python and offers the ability to generate high quality, photorealistic renders. For the convenience of this post, we have divided the process of using it into three major parts.
- Overview of the overall pipeline
- Sampling and parametrization process
- Animation and keyframing process
These three will be discussed in depth in the following sections.
Overview of the Pipeline
The pipeline itself is quite simple. While there exists certain parts which may be improved upon, we’ve maintained most of the pipeline automated.
- Asset Creation - First we begin with a set of reference images. Here, our 3D designer uses the reference images to generate the kind of textures, UVs and 3D models which we may expect to see.
- Rigging - These models are designed in a “riggable” fashion. Essentially, they are divided into discrete parts. These parts will be later(in the script), modified on the fly to generate variance in the dataset.
- Parameterization - Through scripting, we pick and apply textures, shaped and other features on-the-fly to generate the kind of objects we expect to see in our target distribution.
- Rendering - Using the Cycles Rendering Engine native to Blender, we render both the 3D model as well as the foreground.
With this overview in mind, in the upcoming sections, we will describe the details of the pipeline.
Sampling and Rigging
In this first key step, we spend time deciding on a few things.
- On the number of discrete model assets that need to be created for an acceptable amount of variance.
- The textures, UVs, ect that need to be designed for the target distribution we need to improve upon.
To better understand, let us look at the case of hangers. In general to generate a good amount of variance in the kind of hanger images we would be rendering, we felt it was best to divide it into three discrete parts; the hanger hook, the hanger body and the hanger support bar.
With these three discrete model parts, then we decided on the properties of these parts we would need to vary. Empirically, we noticed that the following parameters provided the best looking images and consequently, the best model performance.
- Scale - The percentage of image occupied by the subject.
- Image Resolution - Output resolution of the image.
- Hook Shape - The shape and size of the hook.
- Hook Color - The color and texture of the hook.
- Hanger Texture - The color and texture of hanger bodies.
- Background alpha - The color and texture of the background.
- Lighting - Placement and intensity of the lighting.
Animation and Keyframing
While there may exist better approaches, our approach for quickly generating a good variety of colours, lighting conditions and subject placements was through the process of animating and keyframing various parts of the synthetic hanger object. For certain parameters we would like to vary(mentioned in the previous section), we found it much easier to set values at certain keyframes and allow Blender to interpolate the in-between frames.
For rendering we needed the rendering engine to render two separate images.
- The actual image we’re rendering. This is the input to the model we’re training.
- The foreground object. Here we need these to be rendered retaining their “true colour” so as to ensure that the alpha values we obtain correspond
To generate the foreground of the image, the solution was surprisingly simple; we rendered one image with the background alpha set to 1 and another where it was set to 0. Furthermore, to ensure that reflections from the surrounding surfaces were minimized, surface reflections from other surfaces were turned off for reflective object parts during the second render. In this way, we generated the hanger images. Here are a few examples below,
Aside: A Quick Note on Limitations
And as with everything, this is not without its limitations. With more control over the distribution of image properties and rendering, we run into a rather familiar problem; overfitting. One must be careful to ensure that the hyperparameters used are general enough that we are not “accidentally leaking” distributions from our testing set into our training set. Hence the parameters used to render each sample must come from a more holistic “idea” of what distributions we expect in the real world rather than only being tuned to the errors we observe in the test set. Otherwise, the model trained using such data may generalize poorly to other images.
Furthermore, care must be taken that the renders produced at least “loosely” look similar to realistic images. The whole idea is that the model understands how to use features picked up from our synthetic images and can apply it onto real-world images. The “gap” in how distributions may look; either as a consequence of unrealistic images or improperly selected parameters during rendering, is known as the “Real-Synthetic Gap”. Again, these tend to be problem specific, and what worked for us to model the broader distribution we wanted to improve model performance on may not generalize to all cases.
To summarize, in this blog post, we briefly went over a unique problem that we encountered and our synthetic solution to the same. Synthetic data generation acts as an efficient, reproducible and accurate approach to creating the kinds of distribution that you want to see in your training data. We believe that progress in the field of data synthetics opens doors to the more practical usage of deep learning models when it comes to applying it to real world problems; helping everyone save more time and produce better results.