Hawaiify Me - Machine Learning

Academic paper: [PDF]

Hawaiify Me presents a proof of concept prototype for a tool that replaces shirts with Hawaiian shirts. A user could submit an image of themselves wearing a regular shirt, and Hawaiify Me would generate a new Hawaiian style shirt to replace it. Hawaiify Me makes use of an aggregated tool-set of existing tools and frameworks such as pix2pix for image translation, VIA for image labeling and Mask R-CNN for generating masks from labeled images. The purpose of this project is simply to be a fun experiment making use of the current tools available.

TASKS

Programming, Machine Learning

2019

DATE

Hawaiify Me asks and answers the question ”What if I were wearing a Hawaiian shirt in this picture?” by generating bespoke Hawaiian shirts on demand. Hawaiify Me differs from style transfer as it entirely masks out the bogus old regular shirt, completely generating a totally tubular Hawaiian shirt in its place. This allows it the capability to introduce shirt details such as collars in addition to patterns as opposed to overlaying a pattern on a given image. Radical! *does a kickflip*

To create a Hawaiian shirt generating AI, I needed to train it on images of Hawaiian shirts. I used an automated scraper to download thousands of images from royalty free stock image websites by searching the term "Hawaiian Shirt". I needed many images of a person wearing a Hawaiian shirt to initiate training, for various reasons many images had to be manually filtered out, images that did not contain a Hawaiian shirt wearing person as the main focus or at all, images where the Hawaiian shirt was too small or not distinct. The images were then made uniform to the uniform size 256x256.

With my dataset of Hawaiian shirts I could train an AI. I was using a generative adversarial network (GAN) based on the pix2pix framework (Isola et al., 2016). A GAN trains AI using 2 neural networks, the Generator, which would blindly generate content and the Discriminator which would have a set of training data the Generator will try to emulate. These neural nets effectively play many games of "hot & cold" between themselves, creating a new "model" for each iteration of the Generator that plays.

For my initial experiments to create a Hawaiian shirt generating AI, I would personally blank out shirts in a flat colour, this would leave the shirt as a blank canvas (real_A) for the generator to create an image (fake_B), with the discriminator giving instructions based on the original image (real_B).

Wanting to automate the blanking of shirts I swapped the blank and original images, hoping to create a model capable of blanking out shirts, but it started interfering with the rest of the image such as the mans tongue. Not wanting the AI to interfere with the rest of the image, I sought another option.

1. VIA outlining tool interface and outlined shirt 2. Mask generated . 3. Original image beside mask for comparison.

I started using the VGG Image Annotator (VIA) online tool (Dutta & Zisserman, 2019) to outline the shirts. VIA allowed me to exporting these outlines as json files which could in turn be read by Mask R-CNN (He et al., 2018) a computer vision and pattern recognition framework which generated masks from the json files.

1. Mask created by Mask R-CNN 2. Original Image. 3. Composite image isolating the Hawaiian shirt

The masks would be composited with the original image using pix2pix's composite function, leaving the isolated image of a Hawaiian shirt, ripe and ready for training. These processes could be automated, with the manual outlining of the shirts in VIA consuming the most time to prepare a dataset.

The isolated shirt training method at first did not appear successful even after 200 generations of training there was still much indistinct visual noise.

I increased the number of epochs to 450, and started seeing results. With nearly double the generations the AI became much better at generating more recognizable Hawaiian designs. Happy with this model, I decided to test it on a fresh batch of images to see what it would create.

The isolated training method proved to be effective in not only pattern generation but in learning some of the structural details of a Hawaiian shirt, such as adding a collar to an uncollared shirt.

The model was also capable of adding shadows and wrinkles in appropriate locations to make a generated shirt appear more dynamic and realistic.

The inclusion of many ”non-traditional” Hawaiian shirts featuring muted colours caused for occasional grey shirts to be generated.

These shirts could still generate interesting Hawaiian patterns such as this palm tree pattern.

The final model struggled with baggy or wrinkled clothing images. While it would successfully identify the baggy folds as part of the shirt to replace, it was not capable of retaining these details, and would instead flatten the image. Images with long sleeved crossed arms would also lose detail, however I had recognized these as a risk and filtered them out of the training set for fear of contaminating the model.

The model was at its best when it created red Hawaiian shirts. These shirts were highly recognizable as Hawaiian shirts. This is likely due to the red Hawaiian shirts in the training set having similar features, such as large distinct flower patterns in a bright colour such as white or yellow. The model did not show a preference for the colour of the input image in what design it generated or colours it would use.

What I learnt

My biggest takeaway was:

Machine learning was a remarkably easy concept to learn once I got started, (though I did have a good teacher). The tools and frameworks are well documented by the machine learning community, making it a remarkably approachable subject. I feel that working on this project gave me practical experience and understanding of the functions and limitations of machine learning. Recognizing how the AI's training dataset directly effects its outputs: A small or uniform dataset is susceptible to contamination or developing a model with inherent bias, But an AI trained on a large dataset may lack a clarity of purpose and produce a middling indistinct result. I concluded the best option is to diversify expertise and have multiple trained models for different purposes.

What I wish I could have done better:

I feel the final dataset was too small and would have benefited from more brightly coloured shirts, the inclusion of the modern muted grey hawaiian style shirts contaminated the dataset and thus the training of the model, causing it to produce more grey shirts. While these shirts often did feature interesting patterns, they did not necessarily deliver on the core premise of a humorous tool and would be a less exciting outcome. However it was preferable to an earlier model using a larger dataset, which tended to produce rather generic swirling colours that did not feel distinctly Hawaiian.

If I had more time to work on Hawaiify Me:

Further development of Hawaiify Me would look to automate many processes, including training a model to programmatically mask and isolate the shirt from any given input image. Current use of Hawaiify Me still requires input images to be manually labeled through VIA, which creates an immediate bottleneck in content generation. Automating this process with another AI model would make it possible to compile Hawaiify Me into a single functioning application requiring only an input image to output a custom Hawaiian shirt. From this point I would want to then begin training the specialist models for specific purposes such as a model trained to compensate for a persons crossed arms, or models trained for specific clothing items such as a jacket or hoodie or even bare skin. This would then also necessitate training another model with the express purpose of divining from the input image which specialist model if any would be required. Machine learning is a fascinating subject but also a black hole of a time sink.

References:

Abdulla, W. (2018) Splash of Color: Instance Segmentation with Mask R-CNN and TensorFlow. [Online Article] Matterport Engineering Techblog. Retrieved from: https://engineering.matterport.com/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Dutta, A., & Zisserman. A. (2019). The VIA Annotation Software for Images, Audio and Video. Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), Nice, France. New York, New York: ACM. https://doi.org/10.1145/3343031.3350535

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2018). Mask R-CNN. arXiv. Retrieved from: https://arxiv.org/abs/1703.06870

Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2016). Image-to-Image Translation with Conditional Adversarial Networks. arXiv. Retrieved from http://arxiv.org/abs/1611.07004

Read Academic Paper

Download

← Post-Human Bodies

Zumo →

I am a multidisciplinary Digital Designer with a passion for creating beautiful and delightful experiences through UX, UI and game design.