The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques
The future of visual RecSys is an exciting one. Let us explore some of the most cutting edge techniques and ideas that we should incorporate into our recommenders.
Style2Vec (2017) — Combining Multiple Convolutional Neural Networks
Lee, H., Seol, J., & Lee, S. (2017). Style2Vec: Representation Learning for Fashion Items from Style Sets. Open Access From: https://arxiv.org/abs/1708.04014
The authors “propose Style2Vec, a vector representation model for fashion items. Based on the intuition of distributional semantics used in word embeddings, Style2Vec learns the representation of a fashion item using other items in matching outfits as context. Two different convolutional neural networks are trained to maximize the probability of item co-occurrences” (Lee et al., 2017).
The paper is short, but the idea is fascinating. The use of two CNN models, where one generates embeddings for the target image while the other generates embeddings for similar context items, is a novel application of the techniques derived from Word2Vec to images. If we take a look at the image above, for example (a) we have pink skirt — black skirt + black jacket = pink jacket. It is important to note that we are adding/subtracting traits that are not usually expressed as a product attribute, such as the holes in example (c). Thus, Style2Vec allows us to combine/select styles in a different manner.
Style2Vec implemented by David Nepo?itek. Open source on Github.
David Nepo?itek implemented the ideas proposed in the Style2Vec paper and open source the code on Github. The results from his report are impressive, as you can see from the image above, he manages to group items similar to each other in terms of colors, patterns, and shapes. We can leverage results like this to generate whole outfits and collections based on users’ preferences.
Generative Image Models (2017) with GAN
Kang, W.-C., Fang, C., Wang, Z., & McAuley, J. (2017). Visually-Aware Fashion Recommendation and Design with Generative Image Models. Open Access From https://arxiv.org/abs/1711.02231 | Code
The authors used Generative Adversarial Network (GAN), “an unsupervised learning framework in which two components‘compete’ to generate realistic looking outputs... One component (a generator) is trained to generate images, while another (a discriminator) is trained to distinguish real versus generated images. Thus the generated images are trained to look ‘realistic’ in the sense that they are indistinguishable from those in the dataset” (Kang et al., 2017).
GAN allows us to generate the ideal product for our users, even if it does not exist in the product catalog. The image above is the output. We have the real images on the left and generated images on the right, and we can see that the images on the right generally have a higher preference score. There are two use cases for generating the ideal product.
Firstly, GAN helps us build an awareness of the users’ preferences and represents that awareness concretely with an image. We can leverage this image with techniques like CNN + nearest neighbors to select items that are visually most similar to the ideal state.
Furthermore, GAN helps in product procurement and design. If we know that green colored shirts with purple strips are a growing trend among the ideal products, we can bring in/create such products to fulfill the market need, before customers voice them out to us. Even if green colored shirts with purple strips are not exactly what the customer wants now, GAN allows us to be proactive instead of reactive to market demands and changes.
Explainable Outfit Recommendation with Comment Generation (2019)
Lin, Y., Ren, P., Chen, Z., Ren, Z., Ma, J., & de Rijke, M. (2019). Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation. Open Access From https://arxiv.org/abs/1806.08977v3 | Code
The authors “propose a novel neural network framework,neural outfit recommendation(NOR), that simultaneously provides outfit recommendations and generates abstractive comments. NOR consists of two parts: outfit matching and comment generation. For outfit matching, we propose a convolutional neural network with a mutual attention mechanism to extract visual features…For abstractive comment generation, we propose a gated recurrent neural network with a cross-modality attention mechanism to transform visual features into a concise sentence”.
RecSys based on CNN is powerful, but it can be hard to interpret the output. There have been separate attempts to visualize the CNN gradients by Utku Ozbulak, generate image comments by Donahue, J. et al. Still, it is not easy to combine both techniques and apply it to the context of RecSys. If we look at the resultant image above, we see that it is the first step in understanding the recommendations, with “great denim look”, “love the red and white” being good examples of explaining why the outfits are recommended. However, in the negative cases in the last row, we can see that the comment generation itself is not perfect; sometimes it is describing something not found within the image or is completely out of context. Note that the authors build the comment generating model using data from Polyvore, a community-powered social commerce website.
Nonetheless, explainable AI (XAI) is a critical piece in understanding, evaluating, and deploying deep learning solutions in production. For more on XAI, Feifeife has gathered an impressive collection of XAI materials.
Private Personalized RecSys (2020)
Increasingly, RecSys are being deployed in privacy-sensitive domains like healthcare, education, and finance. We want the benefits of personalized healthcare/education/financial plans, but at the same time, the fear of giving up our data and then losing them to a hack is real. It seems oxymoronic — how can we build a personalized RecSys while maintaining user privacy?
Back in 2009, McSherry & Mironov from Microsoft Research explored this issue in their paper Differentially Private Recommender Systems with a simple idea. In essence, we can add noise to the item ratings and the item-item covariance matrix in line with ε-Differential Privacy (the actual mathematics behind this is non-trivial). In other words:
- We mask away the identifying traits of any particular user (user A buys pink shirts on the first Monday of every month).
- To obtain general trends (segment X of users likes to buy pink shirts)
- The privacy loss is mathematically proven to be bounded by a factor of ε.
- While differential privacy is a useful metric to measure risk internally when designing a RecSys, it is not intuitive to explain to users, nor does it guarantees that data is secured.
A new paper by Ribero et al. (2020) extends the idea with Federating Recommendations Using Differentially Private Prototypes. Federating learning is a modern approach to distributed machine learning.
- Instead of training massive models on centralized servers, we send out small (megabyte sized) models to users’ devices.
- The models are trained on the user’s device with their data during the device idle time.
- We only send training results back to a centralized server.
You can see an illustration of this process illustrated by Google’s comic strip. By combining differential privacy and federated learning, Ribero et al. propose a novel approach to tackling the issue of private, personalized RecSys.
“Most federated learning methods require multiple rounds of communication between entities and a central server, which poses a problem for differential privacy requirements. Specifically, we can think of each round of communication from the entities to the server as a query sent to the individual entities, which has potential to leak information…(hence) we constrain the communication to only two rounds, back and forth” (Ribero et al., 2020).
Of course, the problem is a challenging one. To cut down the number of rounds to only two, the team needs to come up with a novel way to compress the data and then save it in an accessible form. They name these data structures as “prototypes”:
These prototypes are designed to: a) contain similar information as Xh, thus allowing construction of an accurate item representation; b) be of low dimension relative to Xh, hence minimizing communication load; and c) maintain differential privacy with respect to the individual users.
The paper is a challenging read on a rapidly evolving and important topic. If you are interested to learn more about federated learning and differential policy, lee-man is collecting a list of readings, tools, and code on their Github post.
Further Readings
Visual RecSys is an exciting field, and I hope you enjoyed the various techniques we discussed today. For more cutting edge stuff on RecSys, you can explore
- The collection on ACM RecSys and their YouTube channel, they are the biggest event in the world of RecSys.
- Spotify’s engineering blog is impressive. Check out their recent post on personalization.
- Netflix’s tech blog is, of course, another excellent read, covering many of the real-world deployment and scalability challenges.
- Zalando is an eCommerce firm with a solid research team. Their publications are worth a read.
- Recombee provides recommendation-as-a-service and has a pretty good Medium blog covering the technologies employed in their company.
Explore the rest of Modern Visual RecSys Series
- How does a Recommender Work? [Foundational]
- How to Design a Recommender? [Foundational]
- Intro to Visual RecSys [Core]
- Convolutional Neural Networks Recommender [Pro]
- COVID-19 Case Study with CNN [Pro]
- Building a Personalized Real-Time Fashion Collection Recommender [Pro]
- Temporal Modeling [Pro]
- The Future of Visual Recommender Systems: Four Practical State-Of-The-Art Techniques [Foundational][we are here]
Computational Linguist
4 年Here's one more interesting and related one https://arxiv.org/pdf/1908.08847v1.pdf Generating the poses together with the items can automatically generate those "more details" photos on product items =)