User pain points via App Store review
Wish has a large assortment of products; however, too many product offerings and choices can exhaust customers, leading to frustration (choice overload). “Filters” and “categories” are the top two feature requests from users in improving the shopping experience.
Shipped filters and categories experiment (V1)Quick experiments are performed on the production system to quantify their effectiveness.
Results: V1 failed
Gross merchandise volume drops, leading to a deeper dive into the reasons and possible solutions.
Why did filters fail?
When users land on the homepage, they haven’t decided what they want to buy, so filters don’t work well here.
Why did categories fail?
Categories are not the most effective way to browse through millions of product offerings.
With the gathered data and insights, we decided to focus on the search function.
Searching is a popular channel in finding products, but this method needs improvement
- 46% of shoppers eventually perform a search.
- The add-to-cart click-through rate for the search function is higher than browsing.
- 36% of users told us that the displayed search results are not what they want.
Analyze users’ searching behavior when conducting a query
To find insight into the searching behavior, I analyzed the raw queries of each user. I studied what they search, how they search, and the correlations between queries. This study also generates a report on how products should be organized and grouped to maximize the effectiveness of product placement in our UI.
Tree-structure v.s. net-structure browsingThe shopper searching pattern resembles a net-structure instead of a tree-structure. This slight misalignment drops the gross merchandise value when a category (a tree-structure) is introduced.
A net-based browsing UI: Tags
How might we introduce a net-based browsing concept with a simple user interface? Fortunately, I did not need to search very long because tag-based browsing was gaining traction when my project started.
Partnered with machine learning engineers & served as a strong advocate for user behaviors and business priorities
The critical success factors are the relevancy of these tags and the corresponding products that are displayed; however, the patterns are too convoluted to code in the programming logic. I got the buy-in from the machine learning team and initiated a joint project to tackle the problem.
I served as a strong advocate for user behaviors and business priorities in the machine learning team. For each model training iteration, I would identify the weaknesses of the trained model. Then, we would engage in joint brainstorming sessions to refine the model training objectives and to add new candidate selection and ranking rules.Here are some examples for the illustration:
- Applied computer vision to encourage the selected candidates to have visual diversity.
- Used Natural Language Processing to match products beyond keyword searches.
- Prioritized regional trends and introduced product boosts for vendor preference, etc.
Launched Search Tag (V2): relevancy of the displayed search tags and product results needs further improvementQuantitative Metrics
- The gross merchandise volume was slightly positive.
- Only 15% of first-time users noticed the search tag.
- This increased to 50% after users searched a few more times.
Qualitative Feedback through User Interviews
- Most users didn't see the search tags.
- Some search tags were not relevant. I searched “Watches for women,” but search results returned “Men watches” and “Solar gadgets.”
- Some displayed products were not relevant; for example, it showed many men’s watches when I searched for women’s watches.
Tag UI explorations
I experimented with different UI elements and flows to increase the tag's click rate.
Create a benchmark to measure quality of tags, enumerating the Google search tag in the production
To gauge the quality of our search tag, I established a benchmark in comparing our search tag with the Google image search tag. I conducted user research to collect their opinions on which was better and why. Also, I created plans to collect analytical data by enumerating the Google search tag in the production. These experiments allowed me to set up concrete goals for model improvement and quantified its ROI. Another key finding was that we failed to consolidate the statistic for similar queries. It hurt the ranking score and had huge impacts on which tags are selected for displacement. For V3, we prioritized the improvement in the quality of the search tags.