The product will support customized content delivery both for browsing and for reading in-depth content. The general solution will be applicable in many focus areas such as Healthcare, Sports, Hobbies, and Entertainment.
The product is a content delivery service that provides a customized experience which learns from user behavior. The content is drawn from news articles, social media, deeper long-form content, and reputable knowledge bases, with a satisfying experience of consuming in-depth articles targeted to their interests.
The client’s data sets are largely drawn from public sources augmented with feeds from domain-specific data providers and there was poor differentiation with other products.
There is limited in-house expertise in Data Pipeline design and AI modeling. Critical pieces had been built by external contractors without a cohesive data architecture. A deeper signal extraction exercise had not been done and the potential for behavior-driven personalization had not been investigated.
The assessment results are summarized below.
The product will support customized content delivery both for browsing and for reading in-depth content. The general solution will be applicable in many focus areas such as Healthcare, Sports, Hobbies, and Entertainment.
Market research has demonstrated a need for a product that is able to serve up relevant long-form, detailed, and trusted content to the user as a one stop information portal for end users.
The use cases selected were as follows:
The user sees a set of articles that are of general interest and is guided to walk through a process of configuration.
On visiting their home page, the user sees relevant news items, and social media posts.
As they browse through news items, it is possible to star articles to emphasize their importance. The recommendation engine also learns from the topics of the browsed articles, and the amount of time spent in each article. In-depth articles are more influential in guiding the recommendation engine than news articles.
All use cases above assume that users can select the following in their profile to help tune the recommendation engine. However most users would be expected to have a satisfying experience just by engaging with the product without deeper tuning.
The prediction variables were as follows:
Given a set of keywords predict which articles most closely match them.
Given a set of articles predict which articles most closely match it.
What keywords are the best representation of the “topics” implicit in an article.
Given a keyword what other keywords match it most closely.
After an extensive study of literature including latest trends in Natural Language Processing, or NLP, we tested various NLP text affinity approaches to build a custom recommendation engine.
We iterated on the available data sets to compare different available approaches to identify the algorithms that performed the best and the results of which tested well with targeted users. In the evaluation phase we looked at the available cloud APIs from various vendors and settled on building custom predictive models that performed a lot better.
We also did a survey of available third party datasets that had the potential to increase the appeal of the product to the targeted users. The datasets for consideration included syndicated feeds from leading media providers, content scrapes from public-facing pages of representative news outlets, selected categories from Wikimedia and knowledge bases such as WebMD.
We also examined the value of storing clickstream data from users in order to guide their experience and recommended an architecture for a secure management of personal data, which would be saved in a separate datastore.
Key results were:
We identified team gap expertise and presented an internal training and external consultancy plan.
Signal data experimentation identified data intensity and value. A data architecture was prototyped and presented to the client.
We built an AI MVP to validate with beta customers and to estimate the cloud resources needed to deliver the desired experience.
We presented an in depth survey of available approaches to the client. We then ran a number of signal extraction experiments on the most promising approaches.
While the value proposition was extremely strong, the current search market is well served. Further effort is required in order to define a new market category to ensure success.