Recommendation Systems

Recommendation systems have changed the way people shop online, find books, movies or music, news articles go viral or find friends and work mates on Linkedin. The recommendation systems analyze the browsing patterns on websites, ratings or most popular items at that point of time or the products saved in ones virtual basket to recommend products. Similarly, the common interests, work skills or common geographical locations are used to predict people, that you might want to connect with on social media sites.

Behind such personalized recommendation systems lie big data platforms including software, hardware and algorithms that analyze customer behavior and push recommended products, in real time. The big data platforms handle both data and event data distribution and computation. Data can pertain to how customers or customers similar to the one in question, have rated products in the past while event data could be tracking mouse clicks that trigger events for example viewing a product and sometimes both of the above need to be combined to be able to predict a customer’s choice. Hence, the recommendation system architecture caters to data storage for offline analysis as well as low latency computational needs and a combination of the two.

The data platform architecture needs to be robust enough to ingest continuous real time data streams into scalable systems like Hadoop HBASE or any other big data data storage infrastructure like AWS Redshift. Apache Kafka is usually used as the messaging system for the real time data stream in combination with Apache Storm. Due to high throughput data redundancy needs to be taken care of, in case of failures. If the real time computation needs to take into account customer data like previous purchase history, preferences, products already bought , segmentation based on socio economic demographics or data from ERP, CRM, in that case either all the systems have to be available online to be able to blend the data in real time or the customer detail data could be mashed up, offline to create Single Customer View and queried in combination with the real time event data.

The valueable assests of any organisation are customers,products and now, data. Machine learning algorithms combine the three assets together to leverage business gains and predictive analytics is imperative in being proactive to customer needs. Some of the algorithms used for recommendation engines are content-based filtering, collaborative filtering, dimensionality reduction, Kmeans and matrix factorization techniques. The challenge is not the data storage, with wide availability of highly scalable data storage platforms, but the speed with which the data needs to be analyzed in case of recommendation systems. The best approach is to combine mostly precomputed data with fresh event data using pre modelled algorithms to push personalised recommendations to the customer interface.


The data value chain

The Consumer Lifecycle

The terms “Data driven” and “Big Data” are the buzz words of today, hyped definitely, but the implications and potential are real and huge! Tapping into the enormous amount of data and associating this data from multiple sources creates a data chain, proving valueable for any organisation. Creating a data value chain consists of four parts: collection, storage, analysis, and implementation. With data storage getting cheaper, the volume and variety of data available to be exploited is increasing exponentially. But unless businesses ask the right questions and better understand the value that the data brings in and be sufficiently informed to make the right decisions, it does not help storing the data. For example, in marketing, organisations can gather data from multiple sources about acquiring a customer, about the customer’s purchasing behaviour, customer feedback on different social media, about the company’s inventory and logistics of product delivery. Analyzing this stored data can lead to substantial number of customers being retained.

A few of the actionable insights can be as follows:
  • Improving SEO (search engine optimization), increasing the visibility of the product site and attracting more customers
  • CRO (Conversion rate optimization) i.e. converting prospects into sales, by analzying the sales funnel. A typical sales funnel is Home page > search results page > product page > proposal generation and delivery > negotiation > checkout
  • Better inventory control systems, resulting in faster deliveries
  • Predicting products that a consumer might be interested in, from the vast inventory, by implementing good recommendation algorithms that scan through the consumer behaviour and can predict their preferences
  • If some of the above points are taken care of, customer loyalty can increase manifold, based on the overall experience during the entire consumer lifecycle.
Data blending which leads to a Single Customer View and Actionable Insights

Often the focus lies on the Big data technology rather than the business value of implementing big data projects. Data is revolutionising the way we do business. Organisations, today, are inundated with data. To be able to make sense of the data and create a value chain, there has to be starting point and the customer is a good starting point. The customer’s lifecycle with experiences at every touch point defines business growth, innovation and product development. The big data implementations allow blending data from multiple sources leading to a holistic single view of customer, which in turn gives rise to enlightening insights. The data pretaining to customer, from multiple sources, like CRM/ERP/Order Management/Logitics/Social/cookie trackers/Click traffic etc., should be stored, blended and analysed to gain useful actionable insights.

In order to be able to store the gigantic amount of data, organisations have to invest in robust big data technologies. The earlier BI technologies that we had do not support the new forms of data sources such as unstructured data and the huge volumes, variety & velocity of data. The big data architecture consists of the integration from the data sources, the data storage layer, the data processing layer where data exploration can be performed and/or topped with a data visualization layer. Both structured and unstructured data from various sources can be ingested into the big data platform, using Apache Sqoop or Apache Flume, real-time interactive analyses can be performed on massive data sets stored in HDFS or HBase using SQL with Impala, HIVE or using statistical programming language such as R. There are very good visualization tools, such as Pentaho, Datameer, Jaspersoft that can be integrated into the Hadoop ecosystem to get visual insights. Organisations can offload expensive datawarehouses to low cost and high storage enterprise big data technology.

Edited image from Hortonworks

Irrespective of the technical implementation, business metrics such as increasing revenue, reducing operational costs and improving customer experience, should always be kept in mind. The manner in which the data is analyzed could create new business opportunites and transform businesses. Data is an asset and investing in a value chain, from gathering to analyzing, implementing, analyzing the implementations and evolving continuously, will result in huge business gains.