I have not been regular with my personal blog because I have been blogging elsewhere.
Here are the links to my latest blog posts about why Big Data projects fail and how to attract more women into tech.
Having worked extensively in the Big data & IoT space I have closely observed failures over and over again and the reasons for failure being repetitive :
Being a woman in tech or woman in data I am often the only woman in meetings, trainings and discussions which feels weird. With not many women in tech it gets easier to discriminate the few that do exist. Incidents of mansplaining, gaslighting are rampant and it’s the victim that gets labelled as drama queen while the abusers fo scot free. Organisations that are serious about increasing the number of women in tech need to address glass ceiling, gender wage gaps & bro-culture and cultivate an inclusive work atmosphere. Read my post on how to get more women into tech.
Anyone who works in the tech industry is aware of the rising demand of Analytics/ Machine learning professionals. More and more organisations have been jumping on to the data driven decision making bandwagon, thereby accumulating loads of data pertaining to their business. In order to make sense of all the data gathered, organisations will require Big Data Analysts to decipher the data.
Data Analysts have traditionally worked with pre formatted data, that was served by the IT departments, to perform analysis. But with the need for real time or near-real time Analytics to serve end customers better and faster, analysis needs to be performed faster, thereby making the dependency on IT departments a bottleneck. Analysts are required to understand data streams that ingest millions of records into databases or file systems, Lambda architecture and batch processing of data to understand the influx of data.
Also analysing larger amounts of data requires skills that range from understanding the business complexities, the market and the competitors to a wide range of technical skills in data extraction, data cleaning and transformation, data modelling and statistical methods.
Analytics being a relatively new field, is struggling to resource the market demands with highly skilled Big Data Analysts. Being a Big Data Analyst requires a thorough understanding of data architecture and the data flow from source systems into the big data platform. One can always stick to a specific industry domain and specialize within that, for example Healthcare Analytics, Marketing Analytics, Financial Analytics, Operations Analytics, People Analytics, Gaming Analytics etc. But mastering the end-to-end data chain management can lead to plenty of opportunities, irrespective of industry domain.
The entire Data and Analytics suite includes the following gamut of stages:
Data integrations – connecting disparate data sources
Data security and governance – ensuring data integrity and access rights
Master data management – ensuring consistency and uniformity of data
Data Extraction, Transformation and Loading – making raw data business user friendly
Hadoop and HDFS – big data storage mechanisms
SQL/ Hive / Pig – data query languages
R/ Python – for data analysis and mining programming languages
Data science algorithms like Naive Bayes, K-means, AdaBoost etc. – Machine learning algorithms for clustering, classification
Data Architecture – solutionizing all the above in an optimized way to deliver business insights
The new age data analysts or a versatile Big Data Analyst is one who understands the complexity of data integrations using APIs or connectors or ETL (Extraction, Transformation and Loading), designs data flow from disparate systems keeping in mind data security and quality issues, can code in SQL or Hive and R or Python and is well acquainted with the machine learning algorithms and has a knack at understanding business complexities.
Since Big Data and Analytics is constantly evolving, it is imperative for anyone aiming at a career within the same, to be well versed with the latest tech stack and architectural breakthroughs. Some ways of doing so:
Following knowledgeable industry leaders or big data thought leaders on Twitter
Joining Big Data related groups on LinkedIn
Following Big Data influencers on LinkedIn
Attending events, conferences and seminars on Big Data
Connecting with peers within the Big Data industry
Last but not the least (probably the most important) enrolling in MOOC (Massive Open Online Course) and/ or Big Data books
Since Analytics is a vast field, encompassing several operations, one could choose to specialise in parts of the Analytics chain like data engineers – specializing in highly scalable data management systems or data scientists specializing in machine learning algorithms or data architects – specializing in the overall data integrations, data flow and storage mechanisms. But in order to excel and future proof a career in the world of Big Data, one needs to master more than one area. A data analyst who is acquainted with all the steps involved in data analysis from data extraction to insights is an asset to any organization and will be much sought after!
Not a day goes by when our LinkedIn news feed is not flooded with the mentions of AI and Machine Learning benefitting and changing the ways of mankind, like never before. This hype surrounding AI, Machine learning has resulted in most organisations jumping on the bandwagon without proper evaluation. A couple of years ago, the term Big Data enjoyed a similar hyped status but it has been losing it’s lustre to all the talk about AI and Machine Learning, lately.
The truth, however, is that, AI and Big data need to coexist and converge. Merely collecting and storing data in huge amounts will prove futile, unless AI and Analytics are used to generate meaningful insights that help businesses, enhance customer experience or increase revenue influx.
Making an organisation Data-Driven will take time and will happen in stages. While there are no sure shot ways to create a Data-Driven organisation, below are some ways that could lead to a change:
Strategy – It all starts with a clearly defined strategy in place, stating the Whys, Hows, Whos and Whens. A clear strategy helps in raising awareness across the organisation, about the topic in focus (data in this case) and creates a sense of urgency around the change process. It is imperative that the entire organisation understands the importance and implications of a data-driven organisation, thus encouraging people to update their skill sets and raise their level of data awareness. An all round data strategy should not only include the technology required for execution but the kind of competence and people skills and the sort of conducive atmosphere required for a data-driven organisation to thrive.
People – Just as there are different kinds of skills required within a Marketing or a Software organisation, there are different skill sets for the different job roles within a data organisation. But due to the hype surrounding Machine Learning and AI while companies lack the practical knowledge in data know-how, the tendency is to either hire the wrong people or assign the wrong tasks to the right people! Not everyone has to be a data scientist in the data organisation. There will be people required to work on data architecture, data infrastructure, data engineering, data science and the Business Analysts. These could very well be the same person, if the organisation is lucky enough. But it is unfair to hire a data engineer and assign him/her the task of building Predictive models or hiring a data Scientist to be told to develop BI reports. Strategists will have to spend the time required to understand the nuances of skills and expertise required in a data organisation but it will be worth it, to retain and grown the talent pool required for a Data-driven organisation.
Patience – Creating a Data-driven organisation will require ample amounts of patience and perseverance. If data has not been involved in the decision making process, earlier, then the data is most probably not in a state that can be used readily or maybe there is no or not enough data to begin with! In that case, it has to start with gathering the data required to achieve the business goals. Transaction systems have a very different database design than the data storage mechanisms used for Analytics purposes, which entails a design and architecting process before being able to analyse the data. Moreover, as Analysts dig into the transaction data, they surely will encounter non-existence of relevant data, data retrieval issues and unearth data quality issues and data integration problems due to the existence of data silos. In a data-driven organisation, all data sources are integrated to provide a single enterprise version of truth, irrespective of Customer data or Sales or Marketing data. A data platform, integrating all business data sources, ensuring quality and data integrity and security is a time-consuming process. Organisations will have to take this lead time into consideration when strategizing a Data-driven decision making approach.
Organisational Culture – The purpose of a Data-driven organisation is to empower employees by means of data and information sharing to enable the organisation to collectively achieve the business goals. This approach requires employees to be data aware and not use gut feelings to make decisions and this could be a whole new approach for many. This new way of working requires organisational change management, educating people to use facts and figures to arrive at conclusions and make decisions. If an organisation is fairly data aware, in the sense that metrics are used to measure certain processes, in order to turn Data-driven , the organisation has to take steps to use data proactively (read Predictive Analytics) and not just summarise events that happened. The CDOs/ CMOs need to drive data awareness by showcasing quick wins and success cases of Data-driven approaches, as a means to use data as the foundation in every decision making process.
Some organisations may take longer to implement a Data-driven culture than others but there is no way an organisation can become Data-driven, just like that, one fine day! If the CDOs can gauge that the organisation has a longer incubation period then it is good to start with raising data awareness and introducing a BI/ Datawarehousing team. It is not recommended to directly leap on to AI, hiring data scientists, to be then left in a lurch if the organisation and the infrastructure are pretty rudimentary to handle their expertise.
A Data-driven organisation culture starts with the right strategy in place, followed by the right people and technology, evaluating and optimising the entire process, intermittently.
Programmatic marketing involves data driven insights to convert prospects into customers. There is more than meets the eye in the case of conversion rate optimization. Some of the deciding factors for conversion are UX design, the landing page, the source of web traffic, content, competitive price of products, good will, social media marketing, effective campaigns and customer engagement. Programmatic marketing entails analsying data at every customer touch point and targeting the consumer with compelling, preferably personalised, offers. Conversion is not necessarily making a customer shell out money, it could be interpreted as winning customer loyalty by means of signing up for newsletter, downloading whitepapers or trial versions of the product or spending considerable time on the site. This loyalty, in the long run, could result in big wins through persuasion in the form of emails, SMSs, direct contact and targeted recommendations.
Channelizing data about prospects – online behaviour, previous shopping, socio-economic segmentation, online-search, products saved in the online basket, in other words getting to know the customer better to be able to suggest meaningful differences in people’s lives through the products on offer, results in higher conversion rates. It is here that digital convergence is of paramount importance. Digital convergence blends online and offline consumer tracking data over multiple channels to come up with targeted campaigns. Offline tracking through beacon technology is catching up. It is a win-win solution for both the retailer and the consumer providing each with useful information, the consumer, with an enabled smartphone app within a certain distance from the beacon, recieves useful and targeted information about products and campaigns and the retailer gathers data about consumer shopping habbit.
The online experience can be enhanced to reduce the bounce rate by incorporating some of the following design thoughts:
Associative content targeting: The web content is modified based on information gathered about the visitor’s search criteria, demographic information, source of traffic, the more you know about the prospect, the better you can target.
Predictive targeting: Using predictive analytics and machine learning, recommendations are pushed to consumers based on their previous purchase history, segment they belong to and search criteria.
Consumer directed targeting: The consumer is presented with sales, promotions, reviews and ratings prior to purchase.
Programmatic offers the ability to constantly compare and optimize ROI and profitability across mulitple marketing channels. Data about consumer behaviour, both offline and online, cookie data, segmentation data are algorithmically analyzed, to re-evaluate the impact of all media strategies on the performance of consumer segments. Analyzing consumer insights, testing in iterations, using A/B testing contributes to a higher conversion rate. Using data driven methods to gain a higher conversion rate is programmatic conversion and it’s here to stay.
IoT – Internet of things, is the science of an interconnected everyday life through devices communicating over WiFi, cellular, ZigBee, Bluetooth, and other wireless, wired protocols, RFID (radio frequency identification), sensors and smartphones. Data monetization has lead to generating revenue by gathering, analyzing customer data, industrial data, web logs from traditional IT systems, online stream, mobile devices and sensors and an interconnection of them all, in other words, IoT. IoT is hailed as the new way to transform the education sector, retail, customer care, logistics, supply chain and health care. IoT and data monetization have a domino effect on each other which generate actionable insights for business metrics, transformation and further innovation.
The wearable devices are a great way to keep tab on patient heart rates, step counts, calories consumed and burnt. The data gathered from such devices are not only beneficial for checking vital signs but also can be used to scrutinize effectiveness of drug trials, analyzing the causes behind the way body reacts to different stimulus. IoT in logistics, by reading the bar codes at every touch point that track the delivery of products, comparing the estimated with the actual time of delivery, analyzing the reasons causing the difference can help businesses bolster better processes. In Smart buildings, HVAC (heating, ventilation, air conditioning), electric meters, security alarm data are integrated, analyzed to monitor building security, improve operational efficiencies, reducing energy consumption and improving occupant experiences.
IoT is expected to generate large amounts of data from varied sources with a high volume and very high-velocity, thereby increasing the need to better index, store and process such data. Earlier the data gathered from each of the sources was analyzed in a central hub and communicated to other devices, but the IoT brings a new dimension called the M2M (machine to machine) communication. The highlights of such M2M platforms are
Improved device connectivity
API, JSON, RDF/XML integration availability for data exchange
Flexible to be able to capture all formats of data
Data Scalability
Data security across multiple protocols
Real-time data management – On premise, cloud or hybrid platforms
Low TCO (total cost of ownership)
The data flow for an end-to-end IoT usecase entails capturing sensor-based data using SPARQL for RDF encoded data from different devices, wearables into a common data platform to be standardised, processed, analyzed and communicated further as dashboards, insights, as input to some other device or for continuous business growth and transformation. Splunk, Amazon, Axeda are some of the M2M platform vendors that provide end to end connectivity of multiple devices, data security and realtime data storage and mining advantages. Data security is another important aspect of IoT, adhering to data retention policies. As IoT evolves, so will the interconnectivity of machine-to-machine platforms, exciting times ahead!
Recommendation systems have changed the way people shop online, find books, movies or music, news articles go viral or find friends and work mates on Linkedin. The recommendation systems analyze the browsing patterns on websites, ratings or most popular items at that point of time or the products saved in ones virtual basket to recommend products. Similarly, the common interests, work skills or common geographical locations are used to predict people, that you might want to connect with on social media sites.
Behind such personalized recommendation systems lie big data platforms including software, hardware and algorithms that analyze customer behavior and push recommended products, in real time. The big data platforms handle both data and event data distribution and computation. Data can pertain to how customers or customers similar to the one in question, have rated products in the past while event data could be tracking mouse clicks that trigger events for example viewing a product and sometimes both of the above need to be combined to be able to predict a customer’s choice. Hence, the recommendation system architecture caters to data storage for offline analysis as well as low latency computational needs and a combination of the two.
The data platform architecture needs to be robust enough to ingest continuous real time data streams into scalable systems like Hadoop HBASE or any other big data data storage infrastructure like AWS Redshift. Apache Kafka is usually used as the messaging system for the real time data stream in combination with Apache Storm. Due to high throughput data redundancy needs to be taken care of, in case of failures. If the real time computation needs to take into account customer data like previous purchase history, preferences, products already bought , segmentation based on socio economic demographics or data from ERP, CRM, in that case either all the systems have to be available online to be able to blend the data in real time or the customer detail data could be mashed up, offline to create Single Customer View and queried in combination with the real time event data.
The valueable assests of any organisation are customers,products and now, data. Machine learning algorithms combine the three assets together to leverage business gains and predictive analytics is imperative in being proactive to customer needs. Some of the algorithms used for recommendation engines are content-based filtering, collaborative filtering, dimensionality reduction, Kmeans and matrix factorization techniques. The challenge is not the data storage, with wide availability of highly scalable data storage platforms, but the speed with which the data needs to be analyzed in case of recommendation systems. The best approach is to combine mostly precomputed data with fresh event data using pre modelled algorithms to push personalised recommendations to the customer interface.
The terms “Data driven” and “Big Data” are the buzz words of today, hyped definitely, but the implications and potential are real and huge! Tapping into the enormous amount of data and associating this data from multiple sources creates a data chain, proving valueable for any organisation. Creating a data value chain consists of four parts: collection, storage, analysis, and implementation. With data storage getting cheaper, the volume and variety of data available to be exploited is increasing exponentially. But unless businesses ask the right questions and better understand the value that the data brings in and be sufficiently informed to make the right decisions, it does not help storing the data. For example, in marketing, organisations can gather data from multiple sources about acquiring a customer, about the customer’s purchasing behaviour, customer feedback on different social media, about the company’s inventory and logistics of product delivery. Analyzing this stored data can lead to substantial number of customers being retained.
A few of the actionable insights can be as follows:
Improving SEO (search engine optimization), increasing the visibility of the product site and attracting more customers
CRO (Conversion rate optimization) i.e. converting prospects into sales, by analzying the sales funnel. A typical sales funnel is Home page > search results page > product page > proposal generation and delivery > negotiation > checkout
Better inventory control systems, resulting in faster deliveries
Predicting products that a consumer might be interested in, from the vast inventory, by implementing good recommendation algorithms that scan through the consumer behaviour and can predict their preferences
If some of the above points are taken care of, customer loyalty can increase manifold, based on the overall experience during the entire consumer lifecycle.
Data blending which leads to a Single Customer View and Actionable Insights
Often the focus lies on the Big data technology rather than the business value of implementing big data projects. Data is revolutionising the way we do business. Organisations, today, are inundated with data. To be able to make sense of the data and create a value chain, there has to be starting point and the customer is a good starting point. The customer’s lifecycle with experiences at every touch point defines business growth, innovation and product development. The big data implementations allow blending data from multiple sources leading to a holistic single view of customer, which in turn gives rise to enlightening insights. The data pretaining to customer, from multiple sources, like CRM/ERP/Order Management/Logitics/Social/cookie trackers/Click traffic etc., should be stored, blended and analysed to gain useful actionable insights.
In order to be able to store the gigantic amount of data, organisations have to invest in robust big data technologies. The earlier BI technologies that we had do not support the new forms of data sources such as unstructured data and the huge volumes, variety & velocity of data. The big data architecture consists of the integration from the data sources, the data storage layer, the data processing layer where data exploration can be performed and/or topped with a data visualization layer. Both structured and unstructured data from various sources can be ingested into the big data platform, using Apache Sqoop or Apache Flume, real-time interactive analyses can be performed on massive data sets stored in HDFS or HBase using SQL with Impala, HIVE or using statistical programming language such as R. There are very good visualization tools, such as Pentaho, Datameer, Jaspersoft that can be integrated into the Hadoop ecosystem to get visual insights. Organisations can offload expensive datawarehouses to low cost and high storage enterprise big data technology.
Edited image from Hortonworks
Irrespective of the technical implementation, business metrics such as increasing revenue, reducing operational costs and improving customer experience, should always be kept in mind. The manner in which the data is analyzed could create new business opportunites and transform businesses. Data is an asset and investing in a value chain, from gathering to analyzing, implementing, analyzing the implementations and evolving continuously, will result in huge business gains.
The customer expectations are very different, now. Decisions need to be taken in real time, to convert a prospective customer into committing. In an age, where customer seeks instant gratification, organisations that have a longer time-to-market due to cumbersome internal processes, customer loyalty is hard to win. For example, a customer visits your physical store, if you offer a discount at the very first visit, the chances that the customer will revisit your store are high. On the other hand, if you are merely noting customer behaviour which then has to pass through unwieldy processes, later, to mete out a discount coupon, the second time the customer visits your store… if at all, is a thing of the past. The advanced analytics systems now, are able to handle data influx from multiple disparate systems, cleanse and house in the dmp (data management platforms), ready to be queried in real time to cater to predictive and actionable insights, on the fly.
However, if the business methodologies used are not complimenting this speed of data processing, the business will still suffer. The widely used, Lean methodology preaches creating more value for customers with fewer resources. Anything that does not yield value should be eliminated. But organisations need to adapt to only the best of the best practices. Following methodologies by the book, on the contrary, causes bottlenecks. To be able to leverage more out of the Business Analytics systems and solutions, the processes and tools, both, need to be streamlined to create customer satisfaction. A lot of the business intelligence projects take too long to deliver and are inflexible, resulting in the functional business teams procuring BI tools which promise quick wins. The problem with such data discovery tools, apart from creating data silos, are that they lack data governance, hinder data sharing at an enterprise level and increase licensing costs.
It is not a solution to have no business process at all. There needs to be accountability and that comes from business processes. It is a continuous iterative process to find the right balance between processes and the speed of delivering value to keep the costs low and increase the profitability of any business. One size does not fit all and it applies to organisations, as well. Methodologies/processes need to be tweaked, tuned and tailor made for each company. Organisations that try to implement Lean/Agile/Scrum but fail are because they lose the customer focus, some companies do not have a clear strategy in place with employees being assigned foggy responsibilities and lack of communication and this in turn results in the focus shifting from the task at hand to the nitty gritties of such project management methods.
To avoid pitfalls, a clear business strategy needs to be defined specifying business goals in order to maximise gains. The next step is to trim all the processes that lead to this gain.
The terms business analysis and data analysis have traditionally seemed different. With the increasing amount of data available, stored and the need to analyse that data and gain business insights out of it, a new role, Business Data Analyst is critical. Companies lacking the business data analysis talent pool have a lower ROI and will lose out to companies hiring analytics talent.
Most companies, even today, have the two competencies separate. Business analysts analyze functional requirements and help translate the same to technical specifications while data analysts are more technical, gathering, cleansing and analyzing data. To increase the analytic throughput of a company it is vital to combine the business and analytic competencies to be able to analyze the data from a business aspect, being able to draw conclusions about consumer behaviour, find trends and accordingly make business decisions with targeted marketing campaigns.
As this is an emerging field, it can be challenging to find right people with both the business acumen as well as analytics skillset. There can be myriad ways to bridge this gap. One strategy can be to create teams of people with direct marketing roles along with data analysts and data scientists to utilise the combined specialised competencies. Another strategy can be to train the management team’s analytical skills or beefing up the business knowledge of data analysts.
No matter which strategies are adapted, the new role of Business Data Analyst is paramount for enabling a company to make the right investments at the right time to yield an ROI. Building a data driven company is more than identifying the right BI tools, it’s about driving business through customer behaviour feedback by analyzing data.