Content Discovery: Solving The Problem Of Information Overload

By Dmitriy Solodkiy, data scientist and blockchain evangelist at NVB

Cram them full of non-combustible data,

chock them so damned full of ‘facts’ they feel stuffed,

 but absolutely ‘brilliant’ with information.

 Then they’ll feel they’re thinking,

they’ll get a sense of motion without moving.

And they’ll be happy,

 because facts of that sort don’t change.”

Ray Bradbury, Fahrenheit 451

 In a modern world, there is a growing tendency of information overload in regard to people’s ability to consume, rate and sort it out. At the same time, with lower barriers to enter the media market (everybody being able to start a blog or vlog and share his/her opinions, views, feelings with other people), we see a situation, when media/news/broad information market quickly reacts to events and sustains authors with versatile positions simultaneously, but has less and less ability to pre-filter and rank authors and their materials, delegating this task to end users more and more often. And the last – they simply don’t cope with it, or, led by a group of certain media influence, realize with disappointment that, despite the fact, they were picking up seemingly the best pieces of overall information stew, they have made many poor choices. Not only people struggle with making more and more choices in contemporary volatile informational environment, they face two problems, which influence the manner they make their choice in and a variety of choices they have:

  • There is a growing “censorship”, caused by naturally limited attention people are able to pay while looking for information. It happens often with popular topics, that, whenever you search for media related to them, you stumble upon the same positions and facts, blended in a variety of ways, but still, essentially, the same. It is not surprising, although – rankings of search engines, however elaborate, are not perfect, and, most importantly, rankings are simply not enough anymore
  • While media pieces, popular by any sensible metric, may look like a good choice, it happens often, that they are only a good choice for majority. Or biggest group. Or – and this is what makes situation really sad – for nobody. Because, as was demonstrated in many thought experiments in game theory, “plurality ranking engine” may not be a good choice for every group

Also, overall tendency to streamline the content delivery, lowering entry barriers and widely spread “one-person shop” practice lead to another peculiar development – less people want to be “editors” every day, and more want to be journalists, editorial work being neglected in favor of simple “numbers define everything” rule. And this only adds fuel to the fire.

Michael Bhaskar, a consultant for Google-owned DeepMind and successful author and publishers, argues that “In a context of an abundance of information, curation becomes even more important. Publishers must develop their curatorial paradigm, their unique DNA signature of how they make their choices, how they curate their books. In this turbulent world, expert, knowledgeable curation is our best store of value.” To create some content is easy. To create that of a quality, to distillate many pieces of content (too pragmatic a word for sensational news, exciting books and brilliant scientific papers we would like to have in our possession), to curate it and find the right links to make a story of it, not to narrow the information space, but to widen and enrich it – this is an art of editorial work. And this is a must for content delivery system too.

Google and its ecosystem (and other Google-like actors) do a great job in filtering and ranking, but incentivizing creation of content and initiation of content discovery is simply not their main goal – being low-end, high-throughput Internet ecosystem providers, they turn more attention to other services and responsibilities they have to offer. As of now, there are not many players, seriously engaged in search for a way to address above mentioned problems, and even less of those, who have achieved certain success in it. Finally, editorial job – not only pruning and filtering, but also interaction with content creators in order to polish their material and find the most appropriate niches – is not done by those companies, it is delivered to media, which delegate it further, up until the point raw and undigested information flow reaches end-user, overwhelming him.

In my opinion, although, there is a way to cope with this situation, or, at least, to alleviate harsh consequences it would bring to quality of content, whether educational or entertaining, we consume on a daily basis. Instead of resorting to purely “manual” approach, characteristic for a few professional news media we still have (and even less of other types), or pursuing big numbers no matter the relevance and quality, common for big media platforms of mass-market, we could try a dual approach, picking, say, bloggers and vloggers on individual basis, while retaining predominantly automated approach when it comes to placement, frequency of repetition and targeting for created material, in addition leaving content creators with a dashboard to see and manage their auditory and to get an insights on who reads/watches them, who prefers them the most, and what other things these people prefer/would like to see. It means, we would use the best of both worlds to fight the situation of media surplus and disorientation.

Another issue of today’s information environment is chaotic and ragged incomprehensibly complicated course user search usually takes – the situation we call “user journey”, although to me it looks more similar to Brownian motion or after-lunch hiccups’ pattern. Let’s face it – even among those, who have reached this point of disturbingly lengthy article, it would be hard to find one proficient user of search engines or catalogues/meta-platforms of any sort, able to quickly surf through all this information inflow, in ten. This is not surprising – this ability is not that common and can be trained significantly only in case you interact with big amount of raw information on day-to-day basis – and only if you are forced to work closely with it, regularly making compilations, digests, comparisons and reviews. Besides the ability to analyze, the most important ability for today’s analysts – the ability, importance of which is often overlooked by people outside of industry – is to search for information and synthesize and compile it. Why, many professionals, services and products sell themselves, in a nutshell, as digests (or digest-makers), those, who are able to transform hiccup-like experience into a story, a journey, a picture. There are ways, however, to enrich the content space for end user, by illuminating the problem from different angles and making user’s experience more comprehensive and diverse, ways, which are often neglected by web content discovery platforms of any sort. These are:

  • Cross-channel communication, incentivizing cross-channel content discovery
  • Smart use of tagging and auto-sampling of news agenda based on users’ background (with built-in out-of-field discovery mechanisms, dedicated to bring new and unexpected scenarios to user’s established routines
  • Look-alike/score alike targeting – audience differentiation with no discrimination
  • Bottom line-based metrics instead of use of intermediary metrics like views/clicks/time spent

While advertising industry has made a considerable progress in use of all of the above-mentioned tools in recent years, content creators, guides and other media delivery system players are not that agile in adoption of new tools, in part, perhaps, because web-based content market is more fragmented and not so well-capitalized as web advertising market. It is also not that technologically advanced, partly due to the lack of resources, partly out of widespread misconception, that, as it is not that easy to automate delivery and targeting of this type of content, the industry’s margin from adoption of these new techniques would also be low and could not cover costs occurred. Content market is not so integrated, too, with many companies solving the same problems instead of relying on a multitude of third party platforms, focused on one problem/operation each. One can clearly see that ML (machine learning) techniques are over- and underestimated in web content industry at the same time. While there are areas, where application of even the simplest heuristics could considerably increase efficiency and throughput of overall platform, in other areas failed or under-performing attempts to introduce ML automation practices lead to critic of ML approach and damage public image of the whole industry, which could be looked upon as “unfit” for particular industry/area of application. As Annete Markham, communication researcher, argues in her paper Algoritmic Self, “…We should view algorithms as “actors” within our social worlds. In fact, they’re more important than any single person you friend on Facebook, because they dictate how and when you see every piece of content on the site”. Still, to get rid of dictatorship and algorithms leading you in a sort of informational “local optimum”, you need a help of editorial bords, a help of experts.

With an ample amount of information presented in Web and tools for its retrieval, classification, auto-tagging and cross-linking not developing as rapidly, we face a problem of data flood we should manage. Our journey has become more cross-media too, with people looking for videos, texts, images and audio podcasts about every interesting phenomenon they come across or dedicatedly search for. However, as we noted above, people are just not good to deal with this overwhelming complexity of the information world approaching. Neither are publishers and advertisers – with more people preferring videos to text (30% vs 54% for business executives according to Forbes research; in another survey, 59% of business executive answered they prefer video materials to text), more persons developing selected blindness (so that 60% of people under 40 do not mind watching in-stream video advertising before and after the video and close to 80% don’t mind watching it in a process), with more persons relying in search of contractors, equipment and methods to be used on video (with 51% of execs under 40 making business-related purchase in comparison with 26% of those, who are older than 50), with more and more professionals sharing educational, news and work-related videos with colleagues and uploading them (around 30% of all execs/professionals doing it daily and about 2/3 regularly involved) – video ads are almost as unrelated to accompanying content and watcher’s background, as it was several years ago. And if you look at the most popular channels and lists of videos you are supposed to like after reading WSJ article on recent financial markets’ development, you are going to be surprised.

Much of a content – but not enough quality and variety. Videos are viral, but they are often garbage and one can rarely find good video materials on topics, unrelated to extreme sports, entertainment, hobby, Discovery-like materials.

According to Socialblade, categorial distribution of 100 most popular YouTube channels is as follows:

As we see, videos are mostly for entertainment still, and, at YouTube, they are for reviews, clips, game replays, comedies and – to less extent – Discovery-like commercials. It is ok, but many of the categories are not represented in current internet video space at all with others either not being represented enough or not being findable/sellable at all.

With all this in mind, I applaud attempt of several teams in industry I am familiar with to upgrade information discovery routine to the next stage.

Content Discovery Systems – Advertising

Taboola and Outbrain, Revcontent and Adblade, Zemanta (acquired by Outbrain) and Nativo – well, these platforms were built with only good intentions in mind, but if you already don’t know what sort of a content these platforms usually resort to, just google them or go to their pages in Wikipedia. Associated with clickbaits, fake ads and unrelated advertising materials, these companies fell victims of the technology and effectiveness, which were intended to revolutionize the industry and make them disruptive.

What problems do these services have catering to the needs of advertisers and web users?

  • Advertisers first, publishers and users second – this approach, while natural and able to quickly pay out, does bring some problems – you don’t care much about user journey and don’t know (or don’t choose, or choose not to interfere with) what type of a content you are promoting. In the long run such model is unsustainable
  • Again, if you don’t vet content, don’t invest in editorial work and automation, preferring clickbait approach instead of value-added advertising, you start to face some unwanted problems such as specific selective blindness and increasing use of adblockers. These, in turn, lead to deterioration of advertising and accompanying materials’ quality as less and less advertisers rely on your company to advertise their products and services – after some time, these players probably had no choice
  • If you work on per-click, per-view basis instead of per-lead or per-sale basis, or rely on revenue share model, you are forced to advertise via content, which is viral, but hardly capable to incentivize or increase brand loyalty.

There are also hardly many targeting choices for clients of these companies – as they don’t tag/assess traffic going into the system from numerous sites and don’t pick relevant content for the advertising materials they have (aside the one they are already built-in), it is harder for them to harvest proper audience for your advertising materials.

Photo Credit: Uberflip Hub


Content Discovery System – From Articles to Vlogs

NVB project (disclaimer: I am affiliated with this project) follows different approach, creating platform of native video advertising distribution – and ecosystem of content discovery and qualitative content creation incentive aligned with it. Advertising materials used to contain relevant and useful information (at least, to a bigger extent, than now) – and that is the state NVB team wants to bring video online advertising back to. To provide content creators with proper and most effective advertising, boosting their revenue, to provide users with content of bigger quality and variety (as concerning media used to deliver it or points of view reflected in), to give smaller player an opportunity to focus on content creation, leveraging modern content distribution and targeting technological solutions, born in AdTech, and fed with information on auditory segments, users’ associations and most looked-after news/topics on the side of auditory is a mission of NVB, a way team decided to stand out of competition, grow its company and capitalize on existing technology expertise.

There is also another attractive side in NVB approach. As they are going to work closely with vloggers, company has decided to build CRM for videobloggers, not just showing intermediate statistics for them, but giving them advices based on users’ feedback, market situation and issues currently in trend. In an ever-changing world of video blogging this feature would be useful for professionals and amateurs alike.

As Michael Bhaskar, whom I have already cited in this article, states: “We need the big data approach,  but we also need the personal, human and real approach, and that’s part of the dynamic that publishing is now caught up in.”

Content Discovery Systems – Social

While many of social networks/media portals have in-built mechanisms for content discovery, in some cases it plays central role and serves as a focal point, around which all the system is built. These are, for example, Pinterest and Quora, and also less famous Flipboard, Digg and Nuzzel  (all in their way; although Twitter, Facebook and others do a lot to develop content discovery, their main focus is on something else)

Pinterest is for design, fashion and hobbies such as playing games like FM카지노, Quora is for those, who want to get advices and read of opinions, Digg and Flipboard are editorials, which pick the most interesting things from all around the Internet for you (with some curation and customized news feed inside), Nuzzel forms a newsfeed for practitioners from many industries. Still, while Pinterest (and Facebook, and Twitter) was able to monetize its popularity without damaging these platforms’ intrinsic value, there is still a long way to go for Quora and other companies mentioned before they are able to become profitable. Combining popularity and monetization is still not that easy as it seems.

These sites, while all of them allow users to interact socially and use social networking information and/or interests’ list to curate a list of best stories for user, fall into two categories: platforms, which focus on particular type of content, content delivery and/or discovery, growing audiences being for them the only way to eventually justify funds poured into them, and news aggregators, creating curated news/ entertainment feeds for their users


For many years content delivery and discovery in the Internet was dominated (at least as concerns developed countries) by several players, Facebook and Google empires being the largest among them and other aspiring players either becoming part of their teams or dying and selling out technology used. Still, with many newsfeeds out there, more than one way to bundle information up and many of information discovery/distribution/creative facilitation problems still unsolved, it is not so much a market of fierce competition among players, but market expecting innovators, with much capitalization potential for new  incumbents, able to solve its urgent problems. Elsevier does it, clinicians and medical researchers do it , Facebook and Twitter do it a lot. There is a way, in my believe, to do it more effectively in some areas, and earn from it.