How Google Eliminated the Excess in the Index
Google’s New Life as a Minimalist: Part Two
In Part One of this series, Cindy Krum explored the core principles of the modern Minimalist movement and applied those concepts to Google. Specifically, she examined why and how Google began work on Mobile-First Indexing, a new way to organize and rank indexed content.
In Part Two, she further explores just how Google cleaned up their index.
Removing the Clutter
As noted in Part One, a primary goal of Mobile-First Indexing was to make the overall search engine more efficient. To do that, Google had to break free of its hoarding mentality (where it kept every bit of data it could on older websites) and figure out a way to streamline indexing and delivering relevant search results.
Step one in simplifying their system was deciding what to cut.
When an aspiring Minimalist can’t figure out what to get rid of, experts suggest putting the item in ‘purgatory.’ When something is not readily available but is capable of being accessed if necessary, you can decide whether or not you really need it. For example, when a Minimalist cleans out a closet, they might remove everything to another room, adding things back into the closet as they wear them. This helps them determine if an item is worth keeping.
In Google’s world, this concept of purgatory was historically called ‘The Sandbox’. As they work on the Mobile-First Index, everything in Google’s current index sits in the Sandbox, waiting to be recrawled and validated by the desktop crawler. Only then it can be added to the Mobile-First Index.
Google is adding websites to the Mobile-First Index one at a time. From what we can tell, Google is still crawling most sites with the desktop crawler, but it may now have an added validation function that lets it recommend entire websites for addition to the Mobile-First index. Once your site is in the Mobile-First Index, it will then be crawled primarily by the Smartphone Crawler, and only about 20% by the desktop crawler. (If you’re curious about how your content is being crawled, there are tools that can help you find out.)
The Problem with New Content
This system works great for older content, which may or may not be accessed on a regular basis. But there’s a problem. What happens with new content?
For someone practicing Minimalism, the goal is to try to reduce consumption and buy fewer things. Each purchase or other acquisition should be seriously contemplated for the value it brings to their life.
It’s much the same with Google as they go about evaluating newly-added content on websites.
Because not all new content has value, Google no longer wanted to add new content to the Sandbox automatically because the desktop crawler can’t evaluate all the new content fast enough. Instead, Google needed a faster way to evaluate new, time-sensitive content like news, events, and products before it was permanently added to the new, more selective Mobile-First Index.
They began testing Accelerated Mobile Pages (AMP). The HTML for AMP was written to be fast for users. More importantly for indexing, it was designed to be easy for crawlers, because only the most important information is permitted in the AMP HTML content.
To make it even easier for Google to evaluate the level of engagement, they also cached and hosted the entire page in order to eliminate low-quality content from the rankings. AMP content that was linked to regular HTML content was added to a temporary index with a temporary, Google-generated URL. But the original version of the content still needed to be validated by the desktop crawler before it could be added to the Mobile-First Index.
Organizing What Remains
Once Google cleared out unnecessary content from the index, they needed to redirect their focus on what remained—the essential components of the search engine itself.
To accomplish this, Google started focusing more on taking content in through feeds and APIs. This way, they could evaluate content from apps, websites, and other sources without relying exclusively on crawling and URLs. By changing their focus, Google could aggregate the most important information and provide a cross-device presentation layer. Information from many sources started being dynamically-curated into a single source of organized information that users could interact with directly in the search result.
Google then used clicks and engagement with the Search Engine Result Pages (SERP) as a proxy to assist with their curation process. SERPs are the rich search engine results that include blurbs, ratings, and other helpful information that allows a user to find the most relevant results. With SERPs, Google started expanding the footprint of information in the search result, formatting richer results as ‘cards’ and carousels, so that they could test which results most appealed to users. Carousels broke up users’ habits of always clicking on the top result, and expanded the testing so that more results could be quickly evaluated, clicked on, or passed over by users. This has become so important that Google just consolidated ‘Rich Snippets,’ ‘Rich Cards,’ and ‘Enriched Results’ (and presumably ‘Featured Results’ ‘Featured Rich Snippets’ and other variations that were not specifically listed) to simply be called ‘Rich Results’ and launched a ‘Rich Results Testing Tool.’ Hooray!
But what about all the information and content that doesn’t get curated and shown at the top of a search result page?
Minimalists like to use a ‘Museum Model’ to think about curation and organization. In an art museum, the best pieces are hung on the wall, and the other related pieces (the ones that are interesting but perhaps not as widely appreciated) are stored neatly in the basement archive, flat-packed and numbered, so that they don’t take up too much room but are still easy to find.
Google seems to be transitioning to the Museum Model more and more in its results. The curated, featured information is at the top, and the supporting work, details, and the less critical information is still available if you need it, but hidden, so that the real focus is on the high-quality results.
This model provides us a great way to look at the new results that are becoming so common. Once the best possible search results are curated, and potentially improved with an ‘Answer Box’ or Knowledge Graph result, other related items are tucked under that result. The most important and relevant related content is put into interactive tabs in the Google-curated result. Less important information is filed underneath in the web links. As long as the web links are not duplicated, have minimal storage costs, and are not hindering the main result, they may be fine to keep, as long as they are well organized. (Does one woman need 25 pairs of black leggings? No. Nor does Google need 25 copies of the same article from 25 different news outlets.)
Maintaining the Minimalist Discipline
As time goes on, it’s easy to get lazy with your routine, and this is no less true with the rigors of Minimalism. You must constantly pay attention to your consumption patterns in order to avoid over-accumulation in the long run.
It may be easier for Google to maintain some discipline with the new index, as long as rules for content evaluation are constantly updated to meet the changing demands that will become more common with Mobile-First Indexing (i.e., different devices and query types).
The desktop crawler will probably help by crawling the Mobile-First Index to weed-out content that no longer meets the minimum quality guidelines. Google will work harder to preserve the sanctity of the new Minimalist Mobile-First Index by raising the bar for new content to get into the index, and simply not indexing items that are suspected to be spam or duplicate content.
As voice search and the number of connected devices continue to grow, so will the demands on Google to efficiently and effectively index and surface the best content for searchers. People who have worked with the web for a long time may struggle with the curation and the lack of ‘just-in-case’ indexing, but the reality is that an exhaustive index is not what people really want.
Google will continue to get better at teasing out the essence and intent of a query based on context and machine learning. In the end, focusing on the best answer to a question is more useful than focusing on all the potential answers to a question, and this how Google is adapting in Mobile-First Indexing. The change may be jarring and encounter resistance, but it is the most simple, scalable, and useful solution to the growing glut of ‘stuff’ available to index.
Cindy Krum is the founder of MobileMoxie, one of the leading mobile marketing companies. The company’s clients include MTV, Party Gaming and a number of Fortune 50 companies. Cindy is also a regular speaker at major conferences around the world.