31.05.2024 Pozycjonowanie 9 min

Google search algorithm leak – SEO secrets revealed

Blog Google search algorithm leak – SEO secrets revealed

Table of content

The internet is abuzz with news about a major data leak at Google, shedding light on how their ranking algorithms work. Many myths and facts about SEO can now be verified by looking directly at the leaked information.

Google data leak – what happened?

Accidentally, Google bots published internal reference documentation for the Google Search’s Content Warehouse API on GitHub. This has provided more clarity on the features that determine page visibility in search results.

Although over 14,000 ranking factors were revealed (!), we don’t know to what extent they influence positioning due to the lack of scoring.

It’s worth noting that the API documentation was leaked, not source code, so many of these functions might be outdated. The data was published on March 27, 2024, and removed only on May 7, 2024.

We don’t know how the individual modules work or their associations. However, we do have a list of names for ranking factors along with brief descriptions, such as:

Przykład danych, które wyciekły z Google

Key insights from the Google Data Leak

Below are the most important SEO lessons from the Google data leak.

Domain authority matters

In the “Compressed Quality Signals” section, there’s a function called “siteAuthority” that is responsible for better ranking of trusted sites.

Site-authority jest jednym z parametrów SEO.

Again, we don’t know how this function is measured and used in the overall SEO scoring, but we do know for sure that such a parameter exists and is used as a ranking factor. Similar reservations apply to the following parameters.

Clicks influence positions

We know that Google uses machine learning algorithms. It has long been suspected that clicks are an important attribute for automated ranking – after all, engaging content is visited more often and thus should rank higher. Indeed, Google uses clicks as a strong signal in its ranking systems.

In the function below, we have various click signals that influence SEO:

Sygnały kliknięć są kolejnym z czynników rankingowych.

While the initial signals are understandable, it’s worth explaining the rest. According to Google’s definition, squashing is a function that prevents one large signal from dominating others. In other words, the algorithms normalize click data to prevent rank manipulation through artificial, excessive clicks.

We can thus assume that the system is reasonably balanced and reflects natural interest in content by users, and extra clicking on one’s own content won’t help achieve higher positions.

Articles from certain authors position better

This is a significant surprise for the SEO industry. Much has been said about factors like E-A-T or topical authority, which relate to the site’s reputation, affecting its expertise and trust. Now we know that the same applies to individual authors!

Author evaluation appears to be comprehensive – for example, one function checks if the author of the page is also the entity representing the site.

The location of the link affects its strength

The “sourceType” function shows the relationship between the value of the page and where it is located. The higher the site level and the fresher the subpage date, the stronger the links from it. The most valuable links come from the homepage, especially if it is regularly updated.

Homepage PageRank affects subpages

It appears that each subpage borrows the reputation of the homepage before it builds its own PageRank (measured by click signals and other behavioral factors).

Pagerank strony głównej wpływa na SEO podstron

This means that new subpages of a strong site will rank higher initially because they are inherently more trustworthy. Only over time, based on user behavior, does Google determine the actual usefulness of the content. This way, a subpage of a weaker site can be promoted if its content is truly more valuable.

Google considers font size

Surprisingly, Google assesses every detail, including the average font size, as a ranking factor. Presumably, both extremes (too small or too large fonts) hinder readability, resulting in a lower rank.

Google ocenia wielkość czcionki

Interestingly, the font size for anchor text is also evaluated.

Wielkość czcionki dla anchorów też wpływa na pozycję.

Originality of text important even for short content

Not only extensive articles have SEO potential. Through the “OriginalContentStore” function, Google evaluates concise content for originality—likely to rank short but useful terms, such as online dictionaries.

Google ocenia oryginalność krótkich treści.

Page title is a key parameter

The leaked documentation indicates that matching the title to the user’s query increases the page’s visibility. This is handled by the “titlematchScore” function.

Freshness matters

It has long been known that Google promotes fresh information. Now we have confirmation, expressed in several functions:

bylineDate – the visible date on the page.

syntaticDate – the date visible in the URL or title.

semanticDate – the date included in the page content.

Thus, Google assesses not only the publication date but also the dates used in the content. It’s beneficial to cite current reports and studies to improve SEO results.

Sites with videos are treated differently

The “isVideoFocusedSite” function indicates that if more than half of the subpages contain videos, Google intervenes in the site’s rankings; however, we don’t know how.

Jeśli strona zawiera filmy, jest inaczej traktowana.

Health-related sites indexed differently

There is a parameter named “ymylNewsScore” that relates to sites that could impact the health or life of its readers (YMYL – Your Money or Your Life). It seems that YMYL sites are ranked by different rules.

Inaczej są indeksowane strony o tematyce zdrowotnej

Google assigns the YMYL attribute not only to websites but even to specific queries.

Google ocenia, czy dane zapytanie należy do kategorii YMYL.

Small personal sites get special treatment

The Google data leak revealed a mysterious ranking factor called “smallPersonalSite.” We don’t know how “small, personal sites” are defined and for what purpose.

We can guess that small businesses receive assistance in the initial phase of building visibility to quickly generate first traffic to the site. Perhaps this function is a reaction to the Helpful Content Update, which caused a significant drop in visibility for smaller sites in favor of better-optimized ones.

Małe personalne witryny są inaczej traktowane w Google.

Google evaluates if you stay on topic

Algorithms value content related to the site’s theme more highly. The “QualityAuthorityTopicEmbeddingsVersionedItem” function checks if you publish content from your field or if you deviate too much from the topic.

Lepiej oceniane są treści skorelowane z tematyką strony.

The article is a substantial summary of iPullRank publication, which conducted a direct analysis of the code. If you want to know more details, we refer you to the source, where there are more conclusions.

Summary

We received many insights into how SEO algorithms work. However, it’s not hard to feel that there’s no big sensation in the Google leak.

For many years, Google has clearly indicated what it pays attention to when evaluating a site. It turns out that the leaked information is merely a technical confirmation of its guidelines. The data leak only reinforced the view that:

Creating valuable content and maintaining high site quality is crucial.
There are no shortcuts—regular work, content development, and continuous site optimization are necessary.
To effectively position your site, it genuinely has to deserve it.

Analyzing ranking factors is a bit like analyzing a poem. You can examine its structure line by line, study stylistic devices… but that won’t make you write brilliant poetry immediately.

Similarly, with SEO – you can delve into theory and read about each function individually, but only action and creating valuable content will allow you to achieve a high position. Contrary to appearances, it’s more art than technicalities.

Bartosz Ciesielski

Experienced copywriter who explores the secrets of marketing and positioning. In his articles, he refers to the contemporary achievements of psychology and the philosophy of language, creating cross-sectional content. As a member of the Content Writer team, he takes care of technological and organizational solutions that increase the quality of work. Privately, he reads a lot of books, learns new languages and composes piano pieces.