Google Author Understanding and Knowledge Domain Expert Perception

Google has its own Knowledge Graph and Base, while Microsoft Bing has its own Satori, which means "understanding" in Japanese. Richard Quien chose the word "Satori" for the Microsoft Bing knowledge base because semantics is about "understanding meaning", and connections between things.

But, entities, attributes, values and predicates, nouns and verbs, query templates, and word sequences are not just about Information Retrieval and Extraction. It is about becoming an authority on a topic. And, who is the absolute authority in a world where anyone can write anything while Artificial Intelligence can generate one million words a day?
About Author
Koray is the founder and owner of Holistic SEO & Digital. He shares his knowledge through articles, webinars, conferences, and SEO-related events.

As well, Koray regularly writes articles for the Serpstat blog. He focuses on Data Science, Machine Learning, Web Development, Entity-based Contextual Search, Semantic Content Network Creation, and Marketing, Branding, and Reputation Management. Here are some of his article that might be intresting for you.
How can a search engine understand who the real author of an article is? How can a knowledge domain expert prove that they are better than others? And, what does "indexing the physical word" mean for a search engine?

This article explains Google's perception of authors, author entities, their contribution to Google's concept of expertise, and Google's previous inventions, designs, and researchers, along with experiments for authorship integration to the author ranking.

What are Google's Patent Inventions for Authorship?

Google has created many continuous and connected authorship and author ranking designs and inventions. Some of them are below.
Google Answers
Google Talk
Google Plus
Google Knol
Google Sidewiki
Google Groups
Most Google designs for authorship-related search engine functions focus on recognizing the author for a specific topic and ranking the author for their expertise. Some Google designs rank authors according to user feedback, while others focus on the accuracy or diversity of the N-grams and topical entries.

For example, Google Plus is connected to the deprecated Author Tag that Google had used to recognize authors and incorporate them into the direct ranking algorithms.

Google Knol focused on creating authorities for specific topics with user feedback and monetization incentives.
In some designs, Google mentions the AgentRank concept, similar to the PageRank, to explain which author's opinion and reference mean more to which author.

Since PageRank comes from the early days of the search engine, it is highly relevant to authorship because the first links on the open web were for citations of the academic research papers to show which researcher and academician have a better relative prominence for certain query groups, and phrase lists.

What is a Knowledge Domain Expert for Google?

A knowledge domain expert for Google requires an actual existence. As in Google's 2009 Vincent Update, the physical world exists to provide accountability for the search engine's perception. A non-existing author in the physical world can't be an authority on a topic, while it is creating a risk for web search engine users.

Thus, Knowledge Bases of search engines or common databases such as datacommons.org are prominent to centralize the author entities. Because search engines have different online and digital indexes, but in reality, there is only one physical world.

Having a centralized and standardized knowledge base is prominent to help search engines to share the same physical world understanding principles.

Satori is more petite than Google's Knowledge, while both the search engines accepted that they use and integrate the Data Commons into their internal search systems.

To be a knowledge domain expert, the third-party references matter with a high-level consistency, and the author's reputation affects the publisher's platform or is affected by it.

Below are some excerpts from Google's projects for reputable authors and publishers.
A Knowledge Domain Expert can be different entity types, such as an author, organization, or group of people, and organizations simultaneously.
A Knowledge Domain Expert must have an exponential amount of information with citations for a specific topic. Diversified quotations from different institutions and corporations are necessary.
Consistency in the publication of a singular topic is prominent. Publishing new information regularly for the different sub-topics is prominent to show the expert's activity for the topic.
To have unique N-grams and phrase variations for the topical expressions. Topical entries determine the categorical quality of the specific document, and using the distinct N-grams show the originality of the information, along with unique expression identity.
To have a memorable face, voice, and appearance with different other entities from the singular topic. If an author's name and the face appear for 44 various topics, it is not easy to believe that the same author has a physical existence and reality.
To be searched for the specific knowledge domain terms. The author should appear in the queries and has a search demand for the particular topic.
The presence of distinctive sentences for the specific topic. Unique sentences and word sequences show the author's originality for the given topic.
To be repeated by the other sources. If the author is followed by other sources with the same sentence structures and declarations, it gives the knowledge domain expert a leading position.
To have inventions for the specific topic from patent offices. The knowledge domain expert should have outstanding ideas and concepts for diagnosing specific situations or dynamic relations. To use the first-time concepts for a topic before it starts to exist within the queries.
Availability of a book exists with positive sentiment reviews for the singular topic. If the specific entity has a book for the topic with a positive signal, it is easier for a search engine to define the person as an expert.
To be searched for the first time for specific topical entries. If the particular entity appears with specific categorical terms for the first time, it shows the reference value and need of the expert.
The presence of a solid groundedness for the topical declarations with a high level of accuracy for the topic.
To be mentioned by the high-level authoritative publications and other knowledge domain experts for the singular topic.
Different content formats for the same topic include audio, video, image, infographics, social media posts, and blog posts.
Clear face and voice recognition across the different video and image platforms.

What is the Main Content Creator for Google?

The main content creator is the creator of the primary content within a web page. He involves the identity of the author of the specific content item. For example, I am the main content creator here.

Google states the identity of the main content creator within the Money or Your Life context as below.
A website's reputation is based on the experience of real users, as well as the opinion of people who are experts in the topic of the website. Keep in mind that websites often represent real companies, organizations, and other entities. Therefore, reputation research applies to both the website and the company, organization, or entity the website represents.
The website's reputation and the brand it represents are connected. A brand's reputation is affected by third-party sources while reflecting its overall E-A-T.
Knowing more about the reputation of a website and content creator can also help you understand what a website is best known for and, as a result, how well it accomplishes its purpose. For example, newspapers may be known for high quality, independent investigative reporting while satire websites may be known for their humor.
Certain types of sources (websites) are connected to certain types of topics and content formats.
Many websites are eager to tell users how great they are. Some webmasters have read these rating guidelines and write "reviews" on various websites. You should aim to find independent sources of reputation information about the website and creator of the Main Content ....
Google is already aware of fake reviews on the internet. And in fact, Google already ranks the sources and experts based on specific topics by filtering out bad reviews or non-quality and authentic reviews.
source rating
For example, according to Google's algorithmic assessment, searchenginejournal.com doesn't have a proper third-party reference for defining itself. Usually, Google picks Wikipedia, or Wikipedia-like extensive amounts of information, including sources for definitions. Google collects third-party sources to give some reviews and opinions. And certainly cares about the activity timeline of the source by saying, "Indexed ten years ago the first time."

“For YMYL informational topics, the reputation of a website or content creator should be judged by what experts in the field have to say. Recommendations from expert sources, such as professional societies, are strong evidence of a very positive reputation.”

Google states that the reputation of a website or the content creator for the informational topics should come from the same level of experts. As shown below, Google Scholar is a good source for understanding certain experts.

For example, below, you can find Trystan Upstill, who I admire from the Google Company, to understand his overall expertise on a particular topic. His verification, research, coauthors, and their organizations. The citation amount per the study, and years of the research, are clear signals of Trystan's expertise.
Trystan Upstill authors dashboard
The information about Trystan Upstill is validated by other third-party sources such as researchgate.net. The Trystan Upstill has many years of experience and expertise in certain subjects, and we can track this entity over the years as below.
He is verified for some other resources with similar coauthors and same topics along with identical research papers.
Trystan Upstill in google search
In Google Image vertical search, his face is clear, and he is associated with Android, which is Google's internet and mobile phone technology, and Google IO. We can see the name of the Android within the query refinements too.
Trystan Upstill the washington post
The Washington Post posts his image, mentioning the Google IO, and Google News to explain his perspective.

Since Google can read images with context, for example, “Imagen,” it is clear that Google sees the context of pictures, and here, Trystan, as an authority, presents his opinions to the audience of Google IO.
Trystan Upstill articles
These are the results from the Google News vertical search. And the name of the author entity, web technologies, Android, and Tablet are directly related to each other.

But not just the fresh news. Let’s check the entity’s lifecycle on the SERP.

Google search results from 2007 mention him as a Google engineer, and the topic is still related to Information Retrieval and Web technologies.
Trystan Upstill author
Seeing seobythesea.com, owned and created by the Charles Darwin of SEO, Bill Slawski, while examining the Trystan Upstill years ago, is not a surprise. And, Google can see Bill Slawski as a “third-party expert” who references the Trystan Upstill.
Trystan Upstill author
Another Information Retrieval research that mentions and quotesTrystan Upstill from 2006.
Trystan Upstill author
Another one is from Utah University. The research paper is new, but most of the other resources and experts are old in the industry, as Trystan.
Trystan Upstill articles
One more mentions the “Advanced web technology” from 2004, which reminds Android of the future of 2004.
Trystan Upstill author
These samples show how old an authoring entity is in a particular field and how many times they are mentioned for specific topics and contexts with positive sentiments, research, and more. Thus, recognizing the entity and seeing their reputation for the main content is more manageable.
Leo Tolstoy books
Books of Authors are presented by Google to provide relevance to possible entity-seeking queries.

Can an Author Increase the Reputation of a Website?

The author (Main Content Creator) and the Publisher (website) affect each other’s reputation. And Google can understand Author Vectors, Website Representation Vectors, and stylometry.

In other words, to increase your authority or reputation, you can’t put the name of Albert Einstein into the author section. And, of course, this sample is even beyond stylometry.

Let me explain what stylometry is.

In this article, I used the word “thus” 3 times.
Used the word “also” 2 times.
The word “and” 17 times.
And some other authors use stop words much more than I do. Some others use present tense sentences more. Some use “can” instead of “be able to,” while others use numeric values, more, or certain types of word sequences. If there are enough samples, the author of the content can be understood correctly.

Do you want an example?

As a history reader, I am a big fan of Alexander Hamilton since he was one of the real founders of American National Economy basics as Alfred Thayer Mahan created the navy basics. And,12 pages of the American Constitution were a topic for a debate to understand who was the real author. James Madison claimed that, for these 12 pages, he was the real-author of these papers. But Alexander Hamilton gave his private notes to James Madison before dying in a duel.

Thus, Alexander Hamilton wasn’t there to say that “I was the real author of these stolen 12 pages” of the American Constitution. Luckily, 150 years later, we had stylometry with Python.
stylometry in search engines
Above, you can see the language tonality, style, and word distribution of the “disputed papers” of the American Constitution for authorship and ownership. And, you also see that Hamilton’s word distribution curve is highly similar to the disputed papers, while Madison is slightly different, but also second most similar.
articles in google search
Brodie Clark’s Article Carousel that shows his Search Engine Journal articles.

I also want to draw your attention to the fact that I have used the word “also” three more times until now. Since AI generates content with millions of words or sentences, a search engine can easily recognize which websites use which language models, and whether they are the same or not.

What is Expression Identity for Authors?

Expression identity is the expression style of a specific author. Author identity and expression identity must match each other to make a search system understand that the author is authentic and human.

If you see that Jean-Christophe Grange starts to write as in a teenage magazine, you wouldn’t believe that the lines of text come from him. Imagine that someone imitates another person’s voice while speaking to you. You would feel it, right? The same happens in the stylometry and expression identity of the author entity.

An author’s expression identity represents the author’s talking, writing, and communication style. Thus, it is easy to understand who uses a “ghostwriter.” Even if PageRank and high Information Retrieval scores protect many sources simultaneously, Semantics and E-A-T are beyond these two fundamental search systems. Because PageRank can be balanced with Topical Authority, E-A-T is the strongest, heaviest, and a multi-layered indirect ranking signal. Thus, “just a blog” or “non-real author” are harmful qualifications for websites.

How Does Google Use Authors in SERP?

Google defines an author as the owner of the content. In this case, a single comment belongs to the commenter, while the article belongs to its writer. And, the content's ownership is shared between the publisher and the author, as it happens in this article between me and Serpstat blog.

Google encourages searchers to seek out more expert authors in their field. In this video, Johannes Cronje explains in more detail how to find such experts through Google Scholar.

As well Google bears up authors to become a part of Google Books.

Encouraging authors over the years have had some counterparts on the SERP. Authorship started its journey on Google with Google Author Tag as a direct touch for authorship understanding.

Due to heavy spamming, the Google Author Tag is deprecated, but Google’s consistency for its patent filings didn’t end. And Google found other ways to propagate authorship for rankings as an authority.

The first design that talks about Author Authority from Google focuses on the social networks to explain that the social media platform’s user represents the user’s expertise. Othar Hansson, one of the creators of Google Plus and the Authorship idea, is also an inventor of this design too.
authorship in serp
A section from the design is below.

  • Query (Q) data
  • Authoritative user (AU) data and
  • Score (S) data, determined from one or more social networking services. In some examples, a content creator is authoritative for one or more queries, with the score providing a relative measure of the authoritativeness of the user for a particular search.
Showing prominent users for information retrieval requests” is important because search engines file a patent for finding the authoritative experts for topics via social media content, feedback, or different types of scores.

Most importantly, this is a continuation patent. The first version is from 2014 and mentions the author tag. The second version is from 2015 and notes the “authoritative rank” rather than “author tag related authorship.
To strengthen these declarations, I will put some explanations from Google.
hight quality site
The questions above are from 2011. They come from Amit Singhal. You might remember this “Is this article written by an expert or enthusiast” question. Because this question is passed from Panda-related algorithms to Broad Core Algorithm Updates.

The same question appears 8 years later, from Danny Sullivan, as below.
Expertise questions
Regarding recognizing authority and expertise, in 2011 Matt Cutts has given an evergreen signal below that links should be reciprocal between publisher and author.

"You don't want to have fake Barack Obama. And so one of the things they're doing is it requires, down the road, if you allow across different domains - you can link to whitehouse.gov, but unless whitehouse.gov links back to you, then it hasn't been authenticated."

And, this is what John Mueller said 10 years later:
...our systems try to recognize who that is, what that entity is. And we do that based on a number of different factors. And that includes links to profile pages, for example, or visible information that we can find on these pages themselves.

So my recommendation here would be to at least link to a common or central place where you say everything comes together for this author, which could be something like a social-network profile page, for example.
And use that across the different Author pages you have when you're writing so that when our systems look at an article and see an Author page associated with that, they can recognize that this is the same author as the person who wrote something else. And we can kind of group this by entity.
For a more in-depth understanding, I suggest you check out the technical side of the authorship explanation from Google.
From Matt Cutts to John Mueller, the consistency continues. Matt Cutts said that "in-depth articles are used for understanding the authority of the author."
Also, John Mueller stated that “author names should appear in news articles to provide trust.
With all these things, after the removal of “rel=publisher” and “rel=author” values, the “author” search operator is removed from Google News. And, “Author Stats” are removed from the Webmaster Tools (Old version of GSC).

These changes happened in the PageRank toolbar and numeric PageRank value removal from SERP. Instead of giving numeric stats with different dimensions, Google started to use “closed” internal concepts and progress.

Garry Illyes stated that they would continue to use “rel=author” if more people would use it.
And as always, the Black Hat SEOs were chasing authorship manipulation.
Black Hat SEOs authorship manipulation
These manipulation attempts destroyed two Google Products, Google Knol and Google Answers. Google Authorship concept continued, because without understanding domain experts, ranking the sources is risky and prevents balancing PageRank with informativeness.

And these discussions of authorship didn’t end. Even Google had to defend its own “articles” against publishers. How? Let’s check.
Gianluca Forelli states that Google steals content from websites.

Google states that publishers steal content from them.
I have taken an example from Gianluca Forelli, which contains the phrase “Valle d'Aosta,” and put the date for restricting the results between 2010-2017.
Valle d'Aosta
Google states that in 2018, they created these sentences.
Valle d'Aosta
Despite this, they appear on Google earlier than 2018 as they are. There is a minor possibility that Google systems used existing documents to generate content, and these lines are overlapped.

As well, there is one particular thing that authorship is not easy to resolve and prove. Thus, gaining trust and expertise on a topic doesn't happen in one day, so even the search engine must prove that its content belongs to itself.

And, sometimes epic reactions, as below, happen.
Matt Cutts twitter
With the improvement of AI and the return of Content Farms, we will see more debates about Authorship, Author Expertise, and Knowledge Domain Experts. We will see “expert-level” content for “treatment of cancer” while AI is behind it, but no expert. And, since AI can “repeat” existing information while not being able to see or use the new information, Google algorithms might find and focus on different subtle signals for real-world expertise understanding.


Although the attribute has been deprecated, the author's authority is still vital. And Google, SEO, and authorship are closely related.

A further indication that Google didn't forget about its original goals is the most recent Google Knowledge Panel Designs for the Authors. Google's Authorship Reconciliation and Presenting are relevant to the Brand SERP and Knowledge Panel Optimization. Google always gives the most popular and prominent queries priority. Google prioritizes musicians, authors, playwrights, and actors since they are pertinent to many queries. Query-Entity-Phrase associations, "defining the queries," or extrapolating entity attributes from the questions because "author entities" are relevant to the particular query. And new "context vectors" with "author identity" can be created based on an author's related query count and the success of the articles the author has written for those queries.

