Start Exploring Keyword Ideas

Use Serpstat to find the best keywords for your website

SEO, March 16, 2021 | 18251 40 | 54 min read – Read later

How To Retrieve Google Search Results Into A DataFrame For Analysing With Data Science Skills Using Python

This article will show how to record and analyze an example of SERP with Python to understand the search engine's perspective and character better. You will also see Topical Search Engine (Semantic Search Engine) features and other possible ways to understand different SERP groups for similar entity types and categorize the documents on the SERP according to various data points.

It is important to save Google search results, compare for different dates, geographies, languages, response times, schema markups, content-related details, or SEO-related tags, to understand the search engine's algorithmic requests and what needs to be done in an SEO project.

Note: I have started to write this article before the December Core Algorithm Update, so you will also see how the results change after the algorithm and how you might record these changes for research via Python. Lastly, many thanks to Elias Dabbas for fixing my errors and helping me in writing this article.

Contents

1. Introduction
2. What are custom search engines of Google?
2.1 How to create a custom search engine?
3. How to create a Google developer console project?
4. How to retrieve Google search results with every detail into a DataFrame?
5. How to search for multiple queries for multiple locations at the same time for retrieving results into a Data Frame
6. How to refine search queries and use search parameters with Advertools' "serp_goog()" function
6.1 How to use search operators for retrieving SERPs in bulk with Advertools
7. Last thoughts on Holistic SEO and Data Science

Introduction

I have used Custom Search Engines to examine the SERP during one of my previous SEO case studies to understand Google's perception for finance-related queries. And I have published this case study which focuses on an SEO Strategy that relies on Core Algorithm Update with a 150% traffic increase within 5 months with two core algorithms win with OnCrawl. It might help you during the December Core Algorithm Update.

I have learned that Google brought the "Knowledge Graph Entities" to restrict the search results according to the different entities and document's and source's relevance to these entities

In different search engine verticals, you can use many tools, especially Serpstat, to see which type of queries certain domains get better results. However, there is an advantage in analyzing the SERP with Python.

Python is a programming language that is completely flexible and works within your creativity, in other words, you can set up your own "Keyword Rank Tracker" system with the information you will learn in this article or you can download search results according to different languages and countries at the same time.

Would you like to learn how to use Serpstat to boost your results?

Leave a request, and our experts will advise you on the development of your project, share training materials, and offer test access to Serpstat!

Request demo

To understand the Topical Search Engine and entities' effect on the SERP, Natural Language API of Google and content categorization along with named entity recognition might help.

We will use the Advertools Python Package developed by Elias Dabbas to retrieve Google SERPs with Python. At the same time, while taking advantage of Google's custom search engine, we will be using the custom search API via the Google Developer Console.

First of all, let's start with the custom search engine because you can get a lot of new information in this area.

What are custom search engines of Google?

Custom Search Engine or Programmable Search Engine are search engines that work with Google algorithms customized according to schema type, entity type, region, language, secure search, websites, where the keywords are passed, and more.

Like Topical (Entity) based categories, we also have query categories within the PyTrend that you can use. You might try to see "Taxonomy and Ontology of Search" with query and topic classification and matching for a better SERP Analysis. You will hear the term of Semantic SEO, more in 2021.

Custom Search Engine has been specially designed by Google so that many publishers can use it within their internal search system. The primary purpose here is to increase the use of Custom Search Engine and increase both publishers and Google's earnings with ads in CSE.

How to create a custom search engine?

To use Advertools' "google_serp ()" function and Google's custom search API, we need to create a custom search engine. Creating a CSE via Google is quite easy. I suggest you use CSE at the same time to understand the Google Algorithm. It is quite instructive to see which query yields results with which entity type, structured data type, region, country, safe search, or different properties and compare the results.

However, the topic of this article is about how you can do this by utilizing data science skills and techniques. I will be using Python as the programming language.

Therefore, you must follow the steps below.

Open the https://cse.google.com/cse/all address.
Click "Add" new for adding a new CSE.

Enter a CSE name and CSE site for searching results by also changing languages to "All languages" from English or another language.

Activate the global search as below.

Remove the CSE site so that we can take results from all of Google.

After deleting the CSE site, record the custom search engine ID.

Thus, we have created a custom search engine for each topic and entity type in all languages and all websites, in all search verticals. Now we need to create a custom search API key on Google Developer Console so that we can use the corresponding CSE on Python.

Using Scripts To Scrape SERPs For Different Results

How to create a Google developer console project?

To use Custom Search API, you need to create a new project in the Google Developer Console and enable the related API within the project, and then you need to create an API key. Below, you will see a step-by-step guide for creating a project and API key within the Google Developer Console with images.

On the Google Developer Console homepage, you will see the "create new project" option. I suggest you give a name that complies with your purpose. In this case, I have chosen the "CSE Advertools" as the name for my project.

In the second step, you need to search for "Custom Search API" in the search bar.

Then, you should enable the API so that you can use it.

Then, we need to create our credentials so that we can use it from outside of the Google Developer Console.

Then, press the "Create credentials" screen and choose the API key option.

And now, we have our API Key.

Now, we can focus on our Python Script. Before continuing, I have two notes for you.

You shouldn't share your API key with someone else, that's why I will also destroy this key at the end of this guideline.

You should know that the custom search API is not completely free. For every day, you have 100 queries right as free, but after that, you need to pay 5 U.S dollars for every 1000 query.

Now, let's copy our API key, and move on.

How to retrieve Google search results with every detail into a DataFrame?

In normal conditions, to scrape the Google search results, we would behave to create our custom scraper with Scrapy or use Selenium for scraping the first 100 results for any query. Also, Advertools has a crawl function that supports custom extraction. It can be used to scrape additional special elements from the SERPs, which are not included in the official API results, for example, the "people also ask" section.

Note: There is a subtle difference between Scraping and Retrieving. Scraping happens on the web page, extracting data via CSS or XPath selectors. Retrieving SERP happens via the official API; that's why it is faster, and also, it doesn't create a burden for the API owner, in this case, Google.

But, in Advertools, most of the things are just a "single line of code."

import advertools as adv
cs_id = "a5f90a3b5a88e0d0d"
cs_key = "AIzaSyCQhrSpIr8LFRPL6LhFfL9K59Gqr0dhK5c"
adv.serp_goog(q='Holistic SEO', gl=['tr', 'fr'],cr=["countryTR"], cx=cs_id, key=cs_key)

At the first line, we have imported the advertools.

At the second line, we have created a variable that stores our custom search engine ID.

At the third line, we have created a variable that stores our custom search API key.

In the fourth line, we have used the "serp_goog()" function with "gl", "cx", "key", "cr" parameters.

Below, you will see our output.

We can assign a function and its output to a variable so that we can check some features about it.

In our first example, we have used the "cr" parameter for only "countryTR" value. It means that we have tried to take only the content that has been hosted on Turkey, or source with a Turkish top level domain. Below, you will see the result differences, when we have removed the "cr" parameter.

Now, we have more relevant results from every website that has been hosted on our planet. While reading the Retrieving Google Search Results Pages with Python Guideline, you should remember this parameter and its meaning. Below, we will check our "serp_goog()" function's output's shape.

df  = adv.serp_goog(q='Holistic SEO', gl=['tr', 'fr'],cr=["countryTR"], cx=cs_id, key=cs_key)
df.shape

We have assigned our function into the "df" variable.

We have used the "shape" method on it.

This means that it has 20 rows and 94 columns. We can check which columns we have to understand what our data frame should mean to us with which context.

from termcolor import colored
for column20, column40, column60, column90  in zip(df.columns[:20], df.columns[20:40],df.columns[40:60],df.columns[60:90]):
    print(colored(f'{column:<12}', "green"), colored(f'{column40:<15}', "yellow"), colored(f'{column60:<32}', "white"),colored(f'{column90:<12}', "red"))

In the first line, I have imported "termcolor" and its function "colored" so that we can color our columns.

I have used a loop with the "zip()" function so that I can use more column segments for the loop.

I have used "for loop" with different variables. I have used "f string" and ">" or "<" signs along with "numbers" that follow them so that I can align the columns and adjust the column's size.

I have used different colors with the color function.

We have columns for Open Graph properties, Twitter card properties, query time, cacheID, title of the snippet, ranking, search terms, and more.

In our first try, we have used two countries within our "gl" parameter. "gl" means "geo-location". We have used "fr" and "tr" values for this parameter which the first one is for France while the latter is for Turkey. And, that's why we have twenty rows.

The first 10 of them is for the "tr" and the last 10 of them is for "fr".

import pandas as pd
pd.set_option("display.max_colwidth",None)
df[['gl', 'rank', 'title', 'link']]

In the first line, I have imported pandas which we will use for manipulating and reshaping our data into whatever form we want.

In the second line, I have changed the "Pandas Data Frame" options so that I can see all of the columns with the full length.

I have filtered the specific columns that I want to see.

You may see the two different search results for the two different countries in the same data frame. If you want you can separate them easily.

df_tr = df[df['gl'] == "tr"]
df_fr = df[df['gl'] == 'fr']

I have created two different data frames and filtered them with the "gl" column. In the "df_tr", there is only the results that are relevant to "Turkey" while "df_fr" includes the results for "France". Below, you will see a view of the "df_tr" data frame with some of the certain columns.

Now, before continuing further, let's create another example and this time, let's record the first 100 results for a new query.

df = adv.serp_goog(q='Who is Abraham Lincoln?', gl=['us'], cr=["countryUS"], cx=cs_id, key=cs_key, start=[1,11,21,31,41,51,61,71,81,91])

In this example, I have searched for the "Who is Abraham Lincoln" in the "United States" while restricting the results for the United States. I also used the "start" parameter that is an important parameter. Advertools navigate all values in the "start" parameter with "for loop". "1" means 1 to num, and "11" means 11 to 11. + num.

Dear Elias Dabbas has put a relevant explanation about the "start" parameter and values for a more precise explanation.

By default, the CSE API returns ten values per request, and this can be modified by using the "num" parameter, which can take values in the range [1, 10]. Setting a value(s) for "start", you will get results starting from that value up to "num" more results.

Each result is then combined into a single data frame.

Below you can see the result of the corresponding function call.

And, let me show you the last 5 results from our search activity for the query of "Who is Abraham Lincoln?"

I didn't even need to write these search result's descriptions or URLs. As you may notice, they are not relevant answers for our query. We have asked "Who is Abraham Lincoln?" not "(what, where or how) is Abraham Lincoln {foundation, park, hospital, tourism agency}"

But, still, the search engine tries to cover different "possibilities" and "probabilities".

With Advertools' "serp_goog()" function, you can try to see where the dominant search intent shifts and where Google starts to think that you might search for the other kinds of relevant entities or possible search intents.

pd.set_option("display.max_rows",None)
df.head(100)

First, we have changed the Pandas' options for the "maximum rows" to be displayed. Then, we have called our first 100 results for the "Who is Abraham Lincoln?" query. Now, you will start to see that at the end of the second page and at the start of the third page, there is a search intent shift for the SERP characteristics.

I hope you can see the image and the details within it. You will see that there is a clear shift after the 17th result. After a point, we see the "Abraham Lincoln Elementary Schools, Presidential Libraries or Portrait Museums…" We also see some web pages that respond to the main search intent, such as "Abraham Lincoln facts' '. But, it doesn't reflect the web page's content. In reality, the web page is about "10 Odd Facts about Abraham Lincoln's Assassination".

So, only its first paragraph is related to our question, and the content focuses on only a single micro-topic about Abraham Lincoln. You may check the URL below.

https://constitutioncenter.org/blog/10-odd-facts-a...

Or, you can try to search for "Who is Abraham Lincoln" to see what I mean. When we move to the results between 28-38, we see that none of the content is actually relevant to our question.

Even if we would find a web page that responds to the main search intent with a high concentration, detail and strong focus, we can see why it might be diluted with other less-relevant pages such as historical data, Source Rank (a term from Google Patents to give a web site trust and expertise for a topic.), references (links, mentions, social engagement), crawlability and more.

So, with Advertools' "serp_goog()" function, retrieving these intent shifts and intent shifting points, seeing all webpages' angles, differences from each other, side-topics or relevant questions is easy. The best section is that you can do this for multiple queries and combinations of queries and different search parameters, all in one go.

How to search for multiple queries for multiple locations at the same time for retrieving results into a Data Frame

To search for multiple queries from different locations, devices for different search verticals with search operators or filtering the results that include only Twitter Cards or "news" sources, we can use Advertools. To examine such a useful function in detail, we need more than 15.000 words probably.

By saying, 15.000 words, I am not joking, I have written a detailed and consolidated article about the Google Knowledge Graph API and Usage of it via Advertools before.

In this article, by respecting Serpstat's editorial guidelines, I will try to cover the most important points in the form of usage style with some important SEO insights. To perform a multi-query search from multi-locations at the same time, you can use the code example below.

df = adv.serp_goog(q=["Who is Abraham Lincoln?", "Who are the rivals of Abraham Lincoln?", "What did Abraham Lincoln do?", "Was Abraham Lincoln a Freemason"], gl=['us'], cr=["countryUS"], cx=cs_id, key=cs_key, start=[1,11,21,31,41,51,61,71,81,91])

With the "q" parameter, we can use a list of queries. In this example, I have used more related questions for Abraham Lincoln and asked some specific questions about his "entity" on the web.

Since we have asked 4 questions to Google, and gave 10 start values, 1 "cr", and 1 "gl", we have "4x10x1x1 queries x 10 results" which equals 400. The result pages' features have determined the column count.

You may see the first 100 columns of our results. When we have a unique SERP snippet from a different type, we also have its attributes in the columns such as "postaladdress", "thumbnail", "theme-color", "twitter cards, "application-name" and more.

And let's check the last 100 columns.

We have "twitter:username", "allow-comment", "article:subsection", "docauthor" and more. Let's see where we have a "docauthor" name.

df[df.docauthor.notna() == True][["docauthor", "htmlTitle", "snippet"]]

I have filtered the "docauthor" column and tried to find a row that is not "NaN". I have found it and then I have filtered the "htmlTitle", "snippet" and the "docauthor". You can see the output below.

We have Chris Togneri as an author for this topic. We also have another column as "bookauthor". Let's use the same methodology to find some book authors.

df[df['books:author'].isna() == False][['books:author', "htmlTitle", "snippet"]]

We have two authors from the "Good Reads", one of them is Doris Kearns Goodwin, the other one is Louis Dale Carman.

One of those books and its author is ranked for the query of "Abraham Lincoln's Rivals" possibly since his book's name is "Team of Rivals: The Political Genius of Abraham Lincoln".

Let's look at them with Advertools.

key = "AIzaSyAPQD4WDYAIkRlPYAdFml3jtUaICW6P9ZE"
adv.knowledge_graph(key=key, query=['Doris Kearns Goodwin', "Louis Dale Carman"])[['result.name', 'result.description', 'resultScore']].style.background_gradient(cmap='Blues')

I covered Advertools' "knowledge_graph()" function with more than 15.000 words, I won't enter the details, don't worry. Below, you will see the output.

And, we have an "American Biographer" with the 1999.135864 result score. Why is this important? In my SEO content creation process, I always use Semantic SEO principles and authoritative sources. If you want to have higher ranks on a topic for a given context, you need reliable sources. You may use Doris Kearns Goodwin and her possibly great book for your content marketing as a source and witness for your expertise.

In short:

Search for a query.

Record all the SERPs with all the possible search engine data columns and ranking signals.

Extract all the entities from these SERP snippets.

Watch the search intent variety and shifts.

Create your own topical graph and topical coverage.

Use the related entities to increase topical authority and expertise on a subject.

As you know me, I always read Bill Slawski and Mordy Oberstein; that's why I also tend to give value to "SEO Theories, Terminology" and their concrete effect on the SERP and SEO. But, I will continue to our latest task, and it's making a comparison for multiple queries and their results.

First, let's learn which domains are top-performing and most-occurred in the Google search results pages for the given four questions.

(df.pivot_table('rank', 'displayLink', 
aggfunc=['count', 'mean']).sort_values([('count', 'rank'), ('mean', 'rank')], 
ascending=[False, True]).assign(coverage=lambda df: df[('count', 'rank')] / len(df)*10).head(10).style.format({("coverage", ''): "{:.1%}", ('mean', 'rank'): '{:.2f}'})).background_gradient(cmap="viridis")

The code above might be seen as a little bit complicated but actually, it is like a sentence in machine language. I have learned this and more Data Science skills from Elias Dabbas' Kaggle profile that's why I also recommend you to check it.

Let's translate the machine language to the human language.

"df.pivot_table('rank', 'displayLink',) means that you get these two columns to aggregate data according to these.
"aggfunc=["count","mean"]) means that you get the "count" and "mean" values for these base columns.
"sort_values([('count', 'rank'), ("mean","rank")]), ascending = [False, True]) means that sort these columns with multiple names with "False" and "True" boolean values. "False" is for the first multi-named column, "True" is for the second one. But, Pandas still shape the entire data frame according to the "count, rank" column since it is superior in order. "Ascending" is for specifying the sorting style.
"Assign" is for creating new columns with custom calculations.
"Lambda" is an anonymous function, we are using a column name in "tuple" because in the "pivot_table" every column is a type of multi index, so we are specifying the column with its multi-index values in the tuple.
"Style.format" is for changing the style of the column's data rows. We are specifying that the column with the name of "coverage, (empty)" should be written with the % sign and with a one "decimal point" while the column with the name of "mean, rank" should be written with two decimal points without a % sign.
"Background_gradient" is for creating a kind of heatmap for our data frame.

So, in short, machines like to talk less with a systematic syntax than humans.

You may see the result below:

Search Results for 29th of November for the given queries.

Below, you will see how the search results change in 49 days and how new domains have appeared for these queries after the December Google Broad Core Algorithm Update.

We see that "nps.gov" has appeared 7 times for these queries and its average rank is 24.57.

Something wrong in the numbers, top-ranked domains and rank > 10?

On the other hand, we can see that "wikipedia" has the best balance, it has lesser results with a better average ranking.

To see Wikipedia's content URLs, you can use the code below.

df[df['displayLink'] == "en.wikipedia.org"]

We see that Wikipedia covers all of the entities, as always it does. From the books, events, places and persons. If we want, we can visualize the best-ranking sites averagely along with "ranking occurrence amount".

df['displayLink'] = df['displayLink'].str.replace("www.", "")
df_average = df.pivot_table("rank", "displayLink", aggfunc=["count", "mean"]).sort_values([('count', 'rank'), ('mean', 'rank')], 
ascending=[False, True])
df_average.sort_values(("count", "rank"), ascending=False)[:10].plot(kind="bar", rot=0, figsize=(25,10), fontsize=10, grid=True, xlabel="Domain Names", title="Average Rank and Rank Amount", position=0.5)

You may see the translation of this machine command to the human language below.

We have changed all the "www." values with "" for creating a better and cleaner view for our visualization example.
We have created a "df_average" variable to create a pivot table based on the "rank" values and "displayLink" data.
We are using the "aggfunc" parameter so that we can aggregate the relevant data according to the "amount" and "average values".
With "sort_values" we are sorting the pivot table according to the "count, rank" column as "ascending=False".
"[:10]" means that it takes only the first ten results.
"Plot" is the method for the Pandas Library that uses the "matplotlib".
I have used the "kind="bar" parameter for creating a barplot.
I have used the "figsize" parameter for determining the figure sizes of the plot.
I have used a "rot" parameter so that the x axis' ticks can be written as horizontally.
I have used the "fontsize" parameter to make letters bigger.
I have used "grid=True" for using a background with grids.
I have used "xlabel" for determining the X-axis' title.
And, position parameter to lay the bars more equally is used.

You may see the result below.

You can't plot averages and counts on the same axis.

With Plotly, you also can create interactive plots to examine them more. Also, this plot has no "multi Y" axes, despite it showing two different values on the same Y-Axis. To fix these two data visualization problems, we can create an interactive and multi Y Axes plot. Below, you will see an interactive and "multi-Y axes" "bar and scatter" plot code block.

The explanation of the code block here is below.

We have imported, "plotly.graph_objects".
We have created a figure with a bar and also a scatter plot with the "go.Figure(data)".
We have used a common X Axis Value which is our "df_average" dataframe's indexes which is equal to the most successful domains for our example queries.
We have arranged the plot's "y axis" and "x axis" names, values, fonts, sizes with the "layout" parameter.
We have adjusted the width and height values with "update_layout" for our plot and called it with "fig.show()".
If you want to write it into an HTML file, you can also use the "fig.write_html" function that I have commented out.

Below, you will see the output.

This data is coming from the 13th December, 2020

We have two values for the "Y" axis, like in Google Search Console's performance report. On the left side, we have "Ranked Query Count," and we have "Average Ranking" on the right side.

We see that Wikipedia, Britannica, Millercenter, and History.com have better rankings averagely, and also "Nps.gov" and "Loc.gov" have more results for these queries, and their average ranking is close to Britannica.

Perform the same progress for multiple queries. You will get your Ranking Tracker for sure. You can schedule a function or code block the run repeatedly per specific timelines with Python's Schedule Library, but this is a topic of another article.

Before moving further, let me show you the same SERP Record after Google's Broad Core Algorithm Update on 3rd December of 2020.

You may compare the three different charts that show the SERP for the same queries with different data.

As you can see, everything has changed. The ranking domain count has been increased, government sites are prominent for these queries while the encyclopedias are losing rankings. We also see new domains on the first 5 results. Isn't it interesting to see?

Let's perform another visualization and see which domain was at which rank before the Core Algorithm Update and how their rankings affected.

You may see our machine command below.

import matplotlib.pyplot as plt 
from matplotlib.cm import tab10
from matplotlib.ticker import EngFormatter
queries = ["Who is Abraham Lincoln?", "Who are the rivals of Abraham Lincoln?", "What did Abraham Lincoln do?", "Was Abraham Lincoln a Freemason"]
fig, ax = plt.subplots(4,1, facecolor='#eeeeee')
fig.set_size_inches(10, 10)
for i in range(4):
    ax[i].set_frame_on(False)
    ax[i].barh((df[df['searchTerms']==queries[i]])['displayLink'][:5], df[df['searchTerms']== queries[i]]['rank'][:5],color=tab10.colors[i+5])
    ax[i].set_title('Ranking Results for Top 5 Domain for ' + "'" + queries[i] + "'", fontsize=15)
    ax[i].tick_params(labelsize=12)
    ax[i].xaxis.set_major_formatter(EngFormatter())
 
plt.tight_layout()
fig.savefig(fname=”serp.png”, dpi=300, quality=100)
plt.show()

And, you can see the translation of the machine command to the human language below.

We have imported Matplotlib Pyplot, "tab10" and "EngFormatter". The first one is for creating the plot, the second one is for changing the colors of the graphics and the last one is for formatting the numbers as "5k, 6k" instead of 5000, 6000 in the graphs.
We have created a list from our "queries" with the name of "queries".
We have created two different variables which are fig, ax and we have assigned "plt.subplots" function's result to them.
We have created a subplot with four rows and changed the "facecolor" of it.
We have changed the figure size with the "set_size_inches".
And, for every query we have chosen, we have started a for loop to put them into our figure with multiple plots.
We have used "barh" method to create horizontal bars.
We have used Pandas Methods to match the "query" and "search term" for filtering our related rows.
We have changed every plot's color with the "tab10.colors" within the "color" parameter.
We have changed the title of every plot with the name of the matching query.
We have changed font sizes with the parameter "fontsize".
We have used "labelsize" with the "tick_params" method to change the size of the tick parameters.
We formatted the xticks so that the relevant data can be displayed more clearly. "EngFormatter ()", which is used to display xticks in different "units of measure" on different subjects, may not work properly at this point as we use very few "queries".
We have used a tight layout with the "tight_layout()".
We have saved our figure into an image with png extension with the "300 DPI" and "maximum resolution" quality.
We have called our figure.

And, you may see the result below. This is the result after the Core Algorithm Update.

As a bar and scatter chart, the situation in the SERP can be seen.

The graph here shows which domain is ranked for which query with a scatter and bar plot. But, since every domain doesn't appear for every query, the order of domains is different. Using multi-plots for different domains makes it harder to follow how a domain switches its position on the SERP according to other queries.

Thus, using "plotly.graph_objects" with a for loop and line chart can help fix this issue better.

fig.data = []
fig = go.Figure()
for i, domain in enumerate(df_graph_first_10["Domain"]):
    filtered_graph = df_graph_first_10[df_graph_first_10["Domain"]==domain]
    fig.add_scatter(x=filtered_graph["Query"], y=filtered_graph["Rank"], name=domain, text=[filtered_graph["Domain"]])
 
fig.update_layout(width=1400, height=800, legend=dict(
    yanchor="top",
    y=1.5,
    xanchor="right",
    x=1.3),
    hovermode="x unified", title="SERP Changes According to Queries per Domain", title_font_family="Open Sans", title_font_size=25)
fig.update_xaxes(title="Queries", title_font={"size":20, "family":"Open Sans"})
fig.update_yaxes(title="Rankings", title_font={"size":20, "family":"Open Sans"})
fig.write_html("serp-check.html")
fig.show()

We have used the "enumerate" function to combine order numbers and domains with each other within a loop. We have filtered the data frame for every domain one by one and then we have added them into a plot as traces at the same time. Some explanations for the code lines above are below.

fig.data=[] is for clearing the figure.

go.Figure() is for creating the figure.

"name" and "text" parameters are for determining the name of the traces and text for the hover.

In the "fig.update_layout()", we are determining the title, title font, legend, and legend position.

We ar eusing "fig.update_xaxes" and "fig.update_yaxes" for determining the axes titles and font styles.

We also used "write_html" for saving the plot into an HTML file.

We also used "unified x" as the hover mode.

As a line, you can see how a domain's ranking changes for which query in an interactive way as below.

And, these were the results before the Core Algorithm Update. As a note and a lesson, I have lost the dataframe from the older SERP record, and in the first plotting, I didn't also use the "Plot Titles" but the order of the queries are the same. So you can compare the ranking results, still. And, always remember to record the data.

(df.to_csv("df_after_core_algorithm_update.csv")

After this fair warning and reminder, here are the older results.

You may see that we have new domains in the first 5 results. Does Google give more weight to the smaller domains or domains with smaller content hubs and coverage? Or are new sites less vulnerable to historical data and trust? Or is this just for informative content? Or is it a general deboost for just Wikipedia-like sites? Or is this just a temporary situation during the Core Algorithm update? We will see, with 4 queries, we can't see all the answers. That's why we can use "query diversification" for more semantic queries and retrieving their SERPs data.

How to refine search queries and use search parameters with Advertools' "serp_goog()" function

These are not search operators.

Before finishing this small guideline, I wanted to show also how to use Search Parameters with the "serp_goog()" function of Advertools. With Advertools, we can use Google search operators with the help of custom Search API Parameters to understand and explore Google's algorithms and SERP's nature. Thanks to search parameters, you can try to understand Google's algorithm in a better and detailed way for different types of queries and also topics. Below, you will find an example.

df = adv.serp_goog(q=["Who is Abraham Lincoln?", "Who are the rivals of Abraham Lincoln?", "What did Abraham Lincoln do?", "Was Abraham Lincoln a Freemason"], gl=['us'], cr=["countryUS"], cx=cs_id, key=cs_key, start=[1,11,21,31,41,51,61,71,81,91], dateRestrict='w5', exactTerms="Mary Todd Lincoln", excludeTerms="kill", hq=["the president","childhood"])

Here, we have used "dateRestrict", "exactTerms", "excludeTerms" and "hq" terms.

"dateRestrict" is for only showing the results and their documents that are created in the last five weeks.
"exactTerms" are for filtering the documents that include the term that we specified.
"excludeTerms" are for excluding the documents that include the term that we specified.
"Hq" is for appending the different terms to the queries we have chosen.

In this example, I have chosen to extract only the documents that have been produced in the last 5 weeks, and include the "Mary Todd Lincoln" who is the wife of Abraham Lincoln, and exclude the term "kill" while appending the "the president" and "childhood" terms to my query group.

What was my angle here? I have tried to extract the latest and updated documents that focus more on Abraham Lincoln's true personality and his family, including his "administration" and also "family life" without the term "kill."

The lesson here is that you can specify different types of query patterns and entity attributes and entity-seeking queries to see which types of content rank higher for which types of entities and their related queries.

For instance, Google might choose to rank higher documents that include Abraham Lincoln's wife's name and her entity attributes along with his administration instead of web pages that solely focus on the assassination. And, now you might see how the results change as below.

Below, you will see the "unfiltered" version of this graph.

To create a more filtered and clean data visualization based on SERP, you can use data filtering with Pandas as below.

df_graph_10 = df_graph[df_graph["Rank"]<=10]

We see that Britannica's articles and some newspapers along with some social media posts and news articles from the last five weeks appear in the search. You can improve your own search and ranking pattern for different topics to see how Google values different pages for different content types and search intent groups. You may see the last example below.

When we choose "hq" for "the president", the results shift toward a more official language and we see results that show Abraham Lincoln's personality and life with letters from his wife and his personal life. When we choose the "childhood" for the "hq", we see that results shift toward his childhood and his efforts for the children.

Until now, we have seen the search and query refinement with Python for retrieving the SERP. Lastly, we will cover the major search operators.

How to use search operators for retrieving SERPs in bulk with Advertools

To use search operators with Advertools' "serp_goog()", we just need to change the "query" directly.

queries = ["Who is Abraham Lincoln? site:wikipedia.org", "Who are the rivals of Abraham Lincoln? -wikipedia.org", "What did Abraham Lincoln do? intitle:President", "Was Abraham Lincoln a Freemason inurl:kill"]
df = adv.serp_goog(q="Who killed Abraham Lincoln site:wikipedia.org", cx=cs_id, key=cs_key)

By changing the queries with the "site", "intitle" and "inurl" search operators, we have more refined our results. Below, you will see each search operator result and their explanation with some extra parameters.

In the first query, we have requested only the results from "wikipedia.org".

df[df["searchTerms"]==queries[0]]["displayLink"].str.contains("wikipedia")

You will see the "boolean" check's output below.

df[df["searchTerms"]==queries[0]][["displayLink","searchTerms", "title", "link"]]

In the second query, we have requested results from any site except the "wikipedia.org".

df[df["searchTerms"]==queries[1]]["displayLink"].str.contains("wikipedia")

df[df["searchTerms"]==queries[1]][["displayLink","searchTerms", "title", "link"]]

In the third query, we have requested every result that has the "President" phrase in their titles.

df[df["searchTerms"]==queries[2]]["title"].str.contains("President", case=False)

In this example, we still have a "False" statement. It can happen because Google changes the visible title on the SERP. But, if we check the actual title of the web page, we will see that it still has the phrase "president" within it. You may see it below.

You may see the result below. All the titles have "president" within it.

In the fourth query, we have requested "kill" in their URL.

df[df["searchTerms"]==queries[3]]["link"].str.contains("kill", case=False)

Again, we have "case=False" parameter and value here because some websites still have upper case letters within their URLs.

And, now we have "kill" words within the URLs. And since the "query" is so specific, we have some irrelevant results such as "How to kill Bees" or "cinema" and "movie" industry-focused results.

Another important note here is that even the slightest differences make a huge difference between versions of SERPs when you use one of these major search operators. For instance, if you try to search the same queries just without the "?" mark, you may understand what I mean here.

So, there are lots of different result characters for the same query and search intent, to satisfy the search engine, examining all these results for the same entity groups with the same query types, excluding and including terms or using "links," "n-gram analysis," "structured data" and more is going to help SEOs to see how to rank at to the first rank with the help of Data Science.

Last thoughts on holistic SEO and data science

About the Holistic SEO: it is in my job description and my company's name along with my official site's name. I believe that coding and marketing are brothers and sisters now. We are at the dawn of a new age that will be shaped by Data and Artificial or Organic Intelligence. Search Engine Optimization has two fundamental columns, which are "satisfying the Search Engine" and "Satisfying the User." Since the "Search Engine" also tries to satisfy the user, you can unite these columns in the shape of helping users in the best possible way while being Search Engine Friendly.

And, Data with Intelligence is your best alliance for this purpose. You can understand the algorithms of search engines with data and your intelligence. And, Holistic SEO is just an acronym for coding and marketing intersection, in my vision.

Speed up your search marketing growth with Serpstat!

Keyword and backlink opportunities, competitors' online strategy, daily rankings and SEO-related issues.

A pack of tools for reducing your time on SEO tasks.

Get free 7-day trial

The opinion of the guest post authors may not coincide with the opinion of the Serpstat editorial staff and specialists.

Rate the article on a five-point scale

The article has already been rated by 14 people on average 4.93 out of 5

Found an error? Select it and press Ctrl + Enter to tell us

Discover More SEO Tools

Backlink Cheсker

Backlinks checking for any site. Increase the power of your backlink profile

API for SEO

Search big data and get results using SEO API

Competitor Website Analytics

Complete analysis of competitors' websites for SEO and PPC

Keyword Rank Checker

Google Keyword Rankings Checker - gain valuable insights into your website's search engine rankings