|SEO||– 11 min read –||September 6, 2018|
Using Scripts To Scrape SERPs For Different Results
You're in the middle of a project for yourself or a client. To successfully hit your deadline and meet your targets, you need access to detailed information that is housed deep in the underbelly of the search engine results page. But there's a problem…
For whatever reason (likely a draconian line of code in Google's latest update) you are unable to access or export the information you need.
If you're a seasoned SEO or webmaster, then I can all but guarantee that the above scenario has probably happened to you at least once.
And today, I'm going to show you how you can use Python to scrape the SERPs (or your website) for different results so that you can get the data you need… In 15 minutes or less.
Sound like a plan? Then let's get to it.
Why Scrape Using Python?
With the proliferation of black hat SEOs attempting to game the algorithm and pull one over on the system, big search engines like Google don't really have any other choice. But that doesn't make the state of search engines metrics in 2018 any easier to live with.
If you want to uncover key metrics behind your website's SEO performance (that aren't available through analytics), scan for security weaknesses, or gain competitive intelligence... the data is there for the taking. But you have to know how to get it.
Sure, you could drop $297/month on an expensive piece of software (that will become entirely obsolete within the next 12 months) or you could just use the strategy I'm about to share with you to find all of the data you need in a matter of minutes.
To do this, we'll be using a coding language called Python to scrape Google's SERPs so that you can quickly and easily gain access to the information you need without the hassle.
Let me show you how.
Specifically, you'll need:
- Google Chrome (duh!)
For those of you who are lazy, er… prefer to work smart, I recommend that you simply uninstall your existing Python distribution and download this one from Anaconda. It comes prepackaged with the Panda library (among many others) making it an incredibly versatile and robust distribution package.
If you're happy with your existing python distribution, then simply plug the following lines of code into your terminal.
To Get Panda/ To get Splinter/ To get Splinter with Anaconda:
Step #1: Setup Your Libraries and Browser-ups in Links
Step #2: Exploring the Website
Luckily, this process is pretty straightforward.
All you're going to do is:
Next, you're going to right click on the HTML element and select "Copy" > "Copy XPath".
And boom! You're ready to go.
Step #3 Control the Website
Next, we'll want to gain navigation of the individual HTML element by using the following line of code:
Step #4: Sit Back and Watch the Magic Happen (Scrape Time!)
Notice how each search result is contained within an h3 title tag with a class "r". It's also important to remember that the title and the link we want are both stored within an a-tag.
Ok, so the XPath of the highlighted tag is //*[@id="rso"]/div/div/div/div/div/h3/a.
But it's only the first link. And much like the iconic rock band Queen, we want it all.
To get all of the links from the SERPs, we're going to shift things around a bit to ensure that our find_by_xpath code returns all of the results from the page.
Now, it's time to sit back, let the magic happen.
To extract the title and link for each search result, all you need to do is insert the following line of code into your Python terminal.
To export all of the data to csv you can simply use panda's dataframe using the following 2 lines of code:
Pretty cool, huh?
And that ladies and gentlemen, is how you scrape the Google SERPs using Python.
By following the simple framework I outlined today, you can quickly and easily gain access to just about any information you need with a few lines of code and the click of a button (or mouse as it were).
So go out and scrape away!
Did you find this article helpful? Do you have any tips or tricks for improving your website scraping abilities? Comments, questions, concerns? Feel free to drop us a line below and let us know!