Skip to content
Advertisement
  1. SEJ
  2.  ⋅ 
  3. SEO

An Introduction To Python & Machine Learning For Technical SEO

Python is used to power platforms, perform data analysis, and run their machine learning models. Get started with Python for technical SEO.

Since I first started talking about how Python is being used in the SEO space two years ago, it has gained even more popularity and a lot of people have started to utilize and see the benefits of using it in their day-to-day roles.

It’s really exciting to see so many SEOs share their experiences, the cool scripts they have written, and the impact it has had on their jobs.

It wouldn’t be right for me to publish this without mentioning the impact that Hamlet Batista had on me and so many other people. He loved seeing people learn and use Python.

I know he would be so proud to see so many people sharing their journey of learning Python, and all of the amazing scripts that people have written.

What Is Python?

In short, Python is an open-source, object-oriented interactive programming language that is interpreted line by line.

With simple and easy to learn syntax, as well as advanced readability and support for several modules and libraries, Python is well-loved due to the increased productivity it provides.

As a testament to this, Python is used by some of the biggest organizations in the world to power their platforms, perform data analysis, and run their machine learning models.

Companies including Google, YouTube, Netflix, NASA, Spotify, and IBM have publicly stated Python has been an important part of their growth, due to its simplicity, speed, and scalability.

In fact, Google’s first web-crawler was actually written in Python and it remains one of their official server-side languages.

How To Run Python

You can run Python scripts in several ways, depending on what works best for you.

Most systems come with Python already installed, this will more than likely be Python 3, but you can find out which version you have by typing python –version in your terminal.

If you have Python 2 installed, you can update this to Python version 3 by downloading Python 3 from the Python website as Python 2 was officially deprecated in 2020 and there are some syntax differences between the two, so it is best to ensure you use Python 3.

You can run Python from your terminal or command line IDE (Integrated Development Environment), as well as desktop-based platforms including Pycharm or VSCode. Alternatively, you can use cloud-based alternatives including:

  • Google Colab
  • Jupyter Notebooks

These provide an easier experience for beginners to learn and test elements of code line by line, as well as to share and collaborate with your team.

How To Learn Python

There are several online tools available for learning Python, and the best method depends on your own learning style. For example, if you are a visual learner and enjoy following along to video coding, then freeCodeCamp is a great place to start.

If you work better with a more project-structured learning style then Codecademy and Sololearn are great places to try out. These websites also provide a way to track your learning and start a project portfolio.

Some sites gamify the learning journey, such as CodeCombat and Checkio, these provide a great way to build a habit of coding each day, in a fun way.

If you prefer to code along with an instructor in real-time and identify as a woman or non-binary, then you can also sign up for a free 8-week course with Code First Girls (disclaimer, I work for Code First Girls).

Once you feel comfortable with the fundamentals of Python, the best thing to do is start working on projects, either creating your own, or building upon one of the many scripts that have been shared in the Python community.

These projects don’t necessarily need to be related to SEO, but it can sometimes be useful to have practical examples to use when working on projects.

If you’re interested in the data analysis side of Python, then it’s definitely worth checking out and using the free datasets available on Kaggle.

Python Libraries

The main power of Python is in its libraries, which enable several extra functions including:

  • Data extraction.
  • Analysis and preparation.
  • Scientific computing.
  • Natural language processing.
  • Machine learning.

Some useful libraries for tasks involving data analysis and automation in SEO include:

  • Pandas: Used for data manipulation and analysis.
  • NumPy: Useful for scientific computing.
  • SciPy: Used for scientific and technical computing.
  • SciKit Learn: Machine learning for data mining and analysis.
  • Pandas: Used for data manipulation and analysis.
  • SpaCy: A great natural language processing library.
  • Requests: A library for making HTTP requests.
  • Beautiful Soup: Used to extract data from HTML and XML files.
  • Matplotlib: For creating visualizations from data.

Why Python Is Popular With SEOs

While having an understanding of the languages which power the websites we work on (such as HTML, CSS, and JavaScript) is important, Python provides many automation opportunities for low-level tasks which we would usually spend several hours undertaking.

Python empowers SEO professionals in several ways as it not only enables us to automate repetitive tasks but also to extract and analyze large data sets.

The amount of data marketers work with is only increasing, so being able to efficiently analyze this will help to solve many complex problems in a shorter amount of time.

This in turn saves valuable time and allows us to be more efficient in undertaking other important SEO tasks. These factors combined have led to a growth in the popularity of Python amongst SEO professionals.

The ability to better understand data will not only help us do our jobs better but will also allow us to make data-driven decisions.

These decisions will then enable us to provide concrete insights for our clients and stakeholders and have more confidence in the recommendations we implement.

The Benefits Of Automating With Python

While Python will not be able to imitate human, emotion-led strategies, Python scripts can be used to automate a large number of time-consuming tasks.

This list of tasks you can automate with Python is growing continuously but includes:

  • Identifying user intent.
  • Mapping URLs ahead of a migration.
  • Internal link analysis.
  • Performing keyword research.
  • Optimizing images.
  • Scraping websites.

How To Add Python To Your SEO Workflow

The best way to add Python into your workflow is to start thinking about what can be automated, particularly tedious, time-consuming tasks.

Alternatively, think of ways you can more efficiently deal with and make conclusions from the data you have available to you.

A great way to get started is to play around with the data from your website that you already have access to, for example from a site crawl or your analytics tool.

Don’t be afraid to take inspiration from other people’s scripts, play around and even break something when learning, as this is often the best way to learn.

Finding the cause of an issue and ways to fix it is a big part of what we do as SEOs, and it’s really the same when learning and using Python.

There are also so many useful articles from other SEOs who have shared practical examples of how they are using Python for SEO-related tasks. I would recommend checking out SEO Pythonistas to explore some of these.

Example Ways To Use Python

Ready to get started with Python?

Here are a few useful scripts which I have found useful for numerous tasks, along with a brief description of how each one works and the challenges they solve.

Redirect Relevancy

The first practical way you can use Python is to identify if the redirect mapping that has been implemented for a migration is accurate, by creating a redirect relevancy script.

This involves taking a crawl of your site pre and post-migration and segmenting the different categories based on their URL structure.

You can then use some of Python’s built-in comparison operators to determine if the folder and depth of each page have stayed the same or changed following the migration.

The script will take each of your URLs and compare them pre and post-migration to identify if they are the same and the results will output to a new table that will state True if they are the same, or False if they have changed.

You can also use the Python library Pandas to create a pivot table that can display a count of how many URLs for each category match and how many have changed.

This will enable you to investigate any categories or URLs which don’t match and review the redirect rules that have been set up.

Redirect Relevancy Screenshot from Python Library Pandas, December 2021

Internal Link Analysis

Another practical script that uses crawl data is using Python to perform internal link analysis.

This will allow you to identify the sections of your site that have the most internal links, as well as discover opportunities to improve internal linking for different sections.

This will again use segmentation to determine the different categories of the URLs and pivot tables to export a count of the number of internal links to each category on the site.

Internal Link AnalysisScreenshot from Python Library Pandas, December 2021

Image Captioning With Pythia

This is the first script that introduced me to the language and the one that kick-started my desire to learn.

Using Pythia, which is a modular deep learning framework created by Facebook, this script generates a caption for an image URL.

This caption can then be used for images currently missing alt tags, which are important for accessibility and image search.

The script is based upon the bottom-up and top-down mechanism, which calculates results by focusing attention on different elements within an image.

Image CaptioningScreenshot from Pythia, December 2021

For each word generated, attention is weighted to individual pixels within the image, outlining the region with the maximum attention.

The ease of this script is because it can be run straight from Google Colab and requires no direct coding.

Once a copy of the necessary code is saved to your personal Google Colab drive, all cells can be run, performing each step for you.

This will download the data sources needed to run the process, as well as automatically complete all of the steps that would typically need to be undertaken manually.

For example, all libraries will be installed, classes will be created and functions assigned.

Pythia CaptioningScreenshot from Google Colab notebook, December 2021

This will generate an area to add in your image URL and a button to caption the image.

Generating a captionScreenshot from Google Colab notebook, December 2021

A caption will then be provided for each image, which can be directly used as an alt tag or to inspire the creation of one.

Google Colab notebookScreenshot from Google Colab notebook, December 2021

Hamlet has written a comprehensive guide to generate text from images with Python which shows this script in action.

APIs

Python is also great to use with APIs, for example, Google’s Page Speed Insights API. This will allow you to measure key performance metrics at scale, saving you time from having to test each URL.

Using a CSV file with all of the URLs you want to test, you can run each through the API and create a response object to hold all of the metrics for each URL.

You can then extract the specific metrics, for example, LCP, CLS, and FID, and generate a table displaying these metrics for each URL.

You can also extract other useful things from the API including layout shifting elements for each page, the largest contentful paint element, and a list of all third-party blocking tags or unused CSS and JS files on each page.

Page Speed APIScreenshot from Google’s Page Speed Insights API, December 2021

Other Possibilities

These examples are just scratching the surface, there are many more automation and optimization possibilities using Python scripts, including:

  • Optimizing images.
  • Merging datasets to form even stronger conclusions.
  • Hreflang validation.
  • Keyword growth calculation.
  • Collecting GSC data.
  • Performing competitor analysis.

Powering Machine Learning

Python is also a popular language used to power machine learning applications due to its simple, intuitive, and accessible syntax.

In addition, there are a large number of useful libraries which are helpful when working with and training machine learning models.

What Is Machine Learning?

Machine learning is essentially “an application of artificial intelligence that provides systems with the ability to automatically learn and improve from experience, without the need to be explicitly programmed” (a full definition can be found here).

Machine learning is often used to identify patterns in data, upon which predictions can then be made.

There are two main types of machine learning, the first is supervised learning which is trained on labeled data, where a training set has input with the desired output.

The learning algorithm is therefore already given the answer when reading the data. The correct outcome for each data point is explicitly labeled when training the model.

Whereas unsupervised learning is trained using information that is not labeled so it allows the algorithm to act on the information without guidance. This is often used to test the capabilities of the system or when you do not have pre-labeled data.

Python & Machine Learning

Run in conjunction with machine learning, Python can be used to power scripts for training a dataset, before it summarizes and visualizes the data.

From here, the model will evaluate the algorithms to enable predictions to be made.

Real-World Machine Learning Examples

The use of machine learning on the web is increasing all the time, with new models being created and training data becoming more accessible daily. In some cases, we are also being used to help train them.

Some real-world machine learning examples include:

  • Google’s RankBrain algorithm.
  • Baidu’s Deep Voice program.
  • Twitter’s curated timelines.
  • Netflix and Spotify recommendations.
  • Salesforce’s Einstein feature.

SEO Possibilities With Machine Learning

Due to their ability to solve complex problems, it is no surprise that machine learning models are being used to help make marketers’ lives easier.

As Britney Muller says:

“Machine Learning is becoming more accessible and will free us up to work on higher-level strategy.”

This will enable you to spend more time finding solutions, rather than just identifying problems.

Some examples of machine learning models used in SEO include:

  • Content quality evaluation.
  • Identifying keyword gaps and opportunities.
  • Gaining insights into user engagement.
  • Optimizing title tags.
  • Automating meta description creation.
  • Transcribing audio.

Here are some examples of Machine Learning that are being used for SEO tasks, which you may have even come across.

Predictive Prefetching

Based on user navigation patterns from website analytics, tools such as guess.js build machine learning models that can predict which pages users are most likely to visit next and prefetch the resources that will need loading.

Other examples of this in practice include predicting the next piece of content a user is likely to want to view and adjusting user experience to account for this.

As well as predicting widgets that a user is likely to interact with and tailoring a more custom experience with this in mind.

Internal Linking

There are two different ways machine learning can help with internal linking.

The first is to update broken links, this can be done by crawling to identify broken internal links, then using an algorithm to suggest the most accurate replacement page and replacing broken internal links.

The other is suggesting relevant internal linking based on big data. These tools use algorithms that are fine-tuned to constantly acquire new information so that they can suggest more internal links after some time.

They also start suggesting relevant internal links as an article is being written.

Content Quality

The next example is improving content quality by predicting what users and search engines would prefer. You can do this by building a model that generates insights on the factors that are most important.

These factors can include things such as search volume and traffic, conversion rate, internal links, bounce rate, time on page, and word count.

You will then use those important factors to train a machine learning model, which generates a content quality score for each page.

User Experience

Machine learning is also being used to help improve user experience, and there are many examples of how this is being used, for example, Instagram uses sentiment analysis to identify and address bullying language.

Twitter also uses it for image cropping, to ensure they crop images to display the most important part, for example, to focus on the text.

Twitter Image CroppingScreenshot from Twitter, December 2021

The text for these images is in different places on each, but Twitter crops them to display the text in the preview. This machine learning model was trained on thousands of images, and started like this, before being able to identify the most important part of the image.

Twitter Image CroppingScreenshot from Twitter, December 2021

Computer vision is also being used to help with user experience, by automatically identifying what is in an image, to make images accessible by explaining to users what an image is.

Conclusion

I hope this has inspired you to start learning Python and explore how it can help you with automating tasks and analyzing complex data to increase your efficiency.

As a final note, please remember that you don’t need to learn Python to be a good SEO, but if you’re intrigued or interested then I hope you have fun learning and putting into practice some Python scripts into your workflow.

Python Contributions From The SEO Industry

To continue to honor Hamlet’s passion for encouraging and celebrating others, I wanted to share some of the amazing things shared by the SEO community this year.

Moshe Ma-yafit wrote a cool script on how to detect competitors’ price changes with Python & send email alerts. You can find an article explaining this together with a Github repository.

Lazarina Stoy has a script for generating meta descriptions as well as a guide to using Pytrends with Python.

Francis Angelo Reyes has written a script for a simple redirect mapping tool in Python. It goes through each URL and finds its match. The app is also in the article so you can try it there!

Yaniss Illoul has worked on a Broken Links Finder in Python. As well as a tool to capture keywords rankings across multiple domains.

Danielle Rohe shared a script to download all sitemaps within a sitemap index as well as loop through each and extract all URLs into a CSV file.

Muhammad Hammad has built a really cool script for NLP and content analysis of SERPs.

Charley Warginer has also shared some awesome scripts this year, including one to generate FAQs for your pages automatically, the BERT Keyword Extractor, and a Keyword Clustering app.

More resources:

  • How to Use Python to Analyze SEO Data: A Reference Guide
  • Python SEO Script: Top Keyword Opportunities Within Striking Distance
  • Advanced Technical SEO: A Complete Guide

Featured Image: fatmawati achmad zaenuri/Shutterstock

Category SEO
ADVERTISEMENT
Ruth Everett
Read Full Bio
VIP CONTRIBUTOR Ruth Everett SEO Testing Consultant at SearchPilot

Ruth is a SEO Testing Consultant at SearchPilot, an SEO A/B testing platform and meta-CMS enabling rapid SEO changes for ...

An Introduction To Python & Machine Learning For Technical SEO

Subscribe To Our Newsletter.

Conquer your day with daily search marketing news.

天下网TXWEB河源市seo按天扣费梅州网站开发哪家好黄石网站建设公司六安市网站改版推荐邵阳seo按天计费昭通市优化报价汕头网页制作孝感网站搭建哪家好吉林seo哪家专业厦门网站开发哪家专业常德网站改版推荐黄冈阿里店铺运营价格邢台seo按天计费哪家好安顺市关键词排名哪家专业贵阳市网站开发推荐开封市关键词排名多少钱三明企业网站设计报价荆门市网站搭建价格广元市建网站价格鞍山市网站定制公司内江市网站定制公司雅安市建站多少钱河池市关键词排名推荐本溪市网络推广价格衡水市阿里店铺运营哪家好庆阳网站开发报价湖州网站改版推荐烟台网站开发哪家好阜阳建站哪家好铁岭市网站推广推荐香港通过《维护国家安全条例》两大学生合买彩票中奖一人不认账让美丽中国“从细节出发”19岁小伙救下5人后溺亡 多方发声卫健委通报少年有偿捐血浆16次猝死汪小菲曝离婚始末何赛飞追着代拍打雅江山火三名扑火人员牺牲系谣言男子被猫抓伤后确诊“猫抓病”周杰伦一审败诉网易中国拥有亿元资产的家庭达13.3万户315晚会后胖东来又人满为患了高校汽车撞人致3死16伤 司机系学生张家界的山上“长”满了韩国人?张立群任西安交通大学校长手机成瘾是影响睡眠质量重要因素网友洛杉矶偶遇贾玲“重生之我在北大当嫡校长”单亲妈妈陷入热恋 14岁儿子报警倪萍分享减重40斤方法杨倩无缘巴黎奥运考生莫言也上北大硕士复试名单了许家印被限制高消费奥巴马现身唐宁街 黑色着装引猜测专访95后高颜值猪保姆男孩8年未见母亲被告知被遗忘七年后宇文玥被薅头发捞上岸郑州一火锅店爆改成麻辣烫店西双版纳热带植物园回应蜉蝣大爆发沉迷短剧的人就像掉进了杀猪盘当地回应沈阳致3死车祸车主疑毒驾开除党籍5年后 原水城县长再被查凯特王妃现身!外出购物视频曝光初中生遭15人围殴自卫刺伤3人判无罪事业单位女子向同事水杯投不明物质男子被流浪猫绊倒 投喂者赔24万外国人感慨凌晨的中国很安全路边卖淀粉肠阿姨主动出示声明书胖东来员工每周单休无小长假王树国卸任西安交大校长 师生送别小米汽车超级工厂正式揭幕黑马情侣提车了妈妈回应孩子在校撞护栏坠楼校方回应护栏损坏小学生课间坠楼房客欠租失踪 房东直发愁专家建议不必谈骨泥色变老人退休金被冒领16年 金额超20万西藏招商引资投资者子女可当地高考特朗普无法缴纳4.54亿美元罚金浙江一高校内汽车冲撞行人 多人受伤

天下网TXWEB XML地图 TXT地图 虚拟主机 SEO 网站制作 网站优化