How to use IMPORTXML in Google Sheets

If you are someone who has a job related to data scraping from the web and often you have to copy and paste data from different web sources to Google Sheets and currently if you are doing it manually then your life is going to be so much easier after reading this article.

Importing web data in Google Sheets has never been so easy. Hats off to Google Sheets  IMPORTXML function which is probably one of the most useful for pulling data from websites. In this article, we will learn how to get XML data in Google Sheets using the IMPORTXML function.

Let’s first learn the syntax of this very useful function;

=IMPORTXML(“url”, “xpath_query”)

To use this function properly we need to understand the parameters of the functions. It requires only two parameters and the details of the parameters as follows.

url: 

This is the string that specifies the website’s URL from which you wish to import the data. 

xpath_query: 

The second parameter is also a string that specifies the nature of data that we wish to import. XPath query works with the structured data. To write XPath queries properly you should know the basic information about XPath and which you can learn from XPath tutorial by W3Schools web page.

Common XPath queries include:

//h2 – Scrapes all H2 headings

//title – Scrapes all titles

//@href – Scrapes all links

//tr – Scrapes all tables

Let’s see how we can scrape data from a Wikipedia page i.e., https://en.wikipedia.org/wiki/Fish_farming

We’ll first use common queries and scrape basic data and then we’ll go for a specific table in the page.

Step 1 – Scrape all H2 Headings

=IMPORTXML(“https://en.wikipedia.org/wiki/Fish_farming”, “//h2”)

  • This will import all H2 headings from the webpage as shown above.

Step 2 – Scrape all Links on the webpage

=IMPORTXML(“https://en.wikipedia.org/wiki/Fish_farming”, “//@ahref”)

  • This will import all links from the web page in Google Sheets as shown above.

Import a specific table from webpage

To import a specific table from the webpage we need to follow different procedure as we have to know the full XPath of that element that we wish to extract from the webpage. We’ll learn these steps in the forthcoming sections.

Step 1 – Find the XPath of the table element

  • On the desired webpage, right click on the table element that you wish to import and click on the “Inspect” option if you are using Google Chrome or Microsoft Edge and “Inspect element or CTRL+SHIFT+C if you are using Opera.

  • A new window will open on the right side, inside the same page. XML text related to the table will be automatically highlighted in that window. 

  • If you move the mouse over the code in the right-side window, you will see that respective elements from the webpage also get highlighted. So, move the mouse until you find the text <table class = and the whole table gets highlighted in the right-side window.

  • Now right click on that text and inside Copy options select Copy full XPath.

Step 2 – Use XPath in IMPORTXML to import table data

  • After copying the XPath from the web page will now use it inside IMPORTXML formula as follows;

=IMPORTXML(“https://en.wikipedia.org/wiki/Fish_farming”,“/html/body/div[3]/div[3]/div[5]/div[1]/table[1]//tr”)

  • The XPath copied from the webpage’s XML code was up to /table[1] only. However, to get the complete contents of the table we added //tr as well to get all the rows of this specific table as shown above.

So this is how we can use the IMPORTXML function in Google Sheets to import XML data from various webpages.