Data scraping is a phrase that still leaves a bad taste in some peoples’ mouths. But the reality is this: search engines, eCommerce companies, search engines, government agencies, and the Fortune 500 are all conducting data scraping all the time. If you’re not doing the same, you’re simply not competing. A good scraping tool is becoming part of the table stakes for conducting business online.
This is especially true in the eCommerce industry, which relies on the use of high-quality price scrapers to conduct competitive analysis. In this article, we will explain how to use a cloud-based scraper to scrape data in a JSON file, and how to then import that JSON file to an SQL server.
Using A Scraper
Once you’ve found a good price scraper that’s not too expensive - we recommend Scraping Robot for its easy-to-use interface and because you get 5000 free scrapes each month - they should provide you with an option to download price data in the popular JSON format.
Here’s an example of a price scraping JSON result (generated by using Scraping Robot to collect data from Amazon.) The JSON format is portable, parseable, and simple - not a bad way to share simple data. But as we all know, data really belongs in a database, where we can use the powerful SQL language to manipulate and analyze it.
Step 1: Using OPENROWSET
The most common method of importing JSON to SQL Server is to use the table value function OPENROWSET. OPENROWSET can be used to bulk import any kind of text file - JSON format or otherwise.
You’ll need to give OPENROWSET two inputs. The first is the actual location of your file and the second is the type of data you’re importing. Since your JSON file is being treated as a large text string for the purposes of this method, use SINGLE_CLOB for the second argument. CLOB stands for Character Large Object and means that OPENROWSET will read the file as a VARCHAR (MAX) data type.
OPENROWSET returns a single string field with BulkColumn as its column name. Let’s see this in action:
DECLARE @JSON VARCHAR(MAX)
SELECT @JSON = BulkColumn
FROM OPENROWSET (BULK 'D:/location-of/your-price-scraping-data.json', SINGLE_CLOB)
Note that we also used the "BULK" option with OPENROWSET. This is necessary to enable data from a file to be read and returned as a rowset.
"Date Scraped": "Tue Feb 09 2021",
Step 2: Parsing The TextSo now you’ve got your essential price data imported as a string. But you don’t want a string. You want actual data objects that you can manipulate in SQL.
Typically the best way to parse your string is with the OPENJSON function. OPENJSON is designed to work with OPENROWSET - it accepts a string of JSON as an argument and returns a dataset formatted as either:
- The key:value pairs of the first level elements in the JSON.
- All elements with their indexes
If you’re analyzing price scraping data, you’ll want the latter. This will require the use of the With function. Add this next section to your query from earlier.
Select * FROM OPENJSON(@JSON, '$.Price')
With (type varchar(50),id int,
DateScraped varchar(4000) '$.DateScraped.self',
ASIN varchar(4000) '$.ASIN.self',
Price varchar(50) '$.Price.self'
) as Dataset;
And voila! Assuming your data looks like ours, you now have an SQL dataset with the prices of several Amazon products and their ASINs. Naturally, you'll need to play around with the variables a bit depending on what your exact output looks like. But as you can see, price scrapers make it easy to get the data you need and feed it right into SQL - and when you're using a scraper that has a built-in API, it becomes even easier.
Thank you for reading, pls keep visiting this blog and share this in your network. Also, I would love to hear your opinions down in the comments.
PS: If you found this content valuable and want to thank me? 👳 Buy Me a Coffee