Converting XML from API calls

Uli Bethke Flexter, XML

Querying the Amazon Product Advertising API with Flexter

Working with APIs that return XML can be a painful process. First of all you need to go through the documentation to figure out how the API works. Then you need to write the code that queries the API and returns the XML. Typically you want to transform the XMLs to a format that can be analysed with SQL. So next you convert the data to a relational format in a database. This last step can be fully automated with Flexter, our ETL tool for XML and JSON. Let’s look at an example to figure out how the whole process works. We will take the popular Amazon Product Advertising API as an example. Flexter works with any API that returns arbitrarily complex XML or JSON as output.

An introduction to the Amazon Product Advertising API

When working with the API, the first thing that you need to know is which Amazon website you wish to target. Not all Amazon websites sell the same kinds of products. If you’re a seller, you’ll really have to target your own locale rather than just the US website (amazon.com). You can find information on what locales are available on this page: AWS Locales. Once on that page, look for Product Advertising API Endpoints and you’ll see a list of locales and their corresponding endpoints. Note that you can only make a request to endpoints where you registered. For example, if you registered as an affiliate on the US website, you’ll only be able to make a request to the https://webservices.amazon.co.uk/onca/xml  endpoint.

A few basic terminologies are listed below:

Operations

Operations are the things you can do with the API. Here are a few examples:

ItemSearch – search for items based on specific parameters such as the category, title, manufacturer, and minimum price. This returns one or more items.

ItemLookup – search for an item based on an identifier such as ASIN or ISBN (for books). This returns only one item.

SimilarityLookup – search for items that are similar to the item that you specified in your request.

Response Groups

Response groups allow you to specify which information about the products you want to include in the response. Note that response groups depend on the operation that you’re using. This means that not all response groups are available to all operations. Here are a few examples of response groups that you can use:

Small – returns basic information about the item. Example data returned include the ASIN, Title, ProductGroup.

OfferSummary – returns the lowest price for each condition type (new item, used item, collectible item, refurbished item).

ItemAttributes – returns all the attributes that an item has. Item attributes depend on the type of the item. For example, a book would have a different set of attributes than computer hardware. However, attributes such as the title or the list price are common to all products.

Locales

As previously discussed, there are a few Amazon locales or marketplaces worldwide, and each API request that you’ll be making needs to target a specific locale. But each locale has different valid values for search indices, browse node IDs, sort values and ItemSearch parameters. For example, the Brazil marketplace only has a few search indices available. But if you look at the Canada Marketplace, the search indices available closely match that of the US Marketplace. You can check out specific locale information on the locale reference page.

Getting An Access Key

In order to make requests to the Product Advertising API, you need to have an access key, secret key, and an affiliate ID. Here are the steps to get you started:

First you’ll need API access —

Register for Amazon Associates (Amazon’s affiliate program)

Register for Amazon API / PAPI (use your existing AWS account if you have one or an AWS account will be created

The detailed process is given below:
Start with signing up for an Amazon Account

Sign up as a Product Advertising API developer based on your locale. Go to this page and scroll down to the bottom to find the URL for your locale. This requires you to enter the following information:
https://lh5.googleusercontent.com/axs0ga-WpFfdNm_ZPBA4k4qlCWyxTjygB45OebG1lebuvgCceq81FNvrb7FIqL-iam0DBZioDZUFUlgE5eLBIOaGj1h_WY-CuxrpfaA8oUvC_j62LRFTQzlgzv1y43OhXhpVcERF
Once you’re done, it should show the following screen:

https://lh4.googleusercontent.com/acL_tS-vWs6ZWPifMbeB_fBnx_wu-kq1BrZYy5DJE2sGS2cbBN0hwd-Cu01hAUaz8IXmTOx7fiJa_u7Ot3kUrwtWwPL_229sQr1ngeG1klXVHHXb_ZWW8JKtgwLspPpN9AJfsvMr

Click on the link for managing your account and it should return the following page:
https://lh4.googleusercontent.com/a6BCMFL8ejdOPa1M0EGgjAaHSugowhRsQXcrbn8FftmlhJVGustN3BHLmMguR8Uzg2Nl1zLKehnt4ZhA_9EvoKgPbCDGyyBLHT444xBv4VVZHiS6he2CWR2KzwW_2E_sgA1gVdaY

Click the link that says AWS Security Credentials Console and you’ll be redirected to the Amazon Web Service console. If this is your first time accessing it, it should show a modal box asking you to continue to security credentials.

Click the Access Keys tab to reveal a list of access keys that you already have. By default, there should be one listed but you won’t really be able to use that since you can’t see the corresponding secret key. If this is a new account you can go ahead and delete that.

https://lh6.googleusercontent.com/970qn4jWqQQh9DK0FI58mbSzV_-5sybRIoUNlSaovooOuIAVzuxTZXT1OU8Xr7vT3PXNqgcyIdX0xuG1j13HAjpGe73KLy_PPpliFlSdS6kF-lUlVH__J7qu3IDBga1G41sgu7Du
Click the Create New Access key button to generate a new access key and secret key combination. Once generated, copy the key combination or download the key file. This is because you won’t be able to see the secret key again on the website.

https://lh3.googleusercontent.com/cVkXTO3EQHcbzPcUHWIKSi4fnQsZm8yWRILDJ22WKwcqrFsc7PofdyhVcJru207GDi5M2CBR3u8A0OwZsa9DoGPAjbhqWP_iwdtDTy1Wm-XFIbFgX35x-ATPIkZJ0cxF5gJAsd-_
Sign up for an Amazon Associates Account based on your locale. This is where they’ll ask the details of your website and the products that you’re selling. It will also ask you to verify your phone number. Once you’re done with all that, you’ll be given a unique associate ID.

Now, we will use Python to generate requests using Amazon Product Advertising API. The Amazon Product Advertising API provides programmatic access to Amazon’s product selection and discovery functionality. It has search and look up capabilities, provides information on products and other features such as Reviews, Similar Products and New and Used listings. python-amazon-product-api offers a lightweight access to the latest version of the API. Let’s setup the Python environment as suggested below

Next, we create an ~/.amazon-product-api (C:\Users\You\.amazon-product-api on Windows) file and add the keys which we received while registering for the AWS account as below

Next, we can use the config file created above by passing its path to the API. If no path was specified, the API looks for configuration files in the following locations and in the following order:

  • /etc/amazon-product-api.cfg for site-wide settings that all users on this machine will use
  • ~/.amazon-product-api for user-specific settings

We can also configure the API at runtime and pass the config values as dict:

We can also declare the below environment variables to permanently store the config values:

AWS_ACCESS_KEY Your AWS access key

AWS_SECRET_ACCESS_KEY Your AWS secret access key

AWS_ASSOCIATE_TAG Your AWS associate ID

AWS_LOCALE Your API locale

Next, let’s take a look at some operations to generate requests and gather response using Amazon's API. All functionality of the Amazon Product Advertising API is provided by operations each of which will accept a number of different parameters both required and optional. A special signed URL has to be constructed from which the result of an operation can be retrieved as a XML document. Building the individual URL can be quite cumbersome when done repeatedly by hand, any operation can thus be called with call( ). As an example, let’s start with the itemlookup operation of the API.

Given an Item identifier, the ItemLookup operation returns some or all of the item attributes, depending on the response group specified in the request. By default, ItemLookup returns an item’s ASIN, Manufacturer, ProductGroup, and Title of the item. ItemLookup supports many response groups. Response groups return product information, called item attributes. Item attributes include product reviews, variations, similar products, pricing, availability, images of products, accessories, and other information.

To look up information on an article, one could for instance call ItemLookup in the following way:

The request generated for the above call is as follows:

The response to this request returns the information associated with ItemId B00008OE6I in an XML format as below

We can easily parse the XML generated using flexter and use it further. Let’s call ItemSearch operation and search using a keyword. Here, we will be using the API.item_search module to perform the similar operation. The ItemSearch operation searches for items on Amazon. The Product Advertising API returns up to ten items per search results page.

An ItemSearch request requires a search index and the value for at least one parameter. For example, you might use the BrowseNode parameter for Harry Potter books and specify the Books search index. There are various parameters associated with the ItemSearch operation as listed below:

Availability - Specifies that the item must be available for purchase. The only valid value for the parameter is "Available".

BrowseNode - Enables you to search a specified browse node for associated items

Condition - Enables you to specify the condition of an item. Valid values are "All", "New", "Used", "Collectible", and "Refurbished". The default is "New". Condition does not restrict the total number of items returned. It does, however, restrict the offers returned to those items that are in the specified condition.

Keywords - A word or phrase (words separated by percent-encoded spaces, %20) used as a search criteria. The titles and descriptions of items are searched for keywords.

MaximumPrice - The maximum price that an item can cost.

MinimumPrice - The minimum price that an item can cost.

Title - Title associated with the item. You can enter all or part of the title. Title searches are a subset of Keyword searches. Use a Keywords search if a Title search does not return the items you want.

We will search for “XML Books” keyword and perform the itemsearch operation using the API.item_search module as given below and parse the response using flexter and store the output in a db.

Once we run the above code, we send a request as per the below URL.

Next, we will use Flexter to parse the output and store the output in the form of tables in a db. Product Advertising API provides schemas for validating the XML in SOAP requests and for specifying item attribute types in responses. You can find the latest Product Advertising API XML schema . Next, we prepare the schema using xsd2er from the AmazonSchema.xsd file from the hyperlink above and with the following command will write the metadata

At the end of the successful XSD analysis process Flexter prints out the ID of the logical schema. This should be noted and used in all the subsequent data extraction processes. Next, we use the generated id and store the data (Response.xml from the python query) in a db. Let’s parse and extract the entire XML content as below

In the above output we can see that the Flexter is writing the tables in the target db as shown in the output. The XML content parsed is stored in the form of tables in the given jdbc path. We can query and check for attributes as below

The above response returns a lot of parameters like number of item results, pages, ASIN, URLs, and item attributes. Next, Let’s take a more detailed look into the ISBN’s we have parsed and get the individual info on them using the ItemLookup operation. We will iterate through each of the given ISBN id’s and generate individual responses for them to create XML’s and parse them using Flexter. Let’s iterate through each ASIN using Python as given below:

Lastly, we have generated the XML’s for all the ISBN from our first step. We will generate Request URL’s for each ISBN as below:

Once we have all the XML responses generated from the above inside a folder, we can store them inside a folder for batch data extraction.Let’s say our XML’s resides in  /home/XMLresponses/ folder

We will parse the  all the XML files using Flexter similarly as we did above in the itemSearch operation response.

Once, we parse the response XML contains :

ASIN - Amazon Standard Identification Number, which is an alphanumeric token assigned by Amazon to an item that uniquely identifies it
Item - Container for information about the item, including ASIN, Title, ProductGroup, and Manufacturer.
ItemAttributes - Container for information about an item, including Title, ProductGroup, and Manufacturer.
Items - Container for one or more Item(s).
Manufacturer - Name of the company that manufactured the item.
ProductGroup - Category of the item, for example, "Book" and "DVD".
Title - Title of the item

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.