Querying the Amazon Product Advertising API with Flexter
Working with APIs that return XML can be a painful process. First of all you need to go through the documentation to figure out how the API works. Then you need to write the code that queries the API and returns the XML. Typically you want to transform the XMLs to a format that can be analysed with SQL. So next you convert the data to a relational format in a database. This last step can be fully automated with Flexter, our ETL tool for XML and JSON. Let’s look at an example to figure out how the whole process works. We will take the popular Amazon Product Advertising API as an example. Flexter works with any API that returns arbitrarily complex XML or JSON as output.
An introduction to the Amazon Product Advertising API
When working with the API, the first thing that you need to know is which Amazon website you wish to target. Not all Amazon websites sell the same kinds of products. If you’re a seller, you’ll really have to target your own locale rather than just the US website (amazon.com). You can find information on what locales are available on this page: AWS Locales. Once on that page, look for Product Advertising API Endpoints and you’ll see a list of locales and their corresponding endpoints. Note that you can only make a request to endpoints where you registered. For example, if you registered as an affiliate on the US website, you’ll only be able to make a request to the https://webservices.amazon.co.uk/onca/xml endpoint.
A few basic terminologies are listed below:
Operations
Operations are the things you can do with the API. Here are a few examples:
ItemSearch – search for items based on specific parameters such as the category, title, manufacturer, and minimum price. This returns one or more items.
ItemLookup – search for an item based on an identifier such as ASIN or ISBN (for books). This returns only one item.
SimilarityLookup – search for items that are similar to the item that you specified in your request.
Response Groups
Response groups allow you to specify which information about the products you want to include in the response. Note that response groups depend on the operation that you’re using. This means that not all response groups are available to all operations. Here are a few examples of response groups that you can use:
Small – returns basic information about the item. Example data returned include the ASIN, Title, ProductGroup.
OfferSummary – returns the lowest price for each condition type (new item, used item, collectible item, refurbished item).
ItemAttributes – returns all the attributes that an item has. Item attributes depend on the type of the item. For example, a book would have a different set of attributes than computer hardware. However, attributes such as the title or the list price are common to all products.
Locales
As previously discussed, there are a few Amazon locales or marketplaces worldwide, and each API request that you’ll be making needs to target a specific locale. But each locale has different valid values for search indices, browse node IDs, sort values and ItemSearch parameters. For example, the Brazil marketplace only has a few search indices available. But if you look at the Canada Marketplace, the search indices available closely match that of the US Marketplace. You can check out specific locale information on the locale reference page.
Getting An Access Key
In order to make requests to the Product Advertising API, you need to have an access key, secret key, and an affiliate ID. Here are the steps to get you started:
First you’ll need API access —
Register for Amazon Associates (Amazon’s affiliate program)
Register for Amazon API / PAPI (use your existing AWS account if you have one or an AWS account will be created
The detailed process is given below:
Start with signing up for an Amazon Account
Sign up as a Product Advertising API developer based on your locale. Go to this page and scroll down to the bottom to find the URL for your locale. This requires you to enter the following information:
Once you’re done, it should show the following screen:
Click on the link for managing your account and it should return the following page:
Click the link that says AWS Security Credentials Console and you’ll be redirected to the Amazon Web Service console. If this is your first time accessing it, it should show a modal box asking you to continue to security credentials.
Click the Access Keys tab to reveal a list of access keys that you already have. By default, there should be one listed but you won’t really be able to use that since you can’t see the corresponding secret key. If this is a new account you can go ahead and delete that.
[flexter_button]
Click the Create New Access key button to generate a new access key and secret key combination. Once generated, copy the key combination or download the key file. This is because you won’t be able to see the secret key again on the website.
Sign up for an Amazon Associates Account based on your locale. This is where they’ll ask the details of your website and the products that you’re selling. It will also ask you to verify your phone number. Once you’re done with all that, you’ll be given a unique associate ID.
Now, we will use Python to generate requests using Amazon Product Advertising API. The Amazon Product Advertising API provides programmatic access to Amazon’s product selection and discovery functionality. It has search and look up capabilities, provides information on products and other features such as Reviews, Similar Products and New and Used listings. python-amazon-product-api offers a lightweight access to the latest version of the API. Let’s setup the Python environment as suggested below
1 2 |
# Getting python amazon api bindings using “pip” pip install python-amazon-product-api |
Next, we create an ~/.amazon-product-api (C:\Users\You\.amazon-product-api on Windows) file and add the keys which we received while registering for the AWS account as below
1 2 3 4 5 6 7 |
[Credentials] access_key = <AWS access key> secret_key = <AWS secret access key> associate_tag = <AWS associate user id> # Next, we can pass the path to the API import amazonproduct api = amazonproduct.API(cfg='~/.amazon-product-api') |
Next, we can use the config file created above by passing its path to the API. If no path was specified, the API looks for configuration files in the following locations and in the following order:
- /etc/amazon-product-api.cfg for site-wide settings that all users on this machine will use
- ~/.amazon-product-api for user-specific settings
We can also configure the API at runtime and pass the config values as dict:
1 2 3 4 5 6 7 8 9 |
# Configuration values as a python dictionary import amazonproduct config = { 'access_key': 'ABCDXXXX4X', 'secret_key': 'Ydjkei78XXXXXieAHDJWE3134', 'associate_tag': 'flexter-01', 'locale': 'uk' } api = amazonproduct.API(cfg=config) |
We can also declare the below environment variables to permanently store the config values:
AWS_ACCESS_KEY Your AWS access key
AWS_SECRET_ACCESS_KEY Your AWS secret access key
AWS_ASSOCIATE_TAG Your AWS associate ID
AWS_LOCALE Your API locale
Next, let’s take a look at some operations to generate requests and gather response using Amazon’s API. All functionality of the Amazon Product Advertising API is provided by operations each of which will accept a number of different parameters both required and optional. A special signed URL has to be constructed from which the result of an operation can be retrieved as a XML document. Building the individual URL can be quite cumbersome when done repeatedly by hand, any operation can thus be called with call( ). As an example, let’s start with the itemlookup operation of the API.
Given an Item identifier, the ItemLookup operation returns some or all of the item attributes, depending on the response group specified in the request. By default, ItemLookup returns an item’s ASIN, Manufacturer, ProductGroup, and Title of the item. ItemLookup supports many response groups. Response groups return product information, called item attributes. Item attributes include product reviews, variations, similar products, pricing, availability, images of products, accessories, and other information.
To look up information on an article, one could for instance call ItemLookup in the following way:
1 2 3 4 5 |
# Calling the itemLookup operation of the API from amazon.api import AmazonAPI from amazonproduct.api import API amazon = AmazonAPI(AMAZON_ACCESS_KEY, AMAZON_SECRET_KEY, AMAZON_ASSOC_TAG) api.call(Operation='ItemLookup', ItemId='B00008OE6I') |
The request generated for the above call is as follows:
1 2 3 4 5 6 7 8 |
http://webservices.amazon.com/uk/xml? Service=AWSECommerceService& AWSAccessKeyId=[AWS Access Key ID]& AssociateTag=[Associate ID]& Operation=ItemLookup& ItemId=B00008OE6I &Timestamp=[YYYY-MM-DDThh:mm:ssZ] &Signature=[Request Signature] |
The response to this request returns the information associated with ItemId B00008OE6I in an XML format as below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
<Items> <Request> <IsValid>True</IsValid> <ItemLookupRequest> <ItemId>B00008OE6I</ItemId> </ItemLookupRequest> </Request> <Item> <ASIN>B00008OE6I</ASIN> <ItemAttributes> <Manufacturer>Canon</Manufacturer> <ProductGroup>Photography</ProductGroup> <Title>Canon PowerShot S400 4MP Digital Camera w/ 3x Optical Zoom</Title> </ItemAttributes> </Item> </Items> |
We can easily parse the XML generated using flexter and use it further. Let’s call ItemSearch operation and search using a keyword. Here, we will be using the API.item_search module to perform the similar operation. The ItemSearch operation searches for items on Amazon. The Product Advertising API returns up to ten items per search results page.
An ItemSearch request requires a search index and the value for at least one parameter. For example, you might use the BrowseNode parameter for Harry Potter books and specify the Books search index. There are various parameters associated with the ItemSearch operation as listed below:
Availability – Specifies that the item must be available for purchase. The only valid value for the parameter is “Available”.
BrowseNode – Enables you to search a specified browse node for associated items
Condition – Enables you to specify the condition of an item. Valid values are “All”, “New”, “Used”, “Collectible”, and “Refurbished”. The default is “New”. Condition does not restrict the total number of items returned. It does, however, restrict the offers returned to those items that are in the specified condition.
Keywords – A word or phrase (words separated by percent-encoded spaces, %20) used as a search criteria. The titles and descriptions of items are searched for keywords.
MaximumPrice – The maximum price that an item can cost.
MinimumPrice – The minimum price that an item can cost.
Title – Title associated with the item. You can enter all or part of the title. Title searches are a subset of Keyword searches. Use a Keywords search if a Title search does not return the items you want.
1 2 3 4 5 6 7 8 9 10 |
# Calling the itemsearch operation of the API from amazonproduct.api import API # Setting up locale api = API(locale='uk’') # Get all books published by "Galileo Press". api.item_search('Books', Publisher='Galileo Press', ResponseGroup='Large') # Using the search index, Toys, and the parameter, Keywords, to return information about all toy rockets sold in by Amazon api.item_search('Toys', Keywords='Rocket') # Use the Availability parameter to only return shirts that are available api.item_search('Apparel', Condition='All', Availability='Available', Keywords='Shirt') |
We will search for “XML Books” keyword and perform the itemsearch operation using the API.item_search module as given below and parse the response using flexter and store the output in a db.
1 2 3 4 5 6 7 8 |
# Getting books using the keyword "XMLBooks" from amazonproduct.api import API # Better to create an ~/.amazon-product-api file with AWS config details # Setting up the locale to fetch the response api = API(locale='uk') # Calling the operation with the desired keyword # SearchIndex can take values such as Apparel, Beauty, Blended, Books, and so on. api.call(Operation='ItemSearch', Keywords='XMLBooks', SearchIndex=’Books’) |
Once we run the above code, we send a request as per the below URL.
1 2 3 4 5 6 7 8 9 |
http://webservices.amazon.com/uk/xml? Service=AWSECommerceService& AWSAccessKeyId=[AWS Access Key ID]& AssociateTag=[Associate ID]& Operation=ItemSearch& Keywords=XMLBooks& SearchIndex=Books &Timestamp=[YYYY-MM-DDThh:mm:ssZ] &Signature=[Request Signature] |
Next, we will use Flexter to parse the output and store the output in the form of tables in a db. Product Advertising API provides schemas for validating the XML in SOAP requests and for specifying item attribute types in responses. You can find the latest Product Advertising API XML schema . Next, we prepare the schema using xsd2er from the AmazonSchema.xsd file from the hyperlink above and with the following command will write the metadata
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
$ xsd2er -R -g3 AWSECommerceService.xsd … » SCHEMA {NAMESPACE} · ELEMENT ------------------------------ {http://webservices.amazon.com/AWSECommerceService/2013-08-01} · BrowseNodeLookup · BrowseNodeLookupResponse · CartAdd · CartAddResponse · CartClear · CartClearResponse · CartCreate · CartCreateResponse · CartGet · CartGetResponse · CartModify · CartModifyResponse · ItemLookup · ItemLookupResponse · ItemSearch · ItemSearchResponse · SimilarityLookup · SimilarityLookupResponse 15:48:12.055 INFO Building metadata 15:48:12.419 INFO Writing metadata 15:48:12.929 INFO updating schema origin 39 15:48:12.931 INFO Generating the mapping: elevate,reuse 15:48:14.349 INFO updating logical schema 16 15:48:14.349 INFO Registering success of job 99 15:48:14.371 INFO Finished successfully in 3381 milliseconds # schema origin: 39 logical: 16 job: 99 # statistics startup: 391 ms parse: 668 ms build: 370 ms write: 512 ms map: 1420 ms xpaths: 3374 |
At the end of the successful XSD analysis process Flexter prints out the ID of the logical schema. This should be noted and used in all the subsequent data extraction processes. Next, we use the generated id and store the data (Response.xml from the python query) in a db. Let’s parse and extract the entire XML content as below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
$xml2er -l16 \ Response.xml -o jdbc:oracle:thin:@//d.org.io:<port-no> \ -u <target username> \ -p <target password> # output path: jdbc:oracle:thin:@//d.org.io:<port-no> user: <target username> password: *** format: jdbc # schema origin: 38 logical: 15 # action skip: true 15:58:04.609 INFO Initialized in 3082 milliseconds 15:58:04.619 INFO Loading metadata 15:58:12.731 INFO Parsing data 15:58:13.051 INFO calculating data 15:58:14.335 INFO skipping tables 15:58:14.341 INFO skipping table string 15:58:14.401 INFO skipping table Header 15:58:14.420 INFO skipping table Argument 15:58:14.440 INFO skipping table Items 15:58:14.490 INFO skipping table Item4 15:58:14.597 INFO skipping table TotalPages 15:58:14.606 INFO skipping table TotalResults 15:58:14.618 INFO skipping table ItemSearchResponse 15:58:14.627 INFO calculating statistics 15:58:15.483 INFO skipping statistics writing 15:58:15.486 INFO Skipping the mapping 15:58:15.487 INFO Skipping job's success registering 15:58:15.493 INFO Finished successfully in 13962 milliseconds # schema origin: 38 logical: 15 # statistics startup: 3082 ms load: 8122 ms parse: 320 ms write: 1576 ms stats: 856 ms map: 3 ms unique xpaths: 36 |
In the above output we can see that the Flexter is writing the tables in the target db as shown in the output. The XML content parsed is stored in the form of tables in the given jdbc path. We can query and check for attributes as below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
sql> select * from TotalResults; TotalResults 1965 ... sql> select * from TotalPages; TotalPages 161 ... -- We have a table Item which contains a lot of Item ttributes sql> describe Item; PK_Item: decimal(38,0) (nullable = true) FK_Item4: decimal(38,0) (nullable = true) FK_Items: decimal(38,0) (nullable = true) ASIN: string (nullable = true) … ItemAttributes_ProductGroup: string (nullable = true) ItemAttributes_ProductTypeName: string (nullable = true) ItemAttributes_Title: string (nullable = true) ItemAttributes_Author: string (nullable = true) ItemAttributes_Publication: string (nullable = true) … -- Querying the unique ASIN number from the Items table sql> select distinct ASIN from Item limit 5; 1544801637 1563390591 1633598431 1520452881 1766906334 ... |
The above response returns a lot of parameters like number of item results, pages, ASIN, URLs, and item attributes. Next, Let’s take a more detailed look into the ISBN’s we have parsed and get the individual info on them using the ItemLookup operation. We will iterate through each of the given ISBN id’s and generate individual responses for them to create XML’s and parse them using Flexter. Let’s iterate through each ASIN using Python as given below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
# Iterating over all the ISBN id’s generated in the step above" from amazonproduct.api import API # Better to create an ~/.amazon-product-api file with AWS config details # Setting up the locale to fetch the response api = API(locale='uk') # Calling the itemsearch operation as above with the desired keyword result = api.item_search('XMLBooks',SearchIndex=’Books’) for book in result: # The book.ASIN is a list of all the ASIN(ISBN) for the result created above print “Genrating XML for the ISBN number ” + book.ASIN # We call the ItemLookup operation to an generate XML response # Setting the IdType as ISBN # Setting the ItemId as the current book.ASIN in the loop api.call(Operation='ItemLookup, IdType=’ISBN’,ItemId=book.ASIN) |
Lastly, we have generated the XML’s for all the ISBN from our first step. We will generate Request URL’s for each ISBN as below:
1 2 3 4 5 6 7 8 9 10 11 |
http://webservices.amazon.com/uk/xml? Service=AWSECommerceService &Operation=ItemLookup &ResponseGroup=Small &SearchIndex=All &IdType=ISBN &ItemId=1544801637X &AWSAccessKeyId=[Your_AWSAccessKeyID] &AssociateTag=[Your_AssociateTag] &Timestamp=[YYYY-MM-DDThh:mm:ssZ] &Signature=[Request_Signature] |
Once we have all the XML responses generated from the above inside a folder, we can store them inside a folder for batch data extraction.Let’s say our XML’s resides in /home/XMLresponses/ folder
We will parse the all the XML files using Flexter similarly as we did above in the itemSearch operation response.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
$xml2er -l16 \ /home/XMLresponses/ -o jdbc:oracle:thin:@//d.org.io:<port-no> \ -u <target username> \ -p <target password> # output path: jdbc:oracle:thin:@//d.org.io:<port-no> user: <target username> password: *** format: jdbc # schema origin: 23 logical: 1 job: 36 ... 16:37:10.426 INFO table EMD_Info: creating 16:37:11.337 INFO table EMD_Info: writing 16:37:14.849 INFO table Ticketing: creating 16:37:15.144 INFO table Ticketing: writing ... # statistics startup: 3199 ms load: 5511 ms parse: 1453 ms write: 64330 ms stats: 1545 ms map: 45 ms unique xpaths: 313 |
Once, we parse the response XML contains :
ASIN – Amazon Standard Identification Number, which is an alphanumeric token assigned by Amazon to an item that uniquely identifies it
Item – Container for information about the item, including ASIN, Title, ProductGroup, and Manufacturer.
ItemAttributes – Container for information about an item, including Title, ProductGroup, and Manufacturer.
Items – Container for one or more Item(s).
Manufacturer – Name of the company that manufactured the item.
ProductGroup – Category of the item, for example, “Book” and “DVD”.
Title – Title of the item