Scrape US Patents Zip Files

Posted By admin
web scraping services

Project Title: Scrape US Patents Zip Files

Project Description:
I need someone who can scrape download numerous gigabytes of information about US patent applications.

It is a pretty straightforward job as the data are extremely well structured. The main challenge will probably be that the resulting amount of data is quite large and might be hard to transmit.

Here are the steps that need to be completed:

1. Navigate to this website: http://patents.reedtech.com/Public-PAIR.php
2. Download the file entitled “PAIRIndex.zip”, which is linked in the middle of the webpage.
3. Open PAIRIndex.zip and observe that the file has three columns: Application Serial Number (APP_NUM), Date Scraped from Public PAIR, and File size.
4. Write a scraper loop that will cycle through the many values of APP_NUM from the PAIRIndex.zip file in the following URL: https://patents.reedtech.com/downloads/pairdownload/APP_NUM.zip
5. Download and save all of the resulting zip files that come from the scraper loop. This will be many gigabytes.
6. Transmit those saved zip files to me. FTP might be the easiest transfer mode, though Dropbox or others work equally well for me.

I would like to have this job completed in about two weeks, if possible. Thank You.

For similar work requirements feel free to email us on info@webscrapingexpert.com.


Add a comment