In order to gain insight into technological trends, I started investigating patents. It seemed like "smart" people had an understanding of trends in patents being filed but I couldn't figure out how they knew what they knew. To solve this issue, I built a scraper which pulls data from the USPTO and also wrote several notebooks analyzing different aspects of the patents.
The initial search page can be found here. The program will query the patent search page and pull the number of patents that should be returned for the search query entered into the functions parameters.
The URL of the page returned from a search will look like this:
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=probiotic&FIELD1=&co1=AND&TERM2=&FIELD2=&d=PTXT
Using those numbers, the program will loop through links to individual patents. One of those links looks like this:
http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=probiotic&OS=probiotic&RS=probiotic
The url parameter &r=1
is modified to return different patents for the search query in the &s1=probiotic
parameter.
In this notebook, the pipeline is made to analyze features like inventors, publication dates, primary examiners.
In this notebook, I created a pipeline to analyze the content of the abstracts of the patents in the input data.
I specifically dove into uBiome because I knew a bit about their business from before they filed for bankruptcy and was interested to see what their patents looked like.
- Standardize plots
- Both high-level style choices (grids, tick marks, font sizes) to axis labels, titles and legends.
- Do more in-depth text analysis
- Got into claims and was able to apply N-Gram analysis and LDA Topic Modeling to both claims and abstracts
- Next step is to find a more in depth method of analyzing text. LDA seems sufficient for unsupervised clustering but there may be more complex methods out there.
https://drive.google.com/file/d/1FtqAcsA-xKhNqVqFMK0rzjQTmsxWaIz3/view?usp=sharing