-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patent Data #9
Comments
For the record, from https://www.dol.gov/general/aboutdol/copyright
|
Would this be relevant in this context? https://www.uspto.gov/learning-and-resources/bulk-data-products |
https://bulkdata.uspto.gov/ |
If this line of inquiry is fruitful, the following might be useful as well. It ostensibly combines multiple countries datasets and multiple other patent datasets as well. |
@sunnydigital @chris-ha458 any updates on this? |
Hi Stella, I'm no longer working on this project. Let me unassign myself. |
Had a look and they only have text available for US publications. Other countries just have (v. short) abstracts, from what I could tell. I can take a look at the sets available from USPO if no one else is working on this. |
@baberabb can you share how you accessed it? @StellaAthena I do think this is a plausible pathway, but I am not able to spearhead it at the moment. I will try to assist any effots though. |
It's available through BigQuery which is Google's SQL-like database system. And Yes! charged me $20 and I just made a few requests. I think if you still have free GCP credits then you can use that. |
Ok got trial access and did some more experimenting and we can just use the Google dataset IMO. They provide full-text for all US patent publications (not applications) and titles/abstracts for all others. All in plain-text as well so will be easy to format. Total 150m rows and seems to have the full US record till Oct 27, 2023. sample extract here. |
Amazing! |
Domain: Patents
Can we use the Google Patents data for this?
It might be possible to use C4/Common Crawl data for this as
patents.google.com
is one of the most represented domains in c4The text was updated successfully, but these errors were encountered: