-
Notifications
You must be signed in to change notification settings - Fork 45
/
CHANGELOG
463 lines (379 loc) · 12.2 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
*********
ChangeLog
*********
0.13.11 (2016-05-27)
====================
- Add harvesters:
- mla: MLA Commons
- Update eLife Harvester
0.13.10 (2016-05-18)
====================
- update longname for boise state
0.13.09 (2016-05-18)
====================
- Add harvesters:
- colostate
- boise_state
- nku
- triceratops
- Update umontreal
- Add favicon for neurovault
0.13.08 (2016-04-21)
====================
- Add harvesters:
- udc: University of Minnesota, Digital Conservancy
- uhawaii:ScholarSpace at University of Hawaii at Manoa
- csir: CSIR Researchspace
- iu: Indiana University Libraries' IUScholarWorks
- uwo: Western University
- nsfawards: NSF Awards
- Fix django settings module import
0.13.07 (2016-03-18)
====================
- Remove document ordering in postgres API
0.13.06 (2016-03-17)
====================
- Add harvesters:
- mizzou: MOspace: University of Missouri
- nature: Nature
- umassmed: eScholarship@UMMS
- umontreal: PAPYRUS - Dépôt institutionnel de l'Université de Montréal
- uncg: UNC-Greensboro
- README and installation updates
- Postgres API updates:
- sort docs by providerUpdatedDateTime
- exclude non-normalized documents
- Fix crossref, add subjects to harvested schema
0.13.05 (2016-02-25)
====================
- Add harvesters:
- usgs: United States Geological Survey
- Fix harvester biomedcentral to pull from new API
0.13.04 (2016-02-24)
====================
- Update script to remove items from elasticsearch
- Fix elasticsearch key typo
0.13.03 (2016-02-23)
====================
- Add harvesters:
- fit: Florida Institute of Technology
0.13.02 (2016-02-23)
====================
- Provide for deleting documents from es index from push API w/status deleted
- Add script for deleting a few documents in production
0.13.01 (2016-02-18)
====================
- Bump django and djangorestframework versions in requirements
0.13.0 (2016-02-18)
====================
- Add harvesters:
- cuny: City University of New York
- ukansas: KU ScholarWorks
- Update harvesters:
- springer to not normalize records published by biomedcentral
- dataone to only normalize metadata records
- Postgres web API serializer updates to json
0.12.2 (2016-02-04)
====================
- Push API harvester updates to work with unvalidated email addresses
0.12.1 (2016-02-04)
====================
- Push API updates to work with new format
- Pin version of invoke for travis
- Change local dist to use postgres over cassandra
0.12.0 (2016-01-26)
====================
- Adding push API harvester
- Add institution support
- VIVO settings in prep for VIVO harvester
- Various updates and bugfixes @fabianvf :)
0.11.16 (2016-01-26)
====================
- Change default encoding recovery to look for None
0.11.15 (2016-01-26)
====================
- Add harvesters:
- eLife Sciences
- Digital Commons @ Illinois Wesleyan University
- K-State Research Exchange
- New Prairie Press at Kansas State University
- University of South Florida - Scholar Commons
- University of Tennessee at Chattanooga
0.11.14 (2016-01-26)
====================
- add error recovery to normalize XML parsing
- add error recovery to helper function XML parsing
0.11.13 (2016-01-26)
====================
- fix favocon for trinity
0.11.12 (2016-01-26)
====================
- moving tags into main schema for crossref
0.11.11 (2016-01-26)
====================
- fixing typos
0.11.10 (2016-01-08)
====================
- Fix favicons for
- Addis Ababa
- Ghent
- NIST
- Texas State
0.11.9 (2015-12-29)
===================
- Fix Addis Ababa URL gathering (strings were not being mutated)
0.11.8 (2015-12-29)
===================
- Add harvesters:
- Pontifical Catholic University of Rio de Janeiro
- PURR - Purdue University Research Repository
- RCAAP - Repositório Científico de Acesso Aberto de Portugal
0.11.7 (2015-12-18)
===================
- Fix Addis Ababa URL gathering
0.11.6 (2015-12-17)
===================
- Add harvesters:
- Addis Ababa University Institutional Repository
- NIST MaterialsData
- Speech and Language Data Repository (SLDR/ORTOLANG)
- DigitalCommons@University of Nebraska - Lincoln
- Washington State University Research Exchange
0.11.5 (2015-12-14)
===================
- Add Iowa State harvester
- Add VCU harvester
0.11.4 (2015-12-07)
===================
- Add EngagedScholarship@CSU to harvesters
0.11.3 (2015-11-23)
===================
- Parse emails, ORCIDs and affiliations from DOE related services
0.11.2 (2015-11-20)
===================
- Fix description parsing for lwbin
0.11.1 (2015-11-20)
===================
- Change some harvester long names
0.11.0 (2015-11-19)
===================
- Improved documentation
- Add versioning to Postgres backend
- Add migrations for versions
- Fix some bugs with migrations from Postgres to other backends
- Add harvesters:
- Duke University Libraries
- Erudit
- Ghent University
- London School of Hygiene and Tropical Medicine Research Online
- Mason Archival Repository Service
- Deep Blue University of Michigan
- Earth System Grid at NCAR
- National Oceanographic Data Center
0.10.7 (2015-11-16)
===================
- Update Cambridge long name
- Update UTAustin url and approved sets
0.10.6 (2015-11-10)
===================
- Fix springer API key name
0.10.5 (2015-11-09)
===================
- Make single document query much faster for apiserver
- Add status route, so the status can be checked without being super slow
0.10.4 (2015-11-06)
===================
- Prevent Django from executing a count query for pagination
0.10.3 (2015-11-05)
===================
- Prevent unicode decode errors when stripping bytes out of json blobs
0.10.2 (2015-11-05)
===================
- Log single failed documents on the cross_db migration, but don't halt it
0.10.1 (2015-11-05)
===================
- Fix invalid database serialization declaration
0.10.0 (2015-11-04)
===================
- Add harvesters:
- Portland State University
- Philadelphia College of Osteopathic
- University of Tennessee Knoxville
- Springer Link
- Kent State
- Lake Winnipeg Basin Information Network Data Hub
- VIVO
- Research Online @ University of Wollongong
- University of Cambridge
- CiteSeerX
- UKY
- University of Illinois
- Chapman
- Neurovault
- NIH
- Add Django REST API for raw and normalized documents, as well as
a postgres backend for scrAPI
- Prevent mapper parsing exceptions in elasticsearch from preventing a
document from being indexed when it is a date field that breaks
- Make pubmedcentral API endpoint point to the correct place
- Add support for running doctests
- Various other testing and test coverage improvements
- FRONTEND_KEYS is no longer a required setting, and will default to allowing
all fields
- Various documentation fixes
0.9.10 (2015-10-26)
===================
- Fix incorrect map serialization in OAI harvesters
0.9.9 (2015-10-26)
==================
- Scitech documents will now process again
0.9.8 (2015-10-20)
==================
- Fix url-gathering for pubmedcentral (after they had a schema change)
0.9.7 (2015-09-18)
==================
- Bugfixes for OAI url gathering
0.9.6 (2015-09-17)
==================
- Elasticsearch now retries requests after connection errors
- Calhoun harvester now ignores that the SSL cert is invalid
- OAI url parser now terminates the regex capture after finding an invalid DOI
character
- harvester invoke task now puts the default start date as settings.DAYS_BACK
days before the end date
- scrapi.requests now exposes the requests.exceptions module
- Update README.md with updated date information
0.9.5 (2015-09-14)
==================
- Clinical Trials harvester now dumps lxml elements to dicionaries in
the otherProperties field
0.9.4 (2015-09-10)
==================
- Biomedcentral harvester now filters out results from the future
0.9.3 (2015-09-01)
==================
- Capture more uris from pubmedcentral harvester
- Update favicons so that all favicons are .icos (fixes IE display bug)
- Fix longname for Portland State University harvester
0.9.2 (2015-09-01)
==================
- fix specification of canonicalUri requirements in schema
- update harvesters to reflect change in specification
0.9.1 (2015-09-01)
==================
- Document __repr__ no longer throw exceptions (allowing errors to be reported)
0.9.0 (2015-08-27)
==================
- Update setup documentation
- add harvesters:
- WHOAS at MBLWHOI Library
- The OAKTrust Digital Repository at Texas A&M
- DigitalCommons@PCOM
- PDXScholar
- ScholarsArchive@OSU
- stricter date/time, email, uri validation
- automated data cleaning (strip out optional values with no semantic information)
- extraction of PDF links for OAI harvesters
- more consistent date formatting
- more consistent DOI extraction
- fix URLs for auto generated OAI harvesters
- OSF harvester now sorts by correct date
0.8.4 (2015-08-21)
==================
- Fix url gathering for datacite harvester
0.8.3 (2015-08-19)
==================
- Add funding information to the crossref harvester
0.8.2 (2015-08-11)
==================
- Add harvester for Washington University Open Scholarship
0.8.1 (2015-08-06)
==================
- Scitech harvester now uses the correct start and end dates
0.8.0 (2015-07-28)
==================
- Add harvesters for Smithsonian Digital Repository, Hacettepe,
Harvard Dataverse, Cyberleninka, Howard University, Scholarworks Umass,
Inter-University Consortium for Political and Social Research
- Python 3 support
- Fix DOI harvesting for OAI harvesters
- Fix OAI harvesters having their otherProperties overwritten when they
defined a new schema.
- Fix resumption tokens in OAI harvesters
- Fix date parsing for DOE schema harvesters
- Stop JSON processor from swallowing exceptions
- Update harvesters to make their schemas more closely match the spec
0.7.6 (2015-07-10)
==================
- Fix language harvesting for DOE and OAI harvesters
0.7.5 (2015-07-10)
==================
- Fix shareok harvester (SSL verification failures ignored)
0.7.4 (2015-07-08)
==================
- Fix probabilistic test failures
0.7.3 (2015-07-07)
==================
- Add Daily SSRN harvester
0.7.2 (2015-06-30)
==================
- Make harvesters run monday-sunday by default
0.7.1 (2015-06-15)
==================
- Base OAI schema now includes DOIs as object URIs
- If a migration begins to fail due to cassandra connection errors, we now
attempt to re-establish the connection
0.7.0 (2015-06-12)
==================
- Add University of Delaware, Harvard Dash,
Data Dryad, and Iowa Research harvesters
- Update skip logic for shareok
- Rewrote cassandra models to partition data to make migrations more efficient
- Added migration script for new models
- Rewrote migrations to take advantage of celery
- Added automatic malformed XML recovery
0.6.6 (2015-06-08)
==================
- Fixed small bug in dryad where documents without URIs were created
0.6.5 (2015-06-08)
==================
- Add harvard-dash, iowa research, and data dryad harvesters
- Make migrations a little more resilient (with autoretries)
- Fix a bug with introspection into function arguments for logging
0.6.0 (2015-05-04)
==================
- Better logging
- Add tests for harvesters
- Add the rename migration script
- Add the delete migration script
- Add the Zenodo, Scholarsbank, SHARE OK, CU Scholar, Calhoun, Caltech
Authors, BHL, and CogPrints harvesters
0.5.0 (2015-04-13)
==================
- Adds the Osf harvester
0.4.0 (2015-04-10)
==================
- Data One now uses XMLHarvester
- PLoS now uses XMLHarvester
- Crossref is no longer limited to collect 1000 documents
- Add the BioMed harvester
- Requests no longer crashes when recording is turned off
- Cassandra now only stores new versions of documents, no more duplicate
versions
- Use the jsonschema library for JSON transformer
- Implement the new schema
0.2.0 (2015-03-16)
==================
- Requests made with scrapi.requests are now recorded and replayed via
cassandra
- Improved test coverage
- Removed website, see erinspace/shareregistration or osf.io/share/ for its
replacement
- Manifest system for harvesters removed and replaced with metaclassing
- Added an img/ folder that stores the favicons of providers
- Implemented the transformer system which refactors how normalize is defined
for xml based harvesters
- Removed the storage module
0.1.0 (2015-03-09)
==================
Initial release