-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding Hyperlinks within Collection on Pages with Certain Keyword #377
Conversation
Codecov Report
@@ Coverage Diff @@
## master #377 +/- ##
==========================================
+ Coverage 76.36% 76.37% +0.01%
==========================================
Files 40 40
Lines 1413 1414 +1
Branches 268 268
==========================================
+ Hits 1079 1080 +1
Misses 217 217
Partials 117 117 |
Looks good!
^^^ @ianmilligan1 that what you're looking for? If so, I'll squash and merge, and add this to the cookbook section if that makes sense. |
hey @ruebot is this something that should be encoded in a test case while we're at it? |
@lintool we already have a test case for |
@ruebot add a separate test case that explicitly include filtering? Maybe not. I dunno. |
Yeah, we could add a new test that add ...and if that's the case, @SinghGursimran, want to update the PR with an updated test? |
@ruebot Looks perfect to me - and I love the sample results with "keystone" leading to David Suzuki. Thanks @SinghGursimran, great work. |
Should I add a separate test for ExtractLink udf with a filter? |
@SinghGursimran yeah, why not. Let's go with that. |
* Add example for archivesunleashed/aut#377 / archivesunleashed/aut#238 * review
Extract hyperlinks within a collection filtered on pages containing a particular keyword (case insensitive) using df.
#238
Returns a csv file with URL, Domain, crawl_date, and destination_page.