Skip to content

Commit

Permalink
Find a robust method to get articles paragraphs #1
Browse files Browse the repository at this point in the history
  • Loading branch information
fmikaelian committed Feb 19, 2019
1 parent ab2e007 commit 4b5ba92
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions cdqa/utils/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,8 @@ def df2squad(df, version='v2.0', output_dir=None):
json.dump(json_data, outfile)

return json_data

def filter_paragraphs(paragraphs):
# filter out paragraphs shorter than 10 words and longer than 250 words
paragraphs_filtered = [paragraph for paragraph in paragraphs if len(paragraph.split()) >= 10 and len(paragraph.split()) <= 250]
return paragraphs_filtered

0 comments on commit 4b5ba92

Please sign in to comment.