The overall flow of our model.
git clone https://github.com/microsoft/MIMICS
mkdir data
cd data
wget http://ciir.cs.umass.edu/downloads/mimics-serp/MIMICS-BingAPI-results.zip --no-check-certificate
Make sub files from MIMICS-BingAPI-results
cd data
unzip MIMICS-BingAPI-results.zip
cat MIMICS-BingAPI.result | wc -l # 479807
split -l 48000 MIMICS-BingAPI.result mimics_
Extract information from mimics_* and create MIMICS-BingAPI.jsonl
cd data
python3 SERP_filter.py
Create train.json and test.json in data folder
cd data
python3 data_preprocess.py
- input: query
- output: facet
cd model/query
python3 facet_generation_train.py --batch 4 --epoch 10
- input: query+documet
- output: facet
cd model/query_document
python3 facet_generation_train.py
- input: query+related
- output: facet
cd model/query_related
python3 facet_generation_train.py
- input: query
- output: facet / document / related
cd model/multi_task
python3 facet_generation_train.py --args
- input: generated facets
- output: re-generated facets
cd model/LLM
python3 facet_generation_test.py --args
- (FG, SL, EFC) Revisiting Open Domain Query Facet Extraction and Generation
cd model/other_models
python3 test_compare_model.py
- (SR) Improving search clarification with structured information extracted from search results
cd model/other_models/SR_result
reulst: All results for the original test set.
result_filter: Results for the same test set to compare with other models.
For auto evaluation
python3 evaluation.py --model_type {type}
For LLM evaluation
python3 evaluation_LLM.py --model_type {type}
Generate rationale from query and facet
cd information/LLM
python3 generate_information7B.py
python3 construct_train_dataset.py
Use only useful information from SERPs for learning.
Information related to queries and facets in documents.
cd information/pick_information
python3 pick_document.py