post-ae update

- added user study related materials - added colab versions of the tool - update readme
chyanju · Apr 29, 2022 · 7602f41 · 7602f41
1 parent 7767bb4
commit 7602f41
Show file tree

Hide file tree

Showing 3 changed files with 73 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -3,6 +3,10 @@
 <div align="left">
   This project is powered by <img src="./resources/trinity_edge_matrix.png" width=30> Trinity-Edge (<a href="https://github.com/chyanju/Trinity-Edge">https://github.com/chyanju/Trinity-Edge</a>).
 </div>
+## [Artifact Evaluation] Poe in Colab
+
+1. To set up the machine learning model used in the artifact, we provide a colab notebook that covers every step and reproduces the procedure. Please check it out here: [link](./public_PLDI22AE_Poe_TaPas_on_VisQA.ipynb).
+2. To reuse the tool and make modifications to existing benchmarks, or even create your own benchmarks, we provide a colab notebook for you to easily do that. Please check it out here: [link](public_PLDI22AE_Poe_make_benchmark.ipynb).
 
 ## [Artifact Evaluation] Getting Started Guide
 
@@ -73,6 +77,8 @@ A Linux (or Unix-like) machine is recommended since this is where the tool is te
 
 ## Dependencies
 
+Please refer to [this](public_PLDI22AE_Poe_make_benchmark.ipynb) colab notebook for step-by-step environment configuration. You can also consider directly using the colab version.
+
 The major functionalities of the tool depend on the following packages:
 
 - Trinity-Edge ([https://github.com/chyanju/Trinity-Edge](https://github.com/chyanju/Trinity-Edge))
@@ -86,8 +92,6 @@ The major functionalities of the tool depend on the following packages:
 - simplejson ([https://github.com/simplejson/simplejson](https://github.com/simplejson/simplejson))
 - python-tabulate ([https://github.com/astanin/python-tabulate](https://github.com/astanin/python-tabulate))
 
-The following packages are needed ***only*** when you want to redo the dataset pre-processing from scratch. The processed version of dataset is already included in this repo.
-
 - Node.js ([https://nodejs.org/](https://nodejs.org/))
 - Vega-Lite >= 5.1.1 ([https://vega.github.io/vega-lite/](https://vega.github.io/vega-lite/))
   - This should be installed via `npm`, run this in the repo root: `npm install vega-lite`. Note that this will install vega as a non-global library in the repo root.
@@ -265,3 +269,7 @@ We manually categorize every benchmark with different question types. There's no
 ### Reusability
 
 Poe provides an easy-to-use entrance. The internal framework structure follows that of Trinity ([https://github.com/fredfeng/Trinity](https://github.com/fredfeng/Trinity)) and Trinity-Edge ([https://github.com/chyanju/Trinity-Edge](https://github.com/chyanju/Trinity-Edge)), which are designed for better extensibility and well documented. One can easily build upon Poe following similar procedure of Trinity and Trinity-Edge, and will also find it easy to change evaluation settings according to the "Usage" section.
+
+## Citation
+
+Please check back later for bibtex.
diff --git a/logs/user-study/protocols.md b/logs/user-study/protocols.md
@@ -0,0 +1,48 @@
+# User Study Protocols
+
+This page describes user study protocols and related details.
+
+## Purpose
+
+This project studies the performance of a research tool from an end-user’s perspective. This tool is developed for synthesizing programs to generate answers to questions raised in natural language regarding scientific facts in provided data visualizations.
+
+Example data visualization forms include bar charts, line charts, histograms etc. Example questions include *“Which country’s economy will get worst over the next 12 months according to the bar chart?”* etc.
+
+Prevailing automatic visualization question answering tools only provide a final answer to a question and end-users without any domain experience could find it hard to understand how the answer is generated. This tool, in addition to generation of an answer, synthesizes a computer program that can be executed to output the desired answer, which could help users better understand the underlying reasoning process of the computer algorithm. Such a program can be thought as an explanation to the answer.
+
+In particular, the objectives of this project are two-folded, i.e., we study two main research questions: 1) (Usability) Can the tool generate the desired answer to a question raised by a participant? 2) (Explainability) Does the accompanying program synthesized by the tool well explain the answer desired by the participant?
+
+The underlying rationale for this project is driven by the research questions asked, namely usability and explainability. Since the tool is designed to help the user understand the underlying process of a computer algorithm, the natural and best fit for the evaluation of these two metrics would be a survey that involves human subjects.
+
+Our hypothesis of this study is that: additional explanations to the answers of a visualization question are more helpful for end-users’ understandings of the underlying algorithm used for finding those answers.
+
+## Procedures
+
+The study procedures include three sessions: the preparation session, the demonstration session and the survey session.
+
+The study starts with the preparation session. The preparation session is expected to take less than 5 minutes. The preparation session is to get the subject prepared for the study, which includes the following steps:
+
+1. Depending on the choice of the subject, both online and onsite versions of study are offered. For an online study, we will provide access to a web interface of the study tool; for an onsite study, subject will be accessing the study tool in an open lab space.
+2. The study proceeds with necessary prerequisite steps required by the guidelines of human subject research, e.g., a consent process, etc.
+3. The subject is then informed of the description, goals and instructions of this study. The subject can ask logistics questions during the introduction in this step. Then the preparation session ends and the study moves to the demonstration session.
+
+The demonstration session (that is expected to take less than 10 minutes) is to get the subject prepared and familiar with the goal and usage of the study tool, which includes the following steps:
+
+1. The study tool is first presented to the subject together with a walk-through demonstration about its expected usage, as well as a brief introduction of programming languages that the explanation programs are written in. The subject can also ask technical questions during the demonstration in this step.
+2. The subject is then asked to practice using the study tool at will. When the subject sends a ready signal (e.g., raising hands, clicking a “ready” button, etc.), the demonstration session ends and the study moves to the survey session.
+
+The survey session (that is expected to take less than 25 minutes) is the core of the study. It’s the main session that we collect the opinions of subjects. Concrete steps of the survey session are as follows:
+
+1. The subject first picks a visualization from a pre-defined set of publicly available visualizations, according to the subject’s own standard. There is not restriction about which one to pick.
+2. The subject then observes the visualization and inputs to the tool a question in natural language regarding scientific facts about the previously selected visualization.
+3. The subject clicks the “run” button (or other buttons with the same functionality) to run the tool and get two versions of answers from the tool. The first version (denoted by “Version A”) only includes a single answer to the question raised by the subject, and the second version (denoted by “Version B”) includes both a single answer to the question raised by the subject as well as a computer program that explains the answer. Note that the answers from the two versions could be different.
+4. The subject is then asked to respond “Yes” or “No” to the following two questions based on the quality of the outputs generated by the tool: 1) Does Version A return the desired answer? What about Version B? 2) Compared to Version A, is the answer generated by Version B well explained by the additionally synthesized computer program?
+5. The subject then repeats the survey session from step 1 to 4 to walk through the steps again, for 3 to 5 times. Then the survey session ends.
+
+During the procedures, we only collect the subjects’ responses to the two questions asked in step 4 of the survey session. The data collected will also be completely anonymous. Note that the subjects can opt out and choose to quit the study at any step of the study procedures.
+
+Interactions only happen during the preparation session and demonstration session, where we answer any logistics questions and do clarifications of any questions raised by the subjects. During the survey session, there is no interaction nor intervention.
+
+## Data Used
+
+The visualizations are from the VisQA dataset, which is publicly available online: [ https://github.com/dhkim16/VisQA-release]( https://github.com/dhkim16/VisQA-release). The dataset comes from a published academic paper "Answering Questions about Charts and Generating Visual Explanations”, whose details can be found on the official ACM Digital Library: [https://dl.acm.org/doi/abs/10.1145/3313831.3376467](https://dl.acm.org/doi/abs/10.1145/3313831.3376467).
diff --git a/logs/user-study/records.csv b/logs/user-study/records.csv
@@ -0,0 +1,15 @@
+participant,round,Q1-Usability,Q2-Explainability
+1,1,1,1
+1,2,1,1
+1,3,0,0
+1,4,1,1
+1,5,1,0
+2,1,1,1
+2,2,1,1
+2,3,1,1
+2,4,0,0
+3,1,1,1
+3,2,0,0
+3,3,0,0
+3,4,1,1
+3,5,1,1