GitHub - changsu/cs221_Project: This repo is used for source control of final project of CS 221

CS221 Final Project Documentation

Report

Please find our final report [here] (http://www.stanford.edu/~changsu/cs221/final_report.pdf)

Author:

Qingyi Meng (qingyim@stanford.edu)
Chang Su (changsu@stanford.edu)

Acknowledgements

The project would not have been possible without the guidance and support of Professor Dave Barker-Plummer and the [Openproof Group] (http://ggweb.stanford.edu/openproof/people) at [Center for the Study of Language and Information Stanford University] (http://www-csli.stanford.edu/)

Methods

1. Linear Regression

Start the program

run gradient_lr

Result logging

MSE_Train =
MSE on training data using RL: 0.48753

MSE_Test =
MSE on testing data using RL: 0.81715

querySentencesPool = 
    '9.17.2'     [ 326]
    '7.11.4'     [ 628]
    '1.9.4'      [ 687]
    ...
    ...

We output mean squared error (MSE) of linear regression on training data and testing data seperately and also output pool of sentences that can be used for query. These sentences are the testing sentences that the model has never "seen" before In addition, we also draw the learning curve of traning iterations vs mse similar to figure learning_curve_lr.fig

Query

After running the algorithm, we have generate a linear model ready for prediction. We can choose any sentence number in querySentencePool and issue the query using the cmd

query('9.17.2')

Then you will see result similar as below

realValue =

    0.6829
    0.5371
    0.2839
    0.3478
    0.3376
    0.3529
    0.2506

estimatedValue =

    0.4005
    0.3774
    0.4736
    0.4021
    0.1091
    0.2080
    0.0759

result = 
    'norm2 distance: '    '0.5'    'Confidence: '    [391]

Here we show ground truth value and predicted value and also their norm2 distance and our confidence in the prediction based on # of scripts generated from the sentence. In addition, we also visualize the result using bar graph.

2. KL Optimization Method

Star the program

run gradient_kl

Result Logging

Loss_KL =
Loss Function on testingn data using KL method: 4.4625

Loss_rand =
Loss Function on testingn data using random guess: 20.3272

querySentencesPool = 
    '1.4.5'      [  93]
    '9.18.4'     [ 636]
    '7.11.3'     [ 286]
    ...
    ...

Here, we output Loss function on testing data using KL method and random guess respectively and also pool of sentences used for query In addition, we also generate learning curves of the KL method iterations vs loss function in figure file learning_curve_kl.fig

Query

Similarly to the linear regression method, after running KL-method, we can also issue queries for prediction. Then we will generate result similar to this:

realValue =

    0.2101
    0.2502
    0.1366
    0.0721
    0.1709
    0.0866
    0.0735

estimatedValue =

    0.1572
    0.2248
    0.1084
    0.1059
    0.2012
    0.1187
    0.0837


result = 
    'KL distance: '    '0.027233'    'Confidence: '    [636]

File Structures

gradient_lr.m - code that implements the linear regression method

gradient_kl.m - code that implements the KL optimization method

query.m - code that supports query based on sentence number

feature_matrix.txt - 268 * 30 matrix with row as sentence column as features

feature_name.txt - 30 dimension vector that store name of each features

label_dependent.txt - 162 * 7 matrix with row as each sentence column as error type probabilities. For each row, all columns reflects the distribution of error types, thus summing up to 1

label_independent.txt - 162 * 7 matrix with row as each sentence, column as error type probabilities. However, it's differenct from label_dependent.txt file in a sense that there is no constraint among columns for a particular row and they are independent with each other.

sentence_map.txt - store map between sentence number and total # of scripts genearted from that sentence

sample_kl.fig - query result figure of KL method

sample_lr.fig - query result figure of LR method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Report

Author:

Acknowledgements

Methods

1. Linear Regression

2. KL Optimization Method

File Structures

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
README.md		README.md
data for mat.xlsx		data for mat.xlsx
feature_matrix.txt		feature_matrix.txt
feature_name.txt		feature_name.txt
gradient_kl.m		gradient_kl.m
gradient_lr.m		gradient_lr.m
label_dependent.txt		label_dependent.txt
label_independent.txt		label_independent.txt
label_independent_old.txt		label_independent_old.txt
learning_curve_kl.fig		learning_curve_kl.fig
learning_curve_lr.fig		learning_curve_lr.fig
query.m		query.m
sample_kl.fig		sample_kl.fig
sample_lr.fig		sample_lr.fig
sentence_map.txt		sentence_map.txt
sentence_map_old.txt		sentence_map_old.txt

changsu/cs221_Project

Folders and files

Latest commit

History

Repository files navigation

Report

Author:

Acknowledgements

Methods

1. Linear Regression

2. KL Optimization Method

File Structures

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages