Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Placeholder pull request for project-wide code review #30

Open
wants to merge 74 commits into
base: initial
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
2e4c603
Updating comments and README.md
gmerritt Sep 19, 2023
c24815b
Adding Gregquestion as code to mark questions
gmerritt Sep 19, 2023
27a8962
Ummm...GitHub is slow to index code for search?
gmerritt Sep 19, 2023
5241435
Trivial edit b/c GitHub search one version behind?
gmerritt Sep 19, 2023
48b8eed
Added additional Gregquestion tag
gmerritt Sep 19, 2023
d3eefab
Base working version, incl. new Gregquestion ?'s
gmerritt Oct 3, 2023
31e87e4
Functional design diagram added to readme
gmerritt Oct 3, 2023
7a4f929
VUE_... to VITE_... convention update + comments
gmerritt Oct 17, 2023
844f849
Moved app-specific fnctn to /src/api/fetch-url.ts
gmerritt Oct 17, 2023
2a9fc3a
Reworked internal api call from GET to POST
gmerritt Oct 18, 2023
490144d
Page formatting, copy URL button, SI AA accessible
gmerritt Oct 22, 2023
6d24c65
Made dl URL display text field uneditable
gmerritt Oct 24, 2023
f6fa332
Broken version that includes Greg CAS attempts
gmerritt Nov 8, 2023
9252092
Fixes to CAS integration, session management
pauline2k Nov 9, 2023
6639d04
Two tiny clean-ups before changes needing help
gmerritt Nov 14, 2023
86289a5
'Gregquestion' comments re: currentUser snafu
gmerritt Nov 14, 2023
7158bf0
user.py fix: from flask import current_app as app
gmerritt Nov 18, 2023
a91f0e0
Fetchurl.vue hip to change in post (response.data)
gmerritt Nov 19, 2023
c85d44b
Integrated to single .vue w/ tool+authorized_user
gmerritt Nov 20, 2023
3d7864b
Port 5000 api now requires authorized user
gmerritt Nov 20, 2023
3c09e44
Make gs:// url form field required
gmerritt Nov 29, 2023
b60a656
Cleaned all 'tox -e lint-py' warnings
gmerritt Dec 8, 2023
906e073
Cleaned all 'tox -e lint-vue' warnings
gmerritt Dec 8, 2023
6e4b776
Added a better input string regex format check
gmerritt Dec 8, 2023
4851cc6
Trivial README change to get GH to prompt for PR
gmerritt Dec 19, 2023
48f3a55
Getting synced with my latest; will use ets authoritative going forward
gmerritt Jan 31, 2024
18ebc9a
Merge pull request #11 from gmerritt/main
johncrossman Feb 1, 2024
2047dab
Preparing for CodeBuild for dev deployment attempts
gmerritt Feb 5, 2024
031e9ab
Merge pull request #1 from ets-berkeley-edu/main
gmerritt Feb 5, 2024
4ab60f2
A quick doodle that proposes to use AWS secrets rather than S3
gmerritt Feb 6, 2024
d435184
A quick doodle that proposes to use AWS secrets rather than S3 (clean…
gmerritt Feb 6, 2024
e1a7cbe
Quickie downgrade of node from 21 to 20 for Code Buil compatibility
gmerritt Feb 7, 2024
7662aac
undoing fake secrets try (from s3)
gmerritt Feb 7, 2024
3fd75fa
Merge pull request #12 from gmerritt/main
gmerritt Feb 7, 2024
b1877e9
Trying to fix fetchurl case
gmerritt Feb 7, 2024
32a335d
Merge pull request #13 from gmerritt/main
gmerritt Feb 7, 2024
cf9494a
fixing build script path
gmerritt Feb 7, 2024
ddc8c9f
Merge pull request #14 from gmerritt/main
gmerritt Feb 7, 2024
4b76aaa
ami config changes
gmerritt Feb 8, 2024
ea5a8f2
ami config changes
gmerritt Feb 8, 2024
74ed106
cloudwatch agent change
gmerritt Feb 8, 2024
4c5814f
Merge pull request #15 from gmerritt/main
gmerritt Feb 8, 2024
c8a6579
Merge pull request #2 from ets-berkeley-edu/main
gmerritt Feb 13, 2024
1012008
Merge pull request #16 from gmerritt/main
gmerritt Feb 16, 2024
6b9b2b7
Merge pull request #3 from ets-berkeley-edu/main
gmerritt Feb 16, 2024
6ec3e60
Use AWS Secrets for bot local & deployed; no S3 secrets!
gmerritt Feb 16, 2024
eb46fd5
Merge pull request #17 from gmerritt/main
gmerritt Feb 16, 2024
9554b2e
Removing S3-specific config handling
gmerritt Feb 16, 2024
fe75465
Merge pull request #4 from ets-berkeley-edu/main
gmerritt Feb 16, 2024
84b77cd
Merge pull request #18 from gmerritt/main
gmerritt Feb 16, 2024
b2bf0ba
Merge pull request #5 from ets-berkeley-edu/main
gmerritt Feb 16, 2024
ab2c28b
Merge pull request #19 from gmerritt/main
gmerritt Feb 16, 2024
d6be48b
Removed reference to deleted ./scripts/*.sh
gmerritt Feb 16, 2024
8a83e3d
Merge pull request #20 from gmerritt/main
gmerritt Feb 16, 2024
1f7d7ed
Paths not jiving in deployment context; "fixing"
gmerritt Feb 20, 2024
1e3090a
Merge pull request #21 from gmerritt/main
gmerritt Feb 20, 2024
50a4585
Getting (too) explicit w/ index.html path to debug deployment
gmerritt Feb 20, 2024
d700854
Merge pull request #22 from gmerritt/main
gmerritt Feb 20, 2024
b84b149
Trying directory fix at build-vue level
gmerritt Feb 21, 2024
e9d0b18
Merge pull request #23 from gmerritt/main
gmerritt Feb 21, 2024
7e50874
temporary code to debug the elastic beanstalk run context
gmerritt Feb 26, 2024
02da69a
Merge pull request #24 from gmerritt/main
gmerritt Feb 26, 2024
5e92102
Helping Elastic Beanstalk get the paths right to find js & css files
gmerritt Feb 26, 2024
6f4a796
Merge pull request #25 from gmerritt/main
gmerritt Feb 26, 2024
0d8401d
Removing "temporary code to debug the elastic beanstalk run context"
gmerritt Mar 7, 2024
f43bb48
Merge pull request #26 from gmerritt/main
gmerritt Mar 7, 2024
886aff0
Leading and trailing blank space are now removed from user gs:// url …
gmerritt Mar 7, 2024
627a291
Merge pull request #27 from gmerritt/main
gmerritt Mar 7, 2024
a6bb274
Clear contents of input and output fields when clicking back into inp…
gmerritt Mar 7, 2024
7a31a8d
Merge pull request #28 from gmerritt/main
gmerritt Mar 7, 2024
6dd4565
Formatting clean-ups from John C.'s feedback
gmerritt Apr 9, 2024
f743050
Merge pull request #31 from gmerritt/main
johncrossman Apr 9, 2024
4f80277
Additional clean-ups from John C.'s feedback
gmerritt Apr 16, 2024
3fe59ed
Merge pull request #32 from gmerritt/main
johncrossman Apr 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 20 additions & 26 deletions .ebextensions/00_ami.config
Original file line number Diff line number Diff line change
@@ -1,20 +1,11 @@
#
# AWS configuration for Hartsfield
# AWS configuration for Squiggy
#

packages:
yum:
amazon-linux-extras: []
awslogs: []
gcc-c++: []
git: []
mod_ssl: []

commands:
01_postgres_activate:
command: sudo amazon-linux-extras enable postgresql14
02_postgres_install:
command: sudo yum install -y postgresql

option_settings:
aws:elasticbeanstalk:cloudwatch:logs:
Expand All @@ -24,25 +15,28 @@ option_settings:
aws:elasticbeanstalk:environment:proxy:
ProxyServer: apache
aws:elasticbeanstalk:environment:proxy:staticfiles:
/static: dist/static
/assets: dist/static/assets
/favicon.ico: dist/static/favicon.ico

files:
/etc/awslogs/awscli.conf:
mode: '000600'
owner: root
group: root
content: |
[plugins]
cwlogs = cwlogs
[default]
region = `{"Ref":"AWS::Region"}`

/etc/awslogs/config/logs.conf:
/opt/aws/amazon-cloudwatch-agent/bin/config.json:
mode: '000644'
owner: root
group: root
content: |
[/var/app/current/hartsfield.log]
log_group_name=`{"Fn::Join":["/", ["/aws/elasticbeanstalk", { "Ref":"AWSEBEnvironmentName" }, "var/app/current/damien.log"]]}`
log_stream_name={instance_id}
file=/var/app/current/hartsfield.log*
{
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/app/current/heartsfield.log*",
"log_group_name": "`{"Fn::Join":["/", ["/aws/elasticbeanstalk", "var/app/current/heartsfield.log"]]}`",
"log_stream_name": "{instance_id}"
}
]
}
}
}
}

6 changes: 3 additions & 3 deletions .env
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
####
# Vue.js environment variables and modes: https://cli.vuejs.org/guide/mode-and-env.html
# Only variables that start with 'VUE_APP_' will be statically embedded into the client bundle.
# Only variables that start with 'VITE_APP_' will be statically embedded into the client bundle.
####

VUE_APP_API_BASE_URL=''
VUE_APP_DEBUG=false
VITE_APP_API_BASE_URL=''
VITE_APP_DEBUG=false
6 changes: 3 additions & 3 deletions .env.development
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
####
# Vue.js environment variables and modes: https://cli.vuejs.org/guide/mode-and-env.html
# Only variables that start with 'VUE_APP_' will be statically embedded into the client bundle.
# Only variables that start with 'VITE_APP_' will be statically embedded into the client bundle.
####

VUE_APP_API_BASE_URL='http://localhost:5000'
VUE_APP_DEBUG=true
VITE_APP_API_BASE_URL='http://localhost:5000'
VITE_APP_DEBUG=true
2 changes: 1 addition & 1 deletion .nvmrc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
16.14.0
20
4 changes: 4 additions & 0 deletions .platform/hooks/postdeploy/00_update_apache_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
sudo mv /tmp/hartsfield.conf /etc/httpd/conf.d/hartsfield.conf
sudo mv /tmp/ssl.conf /etc/httpd/conf.d/ssl.conf
sudo /bin/systemctl restart httpd.service
3 changes: 3 additions & 0 deletions .platform/hooks/postdeploy/01_start_awslogsd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
45 changes: 32 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,39 +4,58 @@ Hartsfield humbly supports UC Berkeley's DataHub.

![Hartsfield, re-imagined as a field of hearts.](src/assets/hEartsfield.png)

## Installation

* Install Python 3.8
* Create your virtual environment (venv)
* Install dependencies


## To run this locally:

### Install a python 3.11 venv in the project directory, activate it, and install requirements:

```
pip3 install -r requirements.txt [--upgrade]
python3.11 -m venv ./venv
source ./venv/bin/activate
pip install -r requirements.txt
```

### Install npm (the Node.js package manager), adjust the version, install the project dependencies, and do the "audit fix" if it complains that you should do so:

https://docs.npmjs.com/downloading-and-installing-node-js-and-npm

### Create local configurations
```
npm install -g npm@9.8.1
npm install
npm audit fix
```

### Securely install local configurations with secrets

If you plan to use any resources outside localhost, put your configurations in a separately encrypted area:
Put your configurations in a separately encrypted area outside of the project folder, which you will later export to environment variables. Ensure that your uid is in the AUTHORIZED_USERS list within that file.

```
mkdir /Volumes/XYZ/hartsfield_config
export HARTSFIELD_LOCAL_CONFIGS=/Volumes/XYZ/hartsfield_config
```

## Greg does a jam like this from a pair of terminals in VSCode to run this locally:
### Run one terminal session for the python back end...

```
source venv/bin/activate
export HARTSFIELD_LOCAL_CONFIGS=/Users/gregm/rip_hartsfield/hartsfield_config
source .env.development
export HARTSFIELD_LOCAL_CONFIGS=/Volumes/XYZ/hartsfield_config
export HARTSFIELD_ENV=development
venv/bin/python application.py
```
and
### ...and another terminal session for the Node.js front end:
```
source venv/bin/activate
export HARTSFIELD_LOCAL_CONFIGS=/Users/gregm/rip_hartsfield/hartsfield_config
source .env.development
export HARTSFIELD_LOCAL_CONFIGS=/Volumes/XYZ/hartsfield_config
export HARTSFIELD_ENV=development
npm run serve-vue
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source .env.development seems unnecessary in both instances above. Vite should pick it up automatically when npm run serve-vue


## Using the application

Browse to http://localhost:8080/ -- but note that the first access will take up to several minutes as all of the Node.js stuff does its thing! Subsequent access are fast.

## A diagram of the intended function of the application:

![Diagram of Hartsfield front end, back end, GCP components, and their relationships.](src/assets/2023-10-03_Hartsfield_diagram.png)
32 changes: 32 additions & 0 deletions buildspec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
version: 0.2

phases:
install:
runtime-versions:
nodejs: 20
gmerritt marked this conversation as resolved.
Show resolved Hide resolved
python: 3.11
commands:
- node -v
- npm install
pre_build:
commands:
- echo "pre_build phase"
build:
commands:
- npm run build-vue
post_build:
commands:
- ./scripts/codebuild/create-build-summary.sh
artifacts:
files:
- '.ebextensions/**/*'
- 'dist/**/*'
- 'requirements.txt'
- 'hartsfield/**/*'
- 'scripts/**/*'
- 'application.py'
- 'consoler.py'
- 'config/**/*'
- '.platform/**/*'
- 'Procfile'

7 changes: 7 additions & 0 deletions config/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,13 @@

import logging

AWS_SECRETS_REGION = "us-west-2"
AWS_SECRETS_NAME_AUTHORIZED_USERS = "AUTHORIZED_USERS"
AWS_SECRETS_NAME_GCP_JSON_CREDENTIALS = "GCP_JSON_CREDENTIALS"

CAS_SERVER = 'https://auth-test.berkeley.edu/cas/'
CAS_LOGOUT_URL = 'https://auth-test.berkeley.edu/cas/logout'

DEV_AUTH_ENABLED = False
DEV_AUTH_PASSWORD = 'another secret'

Expand Down
59 changes: 59 additions & 0 deletions hartsfield/api/auth_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""
Copyright ©2022. The Regents of the University of California (Regents). All Rights Reserved.

Permission to use, copy, modify, and distribute this software and its documentation
for educational, research, and not-for-profit purposes, without fee and without a
signed licensing agreement, is hereby granted, provided that the above copyright
notice, this paragraph and the following two paragraphs appear in all copies,
modifications, and distributions.

Contact The Office of Technology Licensing, UC Berkeley, 2150 Shattuck Avenue,
Suite 510, Berkeley, CA 94720-1620, (510) 643-7201, otl@berkeley.edu,
http://ipira.berkeley.edu/industry-info for commercial licensing opportunities.

IN NO EVENT SHALL REGENTS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL,
INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF
THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF REGENTS HAS BEEN ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.

REGENTS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
SOFTWARE AND ACCOMPANYING DOCUMENTATION, IF ANY, PROVIDED HEREUNDER IS PROVIDED
"AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
ENHANCEMENTS, OR MODIFICATIONS.
"""

from functools import wraps

from flask import current_app as app, request
from flask_login import current_user
from hartsfield.api.errors import UnauthorizedRequestError
from hartsfield.models.user import find_by_uid


def auth_required(f):
@wraps(f)
def decorated(*args, **kwargs):
if not current_user.is_authenticated:
auth = request.authorization
if not auth or not valid_worker_credentials(auth.username, auth.password):
raise UnauthorizedRequestError('Invalid credentials.')
return f(*args, **kwargs)
return decorated


def authorzied_user_required(f):
@wraps(f)
def decorated(*args, **kwargs):
uid = current_user.uid
user = find_by_uid(uid)
if user is None:
auth = request.authorization
if not auth:
raise UnauthorizedRequestError('Invalid credentials.')
return f(*args, **kwargs)
return decorated


def valid_worker_credentials(username, password):
return username == app.config['API_USERNAME'] and password == app.config['API_PASSWORD']
70 changes: 32 additions & 38 deletions hartsfield/api/datahub_archive_url_fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,79 +22,73 @@
"AS IS". REGENTS HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
ENHANCEMENTS, OR MODIFICATIONS.
"""
from collections import OrderedDict
import json

from flask import current_app as app
from hartsfield import __version__ as version
from hartsfield.configs import load_configs
from hartsfield.api.config_controller import load_json
from hartsfield.lib.http import tolerant_jsonify
from hartsfield.lib.util import get_eb_environment

import requests
import datetime
import json
import re

from google.oauth2 import service_account
from flask import current_app as app, request
from google.cloud import storage
from google.oauth2 import service_account
from hartsfield.api.auth_helper import authorzied_user_required
from hartsfield.lib.http import tolerant_jsonify
import hartsfield.api.read_aws_secret

PUBLIC_CONFIGS = [
'DEV_AUTH_ENABLED',
'HARTSFIELD_ENV',
'TIMEZONE',
]

gcp_json_credentials = app.config['GCP_JSON_CREDENTIALS']
gcp_json_credentials_dict = json.loads(gcp_json_credentials)
AWS_SECRETS_NAME_GCP_JSON_CREDENTIALS = app.config['AWS_SECRETS_NAME_GCP_JSON_CREDENTIALS']


gcp_json_credentials_from_aws = hartsfield.api.read_aws_secret.read_aws_secret(AWS_SECRETS_NAME_GCP_JSON_CREDENTIALS)
gcp_json_credentials_dict = json.loads(gcp_json_credentials_from_aws)

# TODO: pass in gs url as input value to @app.route('/api/fetch_url_direct') from form user front-end form submission
gs_source_url="gs://ucb-datahub-archived-homedirs/spring-2021/datahub.berkeley.edu/peterphu-2edo.tar.gz"
# This will probably be request.args['gs_source_url'] in the def block...but that whole "request" business needs to be brought in etc.

@app.route('/api/fetch_url_direct')
@app.route('/api/fetch_url_direct', methods=['POST'])
@authorzied_user_required
def fetch_url_direct():

# parse the input gs url to get bucket and blob names
bucket_and_blob_string = gs_source_url.replace("gs://", "")
bucket_and_blob_list = bucket_and_blob_string.split("/")
params = request.get_json()
gs_source_url = params.get('gsSourceUrl')
gs_source_url = gs_source_url.strip()
if not re.match(r'gs://.{3,}/.+', gs_source_url):
error_message = 'The submitted data \"' + gs_source_url + '\" is not a valid gsSourceUrl.'
v = {'response': error_message, 'status': 'error'}
return tolerant_jsonify(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above looks good. Although I suggest the following for line 58:

return tolerant_jsonify(
    {'message': error_message},
    500,
)

This will make for more standard front-end code (eg, checking HTTP status code).


bucket_and_blob_string = gs_source_url.replace('gs://', '')
bucket_and_blob_list = bucket_and_blob_string.split('/')
bucket_name = bucket_and_blob_list.pop(0)
blob_name = "/".join(bucket_and_blob_list)
blob_name = '/'.join(bucket_and_blob_list)

# instantiate gcp storage client plus with bucket and blob objects
credentials = service_account.Credentials.from_service_account_info(gcp_json_credentials_dict)
storage_client = storage.Client(project=gcp_json_credentials_dict['project_id'], credentials=credentials)
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)

# do some checks to confirm that the bucket and blob exist
try:
stats = storage.Blob(bucket=bucket, name=blob_name).exists(storage_client)
except Exception as e:
error_message = "There was a problem trying to get stats on the requested blob \"" + blob_name + "\" in the requested bucket \"" + bucket_name +"\":\n\n " + str(e)
error_message = f"""There was an exception trying to do the GCP storage operation
with the submitted data "{gs_source_url}".
When GCP tried, it told us: "{str(e)}"
"""
v = {'response': error_message, 'status': 'error'}
return tolerant_jsonify(v)
if stats:
# if the bucket and blob exist, generate a signed url for the blob...
# ...and package it as a Hartsfield back-end internal response
gcp_response = blob.generate_signed_url(
version="v4",
version='v4',
expiration=datetime.timedelta(days=7),
method="GET",
method='GET',
)
v = {'response': gcp_response, 'status': 'success'}
else:
gcp_response = "File \"" + blob_name + "\"does not exist in bucket \"" + bucket_name + "\""
gcp_response = f'GCP tried, but could not locate a file "{blob_name}" in a bucket called "{bucket_name}".'
v = {'response': gcp_response, 'status': 'error'}

return tolerant_jsonify(v)

"""
To make/fix/clean:

- Make the Web request form / wire up web front end portion of app

- CalNet auth in front of web app

- All of the other ignorant/non-ideal coding practices I've done...!
"""

Loading