Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated pdf version specs #400

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
b6d9875
changes to generate pdf-version of the specs
Arshitha Nov 22, 2019
30e3aab
included name to contributors file
Arshitha Nov 22, 2019
b3ddd2e
Fixed bugs as suggested on the pdf-version issue
Arshitha Jan 16, 2020
877a303
Merge pull request #1 from bids-standard/master
Arshitha Jan 17, 2020
5a5dc3f
Merge branch 'pdf-version' into master
Arshitha Jan 20, 2020
4fefe5b
updated branch with most recent commits from base repo
Arshitha Jan 22, 2020
4a2601c
deleted all files
Arshitha Jan 22, 2020
8aa83e6
added most recent version of files
Arshitha Jan 22, 2020
4794112
new branch with more readable commit history
Arshitha Jan 25, 2020
a5667bf
removed remove_cross_internal_links function
Arshitha Jan 25, 2020
c00d029
added comments and cleaned up code
Arshitha Jan 25, 2020
a3efecf
added readme and cleaned up code
Arshitha Jan 30, 2020
e9538f4
reverted modifications to CHANGES.md and fixed file naming to bids-sp…
Arshitha Feb 6, 2020
626f1a4
adding title and version details to the cover page
Arshitha Feb 18, 2020
0ec12cb
Merge remote-tracking branch 'upstream/master'
Arshitha Feb 18, 2020
3476e74
Merge branch 'master' into updated-pdf-version-specs
Arshitha Feb 18, 2020
1574bcb
fixed markdown file flagged by Travis CI build
Arshitha Feb 18, 2020
1ea328e
column width adjustment and typo fix
Arshitha Feb 18, 2020
62328e7
Changes based on code review
Arshitha Feb 20, 2020
ff41bde
modified circleci config
Arshitha Feb 20, 2020
206255d
formatting changes to circleci config
Arshitha Feb 20, 2020
fce1b0f
added store_artifacts command to config
Arshitha Feb 20, 2020
354ca4e
formatting changes to circleci config
Arshitha Feb 20, 2020
20f9388
recreated config to fix indentation errors
Arshitha Feb 20, 2020
37e9e87
added build_docs_pdf job as part of the workflow
Arshitha Feb 20, 2020
24e34a1
changed path in config
Arshitha Feb 20, 2020
4f34472
circleci config working dir changes
Arshitha Feb 20, 2020
caa442b
circleci config changes
Arshitha Feb 20, 2020
621f329
docker version change
Arshitha Feb 20, 2020
6232eb5
changing relative paths for circleci build
Arshitha Feb 20, 2020
1ddeadf
path change to match config
Arshitha Feb 20, 2020
39aa2e5
relative path fix
Arshitha Feb 20, 2020
6dbb810
circleci build debug
Arshitha Feb 20, 2020
725a87c
circleci build debug
Arshitha Feb 20, 2020
68db019
circleci build debug
Arshitha Feb 20, 2020
d1ecfd9
circleci build debug
Arshitha Feb 21, 2020
8de0662
circleci build debug
Arshitha Feb 21, 2020
f524d78
few more changes to relative paths
Arshitha Feb 21, 2020
e6e912a
fixing cp flag
Arshitha Feb 21, 2020
98658b9
testing cp behaviour within docker
Arshitha Feb 21, 2020
6475ad3
circleci build debug
Arshitha Feb 21, 2020
21b5b20
changing config
Arshitha Feb 21, 2020
3a353cf
pulling header string from mkdocs.yml
Arshitha Feb 21, 2020
d1a2d3c
remove generated pdf from repo
Arshitha Feb 21, 2020
29ca8ef
cleaned up code and final checks with circleci
Arshitha Feb 21, 2020
400712e
fixing travisCI build error
Arshitha Feb 21, 2020
44be12f
reverting back changes 01-magnetic-resonance-imaging-data.md
Arshitha Feb 21, 2020
898f7cd
STY: Fixed Markdown style
Arshitha Feb 22, 2020
1b11312
column width adjustment
Arshitha Feb 24, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added bids-specs.pdf
Binary file not shown.
41 changes: 41 additions & 0 deletions pdf_build_src/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# pdf-version of BIDS specification

The `pdf_build_src` directory contains the scripts and tex files required to build a pdf document of the BIDS specification from multiple markdown files using the pandoc library.

Pandoc is command line tool which is also a Haskell library that converts files from one markup format to another. More here: https://pandoc.org/index.html

## Requirements

For the pdf build to be successful, the following need to be installed:

- Python 3.x
- pandoc
- Latest version of LaTeX: By default, Pandoc creates PDFs using LaTeX. Because a full MacTeX installation uses four gigabytes of disk space, pandoc recommends BasicTeX or TinyTeX and using the tlmgr tool to install additional packages as needed.

Installation instructions for both pandoc and LaTeX: https://pandoc.org/installing.html

## Building pdf document

Run the `build_pdf.sh` from the `pdf_build_src` with the command `sh build_pdf.sh` from the command line terminal

List of warnings are for missing characters like emojis while converting from markdown to pdf. Except for losing those characters in the final document, it doesn't affect the formatting or contents and therefore, can be ignored.

## Technical Overview

Pandoc comes with a plethora of options to format the resulting document. For building a pdf from multiple markdowns, a consolidated intermediate tex file is first built, which is then converted to a pdf document. To achieve the desired formatting in the final pdf, additional tex files are used with options offered by pandoc.

### Formatting files

`listings_setup.tex` - Listings is a LaTeX package used for typestting programming code in TeX. This file sets up the listings package to suit our needs and is used with the `--listings` option.

`cover.tex` - BIDS Logo is used as a cover page for the document. `cover.tex` is used with the option `--include-before-body`

`header.tex` - Header tex file that's updated with the latest version number and date when `build_pdf.sh` is run. Used with the `-H` header option.

### Scripts

`process_markdowns.py` - Script that processes markdown files in the `src` directory that are duplicated and modified for the needs of the pdf.

`pandoc_script.py` - Prepares and runs the final pandoc command through the `build_pdf.sh` script

`build_pdf.sh` - Shell script that organizes the directory structure and runs the above two python scripts
Binary file added pdf_build_src/bids-specs.pdf
Binary file not shown.
19 changes: 19 additions & 0 deletions pdf_build_src/build_pdf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Shell script that runs process_markdowns.py and pandoc_script.py in sequence to build the pdf document

# prepare the copied src directory
python process_markdowns.py

# copy pandoc_script into the temp src_copy directory
cp pandoc_script.py header.tex cover.tex listings_setup.tex src_copy

# run pandoc_script from src_copy directory
cd src_copy
python pandoc_script.py
mv bids-specs.pdf ..
cd ..

# delete the duplicated src directory
rm -rf src_copy

# open bids-specs.pdf
open bids-specs.pdf
3 changes: 3 additions & 0 deletions pdf_build_src/cover.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
% adds the bids logo as the cover page of the pdf
\includegraphics{images/BIDS_logo.jpg}
\thispagestyle{empty}
6 changes: 6 additions & 0 deletions pdf_build_src/header.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
% header file
\usepackage{fancyhdr}
\pagestyle{fancy}
\fancyhf{}
\chead{Brain Imaging Data Structure v1.2.1 2019-08-14}
\fancyfoot[LE,RO]{\thepage}
25 changes: 25 additions & 0 deletions pdf_build_src/listings_setup.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
% Contents of listings-setup.tex
\usepackage{xcolor}
\usepackage{graphicx}

\lstset{
basicstyle=\ttfamily,
numbers=left,
keywordstyle=\color[rgb]{0.13,0.29,0.53}\bfseries,
stringstyle=\color[rgb]{0.31,0.60,0.02},
commentstyle=\color[rgb]{0.56,0.35,0.01}\itshape,
numberstyle=\footnotesize,
stepnumber=1,
numbersep=5pt,
backgroundcolor=\color[RGB]{248,248,248},
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2,
captionpos=b,
breaklines=true,
breakautoindent=true,
escapeinside={\%*}{*)},
linewidth=\textwidth,
basewidth=0.5em
}
37 changes: 37 additions & 0 deletions pdf_build_src/pandoc_script.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""
Once the duplicate src directory is processed, the pandoc library is used as a final
step to build the pdf.
"""
import os, sys
import argparse
import subprocess

def build_pdf(filename):
"""
constructs the command with the required pandoc flags and runs it using subprocess module
"""

markdown_list=[]
for root, dirs, files in os.walk('.'):
for file in files:
if file.endswith(".md") and file != 'index.md':
markdown_list.append(os.path.join(root, file))
elif file == 'index.md':
index_page = os.path.join(root, file)

default_pandoc_cmd ="pandoc "

# creates string of file paths in the order we'd like them to be appear
# ordering is taken care of by the inherent file naming
files_string = index_page + " " +" ".join(sorted(markdown_list))

flags = " -f markdown_github --include-before-body cover.tex --toc -V documentclass=report --listings -H \
listings_setup.tex -H header.tex -V linkcolor:blue -V geometry:a4paper -V geometry:margin=2cm --pdf-engine=xelatex -o "
output_filename = filename

cmd = default_pandoc_cmd + files_string + flags + output_filename
subprocess.run(cmd.split())

if __name__ =="__main__":

build_pdf('bids-specs.pdf')
162 changes: 162 additions & 0 deletions pdf_build_src/process_markdowns.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
"""
The purpose of the script is to create a duplicate src directory within which all of
the markdown files are processed to match the specifications of building a pdf from multiple
markdown files using the pandoc library (***add link to pandoc library documentation***) with
pdf specific text rendering in mind as well.

"""

import os, sys
import argparse
import subprocess
import re
import fileinput
import io


def run_shell_cmd(command):
"""
runs shell/bash commands passed as a string using the subprocess module
"""
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE,stderr=subprocess.PIPE)
output = process.stdout.read()

return output.decode('utf-8')


def copy_src():
"""
duplicating src directory by copying contents of src to a new but temporary directory named 'src_copy'
"""

# source and target directories
src_path = "../src/"
target_path = "src_copy"

# make new directory
mkdir_cmd = "mkdir "+target_path
run_shell_cmd(mkdir_cmd)

# copy contents of src directory
copy_cmd = "cp -a "+src_path+" "+target_path
run_shell_cmd(copy_cmd)


def copy_bids_logo():
"""
copies BIDS_logo.jpg from the BIDS_logo directory in the root of the repo
"""
run_shell_cmd("cp ../BIDS_logo/BIDS_logo.jpg src_copy/images/")


def copy_images(root_path):
"""
copies images from images directory of subdirectories to images directory
in the src directory
"""
subdir_list = []

# walk through the src directory to find subdirectories named 'images'
# and copy contents to the 'images' directory in the duplicate src directory
for root, dirs, files in os.walk(root_path):
if 'images' in dirs:
subdir_list.append(root)

for each in subdir_list:
if each != root_path:
run_shell_cmd("cp -a "+each+"/images/"+" "+root_path+"/images/")


def extract_header_string():
'''
extracts the latest release's version number and date from changelog or CHANGES.md file
'''
released_versions = []

for i, line in enumerate(open('./src_copy/CHANGES.md')):

match_list = re.findall(r'^##\s\[v.+\]',line)

if len(match_list) > 0:
wordlist = line.split()
released_versions.append([match_list[0].split()[1], wordlist[2] ])

version_number = released_versions[0][0].strip('[]')
version_date = released_versions[0][1].strip('()')

return version_number, version_date


def add_header():
'''
adds the header string extracted from changelog to header.tex file
'''

version_number, version_date = extract_header_string()

# creating a header string with latest version number and date
header_string = "\chead{Brain Imaging Data Structure "+ version_number +" "+ version_date+"}"

with open('header.tex', 'r') as file:
data = file.readlines()

# now change the last but 2nd line, note that you have to add a newline
data[-2] = header_string+'\n'

# re-write header.tex file with new header string
with open('header.tex', 'w') as file:
file.writelines( data )


def remove_internal_links(root_path, link_type):
"""
finds all cross and same markdown internal links
and replaces it with plain text associated with the link
"""

if link_type == 'cross':
# regex that matches cross markdown links within a file
primary_pattern = re.compile(r'\[((?!http).[\w\s.\(\)`*/–]+)\]\(((?!http).+(\.md|\.yml|\.md#[\w\-\w]+))\)') # TODO: add more documentation explaining regex
elif link_type == 'same':
# regex that matches references sections within the same markdown
primary_pattern = re.compile(r'\[([\w\s.\(\)`*/–]+)\]\(([#\w\-._\w]+)\)')

for root, dirs, files in os.walk(root_path):
for file in files:
if file.endswith(".md"):
with open(os.path.join(root,file),'r') as markdown:
data = markdown.readlines()

for ind, line in enumerate(data):
match = primary_pattern.search(line)

if match:
line = re.sub(primary_pattern, match.group().split('](')[0][1:], line)

data[ind] = line

with open(os.path.join(root,file), 'w') as markdown:
markdown.writelines(data)


if __name__ == '__main__':

duplicated_src_dir_path = './src_copy'

# Step 1: make a copy of the src directory in the current directory
copy_src()

# Step 2: copy BIDS_logo to images directory of the src_copy directory
copy_bids_logo()

# Step 3: copy images from subdirectories of src_copy directory
copy_images(duplicated_src_dir_path)

# Step 4: extract the latest version number and date
extract_header_string()
add_header()

# Step 5: remove all internal links
remove_internal_links(duplicated_src_dir_path, 'cross')
remove_internal_links(duplicated_src_dir_path, 'same')

2 changes: 1 addition & 1 deletion src/03-modality-agnostic-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The file dataset_description.json is a JSON file describing the dataset. Every
dataset MUST include this file with the following fields:

| Field name | Definition |
| :----------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ------------------------------------------------------------------------------| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Name | REQUIRED. Name of the dataset. |
| BIDSVersion | REQUIRED. The version of the BIDS standard that was used. |
| License | RECOMMENDED. What license is this dataset distributed under? The use of license name abbreviations is suggested for specifying a license. A list of common licenses with suggested abbreviations can be found in Appendix II. |
Expand Down
Loading