bids-standard · Arshitha · Nov 22, 2019 · Nov 22, 2019 · Jan 16, 2020 · Jan 17, 2020
diff --git a/bids-specs.pdf b/bids-specs.pdf
diff --git a/pdf_build_src/README.md b/pdf_build_src/README.md
@@ -0,0 +1,41 @@
+# pdf-version of BIDS specification 
+
+The `pdf_build_src` directory contains the scripts and tex files required to build a pdf document of the BIDS specification from multiple markdown files using the pandoc library. 
+
+Pandoc is command line tool which is also a Haskell library that converts files from one markup format to another. More here: https://pandoc.org/index.html
+
+## Requirements
+
+For the pdf build to be successful, the following need to be installed: 
+
+- Python 3.x 
+- pandoc 
+- Latest version of LaTeX: By default, Pandoc creates PDFs using LaTeX. Because a full MacTeX installation uses four gigabytes of disk space, pandoc recommends BasicTeX or TinyTeX and using the tlmgr tool to install additional packages as needed. 
+
+Installation instructions for both pandoc and LaTeX: https://pandoc.org/installing.html
+
+## Building pdf document
+
+Run the `build_pdf.sh` from the `pdf_build_src` with the command `sh build_pdf.sh` from the command line terminal 
+
+List of warnings are for missing characters like emojis while converting from markdown to pdf. Except for losing those characters in the final document, it doesn't affect the formatting or contents and therefore, can be ignored.
+
+## Technical Overview
+
+Pandoc comes with a plethora of options to format the resulting document. For building a pdf from multiple markdowns, a consolidated intermediate tex file is first built, which is then converted to a pdf document. To achieve the desired formatting in the final pdf, additional tex files are used with options offered by pandoc. 
+
+### Formatting files
+
+`listings_setup.tex` -  Listings is a LaTeX package used for typestting programming code in TeX. This file sets up the listings package to suit our needs and is used with the `--listings` option. 
+
+`cover.tex` - BIDS Logo is used as a cover page for the document. `cover.tex` is used with the option `--include-before-body`
+
+`header.tex` - Header tex file that's updated with the latest version number and date when `build_pdf.sh` is run. Used with the `-H` header option. 
+
+### Scripts
+
+`process_markdowns.py` - Script that processes markdown files in the `src` directory that are duplicated and modified for the needs of the pdf. 
+
+`pandoc_script.py` - Prepares and runs the final pandoc command through the `build_pdf.sh` script
+
+`build_pdf.sh` - Shell script that organizes the directory structure and runs the above two python scripts
diff --git a/pdf_build_src/bids-specs.pdf b/pdf_build_src/bids-specs.pdf
diff --git a/pdf_build_src/build_pdf.sh b/pdf_build_src/build_pdf.sh
@@ -0,0 +1,19 @@
+# Shell script that runs process_markdowns.py and pandoc_script.py in sequence to build the pdf document
+
+# prepare the copied src directory 
+python process_markdowns.py
+
+# copy pandoc_script into the temp src_copy directory 
+cp pandoc_script.py header.tex cover.tex listings_setup.tex src_copy
+
+# run pandoc_script from src_copy directory 
+cd src_copy
+python pandoc_script.py
+mv bids-specs.pdf ..
+cd ..
+
+# delete the duplicated src directory
+rm -rf src_copy
+
+# open bids-specs.pdf
+open bids-specs.pdf
diff --git a/pdf_build_src/cover.tex b/pdf_build_src/cover.tex
@@ -0,0 +1,3 @@
+% adds the bids logo as the cover page of the pdf
+\includegraphics{images/BIDS_logo.jpg}
+\thispagestyle{empty}
diff --git a/pdf_build_src/header.tex b/pdf_build_src/header.tex
@@ -0,0 +1,6 @@
+% header file
+\usepackage{fancyhdr}
+\pagestyle{fancy}
+\fancyhf{}
+\chead{Brain Imaging Data Structure v1.2.1 2019-08-14}
+\fancyfoot[LE,RO]{\thepage}
diff --git a/pdf_build_src/listings_setup.tex b/pdf_build_src/listings_setup.tex
@@ -0,0 +1,25 @@
+% Contents of listings-setup.tex
+\usepackage{xcolor}
+\usepackage{graphicx}
+
+\lstset{
+    basicstyle=\ttfamily,
+    numbers=left,
+    keywordstyle=\color[rgb]{0.13,0.29,0.53}\bfseries,
+    stringstyle=\color[rgb]{0.31,0.60,0.02},
+    commentstyle=\color[rgb]{0.56,0.35,0.01}\itshape,
+    numberstyle=\footnotesize,
+    stepnumber=1,
+    numbersep=5pt,
+    backgroundcolor=\color[RGB]{248,248,248},
+    showspaces=false,
+    showstringspaces=false,
+    showtabs=false,
+    tabsize=2,
+    captionpos=b,
+    breaklines=true,
+    breakautoindent=true,
+    escapeinside={\%*}{*)},
+    linewidth=\textwidth,
+    basewidth=0.5em
+}
diff --git a/pdf_build_src/pandoc_script.py b/pdf_build_src/pandoc_script.py
@@ -0,0 +1,37 @@
+"""
+Once the duplicate src directory is processed, the pandoc library is used as a final 
+step to build the pdf.
+"""
+import os, sys
+import argparse
+import subprocess
+
+def build_pdf(filename):
+	"""
+	constructs the command with the required pandoc flags and runs it using subprocess module
+	"""
+
+	markdown_list=[]
+	for root, dirs, files in os.walk('.'):
+		for file in files:
+			if file.endswith(".md") and file != 'index.md':
+				markdown_list.append(os.path.join(root, file))
+			elif file == 'index.md': 
+				index_page = os.path.join(root, file)
+
+	default_pandoc_cmd ="pandoc "
+
+	# creates string of file paths in the order we'd like them to be appear
+	# ordering is taken care of by the inherent file naming
+	files_string = index_page + " " +" ".join(sorted(markdown_list)) 
+
+	flags = " -f markdown_github --include-before-body cover.tex --toc -V documentclass=report --listings -H \
+			listings_setup.tex -H header.tex -V linkcolor:blue -V geometry:a4paper -V geometry:margin=2cm --pdf-engine=xelatex -o " 
+	output_filename = filename
+
+	cmd = default_pandoc_cmd + files_string + flags + output_filename
+	subprocess.run(cmd.split())
+
+if __name__ =="__main__":
+
+	build_pdf('bids-specs.pdf')
diff --git a/pdf_build_src/process_markdowns.py b/pdf_build_src/process_markdowns.py
@@ -0,0 +1,162 @@
+"""
+The purpose of the script is to create a duplicate src directory within which all of 
+the markdown files are processed to match the specifications of building a pdf from multiple
+markdown files using the pandoc library (***add link to pandoc library documentation***) with
+pdf specific text rendering in mind as well. 
+
+"""
+
+import os, sys
+import argparse
+import subprocess
+import re
+import fileinput
+import io
+
+
+def run_shell_cmd(command): 
+	"""
+	runs shell/bash commands passed as a string using the subprocess module 
+	"""
+	process = subprocess.Popen(command.split(), stdout=subprocess.PIPE,stderr=subprocess.PIPE)
+	output = process.stdout.read()
+
+	return output.decode('utf-8')
+
+
+def copy_src():
+	"""
+	duplicating src directory by copying contents of src to a new but temporary directory named 'src_copy'
+	"""
+
+	# source and target directories
+	src_path = "../src/"
+	target_path = "src_copy"
+
+	# make new directory 
+	mkdir_cmd = "mkdir "+target_path
+	run_shell_cmd(mkdir_cmd)
+
+	# copy contents of src directory 
+	copy_cmd = "cp -a "+src_path+" "+target_path
+	run_shell_cmd(copy_cmd)
+
+
+def copy_bids_logo():
+	"""
+	copies BIDS_logo.jpg from the BIDS_logo directory in the root of the repo
+	"""
+	run_shell_cmd("cp ../BIDS_logo/BIDS_logo.jpg src_copy/images/") 
+
+
+def copy_images(root_path): 
+	"""
+	copies images from images directory of subdirectories to images directory 
+	in the src directory
+	"""
+	subdir_list = []
+
+	# walk through the src directory to find subdirectories named 'images'
+	# and copy contents to the 'images' directory in the duplicate src directory 
+	for root, dirs, files in os.walk(root_path):
+		if 'images' in dirs: 
+			subdir_list.append(root)
+
+	for each in subdir_list:
+		if each != root_path: 
+			run_shell_cmd("cp -a "+each+"/images/"+" "+root_path+"/images/")
+
+
+def extract_header_string():
+	'''
+	extracts the latest release's version number and date from changelog or CHANGES.md file
+	'''
+	released_versions = []
+
+	for i, line in enumerate(open('./src_copy/CHANGES.md')):
+
+		match_list = re.findall(r'^##\s\[v.+\]',line)
+
+		if len(match_list) > 0: 
+			wordlist = line.split()
+			released_versions.append([match_list[0].split()[1], wordlist[2] ])
+
+	version_number = released_versions[0][0].strip('[]')
+	version_date = released_versions[0][1].strip('()')
+
+	return version_number, version_date
+
+
+def add_header(): 
+	'''
+	adds the header string extracted from changelog to header.tex file 
+	'''
+
+	version_number, version_date = extract_header_string() 
+
+	# creating a header string with latest version number and date
+	header_string = "\chead{Brain Imaging Data Structure "+ version_number +" "+ version_date+"}"
+
+	with open('header.tex', 'r') as file:
+		data = file.readlines()
+
+	# now change the last but 2nd line, note that you have to add a newline
+	data[-2] = header_string+'\n'
+
+	# re-write header.tex file with new header string
+	with open('header.tex', 'w') as file:
+		file.writelines( data )
+
+
+def remove_internal_links(root_path, link_type):
+	"""
+	finds all cross and same markdown internal links
+	and replaces it with plain text associated with the link
+	"""
+
+	if link_type == 'cross':
+		# regex that matches cross markdown links within a file
+		primary_pattern = re.compile(r'\[((?!http).[\w\s.\(\)`*/–]+)\]\(((?!http).+(\.md|\.yml|\.md#[\w\-\w]+))\)') # TODO: add more documentation explaining regex 
+	elif link_type == 'same':
+		# regex that matches references sections within the same markdown
+		primary_pattern = re.compile(r'\[([\w\s.\(\)`*/–]+)\]\(([#\w\-._\w]+)\)') 
+
+	for root, dirs, files in os.walk(root_path):
+		for file in files:
+			if file.endswith(".md"):
+				with open(os.path.join(root,file),'r') as markdown: 
+					data = markdown.readlines()
+
+				for ind, line in enumerate(data):
+					match = primary_pattern.search(line)
+
+					if match: 
+						line = re.sub(primary_pattern, match.group().split('](')[0][1:], line)
+
+					data[ind] = line
+
+				with open(os.path.join(root,file), 'w') as markdown: 
+					markdown.writelines(data)
+
+
+if __name__ == '__main__':
+
+	duplicated_src_dir_path = './src_copy'
+
+	# Step 1: make a copy of the src directory in the current directory 
+	copy_src()
+
+	# Step 2: copy BIDS_logo to images directory of the src_copy directory
+	copy_bids_logo()
+
+	# Step 3: copy images from subdirectories of src_copy directory 
+	copy_images(duplicated_src_dir_path)
+
+	# Step 4: extract the latest version number and date 
+	extract_header_string()
+	add_header()
+
+	# Step 5: remove all internal links 
+	remove_internal_links(duplicated_src_dir_path, 'cross')
+	remove_internal_links(duplicated_src_dir_path, 'same')
+
diff --git a/src/03-modality-agnostic-files.md b/src/03-modality-agnostic-files.md
@@ -10,7 +10,7 @@ The file dataset_description.json is a JSON file describing the dataset. Every
 dataset MUST include this file with the following fields:
 
 | Field name         | Definition                                                                                                                                                                                                                           |
-| :----------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| ------------------------------------------------------------------------------| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | Name               | REQUIRED. Name of the dataset.                                                                                                                                                                                                       |
 | BIDSVersion        | REQUIRED. The version of the BIDS standard that was used.                                                                                                                                                                            |
 | License            | RECOMMENDED. What license is this dataset distributed under? The use of license name abbreviations is suggested for specifying a license. A list of common licenses with suggested abbreviations can be found in Appendix II.        |