Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated pdf version specs #400

Closed
wants to merge 49 commits into from
Closed

Updated pdf version specs #400

wants to merge 49 commits into from

Conversation

Arshitha
Copy link
Contributor

Bugs Fixed:

  • Stable Table of Contents page
  • Cover page
  • Non-tabular text overflow
  • Overlapping columns of tables (owing to small widths of the pipe tables in the markdown files)
    • Markdown files were edited (one-time edit) to increase the column widths of pipe tables
    • Not sure how this would affect the read the docs version of specs
  • Automating header to include the latest release version number and date

Issues that need to be fixed:

  • Internal links referencing either the same markdown file or other markdown files break during conversion from markdown to pdf. Currently, all of the internal links are removed and replaced by the accompanying text.
  • Superscripts are lost in conversion as well
  • Tried a couple different family of fonts to fix the missing character warnings for emojis used in the markdowns with no success. It would be ideal to fix this to match the Read The Docs version more closely
  • Entity table in Appendix IV has overlapping unreadable columns since it's markdown equivalent has more than 10 columns. Refer to issue for more discussion

updated pandon-script.sh adds a

-  a cover page. cover.tex file uses the bids logo

- a header with BIDS version number, however, this isn't automated. The header.tex file needs to be updated
for new version release of BIDS specs.

- font change is possible. However, the default LaTex font seemed better to me but can be changed with -V mainfont="DejaVu Serif" or any other
font.

- spacing after 'Quantitative T1rho brain imaging' and before URL on page 21 was modified in the corresponding .md file in the src directory.
merging changes from base repo
- automated header extraction from changelog
- removing all internal links from pdf and replacing them with associated plain text
- modular script but needs to be cleaned up
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

Some comments:

  • there seem to be two bids-specs.pdf files:
    • ./bids-specs.pdf
    • ./pdf_build_src/bids-specs.pdf
  • we usually talk about "the specification", so I would prefer bids-spec.pdf
  • I think the rendered PDF of the spec should NOT be hosted on the GitHub repo ... rather: the pdf should be built concurrently with the READTHEDOCS build of the website, and then also hosted as a build artifact on readthedocs.

src/CHANGES.md Outdated
@@ -1,4 +1,4 @@
Changelog
# Changelog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interestingly, it seems like the rendered changelog page gets a "changelog" heading inserted automatically --> https://bids-specification.readthedocs.io/en/stable/CHANGES.html

previously we then had something like


Changelog

Changelog


is it possible that this change will turn this into


Changelog

Changelog


?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sappelhoff Thanks for the feedback!

  • I've reverted the change I'd made. So I believe that should have fixed it. I needed "Changelog" to be Heading 1 so that pandoc recognizes it to be a new section and doesn't concatenate with the previous section. However, I found a hack around it that didn't require the 'src/CHANGES.md' to be modified.

  • Most recent commit takes care of naming change and doesn't duplicate the pdf. I'm looking into how to make it part of the build and render it as part of the RTD

Copy link
Member

@sappelhoff sappelhoff Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted the change I'd made. So I believe that should have fixed it. I needed "Changelog" to be Heading 1 so that pandoc recognizes it to be a new section and doesn't concatenate with the previous section. However, I found a hack around it that didn't require the 'src/CHANGES.md' to be modified.

actually ... perhaps it was good that you turned that "changelog" into a heading. Perhaps that would make it findable on our website: https://bids-specification.readthedocs.io/en/stable/CHANGES.html

when typing "changelog" into the search function, I don't get this site ... although the search function usually works quite well. 🤔

@franklin-feingold would it be a problem with our auto-changelog generator to change the first line in https://github.com/bids-standard/bids-specification/blob/master/src/CHANGES.md to be a markdown heading?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we change it the changelog generator regenerates (and commits) for each merged PR so that won't solve it - there may be something within the changelog generator that can be changed but nothing apparent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do I leave it as it is right now? or make the "changelog" a heading?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll see if #417 does the job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

Copy link
Contributor Author

@Arshitha Arshitha Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sappelhoff I spent some time looking up how this can be done

the pdf should be built concurrently with the READTHEDOCS build of the website, and then also hosted as a build artifact on readthedocs.

According to RTD's build process using mkdocs, simultaneous pdf generation isn't possible. https://docs.readthedocs.io/en/stable/builds.html
Perhaps, pdf can be created and then hosted on the built RTD version? However, I'm unsure how to fix this. Would appreciate any ideas/suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with @franklin-feingold and @rwblair yesterday and we agreed that the most pragmatic way to go forward would be to split our PDF-keeping in two parts:

  1. on each merged pull request to bids-specification, circle CI should build a pdf ... including the current version and date ... this pdf is then hosted in the "artifacts" tab of the circleCI build. --> we can generate a link that will always resolve to the most recently build pdf artifact on circleci. See this link as an example from the mne-bids repo: https://circleci.com/api/v1.1/project/github/mne-tools/mne-bids/latest/artifacts/0/html/index.html?branch=master ... it will always resolve to the latest index.html file built by circleci

  2. whenever we do a Release of the bids specification, a maintainer builds the PDF and goes through the steps to host it permanently somewhere ... as currently discussed in [DO NOT MERGE] Track adding of old PDF versions to Zenodo #407

so for now, we simply need to figure out how to build the pdf using your script ... in circleci. Which is mainly a question of installing all needed packages there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may come in very handy for getting latex to run on circleci: https://github.com/dante-ev/docker-texlive

@nicholst
Copy link
Collaborator

Lovely!

The PR comment suggests that no internal links work... that's not strictly true: The Table of Contents links works, which is great.

Line wrapping of code is working great now.

Very long words remain a challenge, e.g. on pg 39 see DigitizedHeadPointsCoordinateSystemDescription and the period that falls on the next line, but, really, if that's the worst of it I'm very happy.

Footnotes render particularly oddly... I must admit I didn't realise we allow footnotes... in a HTML-centric view of the world, footnotes are a problem, and parenthesis would seem to be preferred, but at least here in a paginated PDF-world you'd hope they would appear as footnotes. (I guess I should start a new PR suggesting we ban footnotes, but for now...) ANYWAY, ideally footnotes would appear as footers, but it's not really a big deal... but they should render like they do in HTML -- i.e. as superscripted.

@nicholst
Copy link
Collaborator

nicholst commented Feb 6, 2020

Sorry, I forgot to add one more comment/enhancement: The title page should have more than the BIDS logo... it should have the title and the version.

As inspiration, consider this IEEE Standard... it has the name of the document and the authors and the version/revision number.

@Arshitha
Copy link
Contributor Author

Arshitha commented Feb 7, 2020

@nicholst Thanks for both the comments. I'll update the PR soon.

@Arshitha
Copy link
Contributor Author

Arshitha commented Feb 7, 2020

sorry, I'm traveling internationally over the last couple of weeks with spotty internet, so couldn't respond to the comments earlier.

@franklin-feingold
Copy link
Collaborator

Thank you @Arshitha for your time and effort putting this together - looks great!!

+100 to both @sappelhoff and @nicholst comments for reaching the viable release, perhaps iterating after these comments are addressed? then can start a new issue to track further enhancements to the pdf rendering

@Arshitha
Copy link
Contributor Author

@nicholst I've added title and version details (pulling from changelog and so, automated) to the cover page. However, it's not as aesthetically pleasing as the IEEE Standard you'd referenced. Perhaps, this could be something that can be improved as part of a later issue?

steps:
- checkout:
path: ~/bids-specification
- run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may be creating yourself some problems with the paths you are specifying in working_directory and checkout.

I would leave the working_directory line alone (i.e., remove line 33 here) ... then we will have the default working directory of ~/project (see https://discuss.circleci.com/t/working-directory-is-no-longer-required-defaults-to-project/14363/2)

Then I would also remove line 38, corresponding to the path parameter of the checkout setting ... that will mean that the bids-specification will be cloned (checked out) to the default directory, which is the working directory, which will be ~/project

Then you should take a look at your python scripts and how they handle paths. All the while assuming that you are working from a directory ~/project/

@sappelhoff
Copy link
Member

Cool, you got CircleCI to work :-) now there seems to be a path issue that prevents the build from actually producing the PDF:

README.md      build_pdf.sh  header.tex		 pandoc_script.py      src_copy
bids-spec.pdf  cover.tex     listings_setup.tex  process_markdowns.py
.  ..  images  src
Traceback (most recent call last):
  File "process_markdowns.py", line 185, in <module>
    extract_header_string()
  File "process_markdowns.py", line 67, in extract_header_string
    for i, line in enumerate(open(os.path.join(os.path.dirname(__file__), 'src_copy/CHANGES.md'))):
FileNotFoundError: [Errno 2] No such file or directory: 'src_copy/CHANGES.md'
[WARNING] Deprecated: markdown_github. Use gfm instead.
[WARNING] Could not fetch resource 'images/sub-0001_ses-001_acq-NAS_photo.jpg': replacing image with description
[WARNING] Could not fetch resource 'images/sub-0001_ses-001_acq-NAS_photo.jpg': replacing image with description
[WARNING] Could not fetch resource 'images/ieeg_electrodes1.png': replacing image with description
[WARNING] Could not fetch resource 'images/ieeg_electrodes2.png': replacing image with description
Error producing PDF.
! Unable to load picture or PDF file 'images/BIDS_logo.jpg'.
<to be read again> 
                   }
l.114 ...idth=0.6\textwidth]{images/BIDS_logo.jpg}

mv: cannot stat 'bids-spec.pdf': No such file or directory

I tried to help here: https://github.com/bids-standard/bids-specification/pull/400/files#r382491858

Let us know if you cannot solve this issue, so someone else can try to take over.

@Arshitha
Copy link
Contributor Author

@sappelhoff

I've addressed

the pdf_build_src/bids-spec.pdf PDF file is still part of this PR --> the file has to be removed before we can merge, because we do not want to git track each brand new PDF. Instead, we want CircleCI to build and host these PDFs, see: #400 (comment)

The circleci build was successful and the generated pdf is hosted under the artifacts tab. Also, made sure that the pdf version is removed from the repo post build. NOTE: as discussed here: #400 (comment) the date in the header string is basically the build date.

I took the liberty to reformat your pdf_build_src/pandoc_script.py according to pep8 guidelines. As you can see e.g. with this online-pep8-checker, there were several formatting issues. Please add and commit this reformatted version (and also test it ... but it should work like before, I did not change its functionality, only the formatting)

After making final changes to the code, I ran it through the online checker. There are a few trailing spaces warnings but that's about it.

However, while trying to get circleci to work, I essentially committed each time I wanted to debug. That has certainly made the commit history messy. Would it be better to create a new PR from a new branch with a cleaner commit history? and referencing this PR for the discussion value?

path: bids-spec.pdf
- run:
name: remove pdf version from repo
command: rm bids-spec.pdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the CircleCI build (e.g., https://circleci.com/gh/bids-standard/bids-specification/1880?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) I see that the pdf gets built and the artifact is uploaded ...

However, I do not see the "artifact tab" and thus cannot inspect the PDF.

I wonder whether that's because of this last part:

 - run:
          name: remove pdf version from repo
          command: rm bids-spec.pdf

Could you simply remove that? The pdf should not be committed to the repository anyhow, so the removal step should be unnecessary (and might be detrimental with regards to the artifact)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I just had to re-log into the CircleCI interface ... I do see the artifact now!

https://1880-150465237-gh.circle-artifacts.com/0/bids-spec.pdf

awesome job! :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Do you still think it's necessary to remove the quoted lines of code in the config file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't matter. But since everything is working now, let's just leave it as is for now.

@sappelhoff
Copy link
Member

That has certainly made the commit history messy. Would it be better to create a new PR from a new branch with a cleaner commit history? and referencing this PR for the discussion value?

Unless you know how to rebase and squash using git, it would indeed be best if you could make a new PR branched off the most recent master and by cherry picking the "good" commits of this PR.

of course, IF you know how to rebase this PR and squash the circle-ci-iteration-commits into a single commit. then you can just do it with this PR and we'll merge it.

@Arshitha
Copy link
Contributor Author

Sounds good! Will create a new PR with cleaner commit history (or try rebasing as well)

@effigies
Copy link
Collaborator

Hi just finally getting around to having a look at this, and it looks great!

Some notes:

  1. Page 9 table. Is it possible to force the heading row to be a single line, even if that forces the table to be taller and descriptions overflow their box?
  2. Page 18 table. Can we add a little bit of spacing between table rows? Or light (e.g., #cccccc) borders? The descriptions slam together in a way that isn't as readable as I could wish. This is true of most tables, but this is where it started to hit home for me.
  3. Page 20 footnotes are not super-scripted or properly placed at the footer of the page.
  4. Page 22 pre-formatted text ("Template"). Is it possible to adjust the rules for line breaks to prefer to go before/after underscores, after close square brackets or before open square brackets? That would be more natural.
  5. Page 23 table. The "OPTIONAL" highlights some weirdness to the font. Particularly, T is out of alignment with the other caps. Any chance of changing fonts?
  6. Page 26 table. This just looks bad. We should probably fix it in the main source, but just want to note how ugly it is in PDF.
  7. Page 75. Hoo boy.
  8. pp 76-77. Exponents are not super-scripted.
  9. Page 82 table. The last entry is hard to read, due to the line break making YYYY/MM appear to be a new entry.

4 is likely a tar-pit, so it should not in any way hold things up. 7 you mention up top, so I will defer to whatever existing consensus there is.

@Arshitha
Copy link
Contributor Author

@sappelhoff As discussed, here is a new PR with cleaner commit history: #427

@Arshitha
Copy link
Contributor Author

Arshitha commented Feb 25, 2020

@effigies Thanks for the feedback!

Page 18 table. Can we add a little bit of spacing between table rows? Or light (e.g., #cccccc) borders? The descriptions slam together in a way that isn't as readable as I could wish. This is true of most tables, but this is where it started to hit home for me.

Page 26 table. This just looks bad. We should probably fix it in the main source, but just want to note how ugly it is in PDF.

The issues with tables (2 & 6) were caused by running remark on one of the markdown files flagged by Travis CI but I've fixed them and the more recent pdf can found here: https://1930-150465237-gh.circle-artifacts.com/0/bids-spec.pdf as part of build from the newly (but almost identical) created PR here: #427 If I've missed something, I can work on it.

Except for 1, 4 and 9, I tried a few things to fix issues with superscripts and fonts but without much success. The consensus was to tackle them as part of a new enhancement issue in the future and I believe it'd be more efficient to deal with the others similarly as well.

@sappelhoff
Copy link
Member

I agree with @Arshitha to tackle further issues in single iterative PRs and try to get this big one in first.

please continue discussion in #427

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants