Skip to content

DC Session 4 Python

Matteo Romanello edited this page Feb 11, 2020 · 24 revisions

Sunoikisis Digital Classics, Spring 2020

Session 4. Introduction to programming with Python

Thursday Feb 6, 16:00 UK = 17:00 CET

Convenors: Paula Granados García (Open University & The Watercolour World), Matteo Romanello (École polytechnique fédérale de Lausanne)

YouTube link: https://youtu.be/JDxRd-RYkXA

Presentation: run the Jupyter notebooks on binder: Binder

NB: for those who want to edit the notebooks on Binder (cloud-based platform) – for example to do the exercise – without losing their edits when the session expires (usually after 15 mins of inactivity), make sure to read this tutorial.

Session outline

This session will begin with a general discussion of programming for the humanities with an specific focus on how programming languages can be useful to humanists, followed by a general introduction to the Python programming language. We will then look at two key Python libraries (collections of code that enhance Python funtionality for specific purposes): Pandas (for structuring and analysing data), and Beautiful Soup (for parsing HTML and XML). These skills will then be illustrated with specific examples and exercises, all of which will be available for your use and adaptation in the Jupyter notebook linked from this session page.

In preparation for this session, please install or activate a version of Jupyter Notebooks on your computer or in the cloud (see below under "Exercise" for links).

Seminar readings

Further reading

Python Resources

Exercise

Exercise description

  • You are asked to write a simple python program by modifying the code we provided in notebook Pandas_BeautifulSoup.ipynb, section "XML data → DataFrame"; the current code looks for <name> element and creates a DataFrame out of it. For the exercise you are asked to do something similar, but for a different set of TEI/EpiDoc elements of your choice.
  • These are the steps to follow:
    1. to identify one or more TEI elements of interest (can be lemmata, variants, bibliographic elements, metadata, etc.);
    2. to specify what information you to retain from them, and extract it from the XML (via BeautifulSoup) by modifying the code provided;
    3. convert it to a pandas.DataFrame and explore some statistics (for example by using value_counts()).