Software and Jupyter notebooks for the industry production analysis

TL;DR

I specify the software versions that I used for the analysis of the EU industry production dataset and provide links to the Github repository where the Jupyter notebooks are stored.

Table of contents

Project Background

For the analysis of the EU industry production dataset that is summarized here, I made use of various software packages: Anaconda with Python and Jupyter Notebook as well as PostgreSQL.

In the following, I briefly describe the software and specify which versions I used.

Python setup

Anaconda

Anaconda is a distribution and package manager for Python that can be used on different platforms. Here I use Anaconda 4.4.0, 5.0.0 and 5.0.1 on a machine with Windows 10 (Professional, Version 1703 64bit); see the individual project for information on which of the versions is used (I updated Anaconda continuously).

Python version

I use Python 3, more specifically Python kernels 3.6.1 through 3.6.3.

Custom Python packages

Usually the Python package versions are as specified by the respective Anaconda version.

There is one exception though: Due to a bug in pandas 0.20.3 when dealing with bar charts, I manually updated it to pandas 0.21.0 for the following project: Spotting trends in the manufacturing growth dynamics: Which region grew the fastest?.

That project as well as the ones that came later use pandas 0.21.0.

Jupyter Notebook

I coded the projects with Jupyter Notebook, a platform that allows the combination of source code, documentation text and figure output in the same document. The version is 5.0.0.

PostgreSQL

For the local storage of the EU industry production dataset, I used PostgreSQL, a dialect of the relational database Structured Query Language (SQL). The version is 9.6.5.

Code on Github

The Jupyter notebooks that contain the project code are available on Github.

Conclusion

I used Anaconda with Python 3 and Jupyter Notebook as well as PostgreSQL for the EU industry production analysis.

The software versions vary over the range of the projects. Since this complicates replication, I should try to stick to one particular version for each software package that I use in an end-to-end project. However, occasional updates of specific packages can be beneficial, as the example with the pandas bar chart bug demonstrated.

Bio

I am a data scientist with a background in solar physics, with a long experience of turning complex data into valuable insights. Originally coming from Matlab, I now use the Python stack to solve problems.

Contact details

Jan Langfellner
contact@jan-langfellner.de
linkedin.com/in/jan-langfellner/

PhD thesis

Impressum

One thought on “Software and Jupyter notebooks for the industry production analysis

Leave a Reply