Reducing complexity – from a time series to a single number: coding

TL;DR Using the linear model from Python’s scikit-learn package, I obtain the slopes in the EU industry production time series for each country. Long Description I prepare the normalized EU industry production index dataset for the fit routine of the scikit-learn linear model by forcing the time stamps into a 2D numpy array and the…

 Continue reading

Reducing complexity – from a time series to a single number: modeling

TL;DR I select a linear model with slope and intercept parameters to describe the growth dynamics of the EU industry production index of each country. Long Description Inspired by line plots of the EU industry production index time series that were previously normalized by the EU average time series, I choose to model the individual…

 Continue reading

Removing common trends from a set of time series to highlight their differences

TL;DR I divide the EU industry production index time series for each country by the smoothed EU average time series to bring out the countries’ individual development for further modeling. Long Description Using a chain of pandas methods to obtain a rolling-mean average, I smooth the EU average time series of the industry production index….

 Continue reading

Exploring the industry production history with EDA

TL;DR I use statistical and graphical tools to perform exploratory data analysis (EDA) on the EU industry production dataset as a starting point for modeling the time series. Long Description With the help of the pandas describe method and the matplotlib package I explore the statistics of the EU industry production dataset, more precisely the…

 Continue reading

Making the numbers shine: Cleaning EU industry production index values

TL;DR I make EU industry production index values, which I previously put in a tidy form, ready for analysis by splitting numbers and flag values with pandas methods. Long Description Now that the EU industry production dataset has a tidy dataframe structure, I clean up the production index values. The values are stored as strings…

 Continue reading

Bringing an EU industry production dataframe into good shape

TL;DR I use pandas dataframe methods to bring EU industry production data into a tidy format to facilitate further analysis. Long Description Here I use the Python packages SQLAlchemy and pandas to read in a subset of the EU industry production dataset from a local PostgreSQL database into a dataframe. I apply pandas methods like…

 Continue reading

Using SQL queries to extract data from a PostgreSQL database

TL;DR Making use of SQLAlchemy and SQL queries, I extract EU industry production data for further analysis from the PostgreSQL database where I previously stored it. Long Description Building on the previous projects, I use SQLAlchemy to connect to a local PostgreSQL database that contains as a table an EU industry production dataset from the…

 Continue reading

Storing a pandas dataframe in a PostgreSQL database

TL;DR Paragraph I store EU industry production data in a PostgreSQL database using the SQLAlchemy package. Long Description Building on the previous project, I download an EU industry production dataset from the EU Open Data Portal, put it in a pandas dataframe, and store it in a PostgreSQL database. Using such a data store can…

 Continue reading