IPE 12-20 Internship or Master Thesis: Performance Evaluation of Distributed Time Series Databases for Industrial Control Systems
Karlsruhe Institute of Technology (KIT) – The Research University in the Helmholtz Association creates and imparts knowledge for the society and the environment. It is our goal to make significant contributions to mastering the global challenges of mankind in the fields of energy, mobility, and information. For this, about 9300 employees of KIT cooperate in a broad range of disciplines in research, academic education, and innovation.
Institute for Data Processing and Electronics (IPE)
Slow control systems of large scientific experiments include multiple thousands of sensors monitoring the operation of used instrumentation and properties of an ongoing experiment. This information is crucial for the understanding of measured data and it should be preserved for a long time after the active experimentation is finished. Recording and organization of slow control data present a few challenges due to increasing sampling rates and the sheer amount of stored data during the experiment's lifetime. A robust distributed time-series database is required to ensure uninterrupted recording of data and to provide a fast interface to the stored historical data.
We aim to evaluate and compare several time-series databases as candidates to archive the data produced by the control system of the KATRIN (KArlsruhe TRItium Neutrino) project. The student is expected to review time-series databases (e.g. InfluxDB, VictoriaMetrics, and Timescale), then provide a detailed evaluation report of several possible solutions that are suited to store high volumes of time series data. The selected engine is to be integrated with the existing data exploration and archival systems operating at IPE.
The ideal database will:
- Reliably store the high-bandwidth streams of the data;
- Scale well in the cluster environment;
- Include intelligent caching mechanisms to speed-up the queries;
- Extract standard statistical information and provide a programming
- interface to compute custom properties;
- Support Geo-distributed operation modes;
- Integrate with data analysis tools like Apache Spark, etc.
as soon as possible
Very good understanding of the relational and NoSQL database technologies. Good programming skills and preferably prior experience in Java and Python. Familiar with statistical methodologies and environments like Jupyter Notebooks, Matlab, R language, etc.
limited, according to the study regulations
Contact person in line-management
Suren Chilingaryan firstname.lastname@example.org IPE
Phone: +49 721 / 608 26579
Andreas Kopmann email@example.com IPE
Phone: +49 721 / 608 24910
If qualified, severely disabled persons will be preferred.
Please apply online using the button below for this vacancy number IPE 12-20.
Personnel Support is provided by
Telefon: +49 721 608-25184,
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany