IPE 06-20 Internship or Master Thesis: Novel database technologies for the large archives with time series data
Karlsruhe Institute of Technology (KIT) – The Research University in the Helmholtz Association creates and imparts knowledge for the society and the environment. It is our goal to make significant contributions to mastering the global challenges of mankind in the fields of energy, mobility, and information. For this, about 9300 employees of KIT cooperate in a broad range of disciplines in research, academic education, and innovation.
Institute for Data Processing and Electronics (IPE)
Complex and distributed detector and control systems are required for modern scientific experiments. The instrumentation integrates custom and commercial components from various sources and generates ever-increasing amounts of data. A variety of different formats, underlying storage engines, and data workflows are used. Often proper manual data interpretation and quality assurance is difficult or even impossible due to the tremendously increase of both number and size of datasets. This raises the need for novel automatic or semi-automatic data analysis methods and tools. Information on
operation and scientific meaning needs to be extracted from the data stream and provided to the users in visual and easy to interpret form.
The work is embedded in a project that aims to develop a novel platform for handling data management tasks of mid-range scientific experiments. We plan to build tools to integrate the data recorded by different subsystems and made it available to users in uniform, comprehensible, and easy-to-use fashion. The thesis is focused on the data storage subsystem. Student is expected to review novel database technologies and provide detailed evaluation of several possible solutions which are optimized to store high volumes of time series data. The selected engine should be integrated with the existing data management system operating at Aragats Space Environmental Center.
The ideal database will:
- Reliably store the high-bandwidth streams of the data;
- Scale well in the cluster environment;
- Include intelligent caching mechanisms to speed-up the queries;
- Extract standard statistical information and provide programming interface to compute custom
- Support Geo-distributed operation modes;
- Integrate with data analysis tools like Apache Spark, etc.
Good background in systems engineering and cloud technologies. Good programming skills, preferably prior experience with Python. Very good understanding of the relational and NOSQL database technologies.
limited, according to the study regulations
Contact person in line-management
Suren Chilingaryan firstname.lastname@example.org IPE
Phone: +49 721 / 608 26579
Andreas Kopmann email@example.com IPE
Phone: +49 721 / 608 24910
If qualified, severely disabled persons will be preferred.
Please apply online using the button below for this vacancy number IPE 06-20.
Personnel Support is provided by
Telefon: +49 721 608-25184,
Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany