Software-Stack

Hadoop

The Hadoop framework forms the basis of the system. Several components like distributed databases and file systems, redundant data storages and parallel processing mechanisms for intensive computations provide optimal grounds to carry out scalable machine learning projects successfully. The available hardware is exploited optimally by the framework in order to efficiently process large amounts of data.


AI-Platform / Toolstack

Besides different database concepts, like Hadoop HBase (NoSql) and Hadoop Hive (SQL), and the distributed file system Hadoop HDFS, the overall software stack consists of further tools that are suited for data science and machine learning. Since the applied software can be individually customized towards different project requirements, several opportunities and approaches exist to apply machine learning to different problems/projects. The following enumeration gives a brief overview of some tools that are commonly applied in our projects:

  • Programming: Python3
  • Virtualization: docker, anaconda, virtualenv
  • Notebook Platform: Jupyter Hub
  • Machine Learning: Tensorflow, Keras, Scikit-learn
  • Data Analysis: pandas, matplotlib, rapids
  • Smart Service Interface: django, flask, nodejs


GPU Nodes for Deep Learning

8 CUDA-enabled GPUs (NVIDIA Tesla-P100 with 12 GB) guarantee that the required performance is available in order to apply intense computations like learning with deep neural networks. The cluster is ready to process complex machine learning pipelines, with huge amounts of parameters involved, very efficiently.Available APIs:

  • NVIDIA CUDA
  • OpenCL
  • OpenACC