Research Projects

Clúster Hadoop para la gestión integral de datos científicos masivos

Cluster Hadoop

Clúster Hadoop para la gestión integral de datos científicos masivos

The objective of this action is to implement a Hadoop cluster in the Port Informació  Cientifica (PIC) with which researchers can generate, store and analyze large datasets, as well as share and distribute them with the entire scientific community.

The cluster, which will be located in the PIC’s common Big Data service, increases the capacity of the existing platform and will make it possible to comprehensively manage a data workflow within the same service. Based on the HDFS distributed file system and its entire technological ecosystem, it will facilitate the interaction and efficient handling of large volumes of data.

A fundamental part of the success of any scientific project is measured by the impact of its results on the scientific community.

To this end, this cluster will be connected at 200 Gbps with external data networks and will facilitate, through different tools, the creation, analysis, exploration, visualization and distribution of data, to promote its use, taking into account the principles of open science.

For this objective, it will be necessary to deploy equipment with the following specifications: a cluster with a minimum of 1000 cores, with 10-40 GiB RAM per core, a minimum of 2 PB of net storage capacity, with each node connected at 10-25 Gbps and 4 management nodes in high availability, in addition to several switches for the network and the corresponding cabling.

This project is funded by the Agencia Estatal de investigación of the Ministerio de Ciencia e Innovación (EQC2021-007479-P/ AEI / 10.13039/501100011033) the NextGenerationEU and the Plan de Recuperación, Transformación y Resiliencia.