Lemonade (Live Exploration and Mining Of Non-trivial Amount of Data from Everywhere) is a visual platform for distributed computing, aimed to enable implementation, experimentation, test and deploying of data processing and machine learning applications. It requires an underlying infrastructure based on a Mesos cluster combined with Spark to perform data analytics. The configuration of the environment is not a trivial task and requires low-level knowledge of the underlying technologies involved (such as the federated cloud middleware). For that, we use the Elastic Cloud Computing Cluster (EC3) tool to easily deploy, configure and manage the required resources to have a fully operational cluster with Lemonade deployed on top of it. EC3 offers a user-friendly interface specifically designed for the ATMOSPHERE project (http://servproject.i3m.upv.es/ec3-atmosphere/) that guides the user upon the process of the deployment of the infrastructure. Finally, the demo shows the usage of the Lemonade visual interface by the creation of a simple workflow and its execution in the cluster. All these experiments are performed on top of a federated infrastructure managed by Fogbow.
The objective of the demo was to show how easily the deployment of a self-managed elastic mesos cluster with a visual platform for data analytics (Lemonade) can be performed with the ATMOSPHERE EC3’s dashboard on top of a federated cloud. Any user provided of valid infrastructure credentials will be able to reproduce and tune up the experiment.
The challenge is to provide a fully operational infrastructure configured automatically in terms of the user requirements, with elastic capabilities on top of federated resources distributed across the world.
Data scientists can easily access to Lemonade and use distributed resources to execute their workflows without worrying about the infrastructure level.
Data scientists had to manually deploy the resources and configure them to work as a Mesos cluster, configuring other dependencies such as Docker and Spark. Also, they had to deploy manually Lemonade and separately deal with the processing of the data and the management of resources (elasticity was not provided).
Data scientists directly access to Lemonade, not worrying about the underlying infrastructure (management and configuration). Application managers can easily deploy the required infrastructure by data scientists and, as the clusters are self-managed, they do not need to provide any further support.
Application developer
Easy environment for distributed computing and accessing resources across sites.
Data scientist
Process large amounts of data, hiding all backend complexity from the users and allowing them to focus mainly in the construction of the solution.
Application manager
Platform-agnosticism of the solution and self-management of the resources
System administrator
No additional burden
Data owner
No additional burden
More info soon
More info soon