|
Indiana University - Research Experience for Undergraduates - May 30- July 25, 2014 |
|
|
Tatyana Matthews
SO - CS - ECSU
[email protected]
|
Title: Apache Big Data Stack
Mentor: Scott McCaulay
Abstract
The Apache Big Data Stack is representative of a large spectrum of numerous open-source software programs provided through Apache projects. The term Apache refers to the Apache Software Foundation (ASF), which provides support for open-source software projects and attracts a large community of users. This mass group of users is what ultimately generates enormous amounts of data ---Big Data. Accordingly, this enormous digital volume consists of data that cannot be captured and organized by traditional tools, thus presenting an obstacle. It is vital to utilize computing power and storage in order to organize such data; however, this presents economical concerns due to costly affects.
The Apache Big Data Stack and the product Chef will be investigated and applied to resolve such an issue. Research will involve installing and testing as many open-source software packages as possible on FutureGrid machines and later making them accessible utilizing Chef. In order to accomplish this, software packages deriving from the Apache Big Data Stack spectrum will be installed on to a virtual machine to create application packages. These packages will be built into projects and Chef will be used to transform the infrastructure of a project's code and bring it to life so that it can be made accessible through a network of servers. Finally, research will demonstrate how use of the Apache Big Data Stack and Chef can be applied to contribute to the evolution of innovation in the Big Data field.
Keywords: Apache Big Data Stack, Chef, FutureGrid, Big Data |
|
|
|
Nigel Pugh
SO - CS - ECSU
[email protected]
|
|
Tori Wilbon
SO - CS - ECSU
[email protected]
|
Title: Evaluating the Performance of MPI Java in FutureGRID
Mentor: Saliya Ekanayake
Abstract:
Message Passage Interface (MPI) has been the common choice among developers when developing tightly coupled parallel High Performance Computing (HPC) applications and the majority of such applications are based on either C, C++ or Fortran. The recent advancement in processing big data, however, has brought attention towards Java. Effort has also been put on Java's support for HPC with flavors of MPI such as OpenMPI Java and FastMPJ. We evaluate these against native C based MPI on a set of micro-benchmarks from the standard Ohio MicroBenchmark suite from Ohio State University. The results show a promising future with Java and MPI for HPC applications
Keywords: MPI, HPC, OpenMPI, FastMPJ, benchmark |
|
|
|
Kaliq Satchell
SO - CS - ECSU
[email protected]
|
Title: PlotViz: The next visualization tool in bioinformatics
Mentors: Geoffrey Fox, Yang Ruan, Saliya Ekanayake
Abstract:
The purpose of this project is to add parallelization support to the code for multithreading PlotViz3. The code in the software uses the C++ programming language which is what I shall be using to make improvements. In the end, adding this support will speed up the virtualization process in the software and make it less time consuming when looking for results quickly and effeciently. In biology there is a scientific field that develops methods and software tools for organizing and analyzing biological data. That field is bioinformatics and it combines computer science with other fields in order to study biological data and processes which in turn can provide meaningful information on genomic sequences. Currently, there is a software called PlotViz, a 3D data point browser, which can be helpful for scientists in the field of bioinformatics. PlotViz can be used to interactively discover intrinsic structures efficiently of which are high-dimensional and contain large volumes of data. This means that scientists will be able to find the correlations between the organisms they have data on more effectively than their previous methods such as phylogenetic trees. This software should be accessible to every scientist working in bioinformatics but has yet to be put out there for them because the process is not easily done. Once it is basic enough for simple execution then scientist will have a new and more efficient tool for analyzing organism's genomic sequences.
Keywords: Bioinformatics, Genomic Sequences, PlotViz, Phylogenetic Trees |
|
|
|
Jefferson Ridgeway IV
SO - CS - ECSU
[email protected]
|
Title: Django For Cloudmesh
Mentor: Gregor Von Laszewski
Abstract:
The cloud computing system Cloudmesh currently uses flask, however because of the low usage of the web framework. Installation of django, a similar framework has proven to be more productive, efficient, and easier to use and apply than flask. The purpose of this project is to develop a prototype django server that will have sets of functions that will make Cloudmesh easier for user to interact with. This includes attaching bootstrap theme to the django server and list of virtual machines on various clouds that currently use Cloudmesh databases.
Keywords: Cloudmesh, django, bootstrap theme, flask |
|
|
|
Derek Morris
JR - CS - ECSU
[email protected]
|
|
|
|
|
|
|
|