DNA Barcodes analysis
This is a project developed during the university course Big Data. The purpose of this project is to perform an analysis of DNA subsequences samples, named barcodes, with the aim to classify and predict the species bolonging from a specific genomic sequence. In particular we conducted the analysis basing on the concept of multi-locus, comparating the results with the past mono-locus research.
This work is based on the data stored in the BOLD database. Specifically we considered species related to the families of plants and fungi.
The tool is realized by the combination of several components, developed by Bash Scripting and Python languages, along with a map-reduce step. To use it, put the data in a folder named “data” and start “fastaSplitter.sh” from a bash terminal.
More detailed information about the project can be found on this paper (located in italian).
Date
July 2014
See on GitHub
https://github.com/Pausa90/BOLDProject
Authors and Contributors
This Project is developed by @V1LL0 and @Pausa90