DNA Barcodes analysis

This is a project developed during the university course Big Data. The purpose of this project is to perform an analysis of DNA subsequences samples, named barcodes, with the aim to classify and predict the species bolonging from a specific genomic sequence. In particular we conducted the analysis basing on the concept of multi-locus, comparating the results with the past mono-locus research.

This work is based on the data stored in the BOLD database. Specifically we considered species related to the families of plants and fungi.

The tool is realized by the combination of several components, developed by Bash Scripting and Python languages, along with a map-reduce step. To use it, put the data in a folder named “data” and start “fastaSplitter.sh” from a bash terminal.

More detailed information about the project can be found on this paper (located in italian).

Date

July 2014

See on GitHub

https://github.com/Pausa90/BOLDProject

Authors and Contributors

This Project is developed by @V1LL0 and @Pausa90

Contacts

v.cestarelli@gmail.com