Show simple item record

dc.contributor.advisorAnshus, Otto J.
dc.contributor.authorMurphy, Michael J.
dc.date.accessioned2017-06-30T10:44:14Z
dc.date.available2017-06-30T10:44:14Z
dc.date.issued2017-05-15
dc.description.abstractIt is still strangely difficult to backup and synchronize data. Cloud computing solves the problem by centralizing everything and letting someone else handle the backups. But what about situations with low connectivity or sensitive data? For this, software developers have an interesting distributed, decentralized, and partition-tolerant data storage system right at their fingertips: distributed version control. Inspired by distributed version control, we have researched and developed a prototype for a scalable high-availability system called Distributed Media Versioning (DMV). DMV expands Git's data model to allow files to be broken into more digestible chunks via a rolling hash algorithm. DMV will also allow data to be sharded according to data locality needs, slicing the data set in space (subset of data with full history), time (subset of history for full data set), or both. DMV repositories will be able to read and to update any subset of the data that they have locally, and then synchronize with other repositories in an ad-hoc network. We have performed experiments to probe the scalability limits of existing version control systems, specifically what happens as file sizes grow ever larger or as the number of files grow. We found that processing files whole limits maximum file size to what can fit in RAM, and that storing millions of objects loose as files with hash-based names incurs disk space overhead and write speed penalties. We have observed a system needing 24 seconds to store a 6.8 KiB file. We conclude that the key to storing large files is the break them into many small chunks, and that the key to storing many chunks is to aggregate them into pack files. And though the current DMV prototype does only the former, we have a clear path forward as we continue our work.en_US
dc.identifier.urihttps://hdl.handle.net/10037/11213
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universiteten_US
dc.publisherUiT The Arctic University of Norwayen_US
dc.rights.accessRightsopenAccessen_US
dc.rights.holderCopyright 2017 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/3.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)en_US
dc.subject.courseIDINF-3990
dc.subjectVDP::Mathematics and natural science: 400::Information and communication science: 420::Communication and distributed systems: 423en_US
dc.subjectVDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Kommunikasjon og distribuerte systemer: 423en_US
dc.titleDistributed media versioningen_US
dc.typeMaster thesisen_US
dc.typeMastergradsoppgaveen_US


File(s) in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)