Big Data, Mere Mortals

Jeff Looman

With the advent of smartphones and portable devices, humans today are generating data at an alarming rate. A recent Domo study estimated that 2.5 quintillion bytes of data are created each day, confirming the general conclusion of a 2017 Forbes article, “How Much Data Do We Create Every Day?,” which estimated that people would take 1.2 trillion photos by the end of 2017 (with 4.7 trillion already stored at that time). These kinds of numbers epitomize big data, and we have generated so much more since that time! If each person with a smartphone takes an average of 3 photos per day, they will take about 1,000 photos per year. All of those photos have to be stored somewhere, and for those who don’t name their files or have an efficient cataloging system, they may be very difficult to find, even after only a few months. 

The question then is this: how does a mere mortal deal with the issue of big data? Corporations have been dealing with big data issues for some time and have specifically trained IT staff to manage the problem. However, most mere mortals are not trained in IT, and do not want to become “data librarians;” they just want to quickly find what they are looking for using some combination of what they remember the subject, location, and/or time the photo was taken to be. Modern operating systems provide a way to search for files, but are limited to searching using folder or file names. This can be a real drag for a user who is not adept at naming their files. And though these systems have recently been augmented to include text searching capabilities, they are limited to specific document types within the local system. 

To add to the problem, physical devices have become shorter-term commodities, with users trading in their devices more frequently for the “latest and greatest” technologies. This has led to an increased use of external devices and/or cloud storage accounts, not only for aggregating files into a single library, but also for longevity (because, let’s face it - data is persistent; devices are not). Over time, the accumulated files become their own big data problem, and most users do not have the tools necessary to deal with it. They don’t have the time to sift through everything to get organized (my closet is bad enough), and they don’t dare get rid of the data because they don’t remember if it is valuable or not. 

What is needed to solve the big data problem is a next generation file storage system in which mere mortals do not have to be concerned with content classification systems, storage device reliability, global accessibility, aggregation of files from varied sources, extensive data security, or computing platform neutrality. FileShadow is that system. Advances in machine learning bear on the classification problem, while FileShadow’s cloud vault federates users’ data into a secure, globally accessible place, with sophisticated search engines that can be used to facilitate document retrieval based upon natural language queries.