Big data are measurements of data that have grown so large that normal databases are unable to contain and work with the massive amount of information. Data come in three sizes: small, medium and big; none of these measurements is strict; instead, each depends more on ease of use and what type of machine can handle the information. Special machines, much larger and complex than those used for ordinary databases, are needed for big data. These types of data are typically found in government and scientific agencies, but some very large websites also contain this large amount of information.
Data come in three standard, but not strict, sizes. Small data are able to fit on a single computer or machine, such as a laptop. Medium data are able to fit on a disk array and are best managed by a database. Databases, no matter how large, are incapable of working with big data, and special systems much be used instead. While there is no strict guideline for what big data are, it typically starts around the terabyte (TB) level and goes up to the petabyte (PB) level.
Attempting to work with big data on a database that is not specialized for this amount of data will cause several substantial problems. The database is not able to handle the amount of information, so some data must be erased. This is like trying to fit 100 gigabytes (GB) on a computer with only 50 GB of hard drive space; it cannot be done. The data left will be unwieldy to both control and manage, because any function would take a long time to complete and the database must be closed off to new submissions.
While it is possible to keep purchasing machines and adding new data to the databases, this creates the unwieldy problem. This is because database software is only made to work with medium data. Larger datasets lead to errors and administrative problems, because the software simply cannot move or work with large data without encountering problems.
Big data are not encountered by most organizations or websites. Defense and military agencies use this amount of information to create models and store test results, and many large scientific agencies need these specialized machines for similar reasons. Some very large websites need large data machines, but websites are not as common as agencies in this market. These organizations need to keep all their data, because it helps to better analyze future data and make predictions.