In the era of big data, it is important to understand how to work with large data sets. Although this is a complex process, it is relatively simple to manage. The key is to focus on a few goals and use the most appropriate data management tools for each goal.

The term “big data” is used to refer to data sets that are too large for traditional data management applications. Although this is a common term, it does not always mean that the data set is of interest to scientists. In fact, some of the most interesting scientific information comes from small datasets. For example, the analysis of Facebook interactions is a big data project, but it does not include information about billions of people. The same is true for the SGP project, which may be the largest scientific project ever assembled, but the information it contains is a very small part of human knowledge.

There are two common formats for longitudinal (time dependent) student assessment data: WIDE and LONG format. The SGPdata package, installed when you install the SGP software, includes exemplar WIDE and LONG data sets (sgpData_WIDE and sgpData_LONG) to help you set up your own longitudinal analysis systems. Both formats have their advantages and disadvantages for different kinds of analyses.

A student growth percentile (SGP) describes a student’s progress over time, comparing him or her to students with similar prior test scores (their academic peers). While SGP calculations are complex, the results are presented in terms that are familiar to most teachers and parents. For this reason, SGPs are often useful in communicating student learning to families and the public.

We recommend that you use the long data format for most operational data analysis. This format allows you to easily update analyses by simply adding new years of data. It also provides a more structured environment for storing and managing data. Using a long data format also simplifies your access to higher level functions that are designed for long data sets.

