With thousands of fans, college basketball games can be almost deafening. Some arenas have decibel meters, which can provide some indication of the noise generated. Researchers wanted to see whether machine learning algorithms could pick out patterns within the raw acoustical data that indicated the crowd's mood, thereby providing clues as to what was happening in the game itself.

With thousands of fans clapping, chanting, shouting and jeering, college basketball games can be almost deafeningly loud. Some arenas have decibel meters, which, accurately or not, provide some indication of the noise volume generated by the spectators and the sound systems. However, crowd noise is rarely the focus of scientific inquiry.

Machine learning algorithms

"Crowd noise is typically treated as background interference — something to screen out." But the BYU researchers felt that crowd noise was worthy of its own investigation. In particular, they wanted to see whether machine learning algorithms could pick out patterns within the raw acoustical data that indicated what the crowd was doing at a given time, thereby providing clues as to what was happening in the game itself.

The BYU team made high-fidelity acoustic measurements during men's and women's basketball games at the university, later doing the same for football and volleyball games.

They broke up the games into half-second intervals, measuring the frequency content, sound levels, the ratio of the maximum to minimum sound levels within a set time block, and other variables. Then they applied signal processing tools that identified 512 distinct acoustical features comprised of different frequency bands, amplitudes and so forth.

The group used these variables to construct a 512-dimensional space, utilizing machine learning techniques to perform a computerized, clustering analysis of this complicated, multidimensional realm.

Gee explained the process with a simple analogy. "Suppose you have a plot of points on a two-dimensional, x-y graph and measure the distance between those points," he said. "You might see that the points are bunched together in three clumps or clusters. We did something similar with our 512-dimensional space, though you obviously need a computer to keep track of all that."

"K-means clustering" analysis

The so-called "K-means clustering" analysis they ran revealed six separate clusters that corresponded to what was happening in the arena, depending on whether people were cheering, singing, booing, being quiet, or letting the loudspeakers dominate the soundscape.

In this way, Gee and his colleagues were able to gauge the emotional state of the audience, simply from a machine-run analysis of the sound data. "One important eventual application of our research," he said, "may be the early detection of unruly or violent crowd behavior."