Connecting the Dots: Discovering the “Shape” of Data

Scientists use a mathematical subject called topology to study the shapes of objects. An important part of topology is counting the number of pieces and the number of holes in an object, and researchers use this information to group objects into different types. For example, a doughnut has the same number of holes and the same number of pieces as a teacup with one handle, but it is different from a ball. In studies that resemble activities like “connect-the-dots,” scientists use ideas from topology to study the “shape” of data. Ideas and methods from topology have been used to study the branching structures of veins in leaves, voting in elections, flight patterns in models of bird flocking, and more.

exactly do we mean by "shape"? We are used to describing common shapes like lines, circles, and cubes, but what about more complicated objects, like a dragon or a Pokémon or a human being?
Topology is a branch of mathematics that concerns the shapes of TOPOLOGY A branch of mathematics that people use to study the shapes of objects.
things [ , ]. To help us understand topology, let's pretend that we have a circular rubber band. We want to describe the properties of an object that stay the same if we stretch it or shrink it or bend it, but without us gluing things together or breaking the object (or creating any sharp points). From a topological viewpoint, because we can stretch the rubber band into an oval, we say that the circle and the oval are topologically equivalent. However, the rubber band is not

TOPOLOGICALLY EQUIVALENT
A term that describes two objects that can be turned into each other by stretching, shrinking, bending, or warping them (but not gluing or tearing them).
topologically equivalent to a segment of a string, because the rubber band has a hole in the middle but the string does not. Remember that we are not allowed to glue the ends of the string together, and we are also not allowed to cut the rubber band.
By figuring out which shapes are equivalent to each other in this special way, we can separate shapes into di erent groups. As an example, let's assign the letters in the word "Pokémon" to groups of topologically equivalent objects. See the short animation in Video . The letters "P" VIDEO Transformation of the letters in the word "Pokémon" to assign them to groups of similar letters. Each of the groups consists of topologically equivalent letters. and "o" belong to the same group, because we can compress the bottom part of the "P" upwards and then stretch the hole into the shape of the letter "o." Consequently, "P" and the two instances of the letter "o" make up one group of topologically equivalent letters. The "k," "m," and "n" form a di erent group, because we can turn each of them into a dot by squeezing and bending them. The remaining letter, "é," is an interesting one. Without its accent, we would be able to shrink the round tail of the "e" into the left side of the semicircle at the top of the letter. We could then stretch that semicircle into the shape of the letter "o," which places it into the same group as "P" and "o." However, with the accent, "é" has two separate pieces that we are not allowed to glue together, so it belongs to its own group.
Shapes that are in the same group have important features in common. Although the details of the shapes "P" and "o" di er, they both have one hole that we cannot remove. By contrast, the letters "k," "m," and "n" do not have any holes. If we look at the uppercase letter "B," we see that it does not belong in either of these groups. It is, however, topologically equivalent to the number " ," because both the "B" and the " " have two holes. The number of pieces in an object is also important, so the "é" (with one hole and two pieces) belongs in a di erent group from all of the other letters that we have discussed. Try separating the letters in your name into groups of topologically equivalent letters. Now let's make things even more interesting by looking at some Pokémon themselves. For each Pokémon in Figure ,  It can be challenging to study the topology of solid objects like those that we have been discussing so far, but now think about drawing pictures in activities like connect-the-dots. We have a bunch of dots, and we often see enough of them that we have a good idea of what shape we will get when we connect them (see Figure ). People are good at determining shapes from just these dots, but is there a way to do this automatically? Even though this type of activity is typically harder for a computer than for a human, mathematicians and other scientists seek good ways to do it automatically because we want to look at many di erent collections of dots.
Topology can help us make sense of large amounts of data, and we can DATA Characteristics and information, typically in the form of quantitative facts and other quantitative features, that are collected through observations or in some other way.
think of exploring the topology of a collection of data (called a data set) as a giant game of connect-the-dots. In real life, there are many

DATA SET
A collection of data. A data set is often in a form that one can study using a computer.
di erent kinds of data and the data may not come in the form of dots on a page. However, we will focus on data that also have associated numbers, such as the populations and other features of regions on a map, the heights of children in a school, or the number of words in each paragraph of this article. We can analyze data of this type in a way that is similar to how we thought about dots on a page.

DISCOVERING THE SHAPE OF DATA
People think about topology and data together in an area of study called topological data analysis (TDA) [ -]. In TDA, we try to describe TOPOLOGICAL

DATA ANALYSIS
A family of techniques for studying the "shape" of data by using topology.
the shape of a data set by first building a series of pictures. By connecting the "dots" in a data set in various ways, we can study the structure of the data. Instead of connecting the dots by drawing lines from one dot to another like we are used to doing, we connect the dots by increasing their size. As we make the dots larger, the gaps Discovering the "Shape" of Data Figure   Figure In (A-G), we draw Jigglypu using increasingly large dots. When the dots are small, they do not touch each other, so there are many pieces and no holes. As the dots become larger, some of them touch each other, so the number of pieces decreases and some holes develop. At first, Jigglypu becomes easier to see as the dots get larger, but then Jigglypu becomes harder to see.
In Table , we indicate the number of pieces and number of holes in each picture in this figure. between the dots become smaller, and eventually the dots overlap (see Figure ).
It is important to figure out how large to make the dots. What if we make the dots really huge, as in Figure G? We then have one very large object with no holes. In this example, Jigglypu becomes very hard to discern when the dots are very large. Importantly, we may notice interesting things for di erent dot sizes. By using mathematics and computation, we can consider many di erent sizes of dots, and we obtain an object for each one. Each of the seven versions of Jigglypu in Figure has a di erent number of pieces and number of holes, and we can count them (see Table ).
The information in Table is one way of describing and summarizing what we observe from studying this range of dot sizes. That is, we are studying the structure of Jigglypu across many sizes (i.e., scales).

SCALE
A characteristic size of an object, such as the radius of a disc or the length of a side of a square. Figure is at one scale, and by counting the number of pieces and number of holes at each scale, we can explore the range of dot sizes over which Jigglypu 's features persist. This is a common approach in TDA: we look at the dot sizes over which di erent features persist in data that we want to study.

WHAT CAN WE LEARN FROM TOPOLOGICAL DATA ANALYSIS?
TDA can tell us a lot about many things in the world. It allows us to explore complex data in a huge variety of topics in social science, biology, astronomy, and more [ ].
We can use TDA to help us understand the universe. Planets like Earth are part of solar systems, which in turn are part of galaxies, which occur in clusters. If we look into a telescope and zoom in on a solar system, the planets seem to be very far apart. But if we zoom out to look at an entire galaxy, each solar system may appear as just a dot      Moving back down to earth, scientists have used TDA to examine the patterns of veins in leaves [ ]. They studied the structure of more than leaves and found di erent patterns-kind of like human fingerprints-in the leaves. These fingerprints can help improve the ability of scientists to identify leaves from small leaf fragments, and they may also be helpful for improving our understanding of how leaves grow. TDA is also useful for studying the structure of fungi, blood vessels, and other things with branches and loops.
People also use TDA to describe activity patterns of people and animals. For example, two of us recently studied geographic voting patterns in di erent areas of California [ ]. We used TDA to detect areas of the state where people voted di erently from those in neighboring areas in the presidential election. Animals other than people also produce interesting patterns. Schools of fish and flocks of birds include many individuals and can form beautiful structures. TDA can help scientists explore and understand these complex patterns [ ].
kids.frontiersin.org March | Volume | Article | In summary, TDA is an increasingly popular approach for studying many problems, which range from connecting the dots in pictures of Pokémon to the structure of the universe [ ], patterns in nature [ ], geographic voting patterns in elections [ ], and much more. TDA is a fascinating and important area of mathematics that helps people makes sense of complex data [ -].

ACKNOWLEDGMENTS
We are grateful to our young readers -Charlotte Amann-Sulzmann, Simon Cafiero, Addison Cart, Nia Chiou, Valerie K. Eng, Linnea Keiser-Clark, Coralea Lash-St. John, Adele Low, Maple Leung, Nora Stricker, Kate Van Hooser, and one anonymous person -for their many helpful comments. We also thank their parents, teachers, and friends -Clayton Cafiero, Lyndie Chiou, Puck Rombach, and Steve Van Hooser -for putting us in touch with them and soliciting their feedback. We also thank Norman Redington, our editors, and our reviewers for helpful comments. MAP, MF, and YHK acknowledge support from the National Science Foundation (grant number ) through the Algorithms for Threat Detection (ATD) program. CMT acknowledges support from the National Science Foundation (grant number ) through the Division of Mathematical Sciences.

CONFLICT OF INTEREST:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
COPYRIGHT © Feng, Hickok, Kureh, Porter and Topaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

ETHAN, AGE:
In my spare time, I love to compose music and play the piano with a garage-band. I train in several sports, including swimming, basketball, and track and field. I am passionate about mathematics, science, and solving puzzles.

IAN, AGE:
I am a high school freshman in Chicago and my interests include competitive math, chess, and lacrosse, as well as reading. My favorite subject in school is math, more specifically theoretical math and set theory. I am an only child and I have a dog named Rosie. My recent favorite books include Anathem by Neal Stephenson and The Hidden Reality by Brian Greene. I also enjoy learning about history and the events that occur in it. Thank you and goodbye! JONATHAN, AGE: . I live in a small town and am very interested in science and mathematics. In the past I invested much time in sports and even earned a blue belt in Taekwondo, but now I put more focus on science, computers, and arts. I learned to play the guitar and code, and learn English as my second language.