By JEFFREY R. YOUNG
Philadelphia
If the Internet is a new kind of social space, what does it look like?
That's a question of particular interest to social scientists eager to see what cyberspace might reveal about the nature of human behavior.
Researchers, after all, have long sought to map social groupings and interactions in the physical world. Now, with so much activity on computer networks, scientists can collect vast amounts of hard data on human behavior. Each blog points to other blogs in ways that reveal patterns of influence. Online chats can be tallied and parsed. Even the act of clicking on links can leave trails of activity like footprints in the sand.
"We're entering the golden age of social science," says Lee Rainie, director of the Pew Internet & American Life Project. "We know more than we ever did about what's on people's minds."
While all that data could seem overwhelming, researchers are refining ways to visualize Internet activity. If a picture is worth 1,000 words, a visualization may well be worth 10,000 data points.
At a conference this month at the University of Pennsylvania called "The Hyperlinked Society," a panel of academic and industry experts showed off their Internet maps and talked about the challenges of painting meaningful pictures of cyberspace.
Politics and Links
Political bloggers are colorful, opinionated online voices. And many people have noted that most liberal blogs, not surprisingly, generally point readers to other liberal viewpoints, while conservative bloggers keep to themselves as well.
Lada Adamic, an assistant professor in the school of information at the University of Michigan at Ann Arbor, helped create a map that shows exactly what those connections looked like just before the 2004 U.S. presidential election.
By sampling more than 1,000 political blogs, she and other researchers developed a map of the ties among them, noting, in particular, when conservatives or liberals reached across the aisle to point to a blogger from an opposing viewpoint. Naturally, the left-leaning blogs are shown as blue dots, and the right-leaning blogs are colored red. Orange lines between blogs indicate links from liberal to conservative blogs, and the purple lines are from conservative to liberal. When two liberal blogs link to each other, the line is shown in blue, just as mutually connected conservative blogs are connected with red lines.
Even the placement of each dot is meaningful, representing degrees of interlinking. "Imagine that all the blogs that link to one another have springs, so they want to be close together," says Ms. Adamic. In a similar way, her mapping algorithm "will bring the ones that are linking to each other close" and leave others further out from the center.
The researchers were curious to see if blogging activity confirmed preconceived notions of how liberals and conservatives behave offline.
"We wanted to see whether conservatives were more interlinked than liberals," she says. "That was true to some extent, but it wasn't like, wow, they're so much more interlinked. We found things to be surprisingly balanced."
As for which side blogged the most, she found things were about as close as the presidential-election results: "It was like dead even."
Ms. Adamic speculates that the large amount of interlinking and activity by conservative bloggers at the moment her data were gathered might have been caused by the excitement over a CBS News report on George W. Bush's military service that turned out to have been based on forged documents. That discovery of the forgeries was made by a blogger.
The colorful map was created using free software called Guess, originally developed by researchers from HP Labs. Other tools for data mapping are readily available as well, she says, adding that attempts to create maps of Internet activity are "definitely growing."
'Core of the Blogosphere'
Matthew Hurst, director of science and innovation for Nielsen BuzzMetrics, a company that analyzes Internet trends for businesses, has created a map of more than a thousand of the most popular blogs, essentially showing what he calls "the core of the blogosphere."
Like Ms. Adamic's map of political blogs, Mr. Hurst's distributes blogs in visual space based on how much they link to each other. "If things are very close to each other, it means they talk to each other a lot," he says. "When you do this analysis, you inevitably end up with a large percentage of blogs that are just floating around by themselves because they don't have a lot of in or out links."
The size of the circles on Mr. Hurst's map indicates the numbers of links to the blogs. The colors of the circles show the type of blog software used or on what kind of server the sites are hosted, telling technology-oriented researchers the more popular servers and software.
The map indicates that the most linked-to blogs focus on technology and social-political commentary.
Mr. Hurst says that Internet maps help the company make recommendations about how to harness the Internet. "If we understand how influence works in the blogosphere," he says, "it allows us to make more qualified assertions of what's going on to our customers."
Following the Patterns
Likewise, Microsoft is using data maps to better understand the dynamics of online communities.
"The future of computing is social computing," said Marc A. Smith, leader of the Community Technologies group at Microsoft Research, at the conference here. "The question is how do you harness the swarm."
He has been working on a project called Netscan, which analyzes behavior on Usenet, a text-only online discussion forum that has been around since before the invention of the World Wide Web. Though Usenet is far less popular than blogging these days, Mr. Smith says the forum can yield important lessons that can be transferred to newer online discussion spaces that use so-called "threaded messages," where conversations are organized by topic.
"What we're trying to do is show patterns of contribution to threaded conversation communities," Mr. Smith says.
The project generates plenty of raw data tables ripe for analysis. But Mr. Smith says the visual representations of data are far more powerful for spotting patterns. "Like a lot of people, I'm not that numerate, and the pictures speak to me in a way that tables of numbers do not," says Mr. Smith.
In fact, Mr. Smith and his colleagues have developed a way to determine what kind of user a person is by looking at data maps of their posting behavior rather than examining the content of their messages. Among the type of users: the "answer person," who is quick to provide advice to strangers; the "flame warrior," who enters discussions hoping to win arguments by trashing other participants; the "discussion person," who is willing to talk on just about any topic; and "the questioner," who seeks advice but is not a frequent participant and is not looking for conversation.
Several weeks' worth of activity by a single user can be shown on one visualization, where each new discussion thread the user participated in is shown as a bubble, and the size of the bubble represents the number of times the user posted on that topic.
"Our intent has been to make visible the latent but invisible patterns in conversational data sets," said Mr. Smith and several colleagues in a report on the research.
Mapping the Future
Academics at the conference here argued that the "killer map," or most powerful way of representing online interactions, has yet to emerge.
"We're still waiting for the Mercator map to emerge" for cyberspace, said Martin Dodge, a lecturer in human geography at the University of Manchester, in England. The Mercator projection method for mapping Earth, developed in the 1560s, was a breakthrough in translating the round earth to a flat map.
Of course, measuring cyberspace is far more complex than mapping physical space.
Some researchers at the conference pointed out that analyzing the number of links to Web sites does not give any sense of whether the linker was positively recommending a site or pointing in disgust.
Mr. Hurst says that, unlike in the physical world, the Internet has no objective space to measure, so any map will inevitably be more subjective, highlighting certain traits and excluding others.
"In the physical world, it's a matter of taking the three-dimensional world and making it two-dimensional," said Mr. Dodge. But on the Internet, he said, there are more dimensions to consider.