Why Do We Need Web Science Research?

Why Do We Need Web Science Research? screenshot

Text-only Preview

Why Do We Need Web Science Research?December 2009Sangki Han, Ph.D.Professor / GSCTKAIST2009년 12월 2일 수요일Web Science by Tim Berners-Lee in 200622009년 12월 2일 수요일PERSPECTIVESThis is the first report of time-dependentof these has changed when changes in onlytrack to realizing technological capabilitiesseismic tomography applied to an eruptingtwo quantities (V and V ) have been mea-resembling those of the fictional VirtualPSvolcano. It builds on earlier work of thesured is not possible and requires the addi-Geophysical Laboratory by 2025.same kind done in geothermal areas intion of other kinds of data. Both theoreticalCalifornia and Iceland and the Long Valleyadvances and more data from different vol-References and NotesCaldera, California. But the seminal exam-canoes are needed before the potential of the1. D. Patanè, G. Barberi, O. Cocina, P. De Gori, C.Chiarabba, Science 313, 821 (2006).ple of major changes in V /V comes frommethod can be fully assessed. PS2. The compressional and shear waves are the fastest andThe Geysers geothermal area in northernAt present, monitoring of active volca-second-fastest waves to be radiated from an earthquakeCalifornia. noes still rests mostly on relatively unso-source, so they arrive first and second on seismograms.Their ratio provides information about pressure andDuring the 1980s and 1990s, somephisticated seismic networks and the moni-about the presence of gas and liquid in the study volume.13,600 tons of steam per hour were extractedtoring of simple parameters, such as theThus, changes in their ratio can tell us about changes infrom The Geysers to generate electricity. Asnumbers of earthquakes and the amplitudepressure and gas/liquid, which are thought to accompanya result of this overexploitation, the reser-of harmonic tremor. Patanè et al. show thatthe buildup and occurrence of a volcanic eruption.3. G. R. Foulger, C. C. Grant, A. Ross, B. R. Julian, Geophys.voir became progressively depleted as poremuch more sophisticated methods can nowRes. Lett. 24, 135 (1997).water was replaced by steam. Repeat seis-be used. Some of these methods only need to4. R. C. Gunasekera, G. R. Foulger, B. R. Julian, J. Geophys.mic tomography showed the steady growthbe automated—a critical factor if they are toRes. 108, 2134 (2003).5. G. R. Foulger, B. R. Julian, Geotherm. Resour. Counc.of a reservoir-wide negative V /V anomalybe useful in situations where information isPSBull. 33, 120 (2004).that coincided with the steam-productionneeded on an hourly basis. It is hoped that6. G. R. Foulger et al., J. Geophys. Res. 108, 2147 (2003).zone. This anomaly was caused by the com-this automation work will be pushed for-bined effects of the replacement of pore liq-ward rapidly in the near future, putting us on10.1126/science.1131790uid with steam, the resulting decrease inpressure, and the drying of clay minerals. Aremarkable series of snapshots showed theC O M P U T E R S C I E N C Erelentless growth of a volume of heavydepletion (3, 4). The work helped to increase on December 1, 2009 awareness of the nonsustainability of suchCreating a Science of the Webhigh rates of fluid withdrawal. Production atThe Geysers has now been reduced to sus-Tim Berners-Lee, Wendy Hall, James Hendler, Nigel Shadbolt, Daniel J. Weitznertainable levels. Time-dependent tomogra-phy is currently used to monitor the CosoUnderstanding and fostering the growth of the World Wide Web, both in engineering and societalGeothermal Area, southern California (5). terms, will require the development of a new interdisciplinary field.Time-dependent seismic tomographywas first applied to a volcano in a study oflyzes the natural world, and tries to findMammoth Mountain, a volcano on the rimSince its inception, the World WideWeb has changed the ways scientistsmicroscopic laws that, extrapolated to thewww.sciencemag.orgof Long Valley Caldera, California. In 1989,communicate, collaborate, and edu-macroscopic realm, would generate thean intense swarm of hundreds of earth-cate. There is, however, a growing realiza-behavior observed. Computer science, byquakes accompanied an injection of newtion among many researchers that a clearcontrast, though partly analytic, is princi-magma into the roots of this volcano,research agenda aimedpally synthetic: It is concerned with the con-and triggered the outpouring of someat understanding thestruction of new languages and algorithms300 tons of CO per day from the vol-Enhanced online at current, evolving,in order to produce novel desired computer2www.sciencemag.org/cgi/cano’s surface. Several broad swathsand potential Web isbehaviors. Web science is a combination ofcontent/full/313/5788/769Downloaded from of trees died as a result of high levelsneeded. If we want tothese two features. The Web is an engineeredof CO in the soil, and the COmodel the Web; if wespace created through formally specified22a l s o presented an asphyxiation hazardwant to understand the architectural princi-languages and protocols. However, becauseto humans. A comparison of V /V tomo-ples that have provided for its growth; and ifhumans are the creators of Web pages andPSgraphic images calculated for 1989 andwe want to be sure that it supports the basiclinks between them, their interactions form1997 showed changes that correlated wellsocial values of trustworthiness, privacy,emergent patterns in the Web at a macro-with areas of tree death on the surface above,and respect for social boundaries, then wescopic scale. These human interactions are,and were attributed to migration of CO inmust chart out a research agenda that targetsin turn, governed by social conventions and2the volcano (6).the Web as a primary focus of attention. laws. Web science, therefore, must be inher-By showing that time-dependent seismicWhen we discuss an agenda for a scienceently interdisciplinary; its goal is to bothtomography can be used to monitor struc-of the Web, we use the term “science” in twounderstand the growth of the Web and to cre-tural changes directly associated with a vol-ways. Physical and biological science ana-ate approaches that allow new powerful andcanic eruption cycle, Patanè et al. take a crit-more beneficial patterns to occur. ical step toward developing a useful volcano-Unfortunately, such a research area doeshazard-reduction tool based on seismicT. Berners-Lee and D. J. Weitzner are at the Computer Sciencenot yet exist in a coherent form. Withinand Artificial Intelligence Laboratory, Massachusetts Institutetomography. As with all good experiments,of Technology, Cambridge, MA 02139, USA. W. Hall and computer science, Web-related research hashowever, it ushers in new challenges. V /VN. Shadbolt are in the School of Electronics and Computerlargely focused on information-retrievalPSis affected by several factors, including poreScience, University of Southampton, Southampton SO17algorithms and on algorithms for the routing1BJ, UK. J. Hendler is in the Computer Science Department,fluid phase, pressure, mineralogy, and frac-of information through the underlying Inter-University of Maryland, College Park, MD 20742, USA. ture density. However, determining how eachE-mail: [email protected] Outside of computing, researchers grow769www.sciencemag.orgSCIENCEVOL 31311 AUGUST 2006Published by AAAS32009년 12월 2일 수요일A New Discipline Model the Web’s structure Articulate the architectural principles that have fueled its phenomenal growth Discover how online human interactions are driven by and can change social conventions42009년 12월 2일 수요일Interdisciplinary Approach52009년 12월 2일 수요일WSRI & Web Science Trust The Web Science Research Initiative (WSRI) is a joint endeavour between the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT and the School of Electronics and Computer Science (ECS) at the University of Southampton. The goal of WSRI is to facilitate and produce the fundamental scientific advances necessary to inform the future design and use of the World Wide Web Publication: Foundations and Trends in Web Science Events– Web Science Summer Graduate School– WebSci09 - Society On-Line Directors of WSRI are establishing a charitable body - the Web Science Trust (WST)– Working with WWW Foundation62009년 12월 2일 수요일WebSci’09: Society On-Line Understanding of both human behavior and  Identified the following areas of on-line society and Web technological designdevelopment for particular attention:– How do people and organisations behave on-line – – E-commercewhat motivates them to shop, date, make friends, learn, participate in political life or manage their – Government and Political Lifehealth or tax on-line? – Social Relationships– Which Web-based designs will they trust? To which – Cybercrime and/or the Prevention Thereofon-line agents will they delegate? – Health– How can the dark side of the Web – such as – Culture On-Linecybercrime, pornography and terrorist networks – be both understood and held in check without – E-Learningcompromising the experience of others?  The cross-cutting infrastructure issues on which these – What are the effects of varying characteristics of areas depend including, but not limited to:Web-based technologies – such as security, privacy, network structure, the linking of data – on on-line – Linked Data and the Semantic Webbehaviour, both criminal and non-criminal? – Trust and Reputation– And how can the design of the Web of the future – Security and Privacyensure that a system on which – as Tim Berners-Lee – Networking (Social and Technical)put it – democracy and commerce depends remains 'stable and pro-human'?72009년 12월 2일 수요일INFORMATION TECHNOLOGYWeb Science which subsequently improved computing significantly. Web science was launched as a formal discipline in November 2006, when the two of us and our col-EMERGES leagues at the Massachusetts In-stitute of Technology and the University of Southampton in England announced the begin-ning of a Web Science Research Initiative. Lead-Studying the Web will reveal better ing researchers from 16 of the world’s top uni-versities have since expanded on that effort. ways to exploit information, This new discipline will model the Web’s structure, articulate the architectural principles prevent identity theft, that have fueled its phenomenal growth, and dis-cover how online human interactions are driven revolutionize industry and manage by and can change social conventions. It will elu-cidate the principles that can ensure that the net-our ever growing online liveswork continues to grow productively and settle complex issues such as privacy protection and in-tellectual-property rights. To achieve these ends, By Nigel Shadbolt and Tim Berners-Lee Web science will draw on mathematics, physics, computer science, psychology, ecology, sociolo-Since the World Wide Web blossomed in gy, law, political science, economics, and more.the mid-1990s, it has exploded to more Of course, we cannot predict what this na-than 15 billion pages that touch almost scent endeavor might reveal. Yet Web science all aspects of modern life. Today more and more has already generated crucial insights, some people’s jobs depend on the Web. Media, bank-presented here. Ultimately, the pursuit aims to ing and health care are being revolutionized by answer fundamental questions: What evolu-it. And governments are even considering how tionary patterns have driven the Web’s growth? to run their countries with it. Little appreciated, Could they burn out? How do tipping points however, is the fact that the Web is more than the arise, and can that be altered? KEY CONCEPTSsum of its pages. Vast emergent properties have The relentless rise in Web arisen that are transforming society. E-mail led Insights Alreadypages and links is creating emer-to instant messaging, which has led to social net-Although Web science as a discipline is new, gent properties, from social net-works such as Facebook. The transfer of docu-earlier research has revealed the potential value working to virtual identity theft, ments led to file-sharing sites such as Napster, of such work. As the 1990s progressed, search-that are transforming society.which have led to user-generated portals such as ing for information by looking for key words A new discipline, Web science, YouTube. And tagging content with labels is cre-among the mounting number of pages was aims to discover how Web traits ating online communities that share everything returning more and more irrelevant content. arise and how they can be from concert news to parenting tips. The founders of Google, Larry Page and Sergey harnessed or held in check to But few investigators are studying how such Brin, realized they needed to prioritize the benefit society.emergent properties have actually blossomed, results. Important advances are begin-how we might harness them, what new phe-Their big insight was that the importance of ning to be made; more work nomena may be coming or what any of this a page—how relevant it is—was best understood can solve major issues such might mean for humankind. A new branch of in terms of the number and importance of the as securing privacy and science—Web science—aims to address such is-pages linking to it. The difficulty was that part conveying trust.sues. The timing fits history: computers were of this definition is recursive: the importance of —The Editorsbuilt first, and computer science followed, a page is determined by the importance of the 32 SCIENTIFIC AMERICAN O c tobe r 2 0 0 882009년 12월 2일 수요일Model the Web’s StructureT E C H N I C A L C O M M E N Tdata, we can illustrate the same procedure forPower-Law Distribution of thethe network of movie actors that we dis-cussed (1). When the connectivity of the in- PageRank by Page and BrinWorld Wide Webdividual actors is plotted as a function of therelease year of their first movie (Fig. 1A), theresults are very similar to those shown in fig.Baraba´si and Albert (1) propose an im- from other sites, and found that the distribu- 1B of Adamic and Huberman’s comment.proved version of the Erdo¨s-Re´nyi (ER) the-tion of links followed a power law (Fig. 1A).The only difference is that the movie industryory of random networks to account for theNext, we queried the InterNIC database (us-had its boom not 4 years ago, as did thescaling properties of a number of systems,ing the WHOIS search tool at www.WWW, but rather at the beginning of theincluding the link structure of the Worldnetworksolutions.com) for the date on whichcentury; thus, the apparently structureless re- Web is a scale-free network -. Wide Web (WWW). The theory they present,the site was originally registered. Whereasgime persists much longer. When the connec-however, is inconsistent with empirically ob-the BA model predicts that older sites havetivity of the actors that debuted in the sameserved properties of the Web link structure.more time to acquire links and gather links atyear is averaged, however, the average con-Baraba´si and Albert write that becausea faster rate than newer sites, the results ofnectivity in the last 60 years increases with“of the preferential attachment, a vertexour search (Fig. 1B) suggest no correlationthe actor’s age, in line with the predictions ofNortheastern University’s Albert- thatacquiresmoreconnectionsthananoth- betweentheageofasiteanditsnumberof ourtheory,andthecurvefollowsapowerlawer one will increase its connectivity at alinks.for almost a hundred years (Fig. 1B). Wehigher rate; thus, an initial difference in theThe absence of correlation between ageexpect that a similar increasing tendencyconnectivity between two vertices will in-and the number of links is hardly surpris-would appear for the WWW data after aver-crease further as the network grows. . . .ing; all sites are not created equal. Anaging, but the length of the scaling intervalThus older . . . vertices increase their con-exciting site that appears in 1999 will soonwould be limited by the Web’s comparativelyLászló Barabásinectivity at the expense of the younger . . .have more links than a bland site created inbrief history.ones, leading over time to some vertices1993. The rate of acquisition of new links isThe fluctuations that lead to the appar-that are highly connected, a ‘rich-get-rich-probably proportional to the number ofent randomness of Fig. 1A are due to theer’ phenomenon” [figure 2C of (1)]. It islinks the site already has, because the moreindividual differences in the rate at whichthis prediction of the Baraba´si-Albert (BA)links a site has, the more visible it becomesnodes increase their connectivity. It ismodel, however, that renders it unable toand the more new links it will get. (Thereeasy to include such differences in theaccount for the power-law distribution ofshould, however, be an additional propor-model and continuum theory proposed by Web as having short paths and small worldslinks in the WWW [figure 1B of (1)].tionality factor, or growth rate, that variesWe studied a crawl of 260,000 sites, eachfrom site to site.)one representing a separate domain name. WeOur recently proposed theory (2), whichcounted how many links the sites receivedaccounts for the power-law distribution in thenumber of pages per site, can also be appliedto the number of links a site receives. In this– While at Cornell University in the model, the number of new links a site re-ceives at each time step is a random fractionof the number of links the site already has.New sites, each with a different growth rate,appear at an exponential rate. This model1990s, Duncan J. Watts and Steven H. yields scatter plots similar to Fig. 1B, and canproduce any power-law exponent1.Lada A. AdamicBernardo A. HubermanStrogatzXerox Palo Alto Research Center3333 Coyote Hill RoadPalo Alto, CA 94304, USAE-mail: [email protected] A.-L. Baraba´si and R. Albert, Science 286, 509 (1999).– Even though the Web was huge, a user Fig. 1. (A) Scatter plot of movie actor connec-2. B. A. Huberman and L. A. Adamic, Nature 401, 131tivity, k (the number of other actors with which(1999).he or she performed during his or her career),10 November 1999; accepted 4 February 2000versus the year of debut. All actors from theInternet Movie Database were included; nResponse: Adamic and Huberman offer ad-392,340. (B) Average movie actor connectivity,could get from one page to any other k , versus year of debut. To determine k , k isditional support for the evolutionary networkaveraged over all actors that debuted in themodel that we offered (1). The apparent messsame year. The curve shows a systematic in-in their fig. 1B is rooted in their choice not tocrease in the average connectivity with theaverage their data. We believe that taking theactor’s professional lifetime, t(2000yearFig. 1. (A) The distribution function for theaverage over all points of the same age, andof debut). The dotted line follows k(t)t ,page in at most 14 clicksnumber of links, k, to Web sites (from crawl inextracting the trends within those averages,with0.49, very close to the predictionspring 1997). The dashed line has slope0.5 of (1). Inset shows a log-log plot of k as awould have unveiled the increasing tendency1.94. (B) Scatter plot of the number of links, k,function of t, which illustrates the presence ofversus age for 120,000 sites. The correlationpredicted by our model.scaling in the last century. The dotted line hascoefficient is 0.03.Although we do not have access to theirslope 0.5.www.sciencemag.org SCIENCE VOL 287 24 MARCH 20002115a92009년 12월 2일 수요일Analysis on CyWorldAnalysis of Topological Characteristicsof Huge Online Social Networking ServicesYong-Yeol AhnSeungyeop Han∗Haewoon KwakDepartment of PhysicsNHN Corp.Division of Computer ScienceKAIST, Deajeon, KoreaKoreaKAIST, Daejeon, [email protected]@[email protected] MoonHawoong JeongDivision of Computer ScienceDepartment of PhysicsKAIST, Daejeon, KoreaKAIST, Deajeon, [email protected]@kaist.ac.krABSTRACTCyworld, the largest SNS in South Korea, had already 10Social networking services are a fast-growing business in themillion users 2 years ago, one fourth of the entire populationInternet. However, it is unknown if online relationships andof South Korea. MySpace and orkut, similar social network-their growth patterns are the same as in real-life social net-ing services, have also more than 10 million users each. Re-works. In this paper, we compare the structures of threecently, the number of MySpace users exceeded 130 milliononline social networking services: Cyworld, MySpace, andwith a growing rate of over a hundred thousand people perorkut, each with more than 10 million users, respectively.day. It is reported that these SNSs “attract nearly half of allWe have access to complete data of Cyworld’s ilchon (friend)web users” [1]. The goal of these services is to help peoplerelationships and analyze its degree distribution, clusteringestablish an online presence and build social networks; andproperty, degree correlation, and evolution over time. Weto eventually exploit the user base for commercial purposes.also use Cyworld data to evaluate the validity of snowballThus the statistics and dynamics of these online social net-sampling method, which we use to crawl and obtain par-works are of tremendous importance to social networkingtial network topologies of MySpace and orkut. Cyworld,service providers and those interested in online commerce.the oldest of the three, demonstrates a changing scaling be-The notion of a network structure in social relations dateshavior over time in degree distribution. The latest Cyworldback about half a century. Yet, the focus of most sociologicaldata’s degree distribution exhibits a multi-scaling behavior,studies has been interactions in small groups, not structureswhile those of MySpace and orkut have simple scaling be-of large and extensive networks. Difficulty in obtaining largehaviors with different exponents. Very interestingly, eachdata sets was one reason behind the lack of structural study.of the two exponents corresponds to the different segmentsHowever, as reported in [2] recently, missing data may dis-in Cyworld’s degree distribution. Certain online social net-tort the statistics severely and it is imperative to use largeworking services encourage online activities that cannot bedata sets in network structure analysis.easily copied in real life; we show that they deviate fromIt is only very recently that we have seen research re-close-knit online social networks which show a similar de-sults from large networks. Novel network structures fromgree correlation pattern to real-life social networks.human societies and communication systems have been un-veiled; just to name a few are the Internet and WWW [3] andCategories and Subject Descriptors: J.4 [Computerthe patents, Autonomous Systems (AS), and affiliation net-Applications]: Social and behavioral sciencesworks [4]. Even in the short history of the Internet, SNSs areGeneral Terms: Human factors, Measurementa fairly new phenomenon and their network structures areKeywords: Sampling, Social networknot yet studied carefully. The social networks of SNSs arebelieved to reflect the real-life social relationships of people1.INTRODUCTIONmore accurately than any other online networks. Moreover,because of their size, they offer an unprecedented opportu-The Internet has been a vessel to expand our social net-nity to study human social networks.works in many ways. Social networking services (SNSs) areIn this paper, we pose and answer the following questions:one successful example of such a role. SNSs provide an on-What are the main characteristics of online social net-line private space for individuals and tools for interactingworks? Ever since the scale-free nature of the World-Widewith other people in the Internet. SNSs help people findWeb network has been revealed, a large number of networksothers of a common interest, establish a forum for discus-have been analyzed and found to have power-law scaling insion, exchange photos and personal news, and many more.degree distribution, large clustering coefficients, and small∗ This work was conducted while Han was at KAIST.mean degrees of separation (so called the small-world phe-nomenon). The networks we are interested in this work areCopyright is held by the International World Wide Web Conference Com-huge and those of this magnitude have not yet been ana-mittee (IW3C2). Distribution of these papers is limited to classroom use,lyzed.and personal use by others.WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.How representive is a sample network? In most networks,ACM 978-1-59593-654-7/07/0005.102009년 12월 2일 수요일