《visualizing_data.pdf》由会员分享,可在线阅读,更多相关《visualizing_data.pdf(384页珍藏版)》请在taowenge.com淘文阁网|工程机械CAD图纸|机械工程制图|CAD装配图下载|SolidWorks_CaTia_CAD_UG_PROE_设计图分享下载上搜索。
1、Visualizing DataBen FryBeijingCambridgeFarnhamKlnParisSebastopolTaipeiTokyoVisualizing Databy Ben FryCopyright 2008 Ben Fry.All rights reserved.Printed in the United States of America.Published by OReilly Media,Inc.,1005 Gravenstein Highway North,Sebastopol,CA 95472.OReilly books may be purchased fo
2、r educational,business,or sales promotional use.Online editionsare also available for most titles().For more information,contact ourcorporate/institutional sales department:(800)998-9938 or .Editor:Andy OramProduction Editor:Loranah DimantCopyeditor:Genevieve dEntremontProofreader:Loranah DimantInde
3、xer:Ellen Troutman ZaigCover Designer:Karen MontgomeryInterior Designer:David FutatoIllustrator:Jessamyn ReadPrinting History:December 2007:First Edition.Nutshell Handbook,the Nutshell Handbook logo,and the OReilly logo are registered trademarks ofOReilly Media,Inc.Visualizing Data,the image of an o
4、wl,and related trade dress are trademarks ofOReilly Media,Inc.Manyofthe designationsusedbymanufacturersand sellers todistinguishtheir products areclaimed astrademarks.Where those designations appear in this book,and OReilly Media,Inc.was aware of atrademark claim,the designations have been printed i
5、n caps or initial caps.Whileeveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorassumeno responsibility for errors or omissions,or for damages resulting from the use of the informationcontained herein.This book uses RepKover,a durable and flexible lay-flat binding.ISBN-10:0-59
6、6-51455-7ISBN-13:978-0-596-51455-6CiiiTable of ContentsPreface.vii1.The Seven Stages of Visualizing Data.1Why Data Display Requires Planning 2An Example 6Iteration and Combination 14Principles 15Onward 182.Getting Started with Processing.19Sketching with Processing 20Exporting and Distributing Your
7、Work 23Examples and Reference 24Functions 27Sketching and Scripting 28Ready?303.Mapping.31Drawing a Map 31Locations on a Map 32Data on a Map 34Using Your Own Data 51Next Steps 53iv|Table of Contents4.Time Series.54Milk,Tea,and Coffee(Acquire and Parse)55Cleaning the Table(Filter and Mine)55A Simple
8、Plot(Represent and Refine)57Labeling the Current Data Set(Refine and Interact)59Drawing Axis Labels(Refine)62Choosing a Proper Representation(Represent and Refine)73Using Rollovers to Highlight Points(Interact)76Ways to Connect Points(Refine)77Text Labels As Tabbed Panes(Interact)83Interpolation Bet
9、ween Data Sets(Interact)87End of the Series 925.Connections and Correlations.94Changing Data Sources 94Problem Statement 95Preprocessing 96Using the Preprocessed Data(Acquire,Parse,Filter,Mine)111Displaying the Results(Represent)118Returning to the Question(Refine)121Sophisticated Sorting:Using Sala
10、ry As a Tiebreaker(Mine)126Moving to Multiple Days(Interact)127Smoothing Out the Interaction(Refine)132Deployment Considerations(Acquire,Parse,Filter)1336.Scatterplot Maps.145Preprocessing 145Loading the Data(Acquire and Parse)155Drawing a Scatterplot of Zip Codes(Mine and Represent)157Highlighting
11、Points While Typing(Refine and Interact)158Show the Currently Selected Point(Refine)162Progressively Dimming and Brightening Points(Refine)165Zooming In(Interact)167Changing How Points Are Drawn When Zooming(Refine)177Deployment Issues(Acquire and Refine)178Next Steps 180Table of Contents|v7.Trees,H
12、ierarchies,and Recursion.182Using Recursion to Build a Directory Tree 182Using a Queue to Load Asynchronously(Interact)186An Introduction to Treemaps 189Which Files Are Using the Most Space?194Viewing Folder Contents(Interact)199Improving the Treemap Display(Refine)201Flying Through Files(Interact)2
13、08Next Steps 2198.Networks and Graphs.220Simple Graph Demo 220A More Complicated Graph 229Approaching Network Problems 240Advanced Graph Example 242Mining Additional Information 2629.Acquiring Data.264Where to Find Data 265Tools for Acquiring Data from the Internet 266Locating Files for Use with Pro
14、cessing 268Loading Text Data 270Dealing with Files and Folders 276Listing Files in a Folder 277Asynchronous Image Downloads 281Using openStream()As a Bridge to Java 284Dealing with Byte Arrays 284Advanced Web Techniques 284Using a Database 288Dealing with a Large Number of Files 29510.Parsing Data.2
15、96Levels of Effort 296Tools for Gathering Clues 298Text Is Best 299Text Markup Languages 303vi|Table of ContentsRegular Expressions(regexps)316Grammars and BNF Notation 316Compressed Data 317Vectors and Geometry 320Binary Data Formats 325Advanced Detective Work 32811.Integrating Processing with Java
16、.331Programming Modes 331Additional Source Files(Tabs)334The Preprocessor 335API Structure 336Embedding PApplet into Java Applications 338Using Java Code in a Processing Sketch 342Using Libraries 343Building with the Source for processing.core 343Bibliography.345Index.349viiPreface1When I show visua
17、lization projects to an audience,one of the most common ques-tions is,“How do you do this?”Other books about data visualization do exist,butthe most prominent ones are often collections of academic papers;in any case,fewexplain how to actually build representations.Books from the field of design tha
18、toffer advice for creating visualizations see the field only in terms of static displays,ignoring the possibility of dynamic,software-based visualizations.A number spendmost of their time dissecting whats wrong with given representationssometimesproviding solutions,but more often not.In this book,I
19、wanted to offer something for people who want to get started build-ing their own visualizations,something to use as a jumping-off point for more com-plicated work.I dont cover everything,but Ive tried to provide enough backgroundso that youll know where to go next.I wrote this book because I wanted
20、to have a way to make the ideas fromComputational Information Design,my Ph.D.dissertation,more accessible to a wideraudience.More specifically,I wanted to see these ideas actually applied,rather thanlimited to an academic document on a shelf.My dissertation covered the process ofgetting from data to
21、 understanding;in other words,from considering a pile of infor-mation to presenting it usefully,in a way that can be easily understood and inter-acted with.This process is covered in Chapter 1,and used throughout the book as aframework for working through visualizations.Most of the examples in this
22、book are written from scratch.Rather than relying ontoolkits or libraries that produce charts or graphs,instead you learn how to createthem using a little math,some lines and rectangles,and bits of text.Many readersmay have tried some toolkits and found them lacking,particularly because they wantto
23、customize the display of their information.A tool that has generic uses will pro-duce only generic displays,which can be disappointing if the displays do not suityour data set.Data can take many interesting forms that require unique types of dis-play and interaction;this book aims to open up your im
24、agination in ways that collec-tions of bar and pie charts cannot.viii|PrefaceThis book uses Processing(http:/processing.org),a simple programming environ-ment and API that I co-developed with Casey Reas of UCLA.Processings program-ming environment makes it easy to sit down and“sketch”code to produce
25、 visualimages quickly.Once you outgrow the environment,its possible to use a regularJava IDE to write Processing code because the API is based on Java.Processing is freeto download and open source.It has been in development since 2001,and weve hadabout 100,000 people try it out in the last 12 months
26、.Today Processing is used bytens of thousands of people for all manners of work.When I began writing thisbook,I debated which language and API to use.It could have been based on Java,but I realized I would have found myself re-implementing the Processing API tomake things simple.It could have been b
27、ased on Actionscript and Flash,but Flash isexpensive to buy and tends to break down when dealing with larger data sets.Otherscripting languages such as Python and Ruby are useful,but their execution speedsdont keep up with Java.In the end,Processing was the right combination of cost,ease of use,and
28、execution speed.The Audience for This BookIn the spring of 2007,I co-taught an Information Visualization course at CarnegieMellon.Our 30 students ranged from a freshman in the art school to a Ph.D.candi-date in computer science.In between were graduate students from the School ofDesign and various o
29、ther undergrads.Their skill levels were enormously varied,butthat was less important than their level of curiosity,and students who were curiousand willing to put in some work managed to overcome the technical difficulties(forthe art and design students)or the visual demands(for those with an engine
30、eringbackground).This book is targeted at a similar range of backgrounds,if less academic.Im tryingto address people who want to ask questions,play with data,and gain an under-standing of how to communicate information to others.For instance,the book is forweb designers who want to build more comple
31、x visualizations than their tools willallow.Its also for software engineers who want to become adept at writing softwarethat represents datathat calls on them to try out new skills,even if they have somebackground in building UIs.None of this is rocket science,but it isnt always obvi-ous how to get
32、started.Fundamentally,this book is for people who have a data set,a curiosity to explore it,and an idea of what they want to communicate about it.The set of people who visu-alize data is growing extremely quickly as we deal with more and more information.Even more important,the audience has moved fa
33、r beyond those who are experts invisualization.By making these ideas accessible to a wide range of people,we shouldsee some truly amazing things in the next decade.Preface|ixBackground InformationBecause the audience for this book includes both programmers and non-programmers,the material varies in
34、complexity.Beginners should be able to pick itup and get through the first few chapters,but they may find themselves lost as we getinto more complicated programming topics.If youre looking for a gentler introduc-tion to programming with Processing,other books are available(including one writ-ten by
35、Casey Reas and me)that are more suited to learning the concepts fromscratch,though they dont cover the specifics of visualizing data.Chapters 14 canbe understood by someone without any programming background,but the laterchapters quickly become more difficult.Youll be most successful with this book
36、if you have some familiarity with writingcodewhether its Java,C+,or Actionscript.This is not an advanced text by anymeans,but a little background in writing code will go a long way toward understand-ing the concepts.Overview of the BookChapter 1,The Seven Stages of Visualizing Data,covers the proces
37、s for developing auseful visualization,from acquiring data to interacting with it.This is the frameworkwell use as we attack problems in later chapters.Chapter 2,Getting Started with Processing,is a basic introduction to the Processingenvironment and syntax.It provides a bit of background on the str
38、ucture of the APIand the philosophy behind the projects development.Chapters 3 through 8 cover example projects that get progressively morecomplicated.Chapter 3,Mapping,plots data points on a map,our first introduction to readingdata from the disk and representing it on the screen.Chapter 4,Time Ser
39、ies,covers several methods of plotting charts that represent howdata changes over time.Chapter 5,Connections and Correlations,is the first chapter that really delves intohow we acquire and parse a data set.The example in this chapter reads data from theMLB.com web site and produces an image correlat
40、ing player salaries and team per-formance over the course of a baseball season.Its an in-depth example illustratinghow to scrape data from a web site that lacks an official API.These techniques canbe applied to many other projects,even if youre not interested in baseball.Chapter 6,Scatterplot Maps,a
41、nswers the question,“How do zip codes relate to geog-raphy?”by developing a project that allows users to progressively refine a U.S.mapas they type a zip code.x|PrefaceChapter 7,Trees,Hierarchies,and Recursion,discusses trees and hierarchies.It cov-ers recursion,an important topic when dealing with
42、tree structures,and treemaps,auseful representation for certain kinds of tree data.Chapter 8,Networks and Graphs,is about networks of information,also calledgraphs.The first half discusses ways to produce a representation of connectionsbetween many nodes in a network,and the second half shows an exa
43、mple of doingthe same with web site traffic data to see how a site is used over time.The latterproject also covers how to integrate Processing with Eclipse,a Java IDE.The last three chapters contain reference material,including more background andtechniques for acquiring and parsing data.Chapter 9,A
44、cquiring Data,is a kind of cookbook that covers all sorts of practicaltechniques,from reading data from files,to spoofing a web browser,to storing datain databases.Chapter 10,Parsing Data,is also written in cookbook-style,with examples that illus-trate the detective work involved in parsing data.Exa
45、mples include parsing HTMLtables,XML,compressed data,and SVG shapes.It even includes a basic example ofwatching a network connection to understand how an undocumented data protocolworks.Chapter 11,Integrating Processing with Java,covers the specifics of how the Process-ing API integrates with Java.I
46、ts more of an appendix aimed at advanced Java pro-grammers who want to use the API with their own projects.Safari Books OnlineWhen you see a Safari Books Online icon on the cover of yourfavorite technology book,that means the book is available onlinethrough the OReilly Network Safari Bookshelf.Safar
47、i offers a solution thats better than e-books.Its a virtual library that lets youeasily search thousands of top tech books,cut and paste code samples,downloadchapters,and find quick answers when you need the most accurate,current informa-tion.Try it for free at http:/.AcknowledgmentsId first like to
48、 thank OReilly Media for taking on this book.I was initially put intouch with Steve Weiss,who met with me to discuss the book in the spring of 2006.Steve later put me in touch with the Cambridge office,where Mike Hendricksonbecame a champion for the book and worked to make sure that the contract hap
49、-pened.Tim OReillys enthusiasm along the way helped seal it.Preface|xiI owe a great deal to my editor,Andy Oram,and assistant editor,Isabel Kunkle.With-out Andys hard work and helpful suggestions,or Isabels focus on our schedule,Imight still be working on the outline for Chapter 4.Thanks also to tho
50、se who reviewedthe draft manuscript:Brian DeLacey,Aidan Delaney,and Harry Hochheiser.This book is based on ideas first developed as part of my doctoral work at the MITMedia Laboratory.For that I owe my advisor of six years,John Maeda,and mycommittee members,David Altshuler and Chris Pullman.Chris al