Sebastopol, CA--Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? Toby Segaran's new book, shows you how. In it, the San Francisco based developer demonstrates how you can build Web 2.0 applications to analyze social interaction across the Web today.
Segaran says, "I've worked in science software for a while, where data-mining and machine learning algorithms are quite important. I found it was difficult to find people with any exposure to these techniques and I thought it might be helpful to write a more accessible introduction with a fun angle. The timing was perfect to create a book about using these methods to help people build more intelligent web applications." With the sophisticated yet easy-to-understand and practical algorithms in this book, developers can start writing smart programs to access interesting datasets from other web sites, collect data from users of their own applications, and analyze and understand the data once they've found it.
As Tim O'Reilly, founder and CEO of O'Reilly Media, Inc., wrote in his recent blog post, Toby's new book "teaches algorithms and techniques for extracting meaning from data, including user data. This is the programmer's toolbox for Web 2.0. It's no longer enough to know how to build a database-backed web site. If you want to succeed, you need to know how to mine the data that users are adding, both explicitly and as a side-effect of their activity on your site."
Added O'Reilly, "There's been a lot written about Web 2.0 since we first coined the term in 2004, but in many ways, Toby's book is the first practical guide to programming Web 2.0 applications. (We won't tell you how to be the next Google, but we'll teach the basic techniques that are part of the price of entry. Better or more specialized algorithms are going to be the heart of each Web 2.0 company's secret sauce.)"
Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:
- Collaborative filtering techniques that enable online retailers to recommend products or media
- Methods of clustering to detect groups of similar items in a large dataset
- Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm
- Optimization algorithms that search millions of possible solutions to a problem and choose the best one
- Bayesian filtering, used in spam filters for classifying documents based on word types and other features
- Using decision trees not only to make predictions, but to model the way decisions are made
- Predicting numerical values rather than classifications to build price models
- Support vector machines to match people in online dating sites
- Non-negative matrix factorization to find the independent features in a dataset
- Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game
Each chapter includes exercises for extending the algorithms to make them more powerful. The code examples in this book are written in Python, and familiarity with Python program will help, but Segaran provides explanations of all the algorithms so that programmers of other languages can follow them easily. With this book, developers can go beyond simple database-backed applications and put the wealth of Internet data to work for them. As Segaran says, "The book just scratches the surface of what's possible. Its purpose is to both educate and inspire people to learn more."
Toby Segaran is a software developer and manager at Genstruct, a computational systems biology company. He has written free web applications for his own use and put them online for others to try, including: tasktoy, a task management system; Lazybase, an online application that lets users design, create and share databases of anything they like; and Rosetta Blog, an online tool for practicing Spanish and French by reading blogs along with their translations and lists of common words. Each of these has several hundred regular users. His blog is located at kiwitobes.com.
Programming Collective Intelligence
Toby Segaran
ISBN: 0596529325, $39.99 US
order@oreilly.com
1-800-998-9938; 1-707-827-7000
1005 Gravenstein Highway North
Sebastopol, CA 95472
About O’Reilly
O’Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O’Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption by amplifying “faint signals” from the alpha geeks who are creating the future. An active participant in the technology community, the company has a long history of advocacy, meme-making, and evangelism.