The WAVE Report
The WAVE Report archive is available on http://www.wave-report.com
0717 Web Intelligence 2007
0717.1 The Role of Web Service Architectures
0717.2 Invited Talks
0717.2.1 IBM Research
0717 Web Intelligence 2007
By John Latta
November 2-4, 2007
Web Intelligence was a small academic conference but its location was a draw which brought in leaders in web, semantic and molecular biology.
Certainly one of the most interesting was the presentation by Richard Karp:
Computer Science as a Lens on the Sciences: The Example of Computational Molecular Biology, Richard M. Karp, University of California at Berkeley.
There is a growing influence of fundamental ideas from computer science on the nature of research in a number of scientific fields. This includes an awareness that information processing lies at the heart of the processes studied in fields as diverse as quantum mechanics, statistical physics, nanotechnology, neuroscience, linguistics, economics and sociology. Increasingly, mathematical models in these fields are expressed in algorithmic languages and describe algorithmic processes. the presentation focused on computational molecular biology, where the view of living cells as complex information processing systems has become the dominant paradigm. These pose specific algorithmic problems arising in the sequencing of genomes, the comparative analysis of the resulting genomic sequences, the modeling of networks of interacting proteins, and the associations between genetic variation and disease.
The theme, as outlined above, is that probabilistic methods, computer science and web technology is being used to solve key problems in DNA and protein research. This is certainly an area where the scope of the problems demands non-deterministic methods which scale.
One of the more interesting analogies was with quantum mechanics. Computing is facing the same transition which happened with physics with the transition from classical deterministic physics to relativity theory and quantum mechanics. A very similar transition is happening in computing and especially as reflected in the web.
0717.1 The Role of Web Service Architectures
Dieter Fensel, University of Innsbruck/Digital Enterprise Research Institute, Austria, gave a presentation on Service Web 3.0. This was challenging not only from a web perspective but the broader content of computing sciences. The summary of the talk frames the issue.
Computer science is entering a new generation. The previous generation was based on abstracting from hardware. The emerging generation comes from abstracting from software and sees all resources as services in a service-oriented architecture (SOA). In a world of services, it is the service that counts for a customer and not the software or hardware components that implement the service. Service-oriented architectures are rapidly becoming the dominant computing paradigm. However, current SOA solutions are still restricted in their application context to in-house solution of companies. A service web will have billions of services. While service orientation is widely acknowledged for its potential to revolutionize the world of computing by abstracting form underlying hardware and software layers, that success depends on resolving fundamental challenges that SOA does not address currently. The mission of Service Web 3.0 is to provide solutions to integration and search that will enable the Service Oriented Architecture (SOA) revolution on a worldwide scale. Hereby we must focus on three major areas where we need to extend current approaches towards service orientation:
- Web technology as an infrastructure and underlying infrastructure for integration of services at a world wide scale.
- Semantic Web technology as a means to abstract from syntax to semantics; and
- Web 2.0 as a means to structure human-machine cooperation in an efficient and cost effective manner
Service Web 3.0 will place computing and programming at the services layer providing the real goal of computing: problem solving in the hands of end users through a properly balanced cooperation approach.
The concept of Service Ware creates a new environment:
Programs are services
Devices are services
Different types of media
Environments are dynamic and open
Mobility and Ubiquity
The scale is to billions of services.
But the challenges of creating a SOA are reflected in the fact that there are only about 12,000 Web services.
The areas which much be addressed to permit scaling include:
Everyone can act as a provider or consumer of services.
Services are created in isolation from one another thus interoperability remains an issue.
There is no central control of services. Services can appear, change or disappear at any time in an uncontrolled fashion.
It is likely that human individuals will pose the fundamental limitation on how well service web will scale.
It was represented that scalability is enable with semantics which provides descriptions of data and processes. This would allow previous human-intensive tasks to quickly and efficiently be accomplished at run time. That is, information can be understood and processed with semantics.
In this Web 3.0 environment web services to be made automated:
Publishing of web services
Discovery of web services
Composition of web services – the combination of services
Selection of the best web service among many which meet a goal
Mediation of inconsistencies of data, protocol or process
Execution of web services which follow programmatic conventions.
There is an interesting comparison between the Web today and the Semantic Web. While the web is about the global networking with URL, HTML and HTTP the Semantic Web is about global networking using URI, RDF and SPQRQL.
Uniform Resource Identifier (URI),
Resource Description Framework (RDF)
SPARQL, query language for RDF, which abbreviates
backwards, Protocol and RDF Query Language.
The end result is that many more tasks can be done automatically using web services built on these components.
Examples cited of the Semantic Web include the following.
KIM Browser plug-in.
This provides is a plug-in for Internet Explorer, performing semantic annotation of named entities (NE, such as, persons, organizations, locations, money, etc.) over unstructured or semi-structured web content online. The annotations refer to an upper-level ontology and a knowledge base, containing instances of real-world entities of general importance. This plug-in is based on the KIM Platform.
Disco Hyperdata Browser
The Disco – Hyperdata Browser is a browser for navigating the Semantic Web as an unbound set of data sources. The browser renders all information, that it can find on the Semantic Web about a specific resource, as an HTML page. This resource description contains hyperlinks that allow you to navigate between resources. While you move from resource to resource, the browser dynamically retrieves information by dereferencing HTTP URIs and by following rdfs:
The FacetedDBLP search interface allows for the search of computer science publications in the DBLP collection starting from some keyword and shows the result set along with a set of facets, e.g., distinguishing publication years, authors, or conferences. It is the first large scale application that uses GrowBag graphs to create a computer science specific topic facet, with which a user can characterize the result set in terms of main research topics and filter it according to certain subtopics.
Processes in the Semantic Web are enabled by frameworks. These enable the automation of individual intensive tasks in a Web 2.0 environment. Cited as examples of frameworks were.
SAWSDL (WSDL-S): Semantic annotation of WSDL descriptions
WSMO: Ontologies, goals, web services and mediators
OWL-S: WS description Ontology(Profile, Service Model and Grounding)
SWSF: Process-based Description Model and Language for WS.
Dieter Fensel concluded that:
The notion of 100% completeness and correctness in logic-based reasoning does not make sense anymore since the underlying fact base is changing faster than any reasoning process can process it.
In a world of billions of services it may cost too much to find the “optimal” service in relation to the gain of having actually found the optimal solution.
Pragmatic approaches in service discovery will focus on utility, that is, stop the search process when a service is found that is “good” enough to fulfill a request.
The times of 100% complete and correct solutions are gone.
Computer science in the 20th century was about perfect solutions in closed domains and applications.
Computer science in the 21st century will be about approximate solutions and frameworks that capture the relationships of partial solutions and requirements in terms of computational costs, i.e., the proper balance of their ratio.
The shift to the 21st century computing paradigm is comparable to the transition in physics from classical physics to relativity theory and quantum mechanics. This happened where the notion of absolute space and time is replaces by relativistic notions and the principle limits of precision.
0717.2 Invited Talks
Dieter Fensel actually framed a number of the invited talks during the rest of the conference. We will discuss the following presentations:
Mashups for the Enterprise – IBM Research
Enabling Next Generations Data Management Applications – Google
Google Mashup Editor – Google
Pipes - Yahoo
Each represents a different approach to use the information present on the Web beyond simple web browsing. In a small way these are creating incremental solutions to the semantic web concepts.
Central to many of these implementations is the mashup.
The objective being to make the creation of a web application as easy as possible, including by non-programmers.
0717.2.1 IBM Research
Anant Jhingran, VP and CTO, IBM Silicon Valley Laboratory, gave the presentation Mashups for the Enterprise. The summary is:
There is a fundamental transformation that is taking place on the web around information composition through mashups. We assert that this will also affect enterprise architectures. Currently the state-of-the-art in enterprises around information composition is federation and other integration technologies. These scale well, and are well worth the upfront investment for enterprise class, long-lived applications. However, there are many information composition tasks that are not currently well served by these architectures. The needs of Situational Applications (i.e. applications that come together for solving some immediate business problems) are one such set of tasks. Augmenting structured data with unstructured information is another such task. Our hypothesis is that a new class of integration technologies will emerge to serve these tasks, and we call it an enterprise information mashup fabric. In the talk, we discuss the information management primitives that are needed for this fabric, the various options that exist for implementation, and pose several, currently unanswered, research questions.
It is claimed that mashup usage is rapidly growing. It is claimed that there are over 2,500 mashups. Web 2.0 makes possible information resources for business that can be simple, mixable (more than one source of data), collaborative and even leverage UGC.
IBM Research is developing Info 2.0 which makes information consumable as a service. This creates the enterprise mashup. It is based on QEDWiki and the Mashup Hub.
QEDWiki is a browser-based assembly canvas used to create simple mashups. A mashup maker is an assembly environment in which the creator of a mashup uses software components (or services) made available by content providers. QEDWiki is a unique Wiki framework in that it provides both Web users and developers with a single Web application framework for hosting and developing a broad range of Web 2.0 applications. QEDWiki can be used for a wide variety of Web applications, including, but not limited to, the following:
* Web content management for a typical collection of Wiki pages
* traditional form processing for database-oriented CRUD (Create/Read/Update/Delete) applications
* document-based collaboration
* rich interactive applications that bind together disparate services
* situational applications (or mashups).
QEDWiki also provides Web application developers with a flexible and extensible framework to enable do-it-yourself (DIY) rapid prototyping. Business users can quickly prototype and build ad hoc applications without depending on software engineers. QEDWiki provides mashup enablers (programmers) with a framework for building reusable, tag-based commands. These commands (or widgets) can then be used by business users who wish to create their own Web applications.
QEDWiki attempts to make use of the social and collaborative aspects of Web 2.0 by enabling the following basic actions:
* Assembly: Subject matter experts who may not be programmers can create Web applications to address just-in-time ad hoc situational needs; they can also integrate data and mark-up using widgets to create new utilities.
* Wiring: Users can bind rich content from disparate sources to create new ways to view information; they can also add behavior and relationships to disparate widgets to create a rich interactive application experience.
* Sharing: QEDWiki can be used to quickly promote a mashup for use by others and to enable multi-user collaboration on the development of a mashup.
Using Info 2.0 IBM represented that only a few steps are required to create an enterprise mashup.
Start the Design with the Data – Create the feed
Transform and Remix feeds – Use the Mashup Hub Flow Editor and Engine for transforming and remixing feeds
Assemble the Visual Elements – Using drag and drop assembly and palette of widgets and feeds
A critical component is the Flow Engine which manages and sets up the various feeds in the mashup.
IBM has also developed some programming interfaces and these include:
This is a ongoing research project at IBM and has a number of activities to improve Info 2.0
Alon Halevy, Research Scientist, Google, gave the talk: Dataspaces -- enabling the next generation data management applications. A summary of the presentation includes:
Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises, large-scale scientific projects, and government agencies. Data integration also holds the promise of fueling the next revolution of data content on the Web. This reviewed progress on data integration made to date but it argued that despite the progress, data integration is either still too hard for most users or does not address the real needs in applications. To address these issues a new abstraction, dataspaces, was presented to address these two challenges.
Web 2.0 creates a number of challenges in terms of databases. These include:
An order of magnitude increase in the number of users
Users are related to each other via social networks
There is diverse data and schema of varying quality and
The collection and analysis of data is decentralized.
This begs for new data management tools, including how the use the data which already exists. At Google these are dataspaces. Areas of use which were illustrated included.
Data integration from multiple sources, however, this remains like traditional data management.
Enterprise databases which include CRM, ERP and many other data sources.
Science, especially in the area of biomedical.
Deep Web, the data hid behind walls on the Web.
An effort to create Dataspaces is the Google Base effort. As per Google.
Google Base is a place where you can easily submit all types of online and offline content, which we'll make searchable on Google (if your content isn't online yet, we'll put it there). You can describe any item you post with attributes, which will help people find it when they do related searches. In fact, based on your items' relevance, users may find them in their results for searches on Google Product Search and even our main Google web search.
Paul McDonald, Google, described the Mashup Editor.
Creating mashups with Google is possible with a few lines of code. Then it is possible to publish mashups.
Google services mashed up
Take some AJAX UI components, data from your users and Google services like Google Base and Google Maps or external feeds and mash them all together using our simple framework. We make it easy with the Google Mashup Editor.
Common web technologies doing uncommon things
Simple tools for sophisticated apps
Using the Google Mashup Editor you can create, debug and deploy your application in one interface.
The editor includes the following.
Source code repository
Framework for handling feeds
Feed based datastore
UI modules and controls which bind to elements in a feed
The skills and capabilities required to create a mashup include.
To fetch, parse and manipulate feeds
Create and syndicate new feeds
UI which supports multiple browsers
Infrastructure which supports a scalable server and hosting
Database for creating and storing data.
The programming model supports XML Tags, HTML, CSS and JS.
One of the major drawbacks of the Google editor is that it requires programming.
Jonathan Trevor, Yahoo, presented an overview of Pipes, which has the following attributes.
Pipes is a composition tool to aggregate, manipulate, and mashup content from around the web. Like Unix pipes, simple commands can be combined together to create output that meets your needs:
Create your ultimate custom feed by combining many feeds into one, then sorting, filtering and translating them.
Geocode your favorite feeds and browse the items on an interactive map.
Remix your favorite data sources and use the Pipe to power a new application.
Build custom vertical search pages that are impossible with ordinary search engines.
Power widgets/badges on your web site.
Consume the output of any Pipe in RSS, JSON, KML, and other formats.
To illustrate the power of Pipes the presentation began by asking the question:
How do you find an apartment near a park?
The solution is easy in concept but tedious to implement.
Use Craiglist to get a listing of apartments
Click on a map link
Determine the distance to a park on the map
The tools are available.
Craigslist apartment has an RSS feed
Yahoo! local API can find parks
This could be implemented in 50 lines of Perl code.
Pipes allows for any input to be used to create any output. The inputs are:
The output can include:
A number of examples were given.
Hot Deals Search
Photos near Napa Wineries
GeoAnotated Reuters News
The strengths of Pipes are:
Pipes is middleware without having one’s own web server
Supports Rapid Prototyping
Faster than many APIs
Key to Pipes is Tweak and the Pipes Editor.
The Library pane on the left hand side lists available modules and saved Pipes
The Canvas pane in the center is the main work area for assembling Pipes
The Debugger is a resizable pane at the bottom which lets you inspect Pipe output at various stages in your Pipe
You build and edit Pipes by moving modules onto the Canvas from the Library pane and wiring them together with your mouse.
Central to the area where work is done is The Canvas.
The canvas is the main work area for assembling and testing your Pipe. You can drag modules around and arrange them in whatever way looks good to you, or ask the editor to auto-arrange everything by clicking the Layout button.
Double-clicking the title bar of any module will collapse the module by hiding all its parameters, while double-clicking again (or clicking the maximize box in the corner) will restore the module to its full size. This can be useful when working with Pipes that have many components.
To make your Pipe work, you'll need to wire modules together. You can do this by clicking the output terminal of any module, then clicking on the input terminal of the module you want to feed that data to. The editor will flash compatible terminals in orange to indicate which connections are permitted. You can mouse over the terminals of any module to see what kind of data that terminal expects to emit or receive.
To sever a connection between modules, click on either of the terminals at the end of a wire (a small scissors icon will appear).
Many modules have configurable parameters and input fields. You can fill these in like regular form fields, or supply them with appropriately typed input wired in from another module. Use the User Input modules to let users supply their own input to the Pipe at runtime.
In summary Pipes has the following attributes.
Grab web data sources which include: RSS, JSON, XML, RDF, ICAL and CSV.
Manipulate the data including from multiple sources
View the results
Use the Pipe whenever needed
It was interesting that even Google cited the strength of Pipes in their presentation.
In Web 2.0 we see a dramatic change in the web where users are involved – web 2.0 is about its users. Social networks and video are just two elements. But here at Web Intelligence the tools of what makes up the web today, including RSS feeds and HTML pages, were building blocks. These are components in a larger computational and data engine. Mashups are but the first step of another form of user involvement and they represent an important trend in creating a personal SOA. This is all a part of making the web more personal and directly controlled by individuals for their own needs.
Return to the top
Copyright 2010 The WAVE Report
To subscribe to the WAVE Report go to
To unsubscribe also use the Wave Report Home page or send
preformatted UNSUBSCRIBE message:
Previous issues of WAVE, as well as other info can be found
Comments on or questions about the WAVE may be sent to:
John N. Latta -
The WAVE Report may be redistributed in full for individual
readership and posted to newsgroups, Web, and FTP sites. This
publication may not be reprinted or redistributed for profit.
Short quotes are permitted but must be attributed to the WAVE