“Predicting the future is a hazardous business.” So cautions Richard Susskind in his recent exercise in legal futurology, The End of Lawyers? Rethinking the Nature of Legal Services, citing a number of amusingly inaccurate predictions made over the years about the future of IT. In a series of posts, I venture into that hazardous business by taking a look at the Semantic Web, an exciting current development in IT, and considering how it might impact the law and lawyers. The Semantic Web is an emerging technology which promises to vastly increase the ability of computers to analyze information, resulting in smarter applications, more efficient search engines, and many more improvements to our current ability to retrieve and process data. Applied to the law, the Semantic Web may have a transformative effect on the way lawyers carry out their business. In this post, I explain why.
The problem: too much data
There are currently over 25 billion web pages on the World Wide Web. In fact, that figure covers only the indexable web, so those 25 billion pages may be only the tip of the iceberg (see this paper on the “deep web”). Looking beyond the web to total production of information, a study by International Data Corp carried out in 2008 predicts that 1,200 exabytes of data will be generated in 2010 (cited by The Economist here). To put this in perspective, note that one byte of information is a sequence of eight bits – a sequence of eight digits which can be either one or zero. One exabyte is 1,000,000,000,000,000,000 bytes (1018), or one billion gigabytes. The text of this blog post, in plain text format, takes up about 13,000 bytes. The challenge of identifying and retrieving relevant data in this ever-expanding universe of information is growing in step with the volume of the information itself. Achieving what Richard Susskind calls “information satisfaction” – getting the information you want, and only the information you want – in the face of this exponential expansion is an increasingly daunting task. This is even more true of the challenge of achieving “optimum retrieval” – for a given query, being confident that the single best document has been returned. Google’s “I’m feeling lucky” option may sometimes be surprisingly accurate, but not with any reliable degree of certainty.
Too much legal data
The problem of too much data will be familiar to law students, associates, and anyone else who has carried out legal research. The volume of legislation, case law, commentary on the law, and the like is no exception to the current phenomenon of information expansion. “Googling it” can provide a good first stab at some legal problems, but no lawyer who fears malpractice suits would rely exclusively on results from a general search engine. Commercial legal databases provide more structured and authoritative databanks of legal information, but they are expensive, difficult to use for the untrained, and the search is still conducted mostly by means of citations and keywords. Whether legal sources are identified by a search engine or using a commercial database, the actual task of analyzing and interpreting the texts is conducted by the lawyer – not the machine.
If I want to ascertain, say, what information I must provide in the certificate of incorporation of a Delaware Corporation, I can search “Delaware corporation law,” click through the link that looks most relevant, scan the text (perhaps with the help of the “find” function), identify the relevant section, and read through it to draw up a list of the requirements. If I am especially diligent, I might also check case law in a commercial database to see if judicial decisions have added to or qualified these requirements. Now imagine that, instead of proceeding by keyword searches and “manual” analysis, I could simply enter the query “What information must be provided in the certificate of incorporation of a Delaware Corporation?” and the search engine returned a complete, authoritative list of all of the requirements, along with any qualifications or additions made by the case law. That, in a nutshell, is the promise of the Semantic Web.
(Next up: Part 2 – What is the Semantic Web?)Scridb filter