The following guest post is from Sushant Sinha - a student at the University of Michigan and the force behind the searchable legal databases website Indian Kanoon. Our blog has previously debated issues of free digital accessibility of legal information (here, here, here and here). Indian Kanoon is an important step in that direction. (We realise that some of the technical details in the post may be unfamiliar to our readers, but the broad themes have been discussed regularly on our blog.)
----------------------------------------------------------------------------
I was quite pleased to find law information publicly available on the judis and the indiacode. However, it was too difficult to look for anything on these websites and so I started building tool sets to play with law data. At a certain point I felt that integration of these small software pieces will be very interesting. I was still skeptic as to whether search on law documents meant anything to common people who do not know the law jargon. In any case I integrated the tool sets into a search engine and got pleasantly surprised when many of my common queries were well answered. So I deployed it as a publicly available service, called it Indian Kanoon and fortunately many people have found it useful over time. When actual people start using a service (whether free or fee-based), the demand for correctness and usability increases significantly. The need to understand the problems, think about the issues and fix them have kept me in tight grip. Indian Kanoon was announced last January in a very crude form and a number of changes have gone in the past year. So this post is mostly to highlight what all work has gone into indian kanoon in the last year, what the challenges were and what features are planned in future.
Integrating more legal documents Indian Kanoon started only with supreme court judgments and central laws. Clearly this was not sufficient to many people who wanted to search in high court judgments, law commission reports and law journals. Over last year, a number of other legal documents have been added. Firstly, the law commission reports and a law journal was added. The law journal "Central India Law Quarterly" has been digitized and was put up on Internet by Devaranjan. The only problem in their integration was that the many of these documents were images scanned from the books. So I used tesseract, a free OCR software supported by google, for extracting text from these images. However, the text extraction quality was just 90% and I am skeptical if google uses tesseract for its own google books project. Tarunabh pointed out the availability of constituent assembly debates that can be integrated. He pointed out two main problems in integrating them. First, the article numbers in the debates were different than in the constitution. Secondly, debates are cited in the court judgments using page numbers in the official books. But both of these numbers were not available in the digital copy provided by the government. So the only way out was to go back to the actual books. We did not want to give away the digital route yet. So we went to books.google.com that had a scanned copy of the debates. Tarunabh emailed Google to release those books in public domain as the copyright on them has expired the previous year. Google replied saying that they are not sure about the copyright expiration and will be conservative in making books publicly available. Finally, I loaned the books from a library, manually copied the page numbers and the association list between the article numbers in the debates and the article numbers in the Constitution and integrated the constituent assembly debates. Indian Kanoon was highly deficient in terms of high court judgments and even in Supreme court judgments as Dilip earlier pointed out on my blog. So I integrated the high court judgments and made Indian Kanoon more comprehensive.
Features Beside making Indian Kanoon comprehensive in terms of legal documents, a number of features to make searching easier have been added. The most common problem was the mis-spelling of Indian names and so I first added the most critical feature for spelling suggestions. Ability to search and order documents by date was added next. The search and forums were redesigned to look aesthetically appealing. In order to provide notifications for new judgments, RSS feed for court judgments was recently added. Finally, people may like to monitor documents related to certain words or phrases. So on Tarunabh's suggestion I added the RSS feed for any arbitrary query.
Contributing code back Developing indian kanoon software has been possible because of the availability of large amount of free software. As a result I was able to modify these software and customize it for law search. Indian Kanoon uses a feature rich open source database - Postgresql as the backend. When users submit a query, matching documents are found, ordered and the top few are shown. For each document, the search engine also displays a small text excerpt where the query terms appear. The text excerpt allows people to quickly evaluate whether the document is relevant to the query. The headline function developed for indian kanoon was contributed back to postgres and has been added to the postgres CVS head. Beside that a bug in postgres was fixed as well. I also sent the phrase search function to the postgres list. But, Teodor Sigaev, who merged OpenFTS in the Postgresql, wants a generic operator that can check for arbitrary distance between the lexemes. I have not yet got time to work on this operator. Beside development on the database, the Indian Kanoon forums has been released as djangobb - Django Bulletin board that uses the django web application framework. The judis recently moved to a really obfuscated website where the judgment did not have a stable URL. Prashant Iyengar pointed out that we are not getting the live feed from the judis. So I reverse engineered the website and released the judis reverse engineering code.
Future works Even after so much of work a number of things need to be improved on indian kanoon. Here is a list of changes that I think are required to make indian kanoon more comprehensive, more rich and better in search. Please feel free to suggest more.
1. Reverse engineering different court and tribunal websites so that indian kanoon can provide a live feed of all Indian court and tribunal judgments.
2. Currently indian kanoon cannot answer questions like "list of judgments in which a particular law section was held" and "search only in family law judgments". The problem is that we do not have enough semantic information about judgments. So I want to enable common users to start tagging documents. There will be two kinds of tagging: categorizing court judgments and laws into broad categories like family law, constitutional law, right to equality etc and secondly, tag whether a judgment explains, bolsters, or overturns a given law or judgment. The tags generated by the users will be available to everyone with the Creative Commons-Attribution-Share Alike license 3.0.
3. A number of people type in natural language in the search box. For example, someone will type "recent judgments from delhi high court". Even though we can answer these questions, we directly search the query to the documents. For example, the above query could have been reduced to "doctypes: delhi sortby: mostrecent". So what we need is a small natural language processor that can automatically convert such natural language queries to a more precise query that the engine can evaluate.
4. I only support searching for a set of words in the documents. Roy wanted a more sophisticated query langauge that supports boolean queries. This will enable people to issue more complicated queries like (freedom OR speech) AND (NOT expression).
5. With the addition of more data over time, Indian Kanoon takes more than a second to evaluate some queries. A number of software changes (or possible hardware upgrade) are required to bring back the evaluation time to sub-second.
Showing posts with label Legal Database. Show all posts
Showing posts with label Legal Database. Show all posts
Friday, January 16, 2009
Wednesday, January 30, 2008
Digitising Legal Scholarship - II
A couple of weeks ago, I wrote a post on the sorry state of digital archives of legal scholarship in India. Since then, I have had many responses on archives and searches that do exist at the moment, and several initiatives being taken in this direction. This post is to summarize these responses and acknowledge these initiatives.
Shamnaad has already introduced Sushant in a previous post. Sushant has made Supreme Court cases searchable in a user-friendly fashion that should put some of the subscription sites to shame. Indian Kanoon, his search engine promises to include High Court decisions, Constituent Assembly Debates, Law Commission Reports and journal articles in its database very soon.
The other person I want to introduce is Devranjan, a third year student at National Law School, Bangalore. Along with some other students, he has founded the 'Open Book Society'. Their purpose is to digitize and make searchable archives of important Indian Journals. They have already managed to do this for the Central India Law Quarterly and the National Law School of India Review. They need prior permission from journals to digitize them. As I understand it, they put in all the effort into doing so themselves - the Journals just have to agree. This is a fantastic initiative and deserves all praise and help. This is the message he asked me to pass on:
Here is the list of freely available articles, indices and search options that I found out in the last two weeks. Only some of this is really good quality, but hopefully the other established journals like the Journal of Indian Law Institute, Indian Journal of International Law, the journal section of Supreme Court Cases, Cochin University Law Review, Indian Bar Review and other journals published by various law schools will learn from the Central India Law Quarterly and let Devranjan's team digitize their archives.
Freely accessible online articles:
Central India Law Quarterly
National Law School of India Review
Indian Journal of International Law (only table of contents is archived)
Scholasticus - Journal of National Law University (only table of contents is archived)
National Law Institute University WebJournal
The Practical Lawyer
Lawyers' Collective Magazine (only current issue is online - I could not locate the archives)
Combat Law
Manupatra Articles
IndLaw Articles
Legal Services India Articles
Shamnaad has already introduced Sushant in a previous post. Sushant has made Supreme Court cases searchable in a user-friendly fashion that should put some of the subscription sites to shame. Indian Kanoon, his search engine promises to include High Court decisions, Constituent Assembly Debates, Law Commission Reports and journal articles in its database very soon.
The other person I want to introduce is Devranjan, a third year student at National Law School, Bangalore. Along with some other students, he has founded the 'Open Book Society'. Their purpose is to digitize and make searchable archives of important Indian Journals. They have already managed to do this for the Central India Law Quarterly and the National Law School of India Review. They need prior permission from journals to digitize them. As I understand it, they put in all the effort into doing so themselves - the Journals just have to agree. This is a fantastic initiative and deserves all praise and help. This is the message he asked me to pass on:
'Would u be able to help the society in any way for instance
raising money, getting journals, or just giving us
better visibility?'
If anyone wants to get in touch with Devranjan, please let me know and I will put you in touch with him.Here is the list of freely available articles, indices and search options that I found out in the last two weeks. Only some of this is really good quality, but hopefully the other established journals like the Journal of Indian Law Institute, Indian Journal of International Law, the journal section of Supreme Court Cases, Cochin University Law Review, Indian Bar Review and other journals published by various law schools will learn from the Central India Law Quarterly and let Devranjan's team digitize their archives.
Freely accessible online articles:
Central India Law Quarterly
National Law School of India Review
Indian Journal of International Law (only table of contents is archived)
Scholasticus - Journal of National Law University (only table of contents is archived)
National Law Institute University WebJournal
The Practical Lawyer
Lawyers' Collective Magazine (only current issue is online - I could not locate the archives)
Combat Law
Manupatra Articles
IndLaw Articles
Legal Services India Articles
Free search engines:
Indian Kanoon
NLSIU Journal Index
Please let me know if I have missed out anything and I will
update these lists.
Monday, January 21, 2008
Launch of Indian Kanoon-Online Resource for Indian Court Decisions
A vibrant computer science student at the Univ of Michigan, Sushant, recently launched "Indian Kanoon", a fabulous online resource for Indian judgments. This valuable research tool will go a long way towards ensuring better access to the court's judgments by the general public and more robust public participation.
Indian Kanoon breaks law documents into smallest possible clause and by integrating law/statutes with court judgments. A tight integration of court judgments with laws and with themselves allows automatic determination of the most relevant clauses and court judgments.
Indian Kanoon sources data from indiacode.nic.in and all supreme court judgments from judis.nic.in, and crawls these sites for updates. I reproduce extracts of this service from the "about us" page.
"India prides herself as the largest democracy in the world. There are three broad pillars of Indian democracy: the legislatures who make laws, the executives who enforce laws and the judiciary that interprets laws. The laws regulate a number of activities like criminal offense, civil cases, taxation, trade, social welfare, education and labor rights.
Even when laws empower citizens in a large number of ways, a significant fraction of the population is completely ignorant of their rights and privileges. As a result, common people are afraid of going to police and rarely go to court to seek justice. People continue to live under fear of unknown laws and a corrupt police.
A number of attempts have been made to bring the knowledge of law to the common people. The Government of India took active efforts to present all laws along with their amendments at indiacode.nic.in and all court judgments at judis.nic.in. Similar efforts have been taken up by other privately owned websites like vakilno1.com and laws4india.com
While it is commendable to make law documents available to common people, it is still quite difficult for common people to easily find the required information. The first problem is that acts are very large and in most scenarios just a few section of laws are applicable. Finding most applicable sections from hundreds of pages of law documents is too daunting for common people. Secondly, laws are often vague and one needs to see how they have been interpreted by the judicial courts. Currently, the laws and judgments are separately maintained and to find judgments that interpret certain law clauses is difficult.
In order to remove the above two structural problems, Indian Kanoon is started. It achieves them by breaking law documents into smallest possible clause and by integrating law/statutes with court judgments. A tight integration of court judgments with laws and with themselves allows automatic determination of the most relevant clauses and court judgments. Hope Indian Kanoon helps you in your search for Indian laws and their interpretations."
Well done Sushant!! We need more people like you.
Indian Kanoon breaks law documents into smallest possible clause and by integrating law/statutes with court judgments. A tight integration of court judgments with laws and with themselves allows automatic determination of the most relevant clauses and court judgments.
Indian Kanoon sources data from indiacode.nic.in and all supreme court judgments from judis.nic.in, and crawls these sites for updates. I reproduce extracts of this service from the "about us" page.
"India prides herself as the largest democracy in the world. There are three broad pillars of Indian democracy: the legislatures who make laws, the executives who enforce laws and the judiciary that interprets laws. The laws regulate a number of activities like criminal offense, civil cases, taxation, trade, social welfare, education and labor rights.
Even when laws empower citizens in a large number of ways, a significant fraction of the population is completely ignorant of their rights and privileges. As a result, common people are afraid of going to police and rarely go to court to seek justice. People continue to live under fear of unknown laws and a corrupt police.
A number of attempts have been made to bring the knowledge of law to the common people. The Government of India took active efforts to present all laws along with their amendments at indiacode.nic.in and all court judgments at judis.nic.in. Similar efforts have been taken up by other privately owned websites like vakilno1.com and laws4india.com
While it is commendable to make law documents available to common people, it is still quite difficult for common people to easily find the required information. The first problem is that acts are very large and in most scenarios just a few section of laws are applicable. Finding most applicable sections from hundreds of pages of law documents is too daunting for common people. Secondly, laws are often vague and one needs to see how they have been interpreted by the judicial courts. Currently, the laws and judgments are separately maintained and to find judgments that interpret certain law clauses is difficult.
In order to remove the above two structural problems, Indian Kanoon is started. It achieves them by breaking law documents into smallest possible clause and by integrating law/statutes with court judgments. A tight integration of court judgments with laws and with themselves allows automatic determination of the most relevant clauses and court judgments. Hope Indian Kanoon helps you in your search for Indian laws and their interpretations."
Well done Sushant!! We need more people like you.
Labels:
Legal Database
Subscribe to:
Posts (Atom)