Q1 Who should apply keyword searches, lawyers or litigation support staff?
Law librarians are knowledgeable about research sources, noting up processes and many other aspects of legal research etc but they are not trained to do substantive research. That’s the lawyer’s job. Librarians can help you navigate through the complex research process, but you have to know what you are looking for and whether or not you have found it.
The same applies for keyword searching. Litigation support managers can help apply search terms properly, set up keyword highlighting, and show you the many powerful search features available in your review software but it is not their job to know what is relevant and what is not relevant, nor is it their job to help you fashion effective search strategies. That is legal work and it must be done competently.
Consult with an expert before agreeing to a list of search terms with opposing counsel, or if you are having trouble with your searches.
The best practice, then, is to work together with an expert lawyer to make sure your list of keywords is effective and that you are making the best use of available search features.
Q2 I need to search a database but I’m at a loss for key words – how do I develop a good list?
There is an art and a science to creating good keyword lists. Developing the list should be done methodically and not “out of thin air” or “off the top of my head,” although brainstorming is one way of getting started. The following methods can be considered:
Q3 I need to search a database but I’m at a loss for key words – are there other options?
Yes, although keyword searching can be very effective, it’s not always possible to develop a good list up front. Options include analytics and predictive coding. Consult your e-discovery expert about the proper configuration and use of these tools.
Q4 After applying my search terms I am retrieving [tens of] thousands of irrelevant records – what did I do wrong?
First ask yourself this question – based on the scope of data in the database, how many relevant documents would I expect? The answer is very different for different types of cases. For example: a large construction project ends with a dispute about delays occurring over a period of years. There will likely be thousands if not hundreds of thousands of relevant documents as we are tracking the relationship between dozens of project managers over the course of years. In other cases there may be a handful of relevant documents even in a collection of millions of emails. Your understanding of these ratios – called prevalence – is important because it helps you understand at a high level whether your searches are on target or not.
There are several reasons why key word searching fails, and it is critical that you address them. For example:
Q5 OK you’ve convinced me to combine search terms. How do I do that?
The most effective method is to create a grid with no more than three concepts in columns. (If your case is complex and involves more than three, create a new grid for each group of two or three concepts). In a simplified example, we are searching for documents about car accidents due to brake disk failure. Our initial search terms (with hits) are:
Total documents with hits: 32,755 (Some documents contain more than one search term.)
At a glance we can see that there are far too many hits. Using a two-column grid helps:
Using this method we derive a search string as follows:
(brake* OR disk OR …) AND (accident* OR fail* OR …)
It is possible that this search string is too narrow; but it will eliminate many irrelevant documents, making it easier to find key documents from which we can then expand our efforts.
Q6 Opposing counsel has provided us with a list of search terms they would like us to apply to our client’s data. What is the appropriate response?
First it is important that search term lists be negotiated. Second it is important that they not be written in stone. Third it is important that the search results of any agreed list are not presumed relevant.
Q7 How do I know when my search is finished and what do I do next?
Probably the most important aspect of full text search is iteration – that is, the incremental searching, reviewing and modification of search strategies to improve results. You need to do this until your results are consistently good, and key documents are retrieved at the top of your ranked list. Use the relevancy ranking feature in your software and pick a cut-off for review.
When software tools are being used to locate relevant records for collection through the use of search terms or other parameters, counsel or the client should review the results of the searches to determine whether the searches are effective in identifying relevant records. If necessary, modifications to the search terms should be made to ensure that relevant records are located and collected and, to the extent reasonably possible, irrelevant records are not captured (bearing in mind that further filtering can be conducted by counsel during the review stage).
For document productions, investigations and trial preparation it is important to confirm that effective searches were performed – this is especially true when nothing relevant is found or produced. “Finding” relevant documents after they are due – especially if they are harmful to your client – is professionally embarrassing if not ethically incompetent.
Therefore when performing searches in data collections it is best to use a methodology appropriate to the purpose, the collection and the available search tools. It is important to know how to use the search tools at your disposal, either alone or in conjunction with other tools such as analytics.
From a software perspective most review programs have powerful search functions built in. Lawyers should understand how they work, how they differ, and which one or ones are most appropriate given the task and the circumstances of the search.
- Is a keyword search appropriate?
- Do we have a good sense of prevalence in the collection?
- Have we selected the right full text search tool? (e.g. keyword full-text search vs field filtering)
- Who is best placed to perform the searches? Do we need a client subject matter expert?
- How will searches and results be documented?
- How will saved searches be named, organized and filed?
- How will search results be recorded and analyzed?
- Has a list of key words been developed properly? Client experts; org charts; dictionary; analytics (expansion); key document review, pleadings
- Was the database indexed to include numbers if numbers are meaningful enough to be searched?
- Are we searching for any default stop words? Do we need to modify the stop word list and re-index?
- Are we clear about the relevant date range, domains and relevant custodians?
- Are searches being performed in the appropriate folders? Should we limit searches to certain folders or search globally?
- Should date range be applied up front or after keyword searching?
- Are we searching extracted text, metadata, both?
- Are we searching attachments?
- Have we arranged search concepts in a grid?
- Have we confirmed that correct logic is being applied in terms of order of operators?
- Have we applied all the available full text search tools such as concept search, thesaurus, wild card, grammar, truncation, etc.?
- Have we applied Boolean logic correctly?
- Are we using search syntax properly?
- Have we tested proximity connectors?
- Have we applied relevance ranking to check results quickly and prioritize review
- Are we iterating/adjusting search terms with every result and tracking progress?
- Have we used fuzzy search? At the right level?
See also Keyword search FAQ