Keyword search FAQ

Q1 Who should apply keyword searches, lawyers or litigation support staff?

Law librarians are knowledgeable about research sources, noting up processes and many other aspects of legal research etc but they are not trained to do substantive research. That’s the lawyer’s job. Librarians can help you navigate through the complex research process, but you have to know what you are looking for and whether or not you have found it.

The same applies for keyword searching. Litigation support managers can help apply search terms properly, set up keyword highlighting, and show you the many powerful search features available in your review software but it is not their job to know what is relevant and what is not relevant, nor is it their job to help you fashion effective search strategies. That is legal work and it must be done competently.

Consult with an expert before agreeing to a list of search terms with opposing counsel, or if you are having trouble with your searches.

The best practice, then, is to work together with an expert lawyer to make sure your list of keywords is effective and that you are making the best use of available search features.

Q2 I need to search a database but I’m at a loss for key words – how do I develop a good list?

There is an art and a science to creating good keyword lists. Developing the list should be done methodically and not “out of thin air” or “off the top of my head,” although brainstorming is one way of getting started. The following methods can be considered:

  • Brainstorming. Get into the heads of custodians – what would they be saying about the relevant facts, events and people? They are not lawyers (usually) so they will not be using legal jargon.

  • Talk to colleagues. Have they worked on similar litigation for the same client or another client? Were there any keywords that others found useful?

  • Talk to client. Many industries use jargon, terms of art, abbreviations and code words. Make sure you ask client representatives about this – these types of words are often very effective as search terms.

  • Extract key terms from documents known to be relevant. This is the best starting point – make a list of all the substantive terms used in initial client documents

  • Browse the dictionary. Some software can display a list of all the words in a database, in alphabetical order. Sometimes browsing that list can pay off when unusual or obviously relevant terms appear that would not otherwise have been considered

  • Browse the database. Look at sample records from key custodians – sometimes these can generate good ideas for search terms.

  • Use the search term expansion feature of your software if available. With your initial set of search terms, this feature suggests related terms

  • Use a thesaurus. (there is also this feature built into some software) to find words that would be used in addition to or in place of the terms you have already developed.

Q3 I need to search a database but I’m at a loss for key words – are there other options?

Yes, although keyword searching can be very effective, it’s not always possible to develop a good list up front. Options include analytics and predictive coding. Consult your e-discovery expert about the proper configuration and use of these tools.

Q4 After applying my search terms I am retrieving [tens of] thousands of irrelevant records – what did I do wrong?

First ask yourself this question – based on the scope of data in the database, how many relevant documents would I expect? The answer is very different for different types of cases. For example: a large construction project ends with a dispute about delays occurring over a period of years. There will likely be thousands if not hundreds of thousands of relevant documents as we are tracking the relationship between dozens of project managers over the course of years. In other cases there may be a handful of relevant documents even in a collection of millions of emails. Your understanding of these ratios – called prevalence – is important because it helps you understand at a high level whether your searches are on target or not.

There are several reasons why key word searching fails, and it is critical that you address them. For example:

  • False hits from overly broad or ambiguous search terms. Eliminate entirely, or combine broad terms with connectors.

  • Using too many search terms. Eliminate or combine.

  • Inappropriate truncation. Make sure the wildcard character is in the right place: regulat* not regula*

  • Failure to use Boolean logic to connect terms.

  • Error in search syntax or order of connectors: [brown AND cat OR dog] vs. [cat OR dog AND brown]

  • Failure to limit searches by date range or custodian.

  • Failure to use proximity connectors between terms. In long documents, proximity of keywords makes them more likely to be relevant.

  • Failure to limit searches to structured fields when appropriate. When searching for “Brown” as an email recipient try searching that field rather than all text.

Q5 OK you’ve convinced me to combine search terms. How do I do that?

The most effective method is to create a grid with no more than three concepts in columns. (If your case is complex and involves more than three, create a new grid for each group of two or three concepts). In a simplified example, we are searching for documents about car accidents due to brake disk failure. Our initial search terms (with hits) are:

Brake* 14,567

Accident* 334

Fail* 87

Disk*20,778

etc.

Total documents with hits: 32,755 (Some documents contain more than one search term.)

At a glance we can see that there are far too many hits. Using a two-column grid helps:

CONCEPT 1

CONCEPT 2

brake*

accident*

disk*

fail*

search term 3 etc

search term 3 etc

Using this method we derive a search string as follows:

(brake* OR disk OR …) AND (accident* OR fail* OR …)

It is possible that this search string is too narrow; but it will eliminate many irrelevant documents, making it easier to find key documents from which we can then expand our efforts.

Q6 Opposing counsel has provided us with a list of search terms they would like us to apply to our client’s data. What is the appropriate response?

First it is important that search term lists be negotiated. Second it is important that they not be written in stone. Third it is important that the search results of any agreed list are not presumed relevant.

Q7 How do I know when my search is finished and what do I do next?

Probably the most important aspect of full text search is iteration – that is, the incremental searching, reviewing and modification of search strategies to improve results. You need to do this until your results are consistently good, and key documents are retrieved at the top of your ranked list. Use the relevancy ranking feature in your software and pick a cut-off for review.

Keyword searching

When software tools are being used to locate relevant records for collection through the use of search terms or other parameters, counsel or the client should review the results of the searches to determine whether the searches are effective in identifying relevant records.  If necessary, modifications to the search terms should be made to ensure that relevant records are located and collected and, to the extent reasonably possible, irrelevant records are not captured (bearing in mind that further filtering can be conducted by counsel during the review stage).

For document productions, investigations and trial preparation it is important to confirm that effective searches were performed – this is especially true when nothing relevant is found or produced. “Finding” relevant documents after they are due – especially if they are harmful to your client – is professionally embarrassing if not ethically incompetent.

Therefore when performing searches in data collections it is best to use a methodology appropriate to the purpose, the collection and the available search tools. It is important to know how to use the search tools at your disposal, either alone or in conjunction with other tools such as analytics.

From a software perspective most review programs have powerful search functions built in. Lawyers should understand how they work, how they differ, and which one or ones are most appropriate given the task and the circumstances of the search.

  • Is a keyword search appropriate?
  • Do we have a good sense of prevalence in the collection?
  • Have we selected the right full text search tool? (e.g. keyword full-text search vs field filtering)
  • Who is best placed to perform the searches? Do we need a client subject matter expert?
  • How will searches and results be documented?
  • How will saved searches be named, organized and filed?
  • How will search results be recorded and analyzed?
  • Has a list of key words been developed properly? Client experts; org charts; dictionary; analytics (expansion); key document review, pleadings
  • Was the database indexed to include numbers if numbers are meaningful enough to be searched?
  • Are we searching for any default stop words? Do we need to modify the stop word list and re-index?
  • Are we clear about the relevant date range, domains and relevant custodians?
  • Are searches being performed in the appropriate folders? Should we limit searches to certain folders or search globally?
  • Should date range be applied up front or after keyword searching?
  • Are we searching extracted text, metadata, both?
  • Are we searching attachments?
  • Have we arranged search concepts in a grid?
  • Have we confirmed that correct logic is being applied in terms of order of operators?
  • Have we applied all the available full text search tools such as concept search, thesaurus, wild card, grammar, truncation, etc.?
  • Have we applied Boolean logic correctly?
  • Are we using search syntax properly?
  • Have we tested proximity connectors?
  • Have we applied relevance ranking to check results quickly and prioritize review
  • Are we iterating/adjusting search terms with every result and tracking progress?
  • Have we used fuzzy search? At the right level?

See also Keyword search FAQ