Andrew Stephenson and Dado Hrustanpasic, Corrs Chambers Westgarth
Complex engineering and construction projects, particularly relating to energy, resources and public infrastructure, are often valued in multiple billions of US dollars. In Australia, in excess of US$200 billion has been invested since 2011 in the oil and gas sector alone. With the ever-increasing size and complexity of projects has come a massive increase in the volume of documents involved therein, facilitated by technology making it ever easier to create, copy and store documents. Large projects can easily generate tens of millions of documents.
The costs of dealing with the enormous number of documents created in such projects has plagued the legal process, particularly in dispute resolution. The common law system traditionally required that relevant documents be identified and disclosed, usually at great expense. Most common law jurisdictions have now recognised that the old rules of discovery need to be modified to reduce the cost of this process. Those reforms have concentrated on reducing the volume of documentation required to be disclosed, by narrowing the criteria by which relevance is assessed and introducing concepts of proportionality. However, these changes have had limited effect in actually reducing cost. Narrowing the relevance criteria has indeed limited the number of documents disclosed. However, at least at the first step, this has often increased costs. The documents still need to be reviewed to determine whether they fall within the relevance criteria, but now more judgment is required, which takes more time, with more experienced practitioners. Changes to discovery rules have done little to overcome the real cost: the sheer number of documents involved in the process.
Even setting aside the problem of discovery, all large construction disputes require an analysis of facts. Facts are most reliably found in documents. Effective advocacy and efficient case management requires identifying relevant documents early and disregarding the irrelevant. Accordingly, the management of a large volume of documentation is necessary and costly even in those jurisdictions that do not require discovery.
Technologies have emerged that promise to significantly reduce the cost of dealing with large volumes of documents, by requiring only a fraction of the project documents to be reviewed. This article touches on one that is currently being adopted in many jurisdictions. However, the quality of the human input into technology and an understanding of what the technology has done remains critical.
Numerous software providers have developed algorithms that are capable of being trained for the purposes of identifying documents relating to a particular subject matter. The application of this technology to the task of identifying material relevant to a dispute from a wider dataset has generically been called ‘technology assisted review’ (TAR).
There are a number of methods by which TAR can be deployed, depending on the size of the document set, the nature of the documents and what issues are relevant. For very large document sets, TAR can be used to extrapolate which documents may be relevant or irrelevant without the need to review a large proportion of the documents.
This method requires lawyers familiar with the dispute to review relatively small samples of randomly chosen documents (often 1,000 documents) for relevance. These document samples are usually called ‘training sets’. Based on the text features of the documents that have been selected to be relevant and irrelevant (such as combinations and frequency of words), the algorithm ranks every other document based on its likelihood to be relevant. Further training sets are then selected and the process repeated a number of times until it is considered that, above a certain ‘cut off’ in the ranking, the TAR system has identified sufficient of the relevant documents and excluded sufficient of the irrelevant ones to be satisfied with the outcome. The documents above the ‘cut off’ are those used for a particular legal purpose, such as disclosure or further consideration.
Generally, the measures of satisfaction are ‘recall’ and ‘precision’. ‘Recall’ is the estimated percentage of all relevant documents that are identified in the bundle of documents that is used for legal purposes. ‘Precision’ is the estimated percentage of relevant to irrelevant documents in the bundle of documents used for legal purposes.
The satisfactory level of recall and precision depends on the particular circumstances of each case, such as the nature of the issues that are relevant, how apparent and prevalent those issues are in the wider document set, and the particular legal purpose that is sought. Recall and precision are often trade-offs so that, for example, a high recall (finding a high proportion of relevant documents) is often at the expense of precision (not being able to confidently exclude documents as irrelevant).
To illustrate by way of example, a particular project may have 10 million documents, for which a TAR system is used to identify the documents relevant to a particular dispute, and exclude irrelevant documents. After the algorithm has been ‘trained’, recall of 80 per cent may be achieved in the first 1 million documents ranked by the TAR system from the most relevant to the least relevant, at a precision of 25 per cent. As 80 per cent recall is the target set, then the first 1 million documents becomes the document set used for further preparation of the case. The 25 per cent precision statistic predicts that of the 1 million documents so chosen, some 25 per cent of them will actually be relevant. This means, despite the use of the technology, 750,000 documents of the million selected for detailed review will be irrelevant. Further, the 250,000 relevant documents identified are likely to be only 80 per cent of the relevant documents that exist. One may be able to select a ‘cut off’ with higher recall, but it is likely to introduce many more irrelevant documents.
Accordingly, in this example the technology has significantly reduced the documents that need to be reviewed – by 9 million documents. However, not all relevant documents will have been identified, and there will still be a significant number of irrelevant documents that will need to be manually reviewed and discarded. So, while well short of perfection, the TAR system has significantly reduced the manual work required and therefore the cost.
If TAR is used for discovery in common law countries, an obvious objection is that 20 per cent of the relevant documents will not be discovered. Parties may also object that they are not confident that the system (either the technology itself or the quality of training by humans) has worked as promised.
This is predicated on the proposition that manual review of documentation by traditional means will produce a better outcome than by using TAR. This proposition has been considered and dismissed in a number of cases around in the world. In the now well-known Irish case, Irish Bank Resolution Corporation Limited v Quinn  IEHC 175, in considering this point Fullam J concluded as follows:
The evidence establishes, that in discovery of large data sets, technology assisted review using predictive coding is at least as accurate as, and, probably more accurate than, the manual or linear method in identifying relevant documents. Furthermore, the Plaintiff’s expert, Mr Crowley exhibits a number of studies which have examined the effectiveness of purely manual review of documents compared to using TAR and predictive coding. One such study, by Grossman and Cormack, highlighted that manual review results in less relevant documents being identified. The level of recall in this study was found to range between 20% and 83%. A further study, as part of the 2009 Text REtrieval Conference found the average recall and precision to be 59.3% and 31.7% respectively using manual review, compared to 76.7% and 84.7% when using TAR. What is clear, and accepted by Mr Crowley, is that no method of identification is guaranteed to return all relevant documents.
The 84.7 per cent precision referred to in the 2009 Text REtrieval Conference study was achieved by manually reviewing the documents predicted by the technology in order to remove false positives. Even so, the TAR review significantly improved efficiency over exhaustive manual review, as on average only about 2 per cent of the total documents required review.
The observation about the quality of manual review in this case are consistent with the writers’ experience. Detailed manual review (particularly where multiple reviewers are used) produces inconsistent and unreliable outcomes. It is also the writers’ experience that there is a trade-off between recall and precision in manual review. Particularly where inexperienced lawyers are involved, with little knowledge of the subject matter in issue, there is a risk of defaulting to a simplistic view of relevance. The capacity for simple human error should also not be underestimated.
Arguments about documents not being discovered can best be avoided by pre-agreeing the use of TAR and the level of recall to be achieved. A suggested acceptable level of recall is 80 per cent ± 5 per cent. This of course depends on the matter at hand, but in most cases will provide a high level of disclosure, while being proportionate to the effort required to achieve high recall.
Quality assurance for TAR systems is largely by way of sampling. Usually a reference set of documents, which have been coded manually, is established early in the process. By comparing the TAR system’s categorisation of the documents in the reference sample with the categorisation made by the manual reviewers, the level of both recall and precision can be established. Even where there is no discovery, a party seeking to rely on TAR for its own purposes should confirm it has achieved satisfactory recall and precision so that subsequent preparation work can proceed on the basis that a high proportion of relevant documents have been properly identified and that the number of irrelevant documents in the dataset is as small as possible.
Concerns about the reliability of the algorithm and the quality of the classification decisions (whether for discovery or simply to establish the performance of the TAR system) can also be established by the reference sample, or drawing other samples to compare whether the TAR system’s prediction aligns with human review (within the statistical parameters).
Of course, this means that the coding of the reference sample and the training sets need to be of high quality. This is particularly so when it is remembered that for a data set of 10 million documents, typically as few as 10,000 documents will be included in the training sets used by the algorithm to establish relevance. Accordingly, any error in the concept of relevance or inconsistency in the training set documents will be extrapolated over the whole dataset of 10 million documents.
It is therefore important that the reference set of documents and the training sets be coded by relatively experienced lawyers who understand the case well. If a team is used, that team should be as small as possible and decisions appropriately cross-checked to ensure consistency. Concerns over extrapolating errors can therefore be avoided by appropriate quality control. It must also be remembered that large manual reviews done by many young lawyers will, subject to very expensive quality control, generally produce poor outcomes in respect of consistency, recall and precision, as referred to in the Irish Bank case and in academic studies.
TAR systems have given the profession an opportunity to increase its efficiency dramatically and significantly reduce the costs of dealing with documents in construction disputes. Nevertheless, the technology does not achieve perfection: there is still significant work required in maintaining quality and then in respect of the balance of the documentation which the algorithm identifies as potentially relevant. Accordingly, this technology is not a panacea for eliminating legal costs in dealing with documents. However, it will significantly improve efficiency in dealing with big data and reduce the costs of managing complex litigation, prevalent in the construction industry.