Doing things the way it's always been done, but better (Qualified)
Scientists study the Harwell computer tape : Copyright Express & Star Newspaper Ltd

Doing things the way it's always been done, but better (Qualified)

Acknowledgements: With kind thanks to Linda RaftreeRick Davies and Kim Forss for their encouragement, support, feedback and critique.

If you would like to catch up with the previous sections, please follow the links...

~ Part One ~ Computation? Evaluate it!

~ Part Two ~ Distance Still Matters

This article is part three of a series exploring the links between evaluation and technologies. It explores the gulfs between digital data-centric and analogue data-centric worlds, seen in not only the workflows but in the very fabric of the buildings where we work. In the evaluation world we collect and analyse data, but arguably it did not occur to us that the massive body of work we have produced is its own dataset, one that repels new technologies’ ability to mine it through its sheer unstructuredness. This calls for cooperation between social and data scientists to make use of it.

With apologies to Mr. Adams...

“I love paradigm shifts. I love the whooshing noise they make as they go by.”

Qualifying it...

One of the things that strikes me as interesting, when reflecting on how we use and apply technology, is how fast things move, and how messy things get. To illustrate this, I found it helpful to think about how some bleeding edge technology companies organise. If you have the opportunity to chat with folks who work in computer games development, or high-end film and TV post-production, you can discern that the work pipeline is all important. The facilities and the organisation of work are built around the application of skills and techniques through the pipeline. Much like the factory model of manufacture, data is: collected, generated, documented, processed, evaluated, refined, modelled, organised, reported and stored (amongst other things). And whilst the material of that work is not data, it is actually data. In some cases, the building is specially built or at the very least its fabric adapted significantly to ensure that the network computing infrastructure can do what it needs to, much more so than the basic needs of an average office. It demonstrates how important computing is to their businesses.

Equally, the pipeline of tools which are adapted to each project provides an end-to-end data system that includes everything to facilitate the production of the digital products. There are teams of programmers and engineers, building, maintaining, refining and developing within each project, returning lessons learned and innovation back to the wider company. Having had a peek inside one or two of these facilities, and having friends who work within them, I have glimpsed a working environment that holds data as central to the work.

This (poor) description of an industry I barely comprehend gives me pause for thought in relation to what my work and sector look like. Whilst there is obviously variation, I would wager that for most organisations and companies in the international development sector (except for instances in very large providers or in niche innovators) most of the time do not resemble post-production studios. More likely, we would describe a fairly basic office environment, with laptops, some servers, some systems and workflows, and probably overworked IT staff, scratching their heads in bemusement and exercising unimaginable patience with colleagues (like me sometimes). I would posit that (for obvious reasons) our sector is not so data and tech centric by design.

No alt text provided for this image

Switching downwards, from buildings and infrastructure to workflows before I delve a little more into machine learning, I will explore a specific aspect of qualitative practice. And I further illustrate how we have probably missed out a bit in our work practices (or maybe I am just re-framing a pet annoyance): We don't code well enough! And by this I mean we don’t code the literature and reports that we create. We are (probably) all familiar with coding when conducting literature reviews, and some of us are moving towards newer tools and processes. In that sense we are recoding other people’s work for our purposes. When was the last time you found a document that the author had coded as part of their process, making your life that much easier? Of course, I appreciate that when we code documents for our own specific inquiry, we cannot expect the author to have predicted what we would then look for and anticipate our coding needs. But an author’s coding efforts make life a lot easier to explore the document. OK, so it is not completely true - the table of contents does provide an index of sorts, so it can be considered structural coding to a point. And there is usually a list of acronyms, and a bibliography, plus some annexes, which gives us the basic 'academic' report layout, and provides us with something to work with.[58] But it is easy to envisage a blockchain based publication system where original documents retain their veracity, displaying their chain of custody of data, and the community of practice can upload and link parallel codings and derivative data and documents; along with citations that can be easily tracked and mapped. The adoption of a networked, multi layered publication format, that is both human and machine readable, seems pertinent and timely.

So why I am I reflecting on this? I design and direct a number of research and evaluation quality assurance systems. I am often looking for evidence within an evaluation product, e.g. evidence of triangulation so that I can qualify the quality of it. So it would be super useful if triangulation was labelled ‘triangulation’, and evidence labelled as ‘evidence’ etc. In fact, I would like the reports to be very structured, and actually if coming from the same organisation consistently structured across all the organisation’s projects. More than that, I would like the data structure to be evident throughout the project lifecycle, from Terms of Reference, through to inception, mid-term and all the way to the final evaluation report. It would be the most marvellous experience to be able to track the qualities, developments, methods, evidence, evolutions, changes, derivations all the way through the process and be able to do so coherently and easily for all concerned. Well designed and consistent coding alone, in my view, would revolutionise this work. If you have ever tried to prep evaluation reports for machine learning, you might find, like me, that the transformation of PDF doc into usable data is hugely frustrating and anything that structures the data and makes it easier to process is highly welcome.

This seems particularly relevant when looking at a body of work where the outputs form a portfolio of project cases. The ability to then facilitate cross case analysis becomes more possible, less arduous and probably better quality. A well designed and consistent workflow also then potentiates structured machine learning to return better results, as well as a whole host of other opportunities. 

No alt text provided for this image

Exploring this helps me attempt to elucidate my main point. That technology development is not just about hyped frontier technologies, but about the much needed investment in people, skills and facilities that hold the ability to effectively manage and deliver our development initiatives. The 'obvious reasons' that I mention above in stating that our sector is not tech centric by design, are part and parcel of the goals and visions of the international development sectors. We are often trying to help others develop their communities with, very often, rudimentary technologies to meet basic needs like access to clean water. In the current operating environment, resources are limited and the business case for spending is usually based on (strong or some semblance of) assurances of results. 

The cost of technology development and the tech model of business is usually contingent on factors such mass user take up, venture capital and advertising revenues. If an NGO, government or multilateral is spending vast amounts of money on technology and facilities, their constituents will raise not only eyebrows but questions in boardrooms, parliaments and assemblies. The traditional development industry doesn't have the money or the remit. The main option is for public-private-partnerships. Our current business model is small innovation units, start ups or partnerships with established players and the results can be mixed. We are working with two models of development - technology development and traditional international development. We have begun to play well together and the tension and friction is creative at the moment, but have fundamental paradigmatic paradoxes.

The criticality of my perfect data world as expressed in brief above is the messiness in the world that I mentioned. Even if we try to standardise workflows, ensure best practice of data integrity across all project aspects, and substantially improve our lot, we will still have different systems disaggregated across different geographies, languages, and thematic areas etc. We will still need to be able to accommodate the uniqueness of every context, project and evaluation. Obviously, well designed data system workflows can translate across group systems, and still have flaws and errors, and we will certainly add noise to the data set in numerous ways.  

So in qualitative research and analysis terms, we get to where machine learning has already made great strides. Technology assisted review (TAR) / eDiscovery / Predictive Coding has come on leaps and bounds in the legal sector for example.[59] The previous gold standard of human (only) review has been exposed as a myth where it has been pitted against machine learning solutions.[60] By having experienced analysts carefully coding documents for structured machine learning, legal document review is now potentially way better, and comes at a much lower cost to the client (nothwithstanding considerable investments by the company) presumably leading to better outcomes. Based on the concepts of Natural Language Processing (NLP) we seemingly find something near to a golden grail when we have the ability to train a machine to accurately match queries, find patterns and generally make very large language datasets easy to interrogate.

Whilst we can look to the legal sectors development and use of TAR, both as a practice and a pipeline of work and skills, we can also look to the bleeding edge of NLP to see where this is heading (and where it already is). NLP seems to have taken some great steps forward towards better language modelling last year with the release of a number of technologies that greatly improve its accuracy.[61] I am not nearly expert enough to explain the intricacies of these technologies (I am currently struggling to train a model to recognise triangulation in evaluation reports). But I observe that the approaches taken form a pre-trained language model (trained over months on large corpora of unlabelled texts). As a learning process this amounts to what some are describing as an ImageNet moment for the field.[62] These forays have yielded record beating approaches in testing, showing significant improvement from the previous generation of machine learning.[63] Having made these models available, we can input clean data into the model and gain semantic, contextualised results. So it will certainly begin to help with my triangulation training model.

No alt text provided for this image

For example, a semantic context search engine is now easily available to us. With a pre-trained ready-to-use model we can get great results.[64] But equally, on a project or sector or organisation basis we can also invest in performing our own model learning on our existing corpus of texts. This gives us the externally provided general pre-trained model, and a specific pre-trained model of our own. So we are able to interrogate and perform auto coding and other pattern recognition and NLP tasks. If you have followed along with my poor explanations... this is really quite huge step forward, and it is here, now.

Given that our sectors have done a very good job of reporting, recording and publishing our work for decades, we have produced a (really) huge corpus of publicly available knowledge that is only basically structured. It is unfeasible (but not impossible) that we would employ armies of document coders to go back through these mountains of work.[65] But with structured, semi-structured and unstructured auto-coding, it might very well be possible, or already happening. With so much of our output transparently available on the net, that data is available to the AI’s that can and do crawl the web and process what they find. Unlike the for-profit corporations, we do not guard our published data, or tightly control access and use. Like the giant tech companies, we aim to provide services to the world. Unlike the giant tech companies, we mostly do not monetise access to the data and resources we have published. 

That data now has a market place, and data has a price and a cost. We seem to need the technologists to help us process and utilise our own data that we have gathered and analysed. We also need to access data that other organisations and people have produced. For the private enterprise that places a satellite in space, it has a means to recoup a return on investment through selling access to the raw data. For the donor organisation operating for the social good, the return on investment is in the development and progress of the communities it works with. Data in that case is most often a product that is made available for all. We have data. They have tech. We give our data away freely, and they process it probably better than we can. As I alluded to, because we have not established structured pipelines and data standards that can serve us well with future proofing, even within single organisations, our abilities to utilise our own data as fully as we might is limited. We need to respond to this together. Significant questions remain over data ownership, access and usage, as well as over access to tech, technologists, and investment in the varying business models of the development sector. If the global networking project becomes (as it seems to have increasingly) both public utility and renewable resource.[66] And therefore a both a ‘right’ (formerly) and ‘raw material’ (latterly) we need to think about what that means for people and their big and wider data.[67] 

What's next?

Part 4 considers how we deal with quantitative data, and asks whether we can make better use of data for decision making. The tensions for evaluators arise with the introduction of algorithms, and mysterious "black boxes". But we also need to think about how humans’ assumptions and biases leak into our technologies, whether digital or analogue.

If you would like to catch up with the previous sections, please follow the links...

~ Part One ~ Computation? Evaluate it!

~ Part Two ~ Distance Still Matters

About the author

Jo Kaybryn is an international development consultant, currently directing evaluation frameworks, evaluation quality assurance services and leading evaluations for UN agencies and INGOs. “All thoughts presented are my opinion only. Mentions of colleagues past and present should be taken as recommendations, as I have always gained much from working with them. No one should take investment advice or suggestions in this series: none is intended, the series aims to present reflections, and add to the conversations in and around evaluation and frontier technology.”

References

[58] It was brought to my attention that there is a controversial internet meme that relates to coding, currently trending. I do not engage with much of the internet and social media. So I don’t fully understand what it means. However, I am told that learning to code is being used in reference to he technologization of jobs and industry (and acting unkindly). So it may be tangentially relevant, but not intentionally, with reference to any meme.

[59] Thomas C. Gricks III and Robert J. Ambrogi, A Brief History of Technology Assisted Review, 2015

[60] Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, Richmond Journal of Law & Technology, Vol 17, Issue 3, 2011

[61] Jay Alammar, The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning, 2018 

[62] Eds. Andrey Kurenkov, Eric Wang, and Aditya Ganesh, NLP's ImageNet moment has arrived, 2018 

[63] Eds. Andrey Kurenkov, Eric Wang, and Aditya Ganesh, Ibid

[64] Josh Taylor, ELMo: Contextual language embedding, 2019 

[65] A question would be: would we even want to, considering the enormous variability in quality? Thanks to Dr. Rick Davies for pointing out that it’s not necessarily a pile of gold that we’re looking at.

[66] “To be sure there are always sound business reasons for hiding the location of your gold mine, In Google’s case, the hiding strategy accrued to its competitive advantage, but there were other reasons for concealment and obfuscation. What might the response have been back then if the public were told that Google’s magic derived from its exclusive capabilities in unilateral surveillance of online behaviour and it’s methods specifically designed to override individual decision rights?” Shoshan Zuboff, 2019, ibid

[67] Antonella Bonanni, Why Exploring Thick Data Helps to Understand Human Motivation, 2019

要查看或添加评论,请登录

Jo Kaybryn的更多文章

社区洞察

其他会员也浏览了