The article presents the text of speech delivered by Professor Luciana Duranti, the School of Library, Archival and Information Studies, University of British Columbia, at the International scientific and practical conference “From parchment to digit” (Kazan, April 18-20, 2018). The title of Ms Duranti’s speech is “Digital archive: completeness, credibility, safety” which involves continuity when translated into the theoretical archival concept of trustworthiness, as well as change, when related to how to ensure that the reliability, accuracy, and authenticity of digital material – its trustworthiness, that is – be maintained in the long term, even permanently. In archival theory, records’ reliability – that is, their trustworthiness as statements of facts – is based on the competence of their author and the controls on their creation; records’ accuracy – that is, the correctness and precision of their content – is based on the competence of their author and the controls on content recording and transmission; and records’ authenticity – that is, the trustworthiness of records which are what they purport to be, untampered with and uncorrupted – is based on their identity and integrity. If the tested and tried way of accessing the historical truth, even in the digital environment, is to rely on the documentary truth, understanding whose truth we are dealing with requires archivists to use traditional archival principles, concepts and methods; collaborate with technology experts while cultivating their disciplinary and professional knowledge; and produce functional requirements, tools, and guidelines to ensure reliable and accurate records creation and a fully contextualised authentication.
Выступление профессора школы библиотечных, архивных и информационных исследований Университета Британской Колумбии Лючианы Дюранти на Международной научно-практической конференции «От пергамена к цифре» (Казань, 18-20 апреля 2018 г.). Цифровой арxив: полнота, надежность, безопасность ‒ подразумевают постоянство, выражаясь в теоретическом арxивном понятии достоверности, и перемены, касающиеся способов обеспечения надежности, точности и аутентичности цифрового материала, его достоверности, на долговременной и даже постоянной основе. В теории арxивоведения надежность документов, т. е. иx достоверность в отношении изложения фактов, основана на компетентности автора и мераx контроля и управления над иx созданием; точность документов, т. е. корректность и правильность иx содержания, основана на компетентности автора и мераx контроля и управления над записью и передачей содержания; аутентичность документов, т. е. надежность документов, являющиxся именно тем, чем они представляются, не подвергавшиxся несанкционированным изменениям и порче, основана на иx идентичности и целостности. Если испытанный и опробованный способ получения доступа к исторической истине, даже в цифровой среде, заключаются в доверии к документальной истине, то понимание того, с чьей истиной мы имеем дело требует от арxивистов использования традиционныx арxивныx принципов, взглядов и методов; сотрудничества с теxническими специалистами, развивая иx дисциплинарные и профессиональные знания; и разработки функциональныx требований, инструментов, методов и руководящиx документов с целью обеспечения создания надежныx и точныx документов и полностью контекстуализированной аутентификации.
Digital archive, digital record, record’s authentication, digitized records, International scientific and practical conference “From parchment to digit”.
Цифровой архив, электронный документ, аутентификация документа, оцифрованные документы, Международная научно-практическая конференция «От пергамена к цифре».
The title of this conference, “From Parchment to Digital,” conveys a sense of continuity as well as change. The specific issues I was asked to cover, “Digital archive: completeness, credibility, safety,” involve continuity when translated into the theoretical archival concept of trustworthiness, as well as change, when related to how to ensure that the reliability, accuracy, and authenticity of digital material ‒ its trustworthiness, that is ‒ be maintained in the long term, even permanently.
Beginning with theory, let’s reaffirm the meaning of trust. Some view trust as a four-level progression: from individual, as a personality trait, to interpersonal, as a tie directed from one person to another (son to father), relational, as a property of a mutual relationship (people doing business), and societal, as a feature of a community as a whole. InterPARES Trust, the international multidisciplinary research project that I direct, defines trust as “confidence of one party in another, based on an alignment of value systems with respect to specific actions or benefits, and involving a relationship of voluntary vulnerability, dependence, and reliance, based on risk assessment”. Substantially, trust involves acting without the knowledge needed to act, by substituting the information that one does not have with other information, e.g. the testimony of witnesses, oral tradition, or documentary truth. This is because the historical truth is not directly accessible: facts and acts slide into the past as they happen and can only be known through oral or written accounts of witnesses and the material instruments that embody them, the records.
In the context of written cultures (and i use the term “written” in the diplomatic sense of information affixed to any medium in any form to transmit it across space and/or through time), records, and the archival bodies of which they are part form the infrastructure through which beliefs and values are upheld and understood: they provide evidence of facts and acts, where evidence is defined as the relationship between a fact to be proven and the fact that proves it.
This has been the case since antiquity, which regarded records as capable of preserving perpetual memory of the facts and acts from which they result (as opposed to “about which they talk”). Thus, their trustworthiness has been traditionally based on the procedures of creation and use; on deposit in a public place (i.e. the place of an authority considered sovereign by a given social group); and on the fact that they were not kept in order to serve the interests for which they are used later by researchers, but for carrying out usual and ordinary activities.
For these reasons, in archival theory, records’ reliability ‒ that is, their trustworthiness as statements of facts ‒ is based on the competence of their author and the controls on their creation; records’ accuracy ‒ that is, the correctness and precision of their content ‒ is based on the competence of their author and the controls on content recording and transmission; and records’ authenticity ‒ that is, the trustworthiness of records which are what they purport to be, untampered with and uncorrupted ‒ is based on their identity and integrity.
While these concepts remain valid for a presumption of trustworthiness regardless of medium, they are easily applied only to records that remain affixed to the same medium overtime, whether they are drafts, originals or copies. When changing the medium and the technological environment of a record becomes a requirement for its continuing accessibility, readability, and preservation, trustworthiness cannot be presumed in all cases in which the conditions described above are declared to exist, but needs to be verifiable and often also verified. In other words, records and the archival bodies in which they belong have to be authenticated. But even authentication may not be sufficient, as it does not imply reliability and accuracy, or even continuing authenticity.
Authentication is a declaration of authenticity at a given moment in time, based either on material proof, inference, or deduction. Traditionally, a chain of legitimate custody is sufficient ground for authenticating a record. In the digital environment, a digital chain of custody ‒ that is, the information preserved about a record, its changes, and its contextual relationships, that shows specific data was in a particular state at a given date and time ‒ is usually considered sufficient ground for authenticating digital records. Also, as stated by the Canadian Government Standards Board 72:32/2018 standard on “Electronic Records as Documentary Evidence”, a declaration made by an expert attesting to the trustworthiness of the system hosting the record can serve as authentication of digital records submitted as evidence in a court of law.
Archivists, however, must be concerned also with an authenticated record’s duplication integrity, which means that the process of creating a copy either for acquisition, access or preservation purposes does not modify a record (either intentionally or accidentally) and the output is an exact bit copy of the original data set (form, content and composition data). Duplication integrity is also linked to time, and time stamps are used for that purpose. But the key is to know what we are duplicating. To make a copy, for example a pdf, is to make a selective duplicate, in that we only reproduce what we see, and rarely there is an expectation of completeness, because a copy only provides an incomplete picture of a digital object. Forensic duplication, on the contrary, is a bit by bit reproduction of the storage medium and its content, including ambient data, swap space and slack space*. A full copy of the data on a storage device can be done regardless of operating system or storage technology.
Forensic duplication is very much concerned with process integrity, which relies on the “principle of non-interference” ‒ which means that the method used to re-produce or re-create a digital record does not change the digital entities ‒ and the “principle of identifiable interference” ‒ which means that, if the method used does alter the entities, the changes are identifiable and identified. These principles embody the ethical and professional stance of a neutral third party, the designated trusted custodian, the social institution responsible for preserving the sources of evidence: in the case of records, the archives.
It is interesting how in North America not much weight is given to technology dependent authentication, which is so prevalent in Europe. I am talking in particular about the digital signature. As it protects bitwise integrity (i. e. a small change in a bit means a very different value presented on the screen or action taken in a program or database), verifies a record’s origin (i. e. a component of its identity), and makes a record indisputable and incontestable (i. e. non-repudiation), the digital signature has been given legal value in Europe by legislative acts. However, the digital signature is enabled through complex and costly public-key infrastructures (PKI), demonstrates authenticity of information only across space, not time, and is subject to obsolescence, thereby compounding the problem of preservation. Furthermore, it presents security issues, as the keys are as safe as the persons responsible for them are reliable and continue to exist, and verification issues, as the related certificates eventually expire.
In light of the challenges presented by digital signatures, another type of technology dependent authentication is rising to attention, blockchain technology. The InterPARES Trust project has studied in depth the use of such technology, on the one hand, to overcome the problems presented by the expiring certificates linked to digital signatures, and, on the other hand, to investigate the possibility of developing the technology in such a way that it can be used for recordkeeping and preservation. This research, led by Hrvoje Stancic and Victoria Lemieux respectively, is discussed in their reports.
However, regardless of the success of technological authentication, as mentioned earlier, two questions remain:
1. How do we know that technologically authenticated records are reliable and accurate (i. e. credible as to content)?
2. How do we convey this knowledge to the people who should be using digital archives as primary sources of knowledge?
The reliability and accuracy of a record used to be largely deduced from its degree of perfection, also called “status of transmission”, which is the degree of perfection of a record and is indicative of the authority of a record’s content. In the context of each creation environment, a record can be transmitted across space (from person to person) and/or through time (by saving it for further action or reference) as a draft, which is a document prepared for purposes of correction and meant to be provisional and temporary; as an original, which is the first, complete document capable of reaching the purposes for which it was intended (primitiveness, completeness, and effectiveness are necessary characteristics of all originals); or as a copy, which is a reproduction of another document whose status can be original, draft or copy. There are many different types of copies, categorized as authentic copy, facsimile, copy in the form of original, imitative copy, simple copy or transcription, insert, inspeximus or vidimus, and each of these types has a different degree of authority.
In the digital environment, we can have digitized records or born digital records. The former are copies of records on other media in any status of transmission, are stable, and, if authenticated (i. e. declared conform to the record they reproduce by an officer who has the authority to do so) have the force of an original. The latter, the born-digital records, exist as originals for a nano-second (upon creation – when they are received by an addressee or by the creator’s internal recordkeeping system, and saved) because they break down into their digital components as soon as they are closed. This means that a born-digital record cannot be maintained as made or received. We can only maintain our ability to reproduce or re-create it in a trustworthy way. This of course means that the most authoritative status of transmission, the original, does not live in a digital environment.
In the absence of originals, the author first, then the creator (i. e. the person accumulating the archival fonds in which the record belongs), and its legitimate successor(s) need to either make or identify a “trustworthy copy” for maintenance and use as the reliable and accurate record. Metadata about such authoritative copy’s creation, original characteristics, and incremental contextual relationships is usually considered sufficient ground for determining the reliability and accuracy of a record and establishing the credibility of its content. However, most records creation environments are not highly controlled in terms of competent personnel, processes, and technologies and, when a record cannot speak for itself as a trustworthy source of information, but its authority relies on the trust of those who read it in those who created or maintained it, how do we protect ourselves and our society from misinformation and disinformation?
It is a fact of our times that even the presence in a creation environment of competent records professionals and controlled procedures does not constitute a reasonable guarantee of the reliability and accuracy of records. Records are falling victim to politicians and administrators who fear being held accountable for their actions, and either destroy or do not create them. In Canada, this situation prompted Information Commissioners to call for a “duty to document”. But also the legislation resulting from this call is not sufficient. Legal scholars talk about the erosion of the persuasive value of evidence and the view of facts as threats, and express a general consensus that existing legislation is unable to address the production and dissemination of “disinformation”, that, information that is incorrect by design.
Furthermore, the technical infrastructures that gather and store data have become increasingly complex, often invisible, and hidden, and records professionals are at a loss to capture much, if any, provenance data about the information found in these infrastructures and, often, even to understand their scope and scale, who controls them, which systems are overtly or covertly collecting the data in them, or how to prevent them from doing so; this gives rise to “misinformation”, that is, information that is incorrect for lack of sufficient contextual data.
Library and information science professionals have been active in discussing this phenomenon, mostly from the point of view of the user rather than the creator. Noting that misinformation and disinformation, and the need to combat them, are nothing new, they cite Socrates and his “test of three”: is the information true? Is the intent to share it good? Is knowing it useful to the recipient of the information? Calls to “certify” information acknowledge the challenge of doing so, citing its volume, the democratization of its creation, which makes it difficult to know what an authoritative source is, and the potential for falsity and pretense. Most of the literature emphasizes the important role of critical thinking and media/information literacy, and argues that librarians and other information professionals serve as impartial mediators, educating the public on how to think critically about information presented to them. A key theme emerging from this literature is the changing way in which society assigns value to information: no longer is value determined by the authoritativeness or reliability of the creator, but rather by the breadth of its circulation.
In the social sciences there is an abundance of literature on misinformation and disinformation, concentrated mostly within psychology, political science and social science, and focused largely on the question of how to correct misperceptions, with the proliferation of false facts about politics, climate change and the safety of childhood vaccinations as the examples most frequently examined. Several commentators found that corrections often fail to reduce misperceptions and may actually increase the persistence of a misperception through the repetition of it. A number of studies have been undertaken to explore the reasons why corrected misperceptions persist, and the efficacy of different correction measures.
Computational scientists flag very significant issues with how algorithms work and their interactions with disinformation. Despite repeated promises by social media platforms to tackle and control them, they are still extremely vulnerable to the effects of their own algorithms and it is becoming clear that their business models are in conflict with what is needed to push accurate, well-sourced news. Thus, some jurisdictions, like New York City, have taken the initiative of requiring transparency and accountability for algorithms. The entire December 2017 issue of the Journal of Applied Research in Memory and Cognition is about disinformation and misinformation (https://www.sciencedirect.com/journal/journal-of-applied-research-in-memory-and-cognition/vol/6/issue/4); one of the articles proposes “technocognition” as the solution, that is, using what we know about psychology to design technology in a way that minimizes the impact of misinformation and disinformation. By improving how people communicate, we can improve the quality of the information shared.
Behavioural economics and auditing can also be sources of inspiration on how to ensure the reliability and accuracy of records and to teach users to trust digital archives. Humans use heuristics to make decisions, often accepting an available option as satisfactory, rather than selecting the optimal outcome. “Nudges” (small stimuli used to influence people or organizations) can lead people to make better decisions, though many consider them manipulative and paternalistic, because they may be deployed for non-altruistic reasons. The literature suggests that people may not respond to “the best evidence”, whether due to changes in human psychology (e. g., the influence of technology on the way we think and process information) and in the external environment (e. g., the demise of the traditional media), or because of other influences. In trying to correct erroneous beliefs caused by disinformation, we may actually reinforce those beliefs. More successful corrections might be to include the true information in several records through various cross-references; ignore the false information and don't repeat it (even if only to correct it); make true information as accessible as possible; and prepare information recipients to expect misinformation. Efforts to correct disinformation must be mindful of how people assess and accept the “truth” of evidence. The source of the correction also matters, as we are more accepting of evidence that is contrary to our beliefs when the source of that correction has a vested interest in that correction being untrue.
One thing records and archives professionals, as well as the users of the records, can learn from auditors is professional skepticism. Broadly speaking, this can take two forms: a presumption that all evidence is a lie until proven otherwise (“presumptive doubt”), or a neutral but cautious consideration and assessment of all the evidence (“neutrality”). Professional skepticism is comprised of a questioning mind, a suspension of judgment until sufficient evidence is available to form an opinion, a search for knowledge (curiosity/interest), interpersonal understanding (consideration of motivations and integrity of those that proffer evidence), self-esteem (ability to resist persuasion and to challenge others), and autonomy (ability to assess the evidence).
In conclusion, if the tested and tried way of accessing the historical truth, even in the digital environment, is to rely on the documentary truth, understanding whose truth we are dealing with requires archivists to use traditional archival principles, concepts and methods; collaborate with technology experts while cultivating their disciplinary and professional knowledge; and produce functional requirements, tools, methods, and guidelines to ensure reliable and accurate records creation and a fully contextualised authentication. But this will only address the first of our outstanding challenges. What about the second? Directing the people towards trustworthy records and archives and away from disinformation?
There is something we archivists can do. We can design tools to “nudge” people towards our infrastructure for documentary truth, perhaps even slicing and dicing it for targeted audiences, just like Facebook does; we can create different blueprints for characterizing our infrastructure to diverse potential users, just like Google does; and we can develop capabilities enabling people to easily trace, access, and assess records in context click after click, fast and easily, just like Wikipedia does.
Only then people will know that, also in the digital world, archives are the means to unveil and denounce misinformation and disinformation and get to the truth, even if only the documentary truth and a partial one.
Сведения об авторе
Лючиана Дюранти, профессор школы библиотечных, архивных и информационных исследований Университета Британской Колумбии.
About the author
Luciana Duranti, is a professor of archival science at the School of Library, Archival and Information Studies, University of British Columbia.
В редакцию статья поступила 18.04.2018, опубликована:
Дюранти Л. Доверяя цифровым арxивам (на англ. яз.) // Гасырлар авазы ‒ Эхо веков. ‒ 2018. ‒ № 2. ‒ С. 30-37.
Submitted on 18.04.2018, published:
Duranti L. Doveryaya tsifrovym arxivam [Trusting Digital Archives]. IN: Gasyrlar avazy ‒ Eho vekov, 2018, no. 2, pp. 30-37.
* “Ambient data” refers to the data saved through the process of auto-save functions included with office productivity programs which write temporary snapshots of an open file to the disk at set intervals. “Swap space” is the portion of the hard disk that the system uses as extension of its RAM during operation. It is termed as virtual memory in the Windows world. The forensic investigators recover significant ephemeral data such as password and encryption keys through swap space. “Slack space” refers to the space available on a cluster even after an active file is stored in some part of that cluster. This arises from the fact that space is allocated in fixed cluster sizes even if the file size is less than the cluster size. The data present in the remaining area of the cluster is not overwritten and reflects data about a past file that was using the cluster and is called slack.