Speechify my love

8 Jan

Speechify is an AI-powered voice assistant app and extension that converts text (from documents, web pages, books, PDFs) into natural-sounding speech, allowing users to listen rather than read, boosting productivity, and aiding accessibility for those with reading difficulties like dyslexia. Key features include lifelike voices in multiple languages, voice typing, AI summaries, and the ability to clone your own voice, functioning across mobile, desktop, and browser platforms.

That’s an AI overview. And I use the term AI loosely to mean the increased sophistication of tools developed recently for searching large language models (LLMs). Whilst I acknowledge its artificiality, in no sense do I think of it as intelligent. My overview is that, like many recent IT developments, this app is simultaneously both brilliant and dangerous. The dangers arise because of the absence of any policies and principles for creating it beyond the freedom to launch an app and the relative success of that app in the marketplace. One principle in particular is absent - one that would necessarily exist (for better or worse) if you were releasing a new drug, launching a rocket, or carrying out a development activity that would risk polluting a water course - the precautionary principle.

To illustrate my point, a particular feature of the app that intrigued me is the creation of “lifelike” podcasts. That’s potentially a great way for some people to consume complex or difficult content. To test this I asked the app to create a podcast from William Shapespeares Hamlet. I am familiar enough with the play having studied it at school, read it for pleasure and watched several theatrical and cinematic adaptations (and also because I’m going to go and see the movie Hamnet tomorrow night).

The app took one minute to turn one of the all time great works of literature into 12 minutes and 20 seconds of amiable chit-chat. The experience is somewhat like overhearing a Mexican telenovela - one that has been concocted from a readers digest condensed book - being discussed earnestly by two precocious American sophomores. The sing-song repartee of these two bots (a boy and a girl) is in places contaminated by pollution from the LLM; specifically Jake and Rachel have access to a smorgasbord of literary opinions from the internet and seemingly select a handful of these that thematically tie together. The LLM is like a dense forest. And what the AI model has done is gather a number parrots of a particular colour and present them as representative of the whole forest. The need to move the conversation forward, and cover the text in chronological order, and imitate a plausible sounding dialogue (giving an equal number of plot points and observations to each of the two participating bots) has strangulated the whole endeavour. What comes out therefore is a “take” on Hamlet. Whose, it is hard for me to say. A professional literary critic might be able to recognise the authentic voice of one of their academic colleagues peering out at them from the murk.

So far, not that dangerous. So someone has a shitty take on Hamlet they got from an app. So what? But I gave the app a second test: to summarise an academic textbook into a podcast. But this time I did not supply the text of the book, just the cover. The app has the feature of being able to summarise the written word even in the absence of the original by utilising the LLM. So this presumably comprises criticism of the text by people other than the author. It maybe also includes reflections on the text by the author and/or their publisher, “blurb” as it is known, and perhaps interviews.

The outcome of the second experiment was much more concerning. I used Sharon Beder’s “Environmental Principles and Policies: An Interdisciplinary Introduction” because it is a book I frequently recommend to students carrying out a course of study for either the NEBOSH Environmental Management Certificate or the ISEP Foundation Certificate in Sustainability and Environmental Management, both or which I teach. If I may summarise it without the use of AI, the book is divided into five parts with an introduction preceding and a conclusion, bibliography and index following. She introduces six principles in the Introduction and these are discussed and expanded upon (in a slightly different order from the introduction) in the first two parts of the book, namely:

Sustainability Principle
Polluter Pays Principle
Precautionary Principle
Participation Principle
Equity Principle Principle
Human Rights Principle

She also introduces Policies through the topic headings; Environmental Legislation; Environmental Concern Peaks and Economic Instruments. And the latter three parts of the book - III. Economic Methods of Environmental Valuation, IV. Economic Instruments for Pollution Control and V. Markets for Conservation - consider the intersection of the aforementioned principles with policy making in respectively, general economic terms, in terms of specifically the control of pollution, and then in terms of the use of market to influence behaviours.

What the podcasts does with this is effectively to create a false controversy. Instead of the carefully considered words of an academic trying to comprehensively navigate the undulating landscape in such a way as to draw a student into the nuances, the podcast comes across as though Beder is an activist who has written a trenchant polemic against the evils of capitalism. Each of the Principles is discussed by “Jake” and “Rachel” as if it is a universal truth that has been cynically subverted by bad actors. Jake and Rachel are no longer Sophomores. They have now dropped out of college to join Extinction Rebellion and smoke weed. What was a “take” has become polarised; has become in fact a political position. Now just because it is a position I happen to agree with, doesn’t mean I can ignore the moral and intellectual hazards that lie in the path of anyone using this technology to learn about the world.

I suppose I gave it an impossible task and it failed and I shouldn’t have expected any different. But apps like Speechify are “Moel Siabod” Engine No.5 sputtering up the Eryri of knowledge. I mean to say they are like the inefficient clunky locomotive taking a coachful of passengers up a direct shortcut to the summit of Mount Snowdon. No one got to explore the mountain from any perspective but the one the railway carriage window allows. The experience of awe at the scale of the mountain of knowledge, let alone the experience of conquering some bit of it, is entirely bypassed. The photo opportunity at the top is just like the spine of one of those very Readers Digest condensed books: meant to create a pleasing and comprehensive looking symmetry in appearance, but an ersatz facsimile of erudition, with none of the hard won knowledge that the original volume contained and that the patient reader could obtain. When those condensed books are not the witterings of Barbara Cartland but actual academic texts, it as though you are now looking at the mountain through the carriage windows, through a further haze of carbon dust and diesel smoke, and then again through coloured spectacles. Where one sees the rose tinted uplands, his neighbour observes Mount Doom.

The Precautionary Principle “advocates taking preventive action against potential harm to health or the environment, even when scientific certainty about the risk is lacking, preventing inaction from becoming more costly later” to quote Google Gemini (who paraphrased and amalgamated definitions from EUR-LEX and Wikifrickinpedia). Beder instead chose the COMEST definition.

When human activities may lead to morally unacceptable harm that is scientifically plausible but uncertain, actions shall be taken to avoid or diminish that harm.

COMEST is formally known as The World Commission on the Ethics of Scientific Knowledge and Technology (COMEST). It is an advisory body and forum of reflection that was set up by UNESCO in 1998. Such a definition - though it could not possibly have anticipated it - must of necessity include the use of AI in the academic and scientific realms. But rather than discuss the ethics first and then create a landscape for the release of this technology into society in a controlled and managed way, fully cognisant and accepting of the risks it poses, we are rushing headlong towards implementing this technology into every aspect of life on a whim. Governments and corporations vie with each other as to who can get to the treasure first; who can build the clippers and railroads to carry it back to their shores. AI is touted like snakeoil as the cure to our ills, when it really is an accelerant.

The impact of AI on - to name just one resource - copper, is astronomical (Reuters, FT Bloomberg). The energy demands of data centres entail burning as much or more energy than we have been saving to try and meet the challenge of climate change (CarbonBrief). So I don’t think its unfair to call AI dangerous because we could be enabling and facilitating error and destroying our world, just in order to propagate more error. AI is a Tower of Babel; an edifice rebuilt out of brittle carbon and incorporating filigree seams of bright and attractive copper, perhaps resembling a mountain but lacking any of the diversity, variety and splendour of the thing its trying to replace.

Stacey Collins

Speechify my love

Congruent Safety

Location

Contact

Speechify my love

New ISO 14001 Update incoming

Congruent Safety

Location

Contact