Alan Turing, the British mathematician who broke the German Enigma encryption code during the second World War, devised what has become known as the Turing test to determine whether a machine is capable of imitating human-like intelligence. Recently an Australian government senate committee used his fabled test to assess whether a generative artificial intelligence large language model (GenAI LLM) could match or even surpass the quality of work done by its own staff.
Together with a consultancy team from Amazon, the committee held a five-week exploratory trial in which public submissions to a parliamentary inquiry were summarised by both a chosen LLM and by humans. The resulting summaries were then individually blind-tested and evaluated by five business representatives. These assessors were not told that GenAI was involved.
Following their work, the assessors were informed of the true nature of the trial and asked why they had rated each summary as they had. The assessors were only then made aware of the use of automated summaries, although three of the five remarked that they had suspected a GenAI trial.
The results showed that the GenAI summaries performed lower on all criteria compared to the human summaries (in total, 47 per cent versus 81 per cent) – thus failing Turing’s test.
Stealth sackings: why do employers fire staff for minor misdemeanours?
How much of a threat is Donald Trump to the Irish economy?
MenoPal app offers proactive support to women going through menopause
Ezviz RE4 Plus review: Efficient budget robot cleaner but can suffer from wanderlust under the wrong conditions
The reviewers felt that the automated digests often missed emphasis, nuance and context, included incorrect information and missed relevant information, and sometimes introduced irrelevant commentary. They concluded that GenAI had been counterproductive and in fact created further work because of the need to fact-check and refer back to the original public submissions.
An extract of the committee discussions and the full report, are publicly available (https://www.aph.gov.au/DocumentStore.ashx?id=b4fd6043-6626-4cbe-b8ee-a5c7319e94a0).
Wall Street appears to be increasingly sceptical that GenAI will significantly pay off. From the perspective of investors, the touted “transformational” technology is so far extremely expensive compared to its actual business impact and has not yet offered any “killer-app” for the public.
[ AI pay-off doubts hang over earnings in big techOpens in new window ]
Microsoft’s capital expenditure is up 75 per cent year on year, and the company sunk almost all of its second-quarter profits – some $22 billion – back into cloud and GenAI investment. Alphabet (Google’s parent) has been less forthright about its GenAI investments, but admits its capital expenditure will be “notably larger” this year than last. Amazon is similarly guarded but has so far spent $30 billion on capital expenditure this year compared to $48 billion in 2023.
Meanwhile, the chief executive of ChatGPT’s OpenAI, Sam Altman, is trying to persuade the US government to join investors in a national GenAI infrastructure initiative that would cost “tens of billions of dollars”, including data centres, power generation and national grid upgrades.
In June, Goldman Sachs published a controversial report GenAI: Too much spend, too little benefit? (https://www.goldmansachs.com/images/migrated/insights/pages/gs-research/gen-ai--too-much-spend,-too-little-benefit-/TOM_AI%202.0_ForRedaction.pdf), in which several analysts debated the likelihood of economic upside over the next decade from GenAI. The firm concluded there is still scope for investor returns, either because GenAI may eventually deliver or its investment bubble may take some time to burst.
Despite not financially justifying its heavy investment, GenAI technology has nevertheless intrigued and fascinated. New search engines such as Perplexity.ai are considerable improvements over the old Google, albeit perhaps six to 10 times more expensive to operate. Assistants such as GitHub Copilot aid routine software development but are frustrating when they generate incorrect code. The hyperrealistic photographic images generated from tools such as Flux 1 from Black Forest Labs have promising commercial potential for online shopping for virtually trying on clothing and accessories before buying, but such use-cases have yet to be proven.
Has Irish house price inflation peaked?
Proponents of GenAI believe that we are still in the early stages of the technology and in particular point to augmenting GenAI with autonomous actions. Such AI agents can proactively plan and execute tasks, including modifying their behaviours based on their experiences of prior actions. For example, an automated holiday assistant might not only book flights and accommodation, but also tailor excursions and entertainment based on its experiences of the user and those of other vacationers.
To explore how teams of GenAI agents might autonomously behave, San Francisco based start-up Altera has given birth to 1,000 autonomous agents within the open-world Minecraft game. The agent community has created its own culture, economy, religion and government (see the YouTube summary at https://www.youtube.com/watch?v=2tbaCn0Kl90). The townsfolk established a market for trading goods, but the community priest actually has become the richest citizen by bribing everyone to convert. Proposed constitutions have been compared, amended, and voted for adoption. When some individuals went missing, the community illuminated the area with torches to guide the lost back. None of these activities were preprogrammed, and the federated AI community autonomously derives its own planning, co-ordination and actions.
GenAI may not as yet pass Turing’s test but his test explicitly targets a comparison to human intelligence, which is perhaps a narcissistic evaluation. Arguably, we are now witnessing a different form of intelligence which we do not fully understand, and cannot yet accurately mathematically analyse and predict. Equipping such intelligence with the capability to autonomously carry out its own actions raises philosophical, ethical and practical implications.
- Sign up for Business push alerts and have the best news, analysis and comment delivered directly to your phone
- Find The Irish Times on WhatsApp and stay up to date
- Our Inside Business podcast is published weekly – Find the latest episode here