The Loebner Prize from a judge's perspective

This year, for the first time in its history, the Loebner Prize competition was held in England, at the University of Reading to be precise. It was organised by Kevin Warwick and Huma Shah.

Independently of whether Turing might have been pleased (he was not well treated in this country, recall?), there was a satisfying sense of “coming home” of the Turing Test (henceforth TT). Expectations were high, and they very highly advertised too. The meeting was perfectly organised.

Having been invited to play the role of a judge, together with several other colleagues, including two members of the IEG, Mariarosaria Taddeo and Matteo Turilli (here are their pictures and Rosaria's interview), I enjoyed the opportunity to see from close-up the machinery and the TT. It was intriguing and great fun.

Because there were interviews with the BBC and other things going on, and because we were also supposed to take part in the parallel AISB Symposium on the TT, I had time to test only one couple, instead of the shortlisted four. It was sufficient to reassure me that our machines are not even close to resembling anything that might be open-mindedly called intelligent.

My first question was “if we shake hands, whose hand am I holding?”. The human, as expected, immediately answered, metalinguistically, that we should not talk about bodily interactions, signalling that he was human, as I hoped. Indeed, he turned out to be Andrew Hodges, recruited on the spot to interact with me on the other side of the screen. The computer failed to address the question entirely, and spoke about something else. It was the usual, give-away, tiring, Eliza-ish strategy, which we have now seen implemented for decades.

My next question was: “I have a jewellery box in my hand, how many CDs can I store in it?”, again, Andrew provided some explanation, the computer blew it badly. More Eliza. By then, we were running out of time, so I asked one last question to the computer: “The four capitals of the UK are three, Manchester and Liverpool. What’s wrong with this sentence?”. Once again, the computer went bananas.

During the Symposium, which was organised and moderated by Mark Bishop with his usual ability, several people, Andrew and myself included, defended the view that a serious TT would have to last much longer than five minutes. But this is as much because of the examined agents, and of the slow means of communication (you have to write/read everything on a screen), as because of the judges, and their lack of training. If you need to test, and I mean really test, an artefact, the higher the stakes are, the tougher the procedure should be. We do not have the same standards when it comes to testing the safety of a house’s central heating system or of an atomic power station. Why (artificial) intelligent behaviour should be left to be tested by the untrained “man in the street” remains a mystery to me. Unless that is the sort of dude you wish to fool. If the TT at Reading scored less badly than it could have, this is also because some of the judges were asking useless questions like “are you a computer?”. This means having missed two essential points of the whole exercise.

First, answers must be as informative as possible, which means that one must be able to maximise the useful evidence obtainable from the received message. It is the same rule applied in the 20 questions game: they have to make a difference to your previous state of information, and the bigger the difference the better. But in the example above, either “yes” or “no” will leave you absolutely unenlightened as to who is what, so that is a wasted bullet.

Second, questions must challenge the syntactic engine which is on the other side. The more a question can be answered only if one truly understand its meaning, the more that question has a chance of being a silver bullet. The first question I asked was already sufficient to discriminate between the human and the machine. It took a minute.

It might be that the Loebner Prize should be re-thought more like a chess tournament, where we could play imitation games with different levels of time control: long games (up to seven hours), short games (30/60 minutes), blitz games (three to fifteen minutes for each player), bullet games (under three minutes) and even one-question games (one minute). The computer I tested could not even pass the latter. I gave it a zero.

Parallel to the Turing Test, the AISB Symposium was meant to provide plenty of food for the biological minds around. I enjoyed the lively interactions, and found the first half of Oven Holland’s talk about the Ratio club interesting and informative.

I disagreed with several people, however, about the following issue. There seemed to be some coalescing consensus on the view that a machine will pass the TT only if it will be conscious. This is certainly not the case. The TT is a matter of semantics and understanding. And although we might never be able to build truly semantic machines – as I suspect – consciousness need not play any role.

Which is not to say that a conscious machine would not pass the TT. For it would, of course. Nor is it to say that some smart applications might never be able to deal successfully with semantic problems by other means. Some already do (isn’t it handy that Google knows better and tells you that your keywords are misspelled and should be so and so?). But then my dishwasher needs no intelligence (let alone consciousness) to do a better job than me.

What it does mean is that, after half a century of failures and zero progress, some serious reconsideration of the actual feasibility of true AI is a must, and making things immensely more difficult cannot help (although it might give some breathing space to a dying paradigm).

Instead, the argument seems to be that, since we do not have the faintest idea about how to build a machine that can answer a few intelligent questions or even win the one-question TT, the best strategy might be to go full-blown and try to build a machine that is conscious. As if things were not already impossibly difficult as they stand. It is like being told that if you cannot make it crawl, you should make it run the hundred metres under ten seconds, because then it will be able to crawl. Surely there must be better ways of spending our research funds.

The fact that nobody agrees on what consciousness is can help only insofar as it makes cheating and fooling the judges easier. If anything may count as consciousness, the game becomes easier. Turing, of course, knew better. He refused to define intelligence, so we should follow his advice and perhaps adopt a test for consciousness. I provided one in Consciousness, Agents and the Knowledge Game (Minds and Machines 2005, 15. 3-4, pp. 415-444), but I am sure other can be devised.

All in all, it was an instructive and entertaining experience, congratulations to all the Humans for passing the test of a successful meeting.

Comments

  1. Check out this Web 2.0 approach to chatbots: http://chatbotgame.com.

    Just as Deep Thought brute-forced it in chess with speed, the idea behind the Chatbot Game is to brute-force it with a huge number of user-submitted Google-like chat rules.

    ReplyDelete
  2. Hi Professor Floridi:

    The Loebner Prize has been held in the UK three times prior to the 18th manifestation of the contest in Reading.

    In 2001, it was held at the Science Museum. In 2003 (my very first attendance) it was hosted by the University of Surrey. In 2006, it was held in UCL's Torrington campus (organised by Tim Childs, CEO of Televirtual, and myself).

    Once again, thank you for agreeing to participate with such a busy schedule :-)

    ReplyDelete
  3. One other thing, Professor Floridi, it is Turing himself, in his 1950 paper, who used the term "average interrogator". There's a link to the paper on Hugh's Loebner Prize site:

    http://www.loebner.net/Prizef/loebner-prize.html


    I agree with you, re consciousness not being necessary in any machine that will pass Turing’s imitation game.


    Finally, the machine that you interacted with, in Loebner 2008, was jabberwacky; it did not come first or second in the Reading University hosted, 18th Loebner Prize contest.

    Eugene, the runner up to winning entry Elbot, managed to deceive one human interrogator (a Times newspaper journalist, no less), that it was human.

    During the preliminary phase of Loebner 2008, in June, this is what one of judges wrote of Eugene: “You could ask it the following: "My car is red. What color is my car?" It gave the correct answer of "Red" whereas all but two other programs either couldn't comprehend the question (or that there was a question) or just took a random guess."

    http://en.wikipedia.org/wiki/Eugene_Goostman

    ReplyDelete
  4. Slowly slipping into the abyss of my own neurological network by means of @Dawn21stcentury, intrigued as I am by the impact of accelerating technology on society, coming to grips with the fact that it is time for me to turn my back on my Business Background and focus instead on the field of Philosophy of Information, I would like to say: Thank you for this post and this blog in general! I've recently written this rather elaborate post on Artificial Intelligence (if I only knew why..)and your perspective as a judge was therefore great food for thought!

    Hartelijk bedankt & greetings from Holland

    ReplyDelete
  5. (法新社倫敦四日電) 英國情色大亨芮孟的公司a片昨天a片下載說,芮孟日前去世,享壽八十二歲;這位身價上億的房地產開發商,曾經在倫敦av女優推出第一情色視訊成人影片脫衣舞表演。色情
    av情色電影
    情色
    芮孟的av女優財產av估計達六億五千萬英鎊(台幣將近四百億),由日本av於他名下事業大多分布在倫敦夜生活區蘇活區,因色情影片成人擁有「蘇活之王」的稱號。成人網站


    他的公司「保羅芮情色孟集團」旗下發行部落格多種情色雜誌,包成人網站括「Razzle」、「男性世界」以及「Ma部落格yfair」。


    芮孟成人電影本名傑福瑞a片.安東尼.奎恩,父親為搬成人光碟運承包商。成人影片芮孟十五成人網站歲離開學av女優校,矢言要在表成人演事業留名,起先表演讀成人影片心術,後來成為巡迴av歌舞雜耍表sex演的製a片作人。


    許多評論家認為,他把情色表演帶進主流社會a片,一九五九年主持破天荒的脫衣舞表演情色電影,後來更AV片靠著在蘇avdvd活區與倫部落格敦西區開發房成人電影地產賺得色情a片大筆財富。

    色情
    有人形容芮孟av是英a片下載國的海夫色情納,地位等同美國的「花花公子」創辦人海夫納。

    ReplyDelete
  6. 看房子,買房子,建商自售,自售,台北新成屋,台北豪宅,新成屋,豪宅,美髮儀器,美髮,儀器,髮型,EMBA,MBA,學位,EMBA,專業認證,認證課程,博士學位,DBA,PHD,在職進修,碩士學位,推廣教育,DBA,進修課程,碩士學位,網路廣告,關鍵字廣告,關鍵字,廣告,課程介紹,學分班,文憑,牛樟芝,段木,牛樟菇,日式料理, 台北居酒屋,燒肉,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,住宿,訂房,HOTEL,飯店,造型系列,學位,牛樟芝,腦磷脂,磷脂絲胺酸,SEO,婚宴,捷運,學區,美髮,儀器,髮型,牛樟芝,腦磷脂,磷脂絲胺酸,看房子,買房子,建商自售,自售,房子,捷運,學區,台北新成屋,台北豪宅,新成屋,豪宅,學位,碩士學位,進修,在職進修, 課程,教育,學位,證照,mba,文憑,學分班,網路廣告,關鍵字廣告,關鍵字,SEO,关键词,网络广告,关键词广告,SEO,关键词,网络广告,关键词广告,SEO,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,台北住宿,國內訂房,台北HOTEL,台北婚宴,飯店優惠,住宿,訂房,HOTEL,飯店,婚宴,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,台北結婚,婚宴場地,推車飲茶,港式點心,尾牙春酒,居酒屋,燒烤,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,美髮,儀器,髮型,小套房,小套房,進修,在職進修,留學,證照,MBA,EMBA,留學,MBA,EMBA,留學,進修,在職進修,牛樟芝,段木,牛樟菇,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,住宿,民宿,飯宿,旅遊,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,美容,美髮,整形,造型,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,設計,室內設計,裝潢,房地產,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA,進修,在職進修,MBA,EMBA

    ReplyDelete
  7. (法新社a倫敦二B十WE四日電) 「情色二零零七」情趣產品大產自二十三日起在倫敦的肯色情影片辛頓奧林匹亞展覽館舉行,倫敦成人影片人擺脫對性的保守態度踴躍參觀,許多穿皮衣與塑膠緊身衣的成人電影好色之徒擠進這項世界規模最大的成人生活展,估計三天展期可吸引八萬多好奇民眾參觀。

    活動計情色畫負責人米里根承諾:「A片要搞浪漫、誘惑人、玩虐待,你渴望的我們都有。」

    他說:「時髦的設計與華麗女裝,從吊飾到束腹到真人大情色電影小的雕塑a片,是我們由今年展出的數千件產品精選av出的一部分,參展產品還包括時尚服飾、貼身女用內在美av女優、鞋子、珠寶、玩具、影片、藝術、圖書及遊戲,更不要說性愛色情輔具及AV女優馬術裝備。」

    參觀民眾AV遊覽兩百五十多個攤位,有成人電影性感服裝、A片下載玩具及情色食品,迎色情合各種品味。

    大舞台上表演的是美國情色電影野蠻搖滾歌手瑪莉蓮曼森的前妻─全世界頭牌脫衣舞孃黛塔范提思,這是她今年在英國唯一一場表演。

    情色成人影片九四零年代風格演出的黛塔范提思表演性感的天堂鳥、旋轉木馬及羽扇等舞蹈。

    a片參展攤位成人網站有的推廣情a片下載趣用品,有的公開展示人體藝術成人網站和人體雕塑,也有情色藝術家工會成員提供建議。

    ReplyDelete

Post a Comment

Popular posts from this blog

(revised on Medium) On a sachet of brown sugar (series: notes to myself)

Breve commento su "Non è il mio lutto" e la morte di Berlusconi.

Onlife: Sulla morte di Corman McCarthy e "the best writers" della letteratura americana

Sulla morte come "distanza che si apre nella vita"

Mind the app - considerations on the ethical risks of COVID-19 apps

Between a rock and a hard place: Elon Musk's open letter and the Italian ban of Chat-GPT

On the art of biting one's own tongue (series: notes to myself)