Concerns that Artificial Intelligence (AI) may replace humans have begun to gain hold as the ChatGPT technology develops. The most recent research, a pre-print that has not yet undergone peer review, supports this notion.
According to the research, which was published on March 29 and demonstrated ChatGPT-4’s superior performance on the American Board of Neurological Surgery’s neurosurgical examinations, it answered every question correctly.
Although ChatGPT (GPT-3.5) has shown almost passing performance on medical student board examinations, ChatGPT or its successor GPT-4 greatly exceeded the former in specialized tests.
The goal was to evaluate how well ChatGPT and GPT-4 performed on a 500-question mock neurosurgery written board examination, according to the paper “Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Exams” on MedRxiv, the pre-print platform for the health sciences.
GPT-4 considerably outperformed users in each of the 12 question categories on the medical student board exams. On inquiries about tumors, it did better than both users and ChatGPT.
The technique known as generative artificial intelligence has seen tremendous attention since ChatGPT, which was introduced in November of last year. With this technology, responses that resemble human discussions are generated.
ChatGPT, an application developed by Microsoft-backed OpenAI that has been trained on vast amounts of data, is capable of creating, summarizing, and interpreting text as well as answering questions and doing a number of other natural language tasks.
The most recent and cutting-edge AI language model, GPT-4, can interpret photographs and describe what’s in them.
GPT-4 outscored the average user by 83.4%.
According to the research, ChatGPT (GPT-3.5) and GPT-4 earned scores of 73.4% and 83.4%, respectively, in comparison to the user average of 73.7%.
Both GPTs and question bank users went beyond the 69 percent passing mark from the previous year. GPT-4 outscored both ChatGPT and question bank users, even though their results were comparable. The questions had a multiple-choice structure with a single best answer.
The research found that GPT-4 greatly outperformed users in each of the 12 question categories, but that it performed similarly to ChatGPT in three (Functional, Other General, and Spine) and outperformed both users and ChatGPT for questions about tumors.
For ChatGPT, but not for GPT-4, higher-order problem-solving and word count were linked to worse accuracy.
Due to the lack of multimodal input at the time of this investigation, ChatGPT and GPT-4 correctly answered 49.5% and 56.8% of questions including images, respectively, based only on contextual cues.
In India, ethical guidelines for using AI in medical research are being developed.
The Indian Council of Medical Research (ICMR), India’s premier medical research organization, is now analyzing the effects of AI-driven apps like ChatGPT on health research and has already developed “ethical standards” for their deployment.
The team of officers tested ChatGPT briefly in order to grasp its immediate ramifications and discovered that although it is exceptional at producing research articles, it still need human input.
Even while not everything is right, the algorithm prompts users to amend inaccurate data, indicating that the software is gathering accurate data and will eventually start returning accurate results.


























