Download PDFOpen PDF in browser

GPT-4o: The Cutting-Edge Advancement in Multimodal LLM

EasyChair Preprint no. 13757

6 pagesDate: July 2, 2024

Abstract

GPT-4o marks a significant advancement in AI technology, enhancing multimodal capabilities. OpenAI has launched several GPT models over the years, with GPT-4o being the latest. This paper provides a concise overview of these models, focusing on their key features and technological advancements. The main objective is to present a brief overview of GPT-4o, including its technological innovations. GPT-4o offers substantial improvements over its predecessors by introducing multimodal capabilities, larger context windows, efficient tokenization, and faster processing speeds, achieving state-of-the-art performance in text, audio, video, and image generation and understanding. We have compared GPT-4o with ten top LLMs using metrics such as throughput, response time, and latency, where GPT-4o demonstrated clear superiority. Additionally, this paper explores various application domains, highlighting GPT-4o's versatility and potential to modernize multiple aspects of human life.

Keyphrases: AI, ChatGPT, GPT-4o, large language models, LLM, multimodal, multimodal capabilities, OpenAI, Performance Comparison

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:13757,
  author = {Raisa Islam and Owana Marzia Moushi},
  title = {GPT-4o: The Cutting-Edge Advancement in Multimodal LLM},
  howpublished = {EasyChair Preprint no. 13757},

  year = {EasyChair, 2024}}
Download PDFOpen PDF in browser