Jan 10, 2025

Jan 10, 2025

Back to blog

Innovation & AI

Large Language Model Battles

Which LLM do you choose?

Artificial Intelligence, Machine Learning, Large Language Models (LLMs), Generative AI… If you feel overwhelmed, you’re not alone. Most people throw these terms around like confetti at a wedding, grouping them all under the overly general label of “AI.” But it’s important to understand the variety of tools available, and why one size does not fit all. Think of it like shoes: AI is the category, LLMs are the sneakers, and the model you choose is your favourite brand. Sure, they’re all designed to cover your feet, but try hiking in flip-flops.

LLMs are the stars of the AI world right now, especially when it comes to workplace tools. But not all LLMs are created equal. Before we dive in, let’s clarify what we’re dealing with here.

What Are Large Language Models?

Firstly, if you wish to understand the differences between AI, Machine Learning, LLMs and GenAI, here is a great read. For our scope, let’s focus on LLMs. In simple terms, LLMs are sophisticated AI tools trained on massive datasets of text to understand and generate human-like language. They can answer questions, summarize reports, write creative content, and even translate languages. But while the functionality sounds magical, these tools have their quirks, strengths, and limitations.

So, instead of asking, “Which LLM is better?” (spoiler: it depends), I’ll explore, “Which LLM is best for which task?” To find out, I tested three popular LLMs: ChatGPT, Claude, and Gemini. For fairness, I used the free versions of each to ensure a fair comparison.

For tasks to assign for comparison, I focused on four common use cases: summarizing, creating content, refining content, and translating. It’s important to note that such use cases are widely used and accepted, simply because of the large volume of ‘donkey work’ that may fall under these categories. One may even go as far as saying that in many cases, people don’t actually care if AI was used to create a summary or a translation. Hence why they were used. Anyway, let’s get into the results.

Step 1: Summarizing

Summarizing is one of the most common uses for LLMs, and it’s where their unique characteristics start to shine, or stumble. I tested the models by asking them to condense a dense section of a report. Instead of uploading a document, I copy pasted the selection of text, being the key findings of the report in question and provided them to the models. Reason being that Gemini’s free version does not allow for documents to be uploaded. I needed to maintain a standard across the board.  

After running the test Gemini emerged as the fastest, delivering concise summaries in record time. However, its brevity often came at the expense of detail. It’s a great choice when speed matters more than depth, but it may leave you wanting more if nuance is critical.

ChatGPT, on the other hand, provided the most detailed summaries, often going beyond the original request to include additional context. This makes it perfect for those who want a deeper dive, but it can sometimes feel overwhelming. How much detail stops a summary from being a summary? You know what I mean?

Claude found a middle ground, offering summaries that were detailed yet digestible, balancing efficiency with readability. If Gemini is your quick-and-dirty solution, and ChatGPT is your thorough researcher, Claude can represent a middle ground. While it leans closer to ChatGPT in regard to detail, it feels more like a summary.

Step 2: Creating a Presentation Structure

Next came the presentation structure. On numerous occasions, we have created brief videos with the intention of sharing some new knowledge, which we obtained via scientific journal entries. We know that not everyone enjoys reading empirical literature, instead of just sharing links to text, we try to keep our audience up to date on certain subjects through some edutaining video presentations. So, after I have my summary, the next step would be to ask the models to create a structure for my video presentation.

When it came to structuring presentations, the models’ unique quirks became even more apparent. All three suggested visual elements like slides and animations, and each provided time breakdowns for sections. However, only ChatGPT included a specific call-to-action for social media, urging viewers to follow for more tips. This shows its stronger grasp of platform-specific dynamics.

Gemini and ChatGPT both suggested engaging hooks to grab the audience’s attention, but Claude skipped this entirely, indicating that presentation best practices were not being considered.

However, what Claude did do is provide a video title without me even requesting one. Sounds cool until I also mention that Gemini once again overdelivered by offering four creative title options without being asked, demonstrating a more proactive and suggestive approach.

ChatGPT, surprisingly, didn’t provide a title, leaving a gap in its otherwise detailed structure. Each model had its moment: Gemini excelled at being suggestive and comprehensive, ChatGPT shined in understanding social media context, and Claude felt functional but less innovative in its approach.

Step 3: Crafting a Video Title and Description

Next, I tasked the models with creating a video title and description using four top long-tail keywords I provided. Specifically, I had asked that the keywords are placed in the first 100 characters of a 400-word description. The results were revealing. All three models managed to integrate all the keywords into both the title and the description, but the approach felt forced, bordering on clickbait. This showed me that all models execute the task they’re given, even if unreasonable and potentially detrimental.

ChatGPT and Claude shared some key similarities in that they both used emojis, hashtags and CTAs that are relative to YouTube. Still, they had differences. ChatGPT’s descriptions were the most natural in tone but missed some keyword placements, showing a trade-off between authenticity and optimization. Claude included chapters in the description, making it the most user-friendly for viewers who want to navigate specific sections of the video easily.

Gemini was the only one that got near the word count and provided a clean, plain-text description. However, it lacked the engagement-driving elements found in the other two. It’s worth noting that while all three overused the keywords, Claude’s ability to structure the description with chapters added a layer of professionalism. I was curious to see what would happen when I asked for conciseness.

Step 4: Shortening the Description

So, I asked the models to shorten their descriptions to 150 words. I made a request that only plain text is used to allow for a cleaner comparison. The differences between the LLMs became even more apparent. Claude retained both the hashtags and chapters from the longer version, ensuring the shortened description still served as a functional and engaging piece. However, it completely omitted the call-to-action, which could hurt viewer interaction.

ChatGPT sacrificed hashtags and “like and comment” prompts, reflecting a more focused but slightly less comprehensive approach. Important to note that it remained the only one that prioritized the subscription request, proving real contextual understanding and prioritization.

Gemini simplified the description significantly, maintaining the plain-text requirement but losing much of the personality and functionality. Perhaps most surprisingly, all three models still managed to overuse keywords, making their shortened descriptions feel repetitive and less organic. This step highlighted the tension between optimizing for brevity and retaining important elements.

Step 5: Translating to Greek

Finally, I asked the LLMs to translate the shortened description into Greek, a common request from clients with primarily Greek-speaking teams. Here, Claude stood head and shoulders above the rest. Its translation felt natural and culturally nuanced, with keywords adapted intelligently rather than translated literally. This resulted in a description that was polished and immediately usable.

Gemini, while serviceable, struggled with syntax, as it tended to translate English idioms and phrases too literally, leading to awkward phrasing. ChatGPT fared the worst, producing a translation that was riddled with errors, including nonsensical compound words that made it unusable without significant editing. Claude’s performance in this step underscored its strength in handling linguistically complex tasks with precision.

Conclusion: Choose Your Tool Wisely

So, which LLM should you use? It depends on your priorities. Claude is unbeatable for translations and tasks requiring nuanced understanding. Gemini shines for those who prefer control and value suggestive guidance over definitive answers. ChatGPT excels at producing polished, ready-to-use content, especially for quick or mundane tasks.

But don’t treat these tools as interchangeable or expect a magic box to solve all your problems. AI tools are like Swiss Army knives—each blade has a purpose, but it’s up to you to use the right one for the task at hand. Personally, I enjoyed Gemini the most for its suggestive approach. I usually just want suggestions as I need to know the final content is my own.

Still, I understand why others might prefer ChatGPT if they’re looking for something quick and elaborate to copy/paste. Perhaps you prefer Claude for its linguistic precision. At the end of the day, the best LLM is the one that fits your unique needs. So, go on, experiment, test, and find the model that helps you work smarter, not harder. After all, you should get to choose which shoes you’ll be going hiking in. Choose wisely. This will be a long and steep journey.

Related articles:

Related articles:

Share This!

Share This!

Share This!

Taz Constantinou

Taz is an innovation coach that also enjoys "smarketing" tasks. He brings his own spin to coaching teams, combining his diverse international experience with his keen interest of human behavior and psychology, and even throwing in a few tricks from his journey as a comedian. Yep, you heard that right. He can throw a punchline or two!

Taz Constantinou

Taz is an innovation coach that also enjoys "smarketing" tasks. He brings his own spin to coaching teams, combining his diverse international experience with his keen interest of human behavior and psychology, and even throwing in a few tricks from his journey as a comedian. Yep, you heard that right. He can throw a punchline or two!

Taz Constantinou

Taz is an innovation coach that also enjoys "smarketing" tasks. He brings his own spin to coaching teams, combining his diverse international experience with his keen interest of human behavior and psychology, and even throwing in a few tricks from his journey as a comedian. Yep, you heard that right. He can throw a punchline or two!

Sign up to our newsletter

Sign up to our newsletter

Sign up to our newsletter

How about joining a free webinar?

Revenue Resilience: A Powerful Strategy

Making Revenue Predictable & Sustainable

Growth is great, but unpredictable revenue can kill a business. One quarter is booming, the next is dry, making cash flow and planning a constant struggle. In this webinar, Paris Thomas introduces Michael Wilkens, who talks about Revenue Resilience and breaks down how businesses can stabilize revenue, increase valuation, and create long-term financial security, without constantly chasing new sales.

Apr 24, 2025 | 16:00 (EET) | 1 hour

Apr 24, 2025

16:00 (EET) | 1 hour