I Was Wrong about GPT4 - Code Llama is here.

Code LLama Looks to Compete GPT4
CodeLlama looks eye to eye with GPT-4

As a long-time fan of GPT-4, I didn’t think any open source coding AI could match its prowess at generating code. But Meta’s new Code Llama model has proven me wrong. Built on Meta’s LLama-2 foundation model and trained on 500 billion tokens of code, it’s optimized specifically for programming – and ready to take on my beloved GPT-4!

According to Meta’s benchmarks, CodeLlama achieved over 67% accuracy on human evaluation tests, surpassing GPT-4’s score of 67%. But how does it compare in real world coding challenges?

Testing Code Generation

I tested Code Llama and GPT-4 on a variety of Python coding prompts, from very easy to intermediate functions and some expert challenges from coding challenge website https://edabit.com/challenges

To test Code Llama 2, I use https://poe.com/ code llama 34b parameter model while GPT-4 is tested on my personal ChatGPT Plus code interpreter.

Let’s put code Llama and GPT 4 into side by side test.

Challenge - Very Easy - Area of a Triangle

Challenge Task: Write a function that takes the base and height of a triangle and return its area.

Easy Code Challange
Code Snippet For Easy Challange By Both Model

Both models successfully generated the code. However, I liked the fact that GPT4 have explained the code with comments which itself is a coding best practise.

Challenge - Easy - Find the Discount

Challenge Task: Create a function that takes two arguments: the original price and the discount percentage as integers and returns the final price after the discount.

Both Model Completes the Challange with ease

Again both the models generated the code successfully. No sweat.

Challenge - Medium - Fizz Buzz Interview Question

Challenge Task: 

Create a function that takes a number as an argument and returns "Fizz", "Buzz" or "FizzBuzz".

 

  • If the number is a multiple of 3 the output should be "Fizz".
  • If the number given is a multiple of 5, the output should be "Buzz".
  • If the number given is a multiple of both 3 and 5, the output should be "FizzBuzz".
  • If the number is not a multiple of either 3 or 5, the number should be output on its own as shown in the examples below.
  • The output should always be a string even if it is not a multiple of 3 or 5.
Again both models were able to complete the challange

Awesome. Still no bugs. Both models are still on the race to be the winner.

Next we’ll increase the challenge level to make it Hard.

Challenge - Hard - The Snake Area Filling

Challenge Task: 

This challenge is based on the classic videogame “Snake”.

Assume the game screen is an n * n square, and the snake starts the game with length 1 (i.e. just the head) positioned on the top left corner.
In this version of the game, the length of the snake doubles each time it eats food (e.g. if the length is 4, after eating it becomes 8).

Create a function that takes the side n of the game screen and returns the number of times the snake can eat before it runs out of space in the game screen.

GPT4 Succeeds, Code Llama almost there.

GPT4 clears this hurdle without any hiccups. Although Code Llama logic was close but it couldn’t print the correct code. So GPT4 is now ahead in the race.

We still have very hard and expert mode to try on. Let’s see who wins the race.

Challenge - Very Hard - Identity Matrix

Challenge Task: 

An identity matrix is defined as a square matrix with 1s running from the top left of the square to the bottom right. The rest are 0s. The identity matrix has applications ranging from machine learning to the general theory of relativity.

Create a function that takes an integer n and returns the identity matrix of n x n dimensions. For this challenge, if the integer is negative, return the mirror image of the identity matrix of n x n dimensions.. It does not matter if the mirror image is left-right or top-bottom.

Both GPT4 and Code LLama were very close

Both were very close and passed almost all the unit tests except one. So, the race is still on and GPT4 is just one step ahead.

Let’s challenge them with our last puzzle.

Challenge - Expert - Poker Hand Ranking

In this challenge, you have to establish which kind of Poker combination is present in a deck of five cards. Every card is a string containing the card value (with the upper-case initial for face-cards) and the lower-case initial for suits, as in the examples below:

"Ah" âžž Ace of hearts
"Ks" âžž King of spades
"3d" âžž Three of diamonds
"Qc" âžž Queen of clubs

There are 10 different combinations. Here’s the list, in decreasing order of importance:

NameDescription
Royal FlushA, K, Q, J, 10, all with the same suit.
Straight FlushFive cards in sequence, all with the same suit.
Four of a KindFour cards of the same rank.
Full HouseThree of a Kind with a Pair.
FlushAny five cards of the same suit, not in sequence.
StraightFive cards in a sequence, but not of the same suit.
Three of a KindThree cards of the same rank.
Two PairTwo different Pair.
PairTwo cards of the same rank.
High CardNo other valid combination.

Given a list hand containing five strings being the cards, implement a function that returns a string with the name of the highest combination obtained, accordingly to the table above.

GPT4 was correct. Code Llama again was close

GPT4 wins the race. Code llama couldn’t get the expert code right. However, it has left me nothing less than being impressed.

My Final Verdict

Meta’s Code Llama proved it can compete with the mighty GPT-4 for coding tasks! As someone who thought GPT-4 would dominate for years to come, I’m happy to be proven wrong.

This opens up a lot of possibilities for me as I’m planning to create my own Coding Assistant who can create/improve code for me . As Code Llama matures ,it’ll allow me to have my personal AI Employee working 24*7 with a fraction of the cost to do same with GPT4 APIs.

Code Llama is an exciting open source coding AI and I can’t wait to see how it progresses from here!

Let me know your opinion and feedback below.

[wpdiscuz_comments]
Top
Grab Your Daily Cyber Bites!
Get the latest Cyber news, breaches, hacks & research insights with access to
Free Security Tools & 300+ Power Prompts For Free
icon
Grab Your Daily AI Bites!
Get the latest AI news, tools & research insights with access to
200+ Power Prompts For Free
icon
0
Would love your thoughts, please comment.x
()
x