登录查看更多内容

Advent of Code Solutions Dataset

Aleksandar Dimov

Elixir Developer

发布日期: 2024年11月23日

Introduction

This dataset contains over 10,000 solutions and input data for the Advent of Code programming puzzles from 2015 to 2023. The Advent of Code is an annual set of programming challenges that can be solved in any language. At the moment, the dataset contains all solutions in Python, Go, and many solutions in JavaScript, CoffeeScript, TypeScript, Java, Scala, Kotlin, Groovy, Clojure, C#, F#, Swift, Objective-C, R, Haskell, OCaml, Racket, Scheme, Ruby, Erlang, Elixir, Rust, C, C++, Zig, Fortran90, Perl, Pascal, Crystal, Julia, Lua, PHP, Dart, Bash, AWK, Nim, D, V, Prolog, Tcl, and Wren.

You can access the dataset here.

The dataset is organised to store all years of Advent of Code puzzles together in a single dataset "train.json".

Dataset Structure

Data Fields

Each entry in the dataset consists of the following fields:

name: The unique identifier for each challenge, formatted as "dayX_partY_YEAR" (e.g. "day1_part1_2017").
task: A detailed description of the challenge. The description of part 2 includes the description of part 1 and the answer to part 1, because part 2 requires information from part 1.
input: The input data provided for the challenge (for my account).
answer: The correct answer as a string (e.g. "1914").
solution: The full solution code for the challenge.
solution_lang: The programming language used for the solution (e.g. "go").
year: The year of the challenge (e.g. 2017).

Sample Entry

{
  "answer": "117946",
  "input": "ckczppom",
  "name": "day4_part1_2015",
  "solution": "package main\n\nimport (\n\t\"crypto/md5\"\n\t\"fmt\"\n\t\"log\"\n\t\"os\"\n\t\"strconv\"\n\t\"strings\"\n)\n\nfunc main() {\n\tdata, err := os.ReadFile(\"input.txt\")\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\n\tsecretKey := strings.TrimSpace(string(data))\n\tvar number int\n\tfor {\n\t\thash := md5.Sum([]byte(secretKey + strconv.Itoa(number)))\n\t\thashString := fmt.Sprintf(\"%x\", hash)\n\n\t\tif strings.HasPrefix(hashString, \"00000\") {\n\t\t\tfmt.Printf(\"%d\\n\", number)\n\t\t\tbreak\n\t\t}\n\t\tnumber++\n\t}\n}",
  "solution_lang": "go",
  "task": "--- Day 4: The Ideal Stocking Stuffer ---\nSanta needs help mining some AdventCoins (very similar to bitcoins) to use as gifts for all the economically forward-thinking little girls and boys.\n\nTo do this, he needs to find MD5 hashes which, in hexadecimal, start with at least five zeroes. The input to the MD5 hash is some secret key (your puzzle input, given below) followed by a number in decimal. To mine AdventCoins, you must find Santa the lowest positive number (no leading zeroes: 1, 2, 3, ...) that produces such a hash.\n\nFor example:\n\nIf your secret key is abcdef, the answer is 609043, because the MD5 hash of abcdef609043 starts with five zeroes (000001dbbfa...), and it is the lowest such number to do so.\nIf your secret key is pqrstuv, the lowest number it combines with to make an MD5 hash starting with five zeroes is 1048970; that is, the MD5 hash of pqrstuv1048970 looks like 000006136ef....",
  "year": 2015
}

Creation Process

I implemented and verified solutions for various Advent of Code challenges. For each challenge, I solved the puzzles using my personal input data from Advent of Code or generated, tested, and modified solutions by open-source models (e.g. Llama 3.1 70B, DeepSeek Coder, Qwen2.5-Coder-32B, and others). This dataset contains my verified solutions and associated input data for these challenges. All solutions are not only verified but also run in under 20 seconds. However, they are not highly optimised, and much faster solutions are possible in many cases.

Input Handling

All solutions read their input from a file named input.txt saved in the current directory where the solution script is located. Here is an example of the first solution in Python:

with open("input.txt", "r") as file:
    data = file.read()

floor = 0
for char in data:
    if char == "(":
        floor += 1
    elif char == ")":
        floor -= 1

print(floor)

Output Verification

Some solutions have complex outputs that should form readable patterns. For example, the solution for day8_part2_2016 should produce the following patterns to verify correctness:

领英推荐

Structural Pattern Matching in Python I

Coditation 2 年前

Why do programming languages need garbage collection?

Arpit Bhayani 2 年前

Comparing Basic FP support part 1

Rick H. 2 年前

".##..####.###..#..#.###..####.###....##.###...###."
" ## #### ### # # ### #### ### ## ### ### "

Additionally, language-specific formatting for big numbers should be considered. It is better to check them in a few different formats. For example:

"3.465154e+06"
"3.465154e+6"

instead of 3465154.

Usage

Filtering Solutions by Programming Language

Here's an example of how to use the script to filter solutions by programming language (e.g. Elixir) using the Hugging Face datasets library:

from datasets import load_dataset
import pandas as pd

# Load the dataset from Hugging Face
dataset = load_dataset("isavita/advent-of-code", split="train")

# Filter the dataset for solutions written in Elixir
elixir_solutions = dataset.filter(lambda example: example['solution_lang'].lower() == 'elixir')

# Convert the filtered dataset to a Pandas DataFrame
elixir_solutions_df = elixir_solutions.to_pandas()

# Display the filtered solutions
pd.set_option('display.max_colwidth', None)
print(elixir_solutions_df)

How the dataset can be used to assess the performance of LLMs

The Advent of Code Solutions Dataset provides a comprehensive resource for evaluating the performance of Large Language Models (LLMs) in various programming tasks. Here are some ways the dataset can be utilised:

Evaluating Code Generation in Less Popular Languages (few-shot): The dataset can be used to test an LLM's ability to generate code in less popular languages, such as Elixir, by providing few-shot examples from the solutions included in the dataset. This allows for assessing the model's versatility and proficiency in generating correct and idiomatic code across various languages.
Evaluating Solution Correctness: Since the dataset contains the correct answers for all puzzles, it is straightforward to ask LLMs to generate solutions and evaluate their correctness against the provided answers. This helps in assessing the model's accuracy and reliability in solving programming challenges.
Comparing Execution Time: The dataset includes solutions in Go and Python for all tasks, which can serve as a baseline. By comparing the execution time of correct solutions generated by the LLM against these baselines, one can evaluate the efficiency of the model's generated code.
Translating Solutions Between Languages: The model can be given a solution in one language and then asked to generate it in another. Evaluating the accuracy and efficiency of these translations can provide insights into the model's cross-language translation capabilities.

By leveraging this dataset, researchers and developers can gain valuable insights into the strengths and weaknesses of LLMs in programming tasks, ultimately contributing to the development of more robust and versatile language models.

Future Expansion

The dataset currently includes data for the years 2015 to 2023 and contains all solutions in Go and Python, along with solutions in over 40 other languages. There are plans to expand it to include additional years and more solutions in different languages. As new years are added, the dataset structure will remain consistent.

Hamza Mateen

SWE Fellow @HeadstarterAI | xNeuroLeapCorp, Technical Writer @OpenGenus, Content Advisor @LogRocket, Computer Engineering.

3 个月

Thtas great!

1 次回应

Albena Plovdiv

Homemaker at none

3 个月

Interesting

2 次回应

查看更多评论

要查看或添加评论，请登录

Aleksandar Dimov的更多文章

Benchmarking o1 on Advent of Code 2024

2024年12月15日

Benchmarking o1 on Advent of Code 2024

1. Main Aim of the Evaluation The primary goal of this evaluation is to assess how o1 performs compared to GPT-4-turbo…
Coderev: AI-Powered Code Review from the Command Line

2024年11月10日

Coderev: AI-Powered Code Review from the Command Line

Introduction I'm pleased to share Coderev, an AI-powered CLI tool that simplifies the code review process for…

2 条评论
Testing LLMs for Web and Game Development

2024年11月3日

Testing LLMs for Web and Game Development

As an experiment to explore the capabilities of AI-assisted development, I recently completed a project building a…

1 条评论

Advent of Code Solutions Dataset

Aleksandar Dimov

Elixir Developer

Introduction

Dataset Structure

Data Fields

Sample Entry

Creation Process

Input Handling

Output Verification

领英推荐

Usage

Filtering Solutions by Programming Language

How the dataset can be used to assess the performance of LLMs

Future Expansion

Aleksandar Dimov的更多文章

社区洞察

其他会员也浏览了

Lamda function in C++

How To Build Your Own Programming Language From Scratch Part 1.

Closures in Rust

Translating High-Level Concepts into Zero-Knowledge Proofs: Exploration Techniques

On generic programming, value types and constraints

C++26 Core Language: Small Improvements

Is it time for a new programming language?

Can I use C++ to create new Programming language like python

Having fun with Loom : Generators

THE ERA OF PROGRAMMING LANGUAGE – 2022

Introduction

Dataset Structure

Data Fields

Sample Entry

Creation Process

Input Handling

Output Verification

领英推荐

Usage

Filtering Solutions by Programming Language

How the dataset can be used to assess the performance of LLMs

Future Expansion

Aleksandar Dimov的更多文章

Benchmarking o1 on Advent of Code 2024

Coderev: AI-Powered Code Review from the Command Line

Testing LLMs for Web and Game Development

社区洞察

其他会员也浏览了

Lamda function in C++

How To Build Your Own Programming Language From Scratch Part 1.

Closures in Rust

Translating High-Level Concepts into Zero-Knowledge Proofs: Exploration Techniques

On generic programming, value types and constraints

C++26 Core Language: Small Improvements

Is it time for a new programming language?

Can I use C++ to create new Programming language like python

Having fun with Loom : Generators

THE ERA OF PROGRAMMING LANGUAGE – 2022