Skip to content

metredecoeur/tabby-testing-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quality evaluation of Tabby coding assistant using real source code snippets

Environment for testing suggestions generated by Tabby coding assistant

Engineering thesis

Pipeline created as a part of engineering thesis at Warsaw University of Technology under the supervision of professor Robert Nowak.

Full topic of the dissertation: Emacs text editor package for integration with Tabby coding assistant

Running

Tabby server setup

  1. Link to Tabby’s offical guide for installation and documentation
  2. Tabby authorization token necessary to set in .env file or as local environment variable during execution

Recreate the virtual environment

python -m venv .venv
source .venv/bin/activate

Install dependencies

python -m pip install -r requirements.txt

Run scripts

1. Get data

chmod +x get_dataset.sh
./get_dataset.sh

2. Sort data

python src/sort_data-1.py

3. Query server

python src/query_server-2.py

4. Perform evaluation

python src/static_tester-3.py
python src/similarity_tester-3.py

5. Visualize results

python src/make_plot-4.py

Project Description

This testing environment serves the purpose of gathering data on Tabby’s performance in the task of generating suggestions for code completion. Testing outcomes serve as the groundwork for analysis in the engineer’s thesis titled “Quality evaluation of Tabby coding assistant and Tabby integration with Emacs text editor”. This project’s outcomes support the motivation for Tabby’s plugin implementation for Emacs and display the potential of the applied methodology.

Structure

Data

data directory holding intermittent samples and final outcomes is created with a help of get_database.sh script, located at the root, which downloads a part of the Algorithms repository, that is used as a benchmark codebase.

Src

src directory contains all scripts constituting the actual pipeline

Data preprocessing

sort_data-1.py discards files that do not match the file extension criteria and those that are empty, reconstructing sorted structure in data/sorted.

Completions retrieval
Server connection

tabby-connection.py defines actual connection with the Tabby endpoint using authentication token.

Prompt generation

prefix_generator.py creates prefixes out of code samples, in an incremental manner, according to a predefined prefix.

Querying

query_server-2.py is responsible for issuing of the requests containing prefix prompts to Tabby server, followed by saving the concatenated prefixes and responses in data/autocompletions.

Testing
Static metrics

static_tester-3.py defines the process of evaluating both the original code samples and the autocompleted ones, according to cyclomatic complexity, Halstead effort and Halstead bugs metrics, implemented using a Python library for code metrics, Radon. Results are saved to data/static_metrics.

Similarity evaluation

similarity_tester-3.py implements the main part of evaluation, by employing string similarity algorithms:

  • difflib’s SequenceMatcher
  • Jaro-Winkler similarity
  • Damerau-Levenshtein distance
  • Hamming distance

The last three algorithms are implemented with the help of Python jellyfish library. Similarity testing is performed in two ways:

  1. Whole files
    • Each original sample from data/sorted is compared with the Tabby-completed duplicate for each prefix.
    • Additional data in the form of ratio between the length of original and duplicate files is captured.
    • Results are saved to data/similarity_logs_full.
  2. Overlap of the generated fragments in terms of location in the file
    • For each original file, its fragment is selected that overlaps with the Tabby-generated fragment in terms of position.
    • This way only purely generated code is compared against the reference snippet.
    • Results are saved data/similarity_logs_fragment
Visualization

Testing process’s outcomes are used for the subsequent creation of plots. make_plot-4.py creates the following plots:

  • Full-file similarity plots per similarity algorithm
  • File-fragment similarity plots per similarity algorithm
  • Averaged static metric values for original programs against averaged static metrics values for duplicate programs per static metric
  • Length ratio between original and duplicate files

About

Pipeline for testing the quality of code generated by TabbyML coding assistant, employing string-based code similarity detection techniques.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors