We're dedicated to pushing the boundaries of scientific exploration through safe and responsible AI. Our mission is to create advanced AI models and agents that accelerate scientific progress.
-
ML-Research-Agent A baseline agent for ML Research Benchmark. This agent provides a foundation for comparing and evaluating machine learning research and development tasks that agents can perform.
-
ML-Research-Agent-Tasks Tasks for ML Research Benchmark, a benchmark designed to evaluate the capabilities of AI agents in accelerating AI research and development.
-
ML-Research-Agent-Evals Agent-Eval is a library for evaluating the performance of an agent on ML Research Benchmark tasks
- ARIA ARIA Benchmarks is a suite of closed-book benchmarks designed to assess an LLMs knowledge and understanding of machine learning research and methodologies
- Agent-States Agent States is a library designed to manage the state and decision-making processes of AI agents.
-
ArXivDLInstruct ArXivDLInstruct is a dataset designed for instruction tuning on Python research code for pretraining and fine-tuning language models in code generation tasks.
-
ArXiv Research Code ArtifactAI/arxiv_research_code contains over 21.8GB of source code files referenced strictly in ArXiv papers. The dataset serves as a curated dataset for Code LLMs.
-
ArXiv Python_Research_Code AlgorithmicResearchGroup/arxiv_python_research_code contains over 4.13GB of source code files referenced strictly in ArXiv papers. The dataset serves as a curated dataset for Code LLMs.
-
ArXiv C++ Research_Code ArtifactAI/arxiv_python_research_code contains over 10.6GB of source code files referenced strictly in ArXiv papers. The dataset serves as a curated dataset for Code LLMs.