Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,3 +83,14 @@ jobs:
- name: Test
shell: bash
run: ci/scripts/test.sh $(pwd)

docs:
name: Build Documentation
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
- name: Build documentation
shell: bash
run: ci/scripts/docs.sh $(pwd)
31 changes: 31 additions & 0 deletions ci/scripts/docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

set -eux

source_dir=${1}

pushd "${source_dir}/docs"

dotnet tool install -g docfx

docfx metadata --warningsAsErrors docfx.json
docfx build --warningsAsErrors docfx.json

popd
2 changes: 2 additions & 0 deletions dev/release/rat_exclude_files.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,5 @@
*.sln
.github/pull_request_template.md
src/Apache.Arrow/Flatbuf/*
docs/images/*.png
docs/images/*.svg
19 changes: 19 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

/api
/_site
61 changes: 61 additions & 0 deletions docs/docfx.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
{
"$schema": "https://raw.githubusercontent.com/dotnet/docfx/main/schemas/docfx.schema.json",
"metadata": [
{
"src": [
{
"src": "../src",
"files": [
"**/*.csproj"
]
}
],
"dest": "api",
"properties": {
"ProduceReferenceAssembly": "true"
}
}
],
"build": {
"content": [
{
"files": [
"**/*.{md,yml}"
],
"exclude": [
"_site/**",
"images/**"
]
}
],
"resource": [
{
"files": [
"images/*"
],
"exclude": [
"**/*.md"
]
}
],
"output": "_site",
"template": [
"default",
"modern"
],
"globalMetadata": {
"_appFaviconPath": "images/favicon.png",
"_appLogoPath": "images/logo.svg",
"_appName": "Apache Arrow .NET",
"_appTitle": "Apache Arrow .NET",
"_appFooter": "© 2018 The Apache Software Foundation",
"_enableNewTab": true,
"_enableSearch": true
},
"markdownEngineProperties": {
"markdigExtensions": [
"attributes"
]
}
}
}
33 changes: 33 additions & 0 deletions docs/images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Images

This directory contains images used in the Apache Arrow .NET documentation.
Comment thread
adamreeve marked this conversation as resolved.

## logo.svg

This file is based on the file `arrow-logo_chevrons_black-txt_transparent-bg.svg`
from the [Apache Arrow Visual Identity website](https://arrow.apache.org/visual_identity/)
and has had the height and width modified to fit the documentation header,
while maintaining the original aspect ratio.
The rectangular outline was also removed.

## favicon.png

This file matches the favicon used in the Apache Arrow website and was copied from the
[arrow-site repository](https://github.com/apache/arrow-site/blob/8884e2320ca131081a2617cdf93a222f0e92b6a3/img/logo.png).
Binary file added docs/images/favicon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions docs/images/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
151 changes: 151 additions & 0 deletions docs/index.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a copy of the top-level RAEDME.md?

Can we refer this from the top-level README.md instead of having duplicated contents?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't a full copy, I removed some parts related to building and development to keep this more user focused.

I think once there is a live documentation website, we should remove the duplicated parts and put a link to the documentation in the README.

Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
---
_layout: landing
---
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Apache Arrow .NET

An implementation of Arrow targeting .NET.

See our current [feature matrix](https://github.com/apache/arrow/blob/main/docs/source/status.rst)
for currently available features.

## Implementation

- Arrow specification 1.0.0. (Support for reading 0.11+.)
- C# 11
- .NET Standard 2.0, .NET 6.0, .NET 8.0 and .NET Framework 4.6.2
- Asynchronous I/O
- Uses modern .NET runtime features such as **Span&lt;T&gt;**, **Memory&lt;T&gt;**, **MemoryManager&lt;T&gt;**, and **System.Buffers** primitives for memory allocation, memory storage, and fast serialization.
- Uses **Acyclic Visitor Pattern** for array types and arrays to facilitate serialization, record batch traversal, and format growth.

## Known Issues

- Cannot read Arrow files containing tensors.
- Cannot easily modify allocation strategy without implementing a custom memory pool. All allocations are currently 64-byte aligned and padded to 8-bytes.
- Default memory allocation strategy uses an over-allocation strategy with pointer fixing, which results in significant memory overhead for small buffers. A buffer that requires a single byte for storage may be backed by an allocation of up to 64-bytes to satisfy alignment requirements.
- There are currently few builder APIs available for specific array types. Arrays must be built manually with an arrow buffer builder abstraction.
- FlatBuffer code generation is not included in the build process.
- Serialization implementation does not perform exhaustive validation checks during deserialization in every scenario.
- Throws exceptions with vague, inconsistent, or non-localized messages in many situations
- Throws exceptions that are non-specific to the Arrow implementation in some circumstances where it probably should (eg. does not throw ArrowException exceptions)
- Lack of code documentation
- Lack of usage examples

## Usage

Example demonstrating reading [RecordBatches](xref:Apache.Arrow.RecordBatch) from an Arrow IPC file using an
[ArrowFileReader](xref:Apache.Arrow.Ipc.ArrowFileReader):

using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using Apache.Arrow;
using Apache.Arrow.Ipc;

public static async Task<RecordBatch> ReadArrowAsync(string filename)
{
using (var stream = File.OpenRead(filename))
using (var reader = new ArrowFileReader(stream))
{
var recordBatch = await reader.ReadNextRecordBatchAsync();
Debug.WriteLine("Read record batch with {0} column(s)", recordBatch.ColumnCount);
return recordBatch;
}
}


## Status

### Memory Management

- Allocations are 64-byte aligned and padded to 8-bytes.
- Allocations are automatically garbage collected

### Arrays

#### Primitive Types

- [Int8](xref:Apache.Arrow.Types.Int8Type), [Int16](xref:Apache.Arrow.Types.Int16Type), [Int32](xref:Apache.Arrow.Types.Int32Type), [Int64](xref:Apache.Arrow.Types.Int64Type)
- [UInt8](xref:Apache.Arrow.Types.UInt8Type), [UInt16](xref:Apache.Arrow.Types.UInt16Type), [UInt32](xref:Apache.Arrow.Types.UInt32Type), [UInt64](xref:Apache.Arrow.Types.UInt64Type)
- [Float](xref:Apache.Arrow.Types.FloatType), [Double](xref:Apache.Arrow.Types.DoubleType), [Half-float](xref:Apache.Arrow.Types.HalfFloatType) (.NET 6+)
- [Binary](xref:Apache.Arrow.Types.BinaryType) (variable-length)
- [String](xref:Apache.Arrow.Types.StringType) (utf-8)
- [Null](xref:Apache.Arrow.Types.NullType)

#### Parametric Types

- [Timestamp](xref:Apache.Arrow.Types.TimestampType)
- [Date32](xref:Apache.Arrow.Types.Date32Type), [Date64](xref:Apache.Arrow.Types.Date64Type)
- [Decimal32](xref:Apache.Arrow.Types.Decimal32Type), [Decimal64](xref:Apache.Arrow.Types.Decimal64Type), [Decimal128](xref:Apache.Arrow.Types.Decimal128Type), [Decimal256](xref:Apache.Arrow.Types.Decimal256Type)
- [Time32](xref:Apache.Arrow.Types.Time32Type), [Time64](xref:Apache.Arrow.Types.Time64Type)
- [Binary](xref:Apache.Arrow.Types.BinaryType) (fixed-length)
- [List](xref:Apache.Arrow.Types.ListType)
- [Struct](xref:Apache.Arrow.Types.StructType)
- [Union](xref:Apache.Arrow.Types.UnionType)
- [Map](xref:Apache.Arrow.Types.MapType)
- [Duration](xref:Apache.Arrow.Types.DurationType)
- [Interval](xref:Apache.Arrow.Types.IntervalType)

#### Type Metadata

- Data Types
- [Fields](xref:Apache.Arrow.Field)
- [Schema](xref:Apache.Arrow.Schema)

#### Serialization

- File [Reader](xref:Apache.Arrow.Ipc.ArrowFileReader) and [Writer](xref:Apache.Arrow.Ipc.ArrowFileWriter)
- Stream [Reader](xref:Apache.Arrow.Ipc.ArrowStreamReader) and [Writer](xref:Apache.Arrow.Ipc.ArrowStreamWriter)

### IPC Format

#### Compression

- Buffer compression and decompression is supported, but requires installing the `Apache.Arrow.Compression` package.
When reading compressed data, you must pass an [CompressionCodecFactory](xref:Apache.Arrow.Compression.CompressionCodecFactory)
instance to the [ArrowFileReader](xref:Apache.Arrow.Ipc.ArrowFileReader) or
[ArrowStreamReader](xref:Apache.Arrow.Ipc.ArrowStreamReader) constructor, and when writing compressed data a
[CompressionCodecFactory](xref:Apache.Arrow.Compression.CompressionCodecFactory) must be set in the
[IpcOptions](xref:Apache.Arrow.Ipc.IpcOptions).
Alternatively, a custom implementation of [ICompressionCodecFactory](xref:Apache.Arrow.Ipc.ICompressionCodecFactory) can be used.

### Not Implemented

- Serialization
- Exhaustive validation
- Run End Encoding
- Types
- Tensor
- Arrays
- Large Arrays. There are large array types provided to help with interoperability with other libraries,
but these do not support buffers larger than 2 GiB and an exception will be raised if trying to import an array that is too large.
- [Large Binary](xref:Apache.Arrow.Types.LargeBinaryType)
- [Large List](xref:Apache.Arrow.Types.LargeListType)
- [Large String](xref:Apache.Arrow.Types.LargeStringType)
- Views
- [Binary View](xref:Apache.Arrow.Types.BinaryViewType)
- [List View](xref:Apache.Arrow.Types.ListViewType)
- [String View](xref:Apache.Arrow.Types.StringViewType)
- Array Operations
- Equality / Comparison
- Casting
- Compute
- There is currently no API available for a compute / kernel abstraction.
25 changes: 25 additions & 0 deletions docs/toc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
### YamlMime:TableOfContent

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

items:
- name: API Reference
type: Namespace
href: api/
- name: GitHub
href: https://github.com/apache/arrow-dotnet
Loading