Skip to content

Conversation

@risenW
Copy link
Member

@risenW risenW commented Jun 27, 2021

This refactor converts the danfojs to use typescript. This is scheduled to be released as Danfo.Js v1.0 (stable), and will include a couple of breaking changes, bug fixes and new features.

Task and progress

  • Generic core
  • Frame class
  • Series class
  • File Readers /input and output
  • Utility class
  • Configuration class
  • Groupby class
  • Concat
  • Merge class
  • Preprocessing class
  • Str class (String)
  • Dt class (Datetime)
  • Indexing
  • Plotting class
  • Date range

Bug Fixes

  • Column data not being updated when mutating internal data array
  • Str class issue for non-strings
  • Better error message
  • Fix support for all JS date string
  • Fix loc slicing bug for row index with string labels
  • Fix wrong order of axis used during computation. axis=1 ==> row-wise operations and axis=0 ==> column-wise operation
  • apply function now works only across a specified axis

New Features

  • Ability to create an empty NDframe
  • Flag for toggling between low/high memory mode
  • Inplace support for all mutating operations
  • Ability to set Configuration values on NDframe creation
  • Support boolean mask for subsetting with iloc. E.g df.iloc({rows: df["count"].gt(5), columns: [0, 1]})
  • Support boolean mask for subsetting with loc. E.g df.loc({rows: df["count"].gt(5), columns: ["Count", "Size"]})
  • Update an existing column value via subsetting. E.g df["count"] = [1,3,4,5]
  • Add loc indexing support for Series
  • Add configuration support for formating table display in console
  • applyMap ==> Element wise apply function for DataFrame
  • and and or logical comparison support. E.g df.loc({rows: df['Salary_in_1000'].gte(100)).and(df['Age'].gt(60)) })
  • Streaming support for CSV and JSON files into NDframes
  • New IO function
    • streamJSON ==> Supports streaming of local or remote JSON files into DataFrame.
    • streamCSV ==> Supports streaming of local or remote CSV files into DataFrame.
    • openCsvInputStream ==> Open a local/remote CSV file as a readable stream
    • writeCsvOutputStream Open a local/remote CSV file as a writable stream
  • readCSV supports config values for headers, separator, etc.
  • toCSV supports config values for output
  • readJSON supports config values for headers, separator, etc.
  • toJSON supports config values for formating output
  • Query/Filter with multiple condition support. E.g df.loc({rows: df['Salary_in_1000'].gte(100)).and(df['Age'].gt(60)) })
  • streamCsvTransforme ==> Pipable stream transformer for incrementally transforming DataFrames

@risenW risenW marked this pull request as draft June 27, 2021 11:29
@JhennerTigreros
Copy link
Contributor

Hi @risenW, which is the current process on this? I think we can divide the migration to TS by chunks in the current codebase 😄

@risenW
Copy link
Member Author

risenW commented Jul 25, 2021

Hi @risenW, which is the current process on this? I think we can divide the migration to TS by chunks in the current codebase 😄

Thanks for offering to help out here @JhennerTigreros So just give me a few more days to complete the Generic core class. Once done you can start working on the other classes. I'm doing some major rewrites in the way NDFrame data is created and stored, to solve some memory constraints in the old version.

I'll update this issue when I'm done, and then you can start contributing here.

@carlpaten-ivadolabs
Copy link

@risenW out of curiosity, will data frames' types reflect their columns? TypeScript is one of the rare languages whose type system is sufficiently advanced to e.g. faithfully represent a SQL JOIN operation (merge in Pandas parlance). Is this part of the plan?

@risenW
Copy link
Member Author

risenW commented Jul 26, 2021

@risenW out of curiosity, will data frames' types reflect their columns? TypeScript is one of the rare languages whose type system is sufficiently advanced to e.g. faithfully represent a SQL JOIN operation (merge in Pandas parlance). Is this part of the plan?

@LilRed I didn’t get your question. Care to rephrase?

@carlpaten-ivadolabs
Copy link

carlpaten-ivadolabs commented Jul 27, 2021

Disclaimer: I'm not acquainted with Danfo yet.

A data frame could have type DataFrame<T> where T is an object describing the column names and their types. a value of type DataFrame<{a: number, b: bool}> is a data frame with two columns, one (a) number-valued and one (b) boolean-valued.

I believe it's possible in TypeScript to define a SQL JOIN-like operation such that if df1 has type DataFrame<{a: number, b: bool}> and df2 has type DataFrame<{a: number, c: string}> then df1.join(df2, "outer", "a") has type DataFrame<{a: number, b?: bool, c?: string}>, mutatis mutandis for inner/left/right joins. Then data frame access can be performed without runtime type checks or column index checks.

@JhennerTigreros
Copy link
Contributor

Ohh that's sound really good, I will wait until you finish the migration and start porting the correlation functions to TypeScript and tackle performance issues. I think we can use web assembly to certain heavily operations. What do you think?

@risenW risenW marked this pull request as ready for review January 12, 2022 15:39
@risenW
Copy link
Member Author

risenW commented Jan 12, 2022

Closing this PR. It has been merged and released in the latest v1.0.0

@risenW risenW closed this Jan 12, 2022
@risenW risenW deleted the danfo/typescript branch January 12, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Time series. Could they be first-class citizens just like they are in Pandas?

6 participants