Skip to content

[Bug][Lineage] Groupby is incorrectly resolved in some cases #4332

@iodone

Description

@iodone

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

Test case:

test("test group by") {
  withTable("t1", "t2") { _ =>
    spark.sql("CREATE TABLE t1 (a string, b string, c string) USING hive")
    spark.sql("CREATE TABLE t2 (a string, b string, c string) USING hive")
    val ret0 =
      exectractLineage(
        s"insert into table t1 select a," +
            s"concat_ws('/', collect_set(b))," +
            s"count(distinct b) * count(distinct c) " +
            s"from t2 group by a")
    assert(ret0 == Lineage(
      List("default.t2"),
      List("default.t1"),
      List(
        ("default.t1.a", Set("default.t2.a")),
        ("default.t1.b", Set("default.t2.b")),
        ("default.t1.c", Set("default.t2.b", "default.t2.c"))
      )))
  }
}

test("test grouping sets") {
  withTable("t1", "t2") { _ =>
    spark.sql("CREATE TABLE t1 (a string, b string, c string) USING hive")
    spark.sql("CREATE TABLE t2 (a string, b string, c string, d string) USING hive")
    val ret0 =
      exectractLineage(
        s"insert into table t1 select a,b,GROUPING__ID " +
            s"from t2 group by a,b,c,d grouping sets ((a,b,c), (a,b,d))")
    assert(ret0 == Lineage(
      List("default.t2"),
      List("default.t1"),
      List(
        ("default.t1.a", Set("default.t2.a")),
        ("default.t1.b", Set("default.t2.b")),
        ("default.t1.c", Set())
      )))
  }
}

Affects Version(s)

master

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions