commit	8e1a934d5f049a6c5fb5fa251fa5157e804ea25f	[log] [tgz]
author	Sean Griffin <sean@seantheprogrammer.com>	Wed Dec 18 13:30:29 2019 -0700
committer	Georg Semmler <georg_semmler_05@web.de>	Thu Apr 16 15:30:26 2020 +0200
tree	6b2b3aabb7b6dc4203e99f0cce1f4177c8168d60
parent	4847ecfd3ba47f804c48812b38babda4b8e78c97 [diff]

Replace `NonAggregate` with something less broken

This commit implements the proposal laid out at
https://discourse.diesel.rs/t/proposed-change-replace-nonaggregate-with-something-less-broken-that-can-support-group-by/18.

The changes made in this commit now correctly enforce semantics around
the mixing of aggregate/non-aggregate expressions, and lays the
foundations required for full `GROUP BY` support in the future. This
commit does not implement `GROUP BY` as a feature in public API, though
I have added some minimal support to prove the soundness of the change.

Since this will likely be the largest breaking change in 2.0, I am going
to include as much detail as I can about the problem, the reasoning
behind the solution, and the likely impact of the change.

Recap of the problem
----

`NonAggregate` was originally introduced in
c66d96fa9cefaba0a418834e8cb0e78a0a87269a. The goal was to prevent the
following error at compile time:

   [local] sean@sean=# select count(*), name from users;
   ERROR:  42803: column "users.name" must appear in the GROUP BY clause or be used in an aggregate function

I didn't think about this nearly as much as I should have at the time,
which resulted in a hilariously broken implementation that has prevented
the addition of `group_by` support, and left bugs in the codebase for
more than 4 years.

At the time I was focused on "mixing aggregate and non-aggregate
expressions in a select statement". Since select uses tuples to
represent multiple values, I added a trait to represent non-aggregate
values, added it as a bound for `impl Expression for AnyTuple`, and
called it a day. This had a number of problems.

The most obvious was that it prevented valid queries involving multiple
aggregate expressions. At the time I figured we'd have a separate
trait for aggregate expressions, and then `impl Expression for AnyTuple
where AllFields: NonAggregate | AllFields: Aggregate`. This eventually
led to [RFC #1268](https://github.com/rust-lang/rfcs/pull/1268), which
doesn't seem likely to be stabilized soon, and even if it were we still
have the other issues with this solution.

The next issue is that you cannot say whether a given expression is
aggregate by looking at it on its own. The answer to "Is `users`.`name`
aggregate?" depends on whether or not that column appears in the group
by clause. So any trait which correctly handles this must be generic
over the group by clause, or it cannot be correctly implemented for
columns.

However, the most egregious issue with that commit is that it does not
handle *any* composite expressions besides tuples. Even today,
`.select((count_star(), id))` fails, but `.select(count_star() + id)`
will compile (and either error at runtime or give non-deterministic
results depending on your backend).

User Impact
----

This change will unfortunately have more breakage than I had
anticipated. That said, the breakage only affects *extremely* generic
code, and I do not believe that the majority of our users will notice or
care about this change.

There are three main ways in which the breakage will affect users:

- The additional bound on `SelectStatement<...>: SelectDsl` and
  `SelectStatement<...>: Query`.
  - This one broke our test suite in a few places, mainly where we have
    *really* generic code to say "I can select T.eq(sql_string)". I do
    not believe this is representative of real code.
  - I did a GitHub-wide search of all occurances of
    `SelectableExpression` (which is *not* the bound on the public API,
    but is the bound on `SelectStatement`'s impl, and would point to
    broken code). I found one repo which is relying on internals that
    will break, and a lot of results from Wundergraph. I didnt' look at
    those. This probably breaks Wundergraph. Sorry, Georg. It should be
    easy to fix I promise.
- `NonAggregate` can no longer be directly implemented
  - I did a GitHub-wide search for `NonAggregate`. With one exception,
    every repo implementing this trait directly was on pre-1.0. The only
    affected repo is manually implementing `Expression` instead of using
    `#[derive(AsExpression)]`. With that change they will be
    future-proofed.
- `T: NonAggregate` no longer implies `(OtherType, T): NonAggregate`
  - This broke `infer_schema`, since it was relying on `AssocType:
    NonAggregate` to know `(known_column, AssocType, known_column):
    Expression`. Ironically, that's no longer important, it did still
    break due to the first item on this list. I could not find any code
    in the wild that fell into this category.

Originally I thought that the only code which would be affected by this
is code that wrote `impl NonAggregate`, since that code would need to
change to `impl ValidGrouping`. However, I missed a few things when I
made that assessment.

The first is that... Well the feature exists. The whole point of this
was to prevent `aggregate + non_aggregate` from compiling when passed to
select, which implies a new trait bound *somewhere*. While
`SelectStatement` and its impls are private API, it's really easy to
couple yourself ot the bounds on those impls. It doesn't help that
`rustc` literally recommends that folks do that when errors occur. Any
code that is written as `Foo: SelectDsl<Selection>` will be fine, but
any code that's gone down the rabbit hole and has copied the bounds from
`impl SelectDsl for SelectStatement<...>` will break. I didn't find many
cases in the wild, but I do think it's relatively common.

The second thing I missed is that "is this aggregate" is not a binary
question. Originally I expected we would have two answers to the
question, and compound expressions would enforce that their
sub-expressions all had the same answer. The issue is that `1` doesn't
care. You can have `count(*) + 1`, and you can also have `non_aggregate
+ 1`. `1` is always valid, regardless of aggregation. So we need three
answers. The problem is that this means we can't have `T: NonAggregate`
mean everything it used to.

On stable `T: NonAggregate` will mean `T: ValidGrouping<()>`, and that
`T` can be passed to everything with a `NonAggregate` bound (`filter`,
`order`, etc). On nightly, it also implies `T::IsAggregate:
MixedAggregates<is_aggregate::No, Output = is_aggregate::No>`. In
English, it implies that this is valid with no group by clause, and its
output can appear with an expression which is not aggregate (but might
be with a different group by clause). The outcome of this is that `T:
NonAggregate` implies `(T, Other): NonAggregate`. However the missing
link from both stable and unstable is `is_aggregate::No:
MixedAggregates<T::IsAggregate>` being implied, which would mean
`(Other, T): NonAggregate` would be implied.

Ultimately this is a really long-winded way of saying that `T:
NonAggregate` used to imply interactions with other types. That is no
longer consistently true. It's unlikely there are many affected users,
but any that are affected will need to directly have a `ValidGrouping`
bound.

Implementation Notes
---

Because this broke diesel_infer_schema, I had to do a version bump to
get that crate to rely on the released 1.4.

There's a note on the unsoundness of the `NonAggregate` impl of
`Subselect`. I haven't changed the bounds on that impl, but we almost
certainly should address it before 2.0 is released.

I opted to validate the new bound in `SelectDsl`, so folks get errors on
`.select` instead of `.load`. We can't validate this on calls to both
`.select` *and* `.group_by`, since a select statement valid with a group
by is often invalid without one, and a group by clause usually makes the
default select clause invalid.

While this commit does not add proper group by support, I fleshed it out
a bit to test the type constraints. Right now a column only considers
itself valid if there is no group by clause, or if the group by clause
is exactly that column.

I had more to say but I went on a call and lost my train of thought. I
need to come back and finish writing this later.

Notable Limitations
---

`SELECT lower(name) FROM users GROUP BY lower(name)` probably can't be
represented.

Unanswered Questions
---

`BoxableExpression` has been limited to only work when there's no
group by clause, and only work with types which are `is_aggregate::No`.
This is probably not what we want to do, but was the smallest change to
start.

49 files changed

tree: 6b2b3aabb7b6dc4203e99f0cce1f4177c8168d60

README.md

A safe, extensible ORM and Query Builder for Rust

API Documentation: latest release – master branch

Homepage

Diesel gets rid of the boilerplate for database interaction and eliminates runtime errors without sacrificing performance. It takes full advantage of Rust's type system to create a low overhead query builder that “feels like Rust.”

Supported databases:

You can configure the database backend in Cargo.toml:

[dependencies]
diesel = { version = "<version>", features = ["<postgres|mysql|sqlite>"] }

Getting Started

Find our extensive Getting Started tutorial at https://diesel.rs/guides/getting-started. Guides on more specific features are coming soon.

Getting help

If you run into problems, Diesel has a very active Gitter room. You can come ask for help at gitter.im/diesel-rs/diesel. For help with longer questions and discussion about the future of Diesel, visit our discourse forum.

Code of conduct

Anyone who interacts with Diesel in any space, including but not limited to this GitHub repository, must follow our code of conduct.

License

Licensed under either of these:

Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)

Contributing

Before contributing, please read the contributors guide for useful information about setting up Diesel locally, coding style and common abbreviations.

Unless you explicitly state otherwise, any contribution you intentionally submit for inclusion in the work, as defined in the Apache-2.0 license, shall be dual-licensed as above, without any additional terms or conditions.