sql-agg

Basic usage

Assuming you have the following database table:

user	date	column_a	column_b
user1	2012-01-01	1	1
user1	2012-01-06	2	0
user2	2012-02-19	0	3

You can use sql-agg to extract aggregated data from the table as follows:

from datetime import date
from sqlagg import *
from sqlagg.columns import *

# create the columns
user = SimpleColumn("user")
i_a = SumColumn("column_a")
i_b = CountColumn("column_b")

# initialise the query context and add the columns to it
vc = QueryContext("table_name",
    filters=[GT('date', 'startdate'), LT('date', 'enddate')],
    group_by=["user"])
vc.append_column(user)
vc.append_column(i_a)
vc.append_column(i_b)

filter_values={
    "startdate": date(2012, 1, 1),
    "enddate": date(2012, 3, 1)
    }

# resolve the query context with the filter values (connection is an SQLAlchemy connection)
vc.resolve(connection, filter_values=filter_values)
data = vc.data

The resultant data variable will be a dictionary as follows:

{
    "user1": {
        "user": "user1",
        "column_a": 3,
        "column_b": 2
    },
    "user2": {
        "user": "user2"
        "column_a": 0,
        "column_b": 1
    }
}

Multi-level grouping

Multi-level grouping can be done by adding multiple SimpleColumn's to the QueryContext as well as multiple column names in the 'group_by' parameter of the QueryContext.

region = SimpleColumn("region")
sub_region = SimpleColumn("sub_region")
column_a = SumColumn("column_a")

vc = QueryContext("table_name"
    filters=None,
    group_by=["region","sub_region"])

The resultant data would look as follows:

{
    ("region1", "sub_region1"): {
        "region": "region1",
        "sub_region": "sub_region1",
        "column_a": 1
    },
    ("region1", "sub_region2"): {
        "region": "region1",
        "sub_region": "sub_region2",
        "column_a": 3
    },
    ("region2", "sub_region3"): {
        "region": "region2",
        "sub_region": "sub_region3",
        "column_a": 2
    }
}

Columns in detail

For each column you can specify the table, filters and also group_by fields. Using these features you can supply different filters per column or select data from different columns.

Different filters

column_a = SumColumn("column_a")
column_b = SumColumn("column_b", filters=[LT('date', 'enddate')])

In this case column_a will get the filters supplied to the QueryContext while column_b will be resolved with its own filters. This will result in two queries being run on the database.

Different tables

It is possible to select data from different tables by providing columns with different table_names.

column_a = SumColumn("column_a")
column_b = SumColumn("column_b", table_name="table_b", group_by=["user"])

Here column_a will be selected from the table configured in the QueryContext while column_b will be selected from table_name and will be grouped by user. This will result in two queries being run on the database.

As Name

It is possible to use the same column in multiple columns by specifying the alias argument of the column.

sum_a = SumColumn("column_a", alias="sum_a")
count_a = CountColumn("column_a", alias="count_a")

The resulting data will use the alias keys to reference the values.

Conditional / Case columns

Simple

num_wheels = SumWhen("vehicle", whens={"unicycle": 1, "bicycle": 2, "car": 4}, else_=0, alias="num_wheels")

Complex

num_children = SumWhen(whens={"users.age < 13": 1}, else_=0, alias="children")

Alias and Aggregate columns

Useful if you want to use a column more than once but don't want to re-calculate its value.

sum_a = SumColumn("column_a")

aggregate = AggregateColumn(lambda x, y: x / y,
                            AliasColumn("column_a"),
                            SumColumn("column_b")

TODO: custom queries

Filtering

The QueryContext and most column classes accept a filters parameter which must be iterable. Each element of this iterable must be a subclass of sqlagg.filter.SqlFilter. The elements of this parameter are combined using the AND operator.

i.e. filters = [EQ('user', 'username'), EQ('role', 'admin')]

is equivalent to:

filters = [AND([
    EQ('user', 'username'), EQ('role', 'admin')
])]

Any filter expression can be expressed using a RawFilter:

RawFilter('"user" = :username AND "date" between :start and :end')

In this case the same filter could be expressed as follows:

AND([EQ('user', 'username'), BETWEEN('date', 'start', 'end'])

Development

To install dependencies run

pip install .

Running Tests

First create an environment variable for the appropriate connection string:

export SQLAGG_TEST_CONNECTION_STRING='postgresql://user:pass@localhost:5432/sqlagg_test

Then run the following

python setup.py test

Note: If you face issues with psycopg2 try replacing with psycopg2-binary in setup.py

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
sqlagg		sqlagg
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGES.md		CHANGES.md
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sql-agg

Basic usage

Multi-level grouping

Columns in detail

Different filters

Different tables

As Name

Conditional / Case columns

Alias and Aggregate columns

Filtering

Development

Running Tests

About

Releases

Packages

Contributors 10

Languages

License

dimagi/sql-agg

Folders and files

Latest commit

History

Repository files navigation

sql-agg

Basic usage

Multi-level grouping

Columns in detail

Different filters

Different tables

As Name

Conditional / Case columns

Alias and Aggregate columns

Filtering

Development

Running Tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages