Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sum of Multiples] Add Approaches #3375

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions exercises/practice/sum-of-multiples/.approaches/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"introduction": {
"authors": [
"MatthijsBlom"
]
},
"approaches": [
{
"uuid": "7dd85d5b-12bd-48a6-97fe-8eb7dd87af72",
"slug": "filter-for-multiples",
"title": "Filter for multiples",
"blurb": "Use the built-in filter function to select the numbers that are multiples, then sum these.",
"authors": [
"MatthijsBlom"
]
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
# `filter` for multiples

```python
def sum_of_multiples(limit, factors):
is_multiple = lambda n: any(n % f == 0 for f in factors if f != 0)
return sum(filter(is_multiple, range(limit)))
```

Probably the most straightforward way of solving this problem is to

1. look at every individual integer between `0` and `limit`,
2. check that it is a multiple of any of the given `factors`, and
3. add it to the sum when it is.


## Notable language features used in this solution

### Built-in function: `sum`

Adding all the numbers in a collection together is a very common operation.
Therefore, Python provides the built-in function [`sum`][builtin-sum].

`sum` takes one argument, and requires that it be **iterable**.
A value is iterable whenever it makes sense to use it in a `for` loop like this:

```python
for element in iterable_value: # 👈
...
```

The `list` is the most commonly used iterable data structure.
Many other containers are also iterable, such as `set`s, `tuple`s, `range`s, and even `dict`s and `str`ings.
Still other examples include iterators and generators, which are discussed below.

When given a collection of numbers, `sum` will look at the elements one by one and add them up.
The result is a single number.

```python
numbers = range(1, 100 + 1) # 1, 2, …, 100
sum(numbers) # ⟹ 5050
```

Had the highlighted solution not used `sum`, it might have looked like this:

```python
def sum_of_multiples(limit, factors):
is_multiple = lambda n: any(n % f == 0 for f in factors if f != 0)
total = 0
for multiple in filter(is_multiple, range(limit)):
total += multiple
return total
```


### Built-in function: `filter`

Selecting elements of a collection for having a certain property is also a very common operation.
Therefore, Python provides the built-in function [`filter`][builtin-filter].

`filter` takes two arguments.
The first is a **predicate**.
The second is the iterable the elements of which should be filtered.

A predicate is a function that takes one argument (of any particular type) and returns a `bool`.
Such functions are commonly used to encode properties of values.
An example is `str.isupper`, which takes a `str` and returns `True` whenever it is uppercase:

```python
str.isupper("AAAAH! 😱") # ⟹ True
str.isupper("Eh? 😕") # ⟹ False
str.isupper("⬆️💼") # ⟹ False
```

Thus, the function `str.isupper` represents the property of _being an uppercase string_.

Contrary to what you might expect, `filter` does not return a data structure like the one given as the iterable argument:

```python
filter(str.isupper, ["THUNDERBOLTS", "and", "LIGHTNING"])
# ⟹ <filter object at 0x000002F46B107BE0>
```

Instead, it returns an **iterator**.

An iterator is an object whose sole purpose is to guide iteration through some data structure.
In particular, `filter` makes sure that elements that do not satisfy the predicate are skipped:

```python
for word in filter(str.isupper, ["THUNDERBOLTS", "and", "LIGHTNING"]):
print(word)
# prints:
# THUNDERBOLTS
# LIGHTNING
```

An iterator is a bit like a cursor that can move only to the right.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this explanation. Both from a cursor for typing ... and a cursor from a DB. Both can only be gone through once, and you can't back up. 😄


The main differences between containers (such as `list`s) and iterators are

- Containers can, depending on their contents, take up a lot of space in memory, but iterators are typically very small regardless of how many elements they 'contain'.
- Containers can be iterated over multiple times, but iterators can be used only once.

To illustrate the latter difference:

```python
is_even = lambda n: n % 2 == 0
numbers = range(20) # 0, 1, …, 19
even_numbers = filter(is_even, numbers) # 0, 2, …, 18
sum(numbers) # ⟹ 190
sum(numbers) # ⟹ 190
sum(even_numbers) # ⟹ 90
sum(even_numbers) # ⟹ 0
```

Here, `sum` iterates over both `numbers` and `even_numbers` twice.

In the case of `numbers` everything is fine.
Even after looping through the whole of `numbers`, all its elements are still there, and so `sum` can ask to see them again without problem.

The situation with `even_numbers` is less simple.
To use the _cursor_ analogy: after going through all of `even_number`'s 'elements' &ndash; actually elements of `numbers` &ndash; the cursor has moved all the way to the right.
It cannot move backwards, so if you wish to iterate over all even numbers again then you need a new cursor.
We say that the `even_numbers` iterator is _exhausted_. When `sum` asks for its elements again, `even_numbers` comes up empty and so `sum` returns `0`.

Had the highlighted solution not used `filter`, it might have looked like this:

```python
def sum_of_multiples(limit, factors):
is_multiple = lambda n: any(n % f == 0 for f in factors if f != 0)
multiples = [candidate for candidate in range(limit) if is_multiple(candidate)]
return sum(multiples)
```

This variant stores all the multiples in a `list` before summing them.
Such a list can become very big.
For example, if `limit = 1_000_000_000` and `factors = [1]` then `multiples` will take up 8 gigabytes of memory!
It is to avoid unnecessarily creating such large intermediate data structures that iterators are often used.


### A function expression: `lambda`

Typically, when using higher-order functions like `filter` and `map`, the function to pass as an argument does not yet exist and needs to be defined first.

The standard way of defining functions is through the `def` statement:

```python
def name(parameters):
statements
```

Downsides of this construct include

- the syntax can be a bit bulky
- it requires coming up with a fresh name

These qualities can be quite bothersome when you just need a simple function of no particular significance for single use only.
In situations like this you might like to use a **lambda expression** instead.

A lambda expression is a specific kind of expression that evaluates to a function.
It looks like this:

```python
lambda parameters: expression # general form
lambda a, b, x: a * x + b # specific example
```

This latter lambda expression evaluates to a function that takes three arguments (`a`, `b`, `x`) and returns the value `a * x + b`.
Except for not having a name, it is equivalent to the function defined by

```python
def some_name(a, b, x):
return a * x + b
```

A lambda expression need not necessarily be passed as an argument.
It can also be applied to arguments immediately, or assigned to a variable:
BethanyG marked this conversation as resolved.
Show resolved Hide resolved

```python
lambda a, b, x: a * x + b
# ⟹ <function <lambda> at 0x000001F36A274CC0>

(lambda a, b, x: a * x + b)(2, 3, 5)
# ⟹ 13

some_function = lambda a, b, x: a * x + b
some_function(2, 3, 5)
# ⟹ 13

list(filter(
lambda s: len(s) <= 3,
["aaaa", "b", "ccccc", "dd", "eee"]
))
# ⟹ ['b', 'dd', 'eee']
```

Only functions that can be defined using a single (`return`) statement can be written as a lambda expression.
If you need multiple statements, you have no choice but to use `def`.

The solution highlighted above assigns a lambda expression to a variable: `is_multiple`.
Some people consider this to be unidiomatic and feel one should always use `def` when a function is to have a name.
A lambda expression is used here anyway to demonstrate the feature, and also because the author prefers its compactness.

Had the highlighted solution not used `lambda`, it might have looked like this:

```python
def sum_of_multiples(limit, factors):
def is_multiple(n):
return any(n % f == 0 for f in factors if f != 0)

return sum(filter(is_multiple, range(limit)))
```


### Built-in function: `any`

...


### A generator expression

...


## Reflections on this approach

An important advantage of this approach is that it is very easy to understand.
However, it suffers from potentially performing a lot of unnecessary work, for example when all `factors` are large, or when there are no `factors` at all.

<!-- TODO elaborate -->


[builtin-sum]: https://docs.python.org/3/library/functions.html#sum "Built-in Functions: sum"
[builtin-filter]: https://docs.python.org/3/library/functions.html#filter "Built-in Functions: filter"
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
def sum_of_multiples(limit, factors):
is_multiple = lambda n: any([n % f == 0 for f in factors if f != 0])
return sum(filter(is_multiple, range(limit)))
Copy link
Member

@BethanyG BethanyG Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic, as it binds a name to a lambda, so needs re-work.

I think this is also referenced in the introduction. We'll need to remove all of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…I now see I also left a temporarily inserted list comprehension in there.

Would you rather have

def sum_of_multiples(limit, factors):
    return sum(filter(
        lambda n: any(n % f == 0 for f in factors if f != 0),
        range(limit)
    ))

and keep the explanation of lambda, or

def sum_of_multiples(limit, factors):
    def is_multiple (n):
        return any(n % f == 0 for f in factors if f != 0)

    return sum(filter(is_multiple, range(limit)))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either work for me. The first might be preferable, considering that you've already done a good explanation of lambda. The second one would need an explanation of nested functions. Absolutely not opposed to doing that - but it is extra work for you.

lambda can sometimes be slower in filter or map, since it opens another stack frame. But that's sorta irrelevant to this particular exercise, methinks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lambda can sometimes be slower in filter or map, since it opens another stack frame.

Can you elaborate or link to a source on this? I had no idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TL;DR is that lambda has all the overhead/execution of any other function call (here's a link to python tutor for the two examples above), so in any situation where you are using a lambda over a built-in, or in place of a generator expression that does the same/similar operation, you will incur the 'extra' overhead of the function call. Trivial in small/medium cases -- but it can add up for larger data sets.

And where you're converting that filter + lambda into a list or other 'realized' structure, it will be a lot slower than the corresponding comprehension or generator, due to the overhead of calling an additional function for every item in the list.

But this varies widely (since python 3.x returns iterators instead of lists) - if you don't need to realize the values and can consume them lazily (like in a call to sum()), then filter/map/reduce outperform comprehensions, and are mostly even for generators. But again, a generator that doesn't call an extra function will be faster than filter + lambda (because of the lambda function call).

Here are some articles - but many of them assume that values need to be realized, and so aren't really comparing apples to apples. The finxter blog (apologies for the aggressive ads there!) does do the comparisons for both realized and un-realized data - and you can really see the difference.

And I'll provide this for completeness, although its really mostly a rant about personal preference with readability, and not on performance: Trey Hunner: Stop Writing Lambda Expressions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I haven't read all the links yet, but the gist certainly makes sense. I thought previously you meant that

def f(x): return f_body(x)

map(f, xs)
# be faster than
map(lambda x: f_body(x), xs)

That in-lined functions in comprehensions are faster than mapped functions I already expected.

120 changes: 120 additions & 0 deletions exercises/practice/sum-of-multiples/.approaches/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Introduction

<!-- TODO write a proper introduction -->

Several possible approaches to this exercise:

- filter for multiples
- generate multiples and
- gather & de-duplicate, e.g. using `set().union`
- merge the multiple-generators into one
- spot the repeating pattern


## Approach: `filter` for multiples

```python
def sum_of_multiples(limit, factors):
is_multiple = lambda n: any(n % f == 0 for f in factors if f != 0)
return sum(filter(is_multiple, range(limit)))
```

Probably the most straightforward way of solving this problem is to

1. look at every individual integer between `0` and `limit`,
2. check that it is a multiple of any of the given `factors`, and
3. add it to the sum when it is.

An important advantage of this approach is that it is very easy to understand.
However, it suffers from potentially performing a lot of unnecessary work, for example when all `factors` are large, or when there are no `factors` at all.

[Read more about this approach][filter-for-multiples].


<!-- TODO improve section title -->
## Approach: generate & gather multiples

```python
def sum_of_multiples(limit, factors):
multiples = (range(0, limit, f) for f in factors if f != 0)
return sum(set().union(*multiples))
```

Egregious memory occupancy when multiples are many.

...


<!-- TODO improve section title -->
## Approach: merge the multiple-generators into one

```python
# NOTE This is a sketch (but it does work)
def sum_of_multiples(limit, factors):
generators = [range(0, limit, f) for f in factors if f != 0]
while len(generators) > 1:
generators = [
merge(g, g_)
for g, g_ in zip_longest(generators[0::2], generators[1::2], fillvalue=())
]
all_multiples, *_ = generators + [()]
return sum(all_multiples)


def merge(gen1, gen2):
"""Merge two sorted-without-duplicates iterables
into a single sorted-without-duplicates generator.
"""
return sorted({*gen1, *gen2}) # FIXME this is CHEATING
```

This is supposed to use very little memory.

...


<!-- TODO: improve section title -->
## Approach: spot the repeating pattern

```python
# NOTE this too is but a sketch (that nevertheless works)
def sum_of_multiples(limit, factors):
(*factors,) = filter(lambda f: f != 0, factors)
N = lcm(*factors)
is_multiple = lambda n: any(n % f == 0 for f in factors)
multiples_up_to_lcm = [n for n in range(1, N + 1) if is_multiple(n)]
q, r = divmod(limit - 1, N)
return (
q * (q - 1) // 2 * N * len(multiples_up_to_lcm)
+ q * sum(multiples_up_to_lcm)
+ sum(q * N + m for m in takewhile(lambda m: m <= r, multiples_up_to_lcm))
)
```

```text
assuming: limit = 22 multiples = [2, 3]
the task is to sum the lower 4 below rows

| 1 2 3 4 5 6| 7 8 9 10 11 12|13 14 15 16 17 18|19 20 21
| 2 3 4 6| 6 6 6 6| 6 6 6 6| 6 6
| | 2 3 4 6| 6 6 6 6| 6 6
| | | 2 3 4 6| 6 6
| | | | 2 3

We see
3 copies of - 2 3 4 - 6
0+1+2 = 3×(3-1)/2 = 3 copies of - 6 6 6 - 6
3 copies of - 6 6
1 copy of - 2 3
```

<!-- TODO properly explain this stuff -->

This approach saves on a lot of iteration, but is still vulnerable to excessive memory use.
Fortunately it can be combined with the generator merging approach.

...



[filter-for-multiples]: https://exercism.org/tracks/python/exercises/sum-of-multiples/approaches/filter-for-multiples "Approach: filter for multiples"