Skip to content

Commit ad6e8d9

Browse files
committed
Init
0 parents  commit ad6e8d9

File tree

7 files changed

+315
-0
lines changed

7 files changed

+315
-0
lines changed

README.md

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# AWS Cookbook
2+
Collection of AWS commands and scripts that I use on a regular basis.
3+
4+
- [S3](docs/s3.md)
5+
- [Athena](docs/athena.md)
6+
- [Glue](docs/glue.md)
7+
- [SES](docs/ses.md)
8+
- [IAM](docs/iam.md)
9+
- [STS](docs/sts.md)
10+
11+
12+
## Prerequisites
13+
### Dependencies
14+
- [**jq**](https://stedolan.github.io/jq/): A lightweight and flexible
15+
command-line JSON processor.
16+
- [**parallel**](https://www.gnu.org/software/parallel/): A shell tool for
17+
executing jobs in parallel using one or more computers.
18+
- [**s5cmd**](https://github.com/peak/s5cmd): A fast S3 command line tool written
19+
in Go.
20+
21+
### Config
22+
In the **aws**' `config` file, set the output to `json`.
23+
24+
```
25+
output = json
26+
```

docs/athena.md

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Athena
2+
3+
## Caveats
4+
Queries' resultsets must be stored in an S3 bucket, and that bucket must be in
5+
the same region as the Athena resources queried.
6+
7+
## Using a Local SQL Client to Query Athena
8+
AWS provides
9+
[JDBC Drivers](https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html)
10+
for Athena.
11+
12+
Most SQL client written in Java will work with it.
13+
[DBeaver](https://dbeaver.io/) is one such client. It is also free.
14+
15+
Be aware that DBeaver does won't be able to retrieve resultsets of queries
16+
between different sessions. Running them several times will only stack
17+
resultsets in the specified output bucket.
18+
19+
## Execute a query
20+
21+
```shell
22+
aws athena start-query-execution \
23+
--query-string "SELECT * FROM database.table limit 10;"\
24+
--result-configuration "OutputLocation=s3://output-bucket/"
25+
```

docs/glue.md

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Glue
2+
AWS Glue is a managed ETL service.
3+
4+
## ARNs
5+
### Glue Catalog ARN Format
6+
<pre>
7+
arn:aws:glue:<i>region</i>:<i>account-id</i>:catalog
8+
</pre>
9+
10+
Example:
11+
12+
`arn:aws:glue:eu-west-1:999999999999:catalog`
13+
14+
### Glue Database ARN Format
15+
<pre>
16+
arn:aws:glue:<i>region</i>:<i>account-id</i>:database/<i>database name</i>
17+
</pre>
18+
19+
Example:
20+
21+
`arn:aws:glue:eu-west-1:999999999999:database/salesdb`
22+
23+
### Glue Table ARN Format
24+
<pre>
25+
arn:aws:glue:<i>region</i>:<i>account-id</i>:table/<i>database name</i>/<i>table name</i>
26+
</pre>
27+
28+
Example:
29+
30+
`arn:aws:glue:eu-west-1:999999999999:table/salesdb/salestable`
31+
32+
## Listing All Glue Tables And Their S3 Location
33+
Assuming *accound id* is 999999999999, and saving the output in file
34+
`tables_location.json`:
35+
36+
```shell
37+
(for database in $(aws glue get-databases --catalog-id 999999999999 | jq ".DatabaseList[]" | jq -r ".Name"); do
38+
for table in $(aws glue get-tables --database-name $database --catalog-id 999999999999 | jq ".TableList[]" | jq -r ".Name"); do
39+
aws glue get-table --database-name $database --name $table | jq -c ".Table | {database: .DatabaseName, name: .Name, location: .StorageDescriptor.Location}"
40+
done
41+
done) | tee tables_location.json
42+
```
43+
44+
This command also retrieve views, which are not tables. To filter out views,

docs/iam.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# IAM
2+
3+
## Searching a Given Pattern in all the Policies of an Account
4+
Looking for `pattern`
5+
```shell
6+
for arn version in $(aws iam list-policies --scope Local | jq -r ".Policies[] | [.Arn,.DefaultVersionId] | @csv " | sed 's/"//g' | cut -f1,2 -d, | tr , ' '); do
7+
echo $arn
8+
aws iam get-policy-version --no-cli-pager --policy-arn $arn --version-id $version | grep -i "pattern"
9+
done
10+
```
11+
12+
## Finding Who Has Access to a Folder in an S3 Bucket
13+
This script generates a CSV file which lists the users of an account and tells
14+
if they have `s3:GetObject` access to the `/specific/folder` in `bucket_name`
15+
bucket or not.
16+
17+
```shell
18+
(
19+
echo "User;Access allowed" && \
20+
for user in $(aws iam list-users --query 'Users[].Arn' --output text); do
21+
echo "${user};"\
22+
$(aws iam simulate-principal-policy --policy-source-arn "$user" \
23+
--action-names "s3:GetObject" \
24+
--resource-arns "arn:aws:s3:::bucket_name/specific/folder/*" \
25+
| jq -r ".EvaluationResults[].EvalDecision")
26+
done
27+
) | tee file.csv
28+
```

docs/s3.md

+161
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
# S3 (Simple Storage Service)
2+
## Good to Know...
3+
S3 is more a key / value storage where the value is the content of a file.
4+
There's no such concept as folders in S3. But this is emulated by using prefixes
5+
containing `/` as folder separator in the key name.
6+
7+
## Deleting an object
8+
With `aws s3`
9+
10+
```shell
11+
aws s3 rm s3://bucket_name/object_key
12+
```
13+
14+
With `aws s3api`
15+
16+
```shell
17+
aws s3api delete-object --bucket bucket_name --key object_key
18+
```
19+
20+
## Listing all older versions of all objects
21+
It is highly recommended to save the output in a file as it can be huge and long
22+
to obtain on large buckets.
23+
24+
```shell
25+
aws s3api list-object-versions \
26+
--bucket bucket_name \
27+
| jq '[.Versions[] | select(.IsLatest == false)]' \
28+
| jq -c ".[] | {Key, VersionId}" > bucket_older_versions.json
29+
```
30+
31+
### Filtering on a specific prefix
32+
33+
Use option `--prefix`:
34+
35+
```shell
36+
aws s3api list-object-versions \
37+
--bucket bucket_name \
38+
--prefix 'prefix/' \
39+
| jq '[.Versions[] | select(.IsLatest == false)]' \
40+
| jq -c ".[] | {Key, VersionId}" > bucket_older_versions.json
41+
```
42+
43+
### Delete all older versions of all objects
44+
Assuming we ran the above commands and saved the output in a file named
45+
`bucket_older_versions.json`, and using 20 threads:
46+
47+
```shell
48+
cat bucket_older_versions.json | parallel -j 20 --linebuffer '
49+
key=$(echo {} | jq -r ".Key")
50+
versionId=$(echo {} | jq -r ".VersionId")
51+
aws s3api delete-object --bucket bucket_name --key "$key" --version-id "$versionId"
52+
'
53+
```
54+
55+
## List all objects from only the root folder of a bucket
56+
```bash
57+
aws s3api list-object-v2 \
58+
--bucket bucket_name \
59+
--delimiter '/' \
60+
--prefix ''
61+
```
62+
63+
## Computing size of objects in a bucket
64+
First generate a full listing of all objects in a bucket and save it in a file.
65+
```shell
66+
aws s3 ls --recursive s3://bucket_name > bucket_name_listing.txt
67+
```
68+
69+
The advantage of pregenerating the listing is that it can be used multiple times
70+
without having to query the S3 API (and get charged) again and again. It is also
71+
much faster to process a local file.
72+
73+
The downside is the initial time to generate that listing and the fact it may
74+
not be up to date. So use that approach mostly for very large buckets.
75+
76+
### Total size of all objects of the bucket
77+
```shell
78+
awk '{sum+=$3} END {print sum}' bucket_name_listing.txt
79+
```
80+
81+
To get the size in MB, divide by (1024 * 1024). For sizes in GB, divide by
82+
(1024 * 1024 * 1024).
83+
84+
Total size of all objects in GB:
85+
86+
```shell
87+
awk '{sum+=$3} END {print sum/(1024*1024*1024)}' bucket_name_listing.txt
88+
```
89+
90+
### Total Size of Objects in a Specific Folder In GB
91+
```shell
92+
grep 'folder_name/' bucket_name_listing.txt \
93+
| awk '{sum+=$3} END {print sum/(1024*1024*1024)}'
94+
```
95+
96+
## Exploring a Bucket's Access Log.
97+
If a `bucket_name` bucket has access logs enabled, and the logs are stored at
98+
`s3://bucket_logs/bucket_name/`, then it is possible to query them with an
99+
**Athena table**.
100+
101+
Create the table with the following SQL
102+
```sql
103+
CREATE EXTERNAL TABLE bucket_name_access_logs (
104+
bucketowner STRING,
105+
bucket_name STRING,
106+
requestdatetime STRING,
107+
remoteip STRING,
108+
requester STRING,
109+
requestid STRING,
110+
operation STRING,
111+
key STRING,
112+
request_uri STRING,
113+
httpstatus STRING,
114+
errorcode STRING,
115+
bytessent BIGINT,
116+
objectsize BIGINT,
117+
totaltime STRING,
118+
turnaroundtime STRING,
119+
referrer STRING,
120+
useragent STRING,
121+
versionid STRING,
122+
hostid STRING,
123+
sigv STRING,
124+
ciphersuite STRING,
125+
authtype STRING,
126+
endpoint STRING,
127+
tlsversion STRING,
128+
accesspointarn STRING,
129+
aclrequired STRING)
130+
ROW FORMAT SERDE
131+
'org.apache.hadoop.hive.serde2.RegexSerDe'
132+
WITH SERDEPROPERTIES (
133+
'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$')
134+
STORED AS INPUTFORMAT
135+
'org.apache.hadoop.mapred.TextInputFormat'
136+
OUTPUTFORMAT
137+
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
138+
LOCATION
139+
's3://bucket_logs/bucket_name/'
140+
```
141+
142+
Then looking for logs is straightforward, but there's a little caveat. The
143+
`requestdatetime` field is not in a `datetime` friendly format. To convert it,
144+
use:
145+
146+
```sql
147+
parse_datetime(requestdatetime, 'dd/MMM/yyyy:HH:mm:ss Z')
148+
```
149+
150+
For example, to get the most recent GET requests on the bucket:
151+
152+
```sql
153+
SELECT
154+
*
155+
FROM
156+
bucket_name_access_logs
157+
WHERE
158+
operation LIKE 'REST.GET.%'
159+
ORDER BY
160+
parse_datetime(requestdatetime, 'dd/MMM/yyyy:HH:mm:ss Z') DESC
161+
```

docs/ses.md

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# SES (Simple Email Service)
2+
3+
## Request production access for SES
4+
Assuming accessing AWS account with profile **my_profile**:
5+
6+
```shell
7+
aws sesv2 put-account-details \
8+
--profile my_profile \
9+
--production-access-enabled \
10+
--mail-type TRANSACTIONAL \
11+
--website-url https://your.website.com \ # doesn't matter
12+
--use-case-description "describe your usecase" \
13+
--additional-contact-email-addresses [email protected] \
14+
--contact-language EN
15+
```

docs/sts.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# STS (Simple Token Service)
2+
3+
## Get a session token for a MFA device
4+
```shell
5+
aws sts get-session-token \
6+
--serial-number arn:aws:iam::999999999999:mfa/mfa.device.name \
7+
--token [TOKEN]
8+
```
9+
10+
Then set the environment variable:
11+
12+
```shell
13+
export AWS_SESSION_TOKEN=token_from_response
14+
```
15+
16+
But it is best to use proper AWS CLI profiles.

0 commit comments

Comments
 (0)