Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Blob filter #1595

Merged
merged 27 commits into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
165fe24
Integrating blob filter in azstorage
vibhansa-msft Dec 17, 2024
f1d1b99
Adding blob filter support
vibhansa-msft Dec 19, 2024
e79dd84
Sync with 2.4.1 branch
vibhansa-msft Dec 19, 2024
fc253ad
revert directory addtion change
vibhansa-msft Dec 19, 2024
2d41771
Adding UT for filtering
vibhansa-msft Dec 20, 2024
45a5ea1
Update change log and readme
vibhansa-msft Dec 20, 2024
ce48ad7
Update change log and readme
vibhansa-msft Dec 20, 2024
91dc8e9
Updated
vibhansa-msft Dec 20, 2024
994fc10
Correcting ut
vibhansa-msft Dec 20, 2024
a8bc7c7
Updating ut
vibhansa-msft Dec 20, 2024
d17fb30
Updated ut
vibhansa-msft Dec 20, 2024
9fc2985
Correcting ut
vibhansa-msft Dec 20, 2024
dd1504d
Correcting ut
vibhansa-msft Dec 23, 2024
f6c2d0c
Spell correction
vibhansa-msft Dec 23, 2024
c6c9735
Adding filter tests
vibhansa-msft Jan 3, 2025
6009ca2
Merge branch 'blobfuse/2.4.1' into vibhansa/blobfilter
vibhansa-msft Jan 3, 2025
c68fd78
Correcting UT
vibhansa-msft Jan 4, 2025
89e00e9
Sync with main
vibhansa-msft Jan 13, 2025
c787d22
Sync with feature branch
vibhansa-msft Jan 15, 2025
07e1eeb
Sync with feature branch
vibhansa-msft Jan 15, 2025
2edc9a7
Corrected varaible names
vibhansa-msft Jan 15, 2025
dc2432f
Rename variable as per review comments
vibhansa-msft Jan 15, 2025
3491e61
Update external API name as per review comments
vibhansa-msft Jan 15, 2025
91903b2
Sync with feature branch
vibhansa-msft Jan 15, 2025
44d1d94
Merge branch 'blobfuse/2.4.1' into vibhansa/blobfilter
vibhansa-msft Jan 15, 2025
4af191c
Correcting filter related test cases
vibhansa-msft Jan 16, 2025
75b06f7
Lint error fix
vibhansa-msft Jan 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
**Bug Fixes**
- Create block pool only in the child process.

**Features**
- Mount container or directory but restrict the view of blobs that you can see. This feature is available only in read-only mount.

## 2.4.0 (Unreleased)
**Features**
- Added 'gen-config' command to auto generate the recommended blobfuse2 config file based on computing resources and memory available on the node. Command details can be found with `blobfuse2 gen-config --help`.
Expand Down
33 changes: 33 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -4093,4 +4093,37 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.






****************************************************************************

============================================================================
>>> github.com/vibhansa-msft/blobfilter
==============================================================================

MIT License

Copyright (c) 2024 Vikas Bhansali

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


--------------------- END OF THIRD PARTY NOTICE --------------------------------
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ One of the biggest BlobFuse2 features is our brand new health monitor. It allows
- Set MD5 sum of a blob while uploading
- Validate MD5 sum on download and fail file open on mismatch
- Large file writing through write Block-Cache
- Blob filter to view only files matching given criteria for read-only mount

## Blobfuse2 performance compared to blobfuse(v1.x.x)
- 'git clone' operation is 25% faster (tested with vscode repo cloning)
Expand Down Expand Up @@ -139,6 +140,7 @@ To learn about a specific command, just include the name of the command (For exa
* `--wait-for-mount=<TIMEOUT IN SECONDS>` : Let parent process wait for given timeout before exit to ensure child has started.
* `--block-cache` : To enable block-cache instead of file-cache. This works only when mounted without any config file.
* `--lazy-write` : To enable async close file handle call and schedule the upload in background.
* `--filter=<STRING>`: Enable blob filters for read-only mount to restrict the view on what all blobs user can see or read.
- Attribute cache options
* `--attr-cache-timeout=<TIMEOUT IN SECONDS>`: The timeout for the attribute cache entries.
* `--no-symlinks=true`: To improve performance disable symlink support.
Expand Down Expand Up @@ -235,6 +237,32 @@ Below diagrams guide you to choose right configuration for your workloads.
- [Sample Block-Cache Config](./sampleBlockCacheConfig.yaml)
- [All Config options](./setup/baseConfig.yaml)

## Blob Filter
- In case of read-only mount, user can configure a filter to restrict what all blobs a mount can see or operate on.
- Blobfuse supports filters based on
- Name
- Size
- Last modified time
- File extension
- Blob Name based filter
- Supported operations are "=" and "!="
- Name shall be a valid regex expression
- e.g. ```filter=name=^mine[0-1]\\d{3}.*```
- Size based filter
- Supported operations are "<=", ">=", "!=", "<", ">" and "="
- Size shall be provided in bytes
- e.g. ```filter=size > 1000```
- Last Modified Date based filter
- Supported operations are "<=", ">=", "<", ">" and "="
- Date shall be provided in RFC1123 Format e.g. "Mon, 24 Jan 1982 13:00:00 UTC"
- e.g. ```filter=modtime>Mon, 24 Jan 1982 13:00:00 UTC```
- File Extension based filter
- Supported operations are "=" and "!="
- Extension can be supplied as string. Do not include "." in the filter
- e.g. ```--filter=format=pdf```
- Multiple filters can be combined using '&&' and '||' operator as well, however precedence using '()' is not supported yet.
- e.g. ```--filter=name=^testfil.* && size>130000000```


## Frequently Asked Questions
- How do I generate a SAS with permissions for rename?
Expand Down
3 changes: 3 additions & 0 deletions component/azstorage/azstorage.go
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,9 @@ func init() {
preserveACL := config.AddBoolFlag("preserve-acl", false, "Preserve ACL and Permissions set on file during updates")
config.BindPFlag(compName+".preserve-acl", preserveACL)

blobFilter := config.AddStringFlag("filter", "", "Filter string to match blobs")
config.BindPFlag(compName+".filter", blobFilter)

config.RegisterFlagCompletionFunc("container-name", func(cmd *cobra.Command, args []string, toComplete string) ([]string, cobra.ShellCompDirective) {
return nil, cobra.ShellCompDirectiveNoFileComp
})
Expand Down
35 changes: 31 additions & 4 deletions component/azstorage/block_blob.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ import (
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/Azure/azure-storage-fuse/v2/internal"
"github.com/Azure/azure-storage-fuse/v2/internal/stats_manager"
"github.com/vibhansa-msft/blobfilter"
)

const (
Expand Down Expand Up @@ -456,7 +457,6 @@ func (bb *BlockBlob) getAttrUsingRest(name string) (attr *internal.ObjAttr, err
}

parseMetadata(attr, prop.Metadata)

attr.Flags.Set(internal.PropFlagModeDefault)

return attr, nil
Expand Down Expand Up @@ -516,10 +516,23 @@ func (bb *BlockBlob) GetAttr(name string) (attr *internal.ObjAttr, err error) {

// To support virtual directories with no marker blob, we call list instead of get properties since list will not return a 404
if bb.Config.virtualDirectory {
return bb.getAttrUsingList(name)
attr, err = bb.getAttrUsingList(name)
} else {
attr, err = bb.getAttrUsingRest(name)
}

if bb.Config.filter != nil && attr != nil {
if !bb.Config.filter.IsFileAcceptable(&blobfilter.BlobAttr{
Name: attr.Name,
Mtime: attr.Mtime,
Size: attr.Size,
}) {
log.Debug("BlockBlob::GetAttr : Filtered out %s", name)
return nil, syscall.ENOENT
}
}

return bb.getAttrUsingRest(name)
return attr, err
}

// List : Get a list of blobs matching the given prefix
Expand Down Expand Up @@ -578,6 +591,7 @@ func (bb *BlockBlob) List(prefix string, marker *string, count int32) ([]*intern

// For some directories 0 byte meta file may not exists so just create a map to figure out such directories
var dirList = make(map[string]bool)
filterAttr := blobfilter.BlobAttr{}
for _, blobInfo := range listBlob.Segment.BlobItems {
attr := &internal.ObjAttr{}
if blobInfo.Properties.CustomerProvidedKeySHA256 != nil && *blobInfo.Properties.CustomerProvidedKeySHA256 != "" {
Expand All @@ -601,15 +615,28 @@ func (bb *BlockBlob) List(prefix string, marker *string, count int32) ([]*intern
MD5: blobInfo.Properties.ContentMD5,
}
parseMetadata(attr, blobInfo.Metadata)

attr.Flags.Set(internal.PropFlagModeDefault)
}
blobList = append(blobList, attr)

if attr.IsDir() {
// 0 byte meta found so mark this directory in map
dirList[*blobInfo.Name+"/"] = true
attr.Size = 4096
}

if bb.Config.filter != nil && !attr.IsDir() {
filterAttr.Name = attr.Name
filterAttr.Mtime = attr.Mtime
filterAttr.Size = attr.Size
if bb.Config.filter.IsFileAcceptable(&filterAttr) {
blobList = append(blobList, attr)
} else {
log.Debug("BlockBlob::List : Filtered out blob %s", attr.Name)
}
} else {
blobList = append(blobList, attr)
}
}

// In case virtual directory exists but its corresponding 0 byte marker file is not there holding hdi_isfolder then just iterating
Expand Down
67 changes: 66 additions & 1 deletion component/azstorage/block_blob_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ import (
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/Azure/azure-storage-fuse/v2/internal"
"github.com/Azure/azure-storage-fuse/v2/internal/handlemap"
"github.com/vibhansa-msft/blobfilter"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/suite"
Expand Down Expand Up @@ -174,7 +175,6 @@ func newTestAzStorage(configuration string) (*AzStorage, error) {
_ = config.ReadConfigFromReader(strings.NewReader(configuration))
az := NewazstorageComponent()
err := az.Configure(true)

return az.(*AzStorage), err
}

Expand Down Expand Up @@ -3387,6 +3387,71 @@ func (suite *blockBlobTestSuite) TestTruncateNoBlockFileToLarger() {
suite.UtilityFunctionTruncateFileToLarger(200*MB, 300*MB)
}

func (s *blockBlobTestSuite) TestBlobFilters() {
defer s.cleanupTest()
// Setup
var err error
name := generateDirectoryName()
err = s.az.CreateDir(internal.CreateDirOptions{Name: name})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd1.txt"})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd2.txt"})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd3.txt"})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/abcd4.txt"})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/bcd1.txt"})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/cd1.txt"})
s.assert.Nil(err)
_, err = s.az.CreateFile(internal.CreateFileOptions{Name: name + "/d1.txt"})
s.assert.Nil(err)
err = s.az.CreateDir(internal.CreateDirOptions{Name: name + "/subdir"})
s.assert.Nil(err)

var iteration int = 0
var marker string = ""
blobList := make([]*internal.ObjAttr, 0)

for {
new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50})
s.assert.Nil(err)
blobList = append(blobList, new_list...)
marker = new_marker
iteration++

log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration)
if new_marker == "" {
break
}
}
s.assert.EqualValues(8, len(blobList))

filter := &blobfilter.BlobFilter{}
s.az.storage.(*BlockBlob).Config.filter = filter

err = filter.Configure("name=^abcd.*")
s.assert.Nil(err)
blobList = make([]*internal.ObjAttr, 0)
for {
new_list, new_marker, err := s.az.StreamDir(internal.StreamDirOptions{Name: name + "/", Token: marker, Count: 50})
s.assert.Nil(err)
blobList = append(blobList, new_list...)
marker = new_marker
iteration++

log.Debug("AzStorage::ReadDir : So far retrieved %d objects in %d iterations", len(blobList), iteration)
if new_marker == "" {
break
}
}
// Only 4 files matches the pattern but there is a directory as well and directories are not filtered by blobfilter
s.assert.EqualValues(5, len(blobList))
s.az.stConfig.filter = nil
}

func (suite *blockBlobTestSuite) UtilityFunctionTestTruncateFileToSmaller(size int, truncatedLength int) {
defer suite.cleanupTest()
// Setup
Expand Down
27 changes: 27 additions & 0 deletions component/azstorage/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ import (
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob"
"github.com/Azure/azure-storage-fuse/v2/common/config"
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/vibhansa-msft/blobfilter"

"github.com/JeffreyRichter/enum/enum"
)
Expand Down Expand Up @@ -187,6 +188,7 @@ type AzStorageOptions struct {
CPKEncryptionKey string `config:"cpk-encryption-key" yaml:"cpk-encryption-key"`
CPKEncryptionKeySha256 string `config:"cpk-encryption-key-sha256" yaml:"cpk-encryption-key-sha256"`
PreserveACL bool `config:"preserve-acl" yaml:"preserve-acl"`
Filter string `config:"filter" yaml:"filter"`

// v1 support
UseAdls bool `config:"use-adls" yaml:"-"`
Expand Down Expand Up @@ -504,6 +506,12 @@ func ParseAndValidateConfig(az *AzStorage, opt AzStorageOptions) error {
}

az.stConfig.preserveACL = opt.PreserveACL
if opt.Filter != "" {
err = configureBlobFilter(az, opt)
if err != nil {
return err
}
}

log.Crit("ParseAndValidateConfig : account %s, container %s, account-type %s, auth %s, prefix %s, endpoint %s, MD5 %v %v, virtual-directory %v, disable-compression %v, CPK %v",
az.stConfig.authConfig.AccountName, az.stConfig.container, az.stConfig.authConfig.AccountType, az.stConfig.authConfig.AuthMode,
Expand All @@ -517,6 +525,25 @@ func ParseAndValidateConfig(az *AzStorage, opt AzStorageOptions) error {
return nil
}

func configureBlobFilter(az *AzStorage, opt AzStorageOptions) error {
readonly := false
_ = config.UnmarshalKey("read-only", &readonly)
if !readonly {
log.Err("configureBlobFilter: Blob filters are supported only in read-only mode")
return errors.New("blobfilter is supported only in read-only mode")
}

az.stConfig.filter = &blobfilter.BlobFilter{}
err := az.stConfig.filter.Configure(opt.Filter)
if err != nil {
log.Err("configureBlobFilter : Failed to configure blob filter %s", err.Error())
return errors.New("failed to configure blob filter")
}

log.Crit("configureBlobFilter : Blob filter configured %s", opt.Filter)
return nil
}

// ParseAndReadDynamicConfig : On config change read only the required config
func ParseAndReadDynamicConfig(az *AzStorage, opt AzStorageOptions, reload bool) error {
log.Trace("ParseAndReadDynamicConfig : Reparsing config")
Expand Down
4 changes: 4 additions & 0 deletions component/azstorage/connection.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ import (
"github.com/Azure/azure-storage-fuse/v2/common"
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/Azure/azure-storage-fuse/v2/internal"
"github.com/vibhansa-msft/blobfilter"
)

// Example for azblob usage : https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#pkg-examples
Expand Down Expand Up @@ -82,6 +83,9 @@ type AzStorageConfig struct {
cpkEnabled bool
cpkEncryptionKey string
cpkEncryptionKeySha256 string

// Blob filters
filter *blobfilter.BlobFilter
}

type AzStorageConnection struct {
Expand Down
27 changes: 26 additions & 1 deletion component/azstorage/datalake.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ import (
"github.com/Azure/azure-storage-fuse/v2/common"
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/Azure/azure-storage-fuse/v2/internal"
"github.com/vibhansa-msft/blobfilter"

"github.com/Azure/azure-sdk-for-go/sdk/azcore"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/to"
Expand Down Expand Up @@ -417,6 +418,17 @@ func (dl *Datalake) GetAttr(name string) (attr *internal.ObjAttr, err error) {
}
}

if dl.Config.filter != nil {
if !dl.Config.filter.IsFileAcceptable(&blobfilter.BlobAttr{
Name: attr.Name,
Mtime: attr.Mtime,
Size: attr.Size,
}) {
log.Debug("Datalake::GetAttr : Filtered out %s", name)
return nil, syscall.ENOENT
}
}

return attr, nil
}

Expand Down Expand Up @@ -529,8 +541,21 @@ func (dl *Datalake) List(prefix string, marker *string, count int32) ([]*interna
// Any method that populates the metadata should set the attribute flag.
// Alternatively, if you want Datalake list paths to return metadata/properties as well.
// pass CLI parameter --no-symlinks=false in the mount command.
pathList = append(pathList, attr)

// We filter only files for now so if its directory add it to return list
if dl.Config.filter != nil && (attr.Mode&os.ModeDir) == 0 {
filterAttr := blobfilter.BlobAttr{}
filterAttr.Name = attr.Name
filterAttr.Mtime = attr.Mtime
filterAttr.Size = attr.Size
if dl.Config.filter.IsFileAcceptable(&filterAttr) {
pathList = append(pathList, attr)
} else {
log.Debug("Datalake::List : Filtered out %s", *pathInfo.Name)
}
} else {
pathList = append(pathList, attr)
}
}

return pathList, listPath.Continuation, nil
Expand Down
Loading
Loading