-
Notifications
You must be signed in to change notification settings - Fork 97
[REEF-2036] Add IFileSystem api to check if uri is a directory #1473
base: master
Are you sure you want to change the base?
Conversation
public bool IsDirectory(Uri uri) | ||
{ | ||
var path = uri.AbsolutePath.Trim('/'); | ||
var blobItems = _client.ListBlobsSegmented(path, false, BlobListingDetails.Metadata, 1, null, null, null).Results; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 [](start = 97, length = 1)
1 [](start = 97, length = 1)
super-nit: Make a static variable. It helps self document the code. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static wont apply to local variable. Not sure if you wanted to make this a class variable. I have used const only
In reply to: 195218006 [](ancestors = 195218006,194580367,194554180)
/// <returns>true if uri is for a directory else false</returns> | ||
public bool IsDirectory(Uri uri) | ||
{ | ||
return Exists(uri) && _adlsClient.GetDirectoryEntry(uri.AbsolutePath).Type == DirectoryEntryType.DIRECTORY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exists(uri) && [](start = 19, length = 14)
Exists(uri) && [](start = 19, length = 14)
Call Exists doubles the number of network calls. Can we just rely on the GetDirectoryEntry alone? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes the double network call but the catch all could hide simple exceptions like a network error.
In reply to: 194580374 [](ancestors = 194580374,194554633)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the change reverted? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_adlsClient.GetDirectoryEntry(uri.AbsolutePath) throws exception if uri not present
In reply to: 195886246 [](ancestors = 195886246)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you look at the source for adls client, https://github.com/Azure/azure-data-lake-store-net/blob/master/AdlsDotNetSDK/ADLSClient.cs
CheckExists internally call GetDirectoryEntry, so you are calling it twice. Can you model what the CheckExists code does internally? That may help with handling the specific case.
In reply to: 196918926 [](ancestors = 196918926,195886246)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public bool IsDirectory(Uri uri) | ||
{ | ||
var stdOut = _commandRunner.Run("dfs -count " + uri).StdOut; | ||
return stdOut.Any() && stdOut.First().StartsWith("1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.StartsWith("1"); [](start = 49, length = 17)
Elaborate this check. What if it the result is "11"? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -295,6 +290,20 @@ public void TestDeleteDirectorySecondLevelE2E() | |||
Assert.True(CheckContainerExists(_container)); | |||
} | |||
|
|||
private List<CloudBlockBlob> SetupBlobs(string directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetupBlobs [](start = 37, length = 10)
nit: SetupBlobs doesn't really let me know what this function is doing. I suggest renaming something more explanitory and also exposing the blob count as well. maybe something like CreateTempBlobs(string directory, int blobCount=3) #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string dirName = $"/{_defaultFolderName}"; | ||
string fakeDirName = $"/fakeDir"; | ||
_adlsClient.CreateDirectory(dirName); | ||
string fileName = UploadFromString(ContentsText); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supernit: try to add newlines to break out the logic. Giant block of code are difficult to follow. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.False(_fileSystem.IsDirectory(remoteFileUri)); | ||
_fileSystem.DeleteDirectory(remoteDirUri); | ||
_fileSystem.Delete(remoteFileUri); | ||
File.Delete(localFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cleanup should be handled in the test cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but please avoid in the future if possible. Please break up this code block for readability as well.
In reply to: 194580410 [](ancestors = 194580410,194556624)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
@@ -228,6 +229,21 @@ public void TestCopyFromLocalE2E() | |||
container.DeleteIfExistsAsync().Wait(); | |||
} | |||
|
|||
[Fact(Skip = SkipMessage)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is following the existing pattern -- but instead of having to remove all the skips and then set credentials, can we change this to dynamically skip the test if the credentials are not set? It looks like xunit does support a SkipTestException: https://github.com/xunit/samples.xunit/blob/master/DynamicSkipExample/SkipTestException.cs
Not sure if this is supported in our version of xunit or not. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also put the check in a single function and call it before the test. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SkippableFact is not supported in our version of xunit.
Yopu dont need to change each SkipMessage.Just set SkipMessage to null at top and all tests will run. These tests can only be run locally and a valid connectionstring is required to be inserted. I dont see how making this dynamic would be any help since the conn string validation will be run everytime in a build pipeline and it will always be expected to fail
In reply to: 195885915 [](ancestors = 195885915)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value can only be a const string.
You dont need to change each SkipMessage.Just set SkipMessage to null at top and all tests will run. These tests can only be run locally and a valid connectionstring is required to be inserted. I dont see how making this dynamic would be any help since the conn string validation will be run everytime in a build pipeline and it will always be expected to fail
In reply to: 195886005 [](ancestors = 195886005)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sounds good, it maybe worth making that a bit more clear above ( there is a comment off to the side that says use null to run tests. It maybe more clear to have another line under SkipMessage with SkipMessage = null commented out and a comment above to uncomment to run skipped tests.
In reply to: 196930627 [](ancestors = 196930627,195885915)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this as it is as discussed offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate? We might need to refer to this PR in a future when we have all forgotten what was discussed offline :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scott said it is fine to have the current line.
private const string SkipMessage = "Fill in credentials before running test"; // Use null to run tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did -- so setting SkipMessage=null has the same effect as the SkipTestException since our version of XUnit does not support it. But I think the comment of "// Use null to run tests" could be a bit more explicit. This is more what I was thinking:
SkipMessage="....";
//Uncomment SkipMessage=null to run tests
//SkipMessage=null
``` #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as suggested
/// <returns>true if uri is for a directory else false</returns> | ||
public bool IsDirectory(Uri uri) | ||
{ | ||
var path = uri.AbsolutePath.Trim('/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the uri need to trim the '/'? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ListBlobsSegmented does not work is there is a '/' at the beginining of the path. I have changed to AbsolutePath.TrimStart('/')
In reply to: 195886086 [](ancestors = 195886086)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// <returns>true if uri is for a directory else false</returns> | ||
public bool IsDirectory(Uri uri) | ||
{ | ||
return Exists(uri) && _adlsClient.GetDirectoryEntry(uri.AbsolutePath).Type == DirectoryEntryType.DIRECTORY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the change reverted? #Resolved
@@ -142,6 +142,18 @@ public void CreateDirectory(Uri directoryUri) | |||
_commandRunner.Run("dfs -mkdir " + directoryUri); | |||
} | |||
|
|||
public bool IsDirectory(Uri uri) | |||
{ | |||
var stdOut = _commandRunner.Run("dfs -count " + uri).StdOut; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to do this? Please see here:
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#test
hdfs dfs -test -d will check to see if the path is a directory. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had tried to use this initially but it does not return any value. It sets the value in $? but I am not sure how to run that command using the commandrunner
In reply to: 195886573 [](ancestors = 195886573)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So $? is a linux variable that contains the value of the last run command. This will be 0 for success or 1 for failure. Talking with you offline, it looks like %errorlevel% in dos would work, however we need to account for linux as well. I think the right way to do this is to echo the %errorlevel% for now and file a JIRA to handle this correctly on linux. Parsing an integer from stdout should be a lot simpler and less error prone.
In reply to: 196918217 [](ancestors = 196918217,195886573)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuationToken, | ||
blobRequestOptions, | ||
operationContext); | ||
task.Wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task.Wait can be removed and just have return task.Result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has been taken care of in a previous PR but I have changed it here too
{ | ||
// TODO[JIRA REEF - 2039]: HadoopFileSystem .IsDirectory() check needs to work on linux machines. | ||
var stdOut = _commandRunner.Run("dfs -test -d " + uri + "& call echo %^errorlevel%").StdOut; | ||
if (stdOut.Any()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit pick, this if statement could be reduced to return stdOut.Any() && stdOut.First() == '0';
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @dwaijam
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert.False(fs.IsDirectory(fakeDirectoryUri)); | ||
Assert.False(fs.IsDirectory(fileUri)); | ||
fs.Delete(fileUri); | ||
fs.DeleteDirectory(directoryUri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cleanup block to make more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a few nit picks
/// <returns>true if uri is for a directory else false</returns> | ||
public bool IsDirectory(Uri uri) | ||
{ | ||
var path = uri.AbsolutePath.TrimStart('/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use UrlPathSeparator
constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ | ||
var path = uri.AbsolutePath.TrimStart('/'); | ||
const int maxBlobResults = 1; | ||
var blobItems = _client.ListBlobsSegmented(path, false, BlobListingDetails.Metadata, maxBlobResults, null, null, null).Results; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we need maxBlobResults
constant here. for readability, we can just write maxResults: 1
, e.g.
var blobItems = _client.ListBlobsSegmented(
path,
useFlatListing: false,
BlobListingDetails.Metadata,
maxResults: 1,
continuationToken: null,
blobRequestOptions: null,
operationContext: null).Results;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var path = uri.AbsolutePath.TrimStart('/'); | ||
const int maxBlobResults = 1; | ||
var blobItems = _client.ListBlobsSegmented(path, false, BlobListingDetails.Metadata, maxBlobResults, null, null, null).Results; | ||
return blobItems.Any() && blobItems.First() is CloudBlobDirectory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe blobItems.OfType<CloudBlobDirectory>().Any()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -69,6 +68,26 @@ public ICloudBlockBlob GetBlockBlobReference(Uri uri) | |||
} | |||
|
|||
public BlobResultSegment ListBlobsSegmented( | |||
string prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter name prefix
differs from the base name path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuationToken, | ||
blobRequestOptions, | ||
operationContext); | ||
return task.Result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use task.Result
in this method, but .GetAwaiter().GetResult()
in ListDirectoryBlobsSegmented()
implementation. is there are reason for it? if not, we should probably use same calls everywhere for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing all task.Result to task.GetAwaiter().GetResult() based on this discussion https://stackoverflow.com/questions/17284517/is-task-result-the-same-as-getawaiter-getresult which says task.Result hides actual exceptions
In reply to: 198671715 [](ancestors = 198671715)
@@ -336,6 +333,20 @@ public void TestDeleteDirectorySecondLevelE2E() | |||
Assert.True(CheckContainerExists(_container)); | |||
} | |||
|
|||
private List<CloudBlockBlob> CreateTempBlobs(string directory, int fileCount = 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use more general return type, e.g. IList
or even IEnumerable
blockBlobs.Add(blockBlob); | ||
} | ||
return blockBlobs; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use Linq? e.g.
private IEnumerable<CloudBlockBlob> CreateTempBlobs(string directory, int fileCount = 3)
{
return Enumerable.Range(0, fileCount).Select(i =>
{
var filePath = directory + '/' + i;
var blockBlob = _container.GetBlockBlobReference(filePath);
UploadFromString(blockBlob, "hello");
Assert.True(CheckBlobExists(blockBlob), "Blob does not exist: " + filePath);
return blockBlob;
});
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
Assert.True(_fileSystem.IsDirectory(PathToFile(dirName))); | ||
Assert.False(_fileSystem.IsDirectory(PathToFile(fakeDirName))); | ||
Assert.False(_fileSystem.IsDirectory(PathToFile(fileName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We test for three different conditions here. Maybe split it into three unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -269,6 +272,21 @@ public void TestCopyFromLocalE2E() | |||
container.DeleteIfExistsAsync().Wait(); | |||
} | |||
|
|||
[Fact(Skip = SkipMessage)] | |||
public void TestIsDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We test for several different conditions here. Maybe split this test into several independent tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Tests whether .IsDirectory() works. | ||
/// </summary> | ||
[Fact(Skip = SkipMessage)] | ||
public void TestIsDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split that test into three independent tests - one for each condition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SKIP e2e tests address cr comments handle path not found expception cr comments fix address cr comments fix cr comments cr comment fixes fix validate children test
[REEF-2036] Add IFileSystem api to check if uri is a directory
Currently, there is no way to check if a location specified by a uri is a directory or not. The only way to check it is by calling fs.GetChildren and checking for an exception. There should be a more graceful way of doing this check by adding an api. This change addresses this.
JIRA:
REEF-2036
Pull request:
This closes 1473