-
Notifications
You must be signed in to change notification settings - Fork 576
feat: Add connector-ops skill and SharePoint/OneDrive connector #1722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a connector-ops skill with documentation, templates, and utility scripts, and introduces a full SharePoint/OneDrive fsspec filesystem connector (implementation, JSON schema, tests) plus a dependency and minor .gitignore change. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SharePointFS as SharePointFS (Connector)
participant Graph as Graph API (Office365)
participant Drive as SharePoint/OneDrive Drive
participant fsspec as fsspec (File API)
Client->>SharePointFS: get_fsspec_fs()
Note over SharePointFS: Lazy, fork-safe init
SharePointFS->>Graph: obtain authenticated context (client creds or token)
Graph-->>SharePointFS: authenticated client
SharePointFS->>Graph: resolve drive/site
Graph-->>SharePointFS: drive reference
SharePointFS-->>Client: SharePointFileSystem instance
Client->>fsspec: read_bytes(path)
fsspec->>SharePointFS: _open(path)
SharePointFS->>Graph: fetch item metadata / download range
Graph-->>SharePointFS: metadata / bytes
SharePointFS-->>fsspec: SharePointFile (provides bytes)
fsspec-->>Client: file content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (9)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json (1)
5-8: Consider stricter validation for authentication requirements.Currently only
connectorNameandclient_idare required, but a valid SharePoint connection also needs authentication credentials. The schema should enforce that users provide either:
client_secret(for app-only/client credentials flow, which also requirestenant_id), ORaccess_token(for user-delegated flow)Without this validation, users could create a connector missing required auth fields.
🔎 Suggested approach using oneOf
{ "title": "SharePoint / OneDrive", "type": "object", "allOf": [ { "required": ["connectorName", "client_id"], "properties": { ... } }, { "oneOf": [ { "title": "App-Only (Client Credentials)", "required": ["tenant_id", "client_secret"], "properties": { ... } }, { "title": "User-Delegated (OAuth)", "required": ["access_token"], "properties": { ... } } ] } ] }.claude/skills/connector-ops/assets/templates/test_mock_template.py (1)
208-231: Consider documenting the__new__pattern usage.Using
__new__to create uninitialized instances for testingsql_to_db_mappingis valid but unusual. Consider adding a comment explaining why this pattern is used (to test the method without requiring a full connection configuration).🔎 Suggested documentation
def test_sql_to_db_mapping_string(self): """Test type mapping for strings.""" + # Use __new__ to create instance without calling __init__ + # since sql_to_db_mapping doesn't require connection state connector = {ClassName}.__new__({ClassName}) result = connector.sql_to_db_mapping("test") self.assertIsInstance(result, str).claude/skills/connector-ops/assets/templates/test_integration_template.py (1)
91-133: SQL f-strings are safe here but pattern could mislead.The f-string SQL queries using
test_tableare safe since the variable is hardcoded in the test. However, this pattern could mislead developers adapting the template to use user input in SQL, creating injection vulnerabilities. Consider adding a comment noting this is safe only because the table name is controlled.🔎 Suggested documentation
def test_table_operations(self): """Test create, insert, select, drop operations.""" connector = {ClassName}(self.config) engine = connector.get_engine() + # Note: Using f-string for table name is safe here because test_table + # is a hardcoded constant. Never use f-strings with user input in SQL. test_table = "_unstract_integration_test".claude/skills/connector-ops/assets/templates/queue_template.py (1)
107-116: Redundantping()call intest_credentials.
get_engine()already callsping()at line 108 to test the connection. Callingping()again intest_credentials()at line 130 is redundant.🔎 Proposed simplification
def test_credentials(self) -> bool: """ Test queue credentials. Returns: True if connection successful Raises: ConnectorError: If connection fails """ try: - client = self.get_engine() - client.ping() + self.get_engine() # Already pings during connection return True except Exception as e: raise ConnectorError( f"Connection test failed: {str(e)}", treat_as_user_message=True ) from eAlso applies to: 128-136
.claude/skills/connector-ops/scripts/verify_connector.py (1)
247-256: Path resolution fallback may silently use an incorrect directory.When the script cannot find
unstract/connectorsrelative to itself, it falls back toPath.cwd(). Ifsrc/unstract/connectorsdoesn't exist there either, it exits. However, if it does exist but is a different project, the script would proceed with the wrong codebase.🔎 Proposed improvement
# Find base path (unstract/connectors) script_dir = Path(__file__).parent base_path = script_dir.parent.parent.parent.parent / "unstract/connectors" if not base_path.exists(): # Try relative to current working directory base_path = Path.cwd() if not (base_path / "src/unstract/connectors").exists(): - print("Could not find connectors base path") + print("Could not find connectors base path.") + print("Run this script from the unstract/connectors directory or ensure") + print("the repository structure is correct.") sys.exit(1) + print(f"Warning: Using CWD as base path: {base_path}").claude/skills/connector-ops/references/test_patterns.md (1)
7-23: Add language specifier to fenced code block.The fenced code block showing directory structure should have a language specifier for consistent formatting.
🔎 Proposed fix
-``` +```text tests/ ├── __init__.py ├── test_connectorkit.py # Registry testsunstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
57-65: Consider testing through public interfaces.Lines 63-65 access internal attributes (
_site_url,_tenant_id,_client_id) to verify initialization. While acceptable for unit tests, consider whether testing through public behavior would provide better encapsulation.unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (2)
51-53: Document the upload stub method.The
finalparameter is unused because uploads are handled bySharePointFileSystem.write_bytes(). Consider adding a docstring to clarify this is a required interface method.Suggested documentation
def _upload_chunk(self, final: bool = False) -> bool: - """Upload is handled by write_bytes.""" + """Upload is handled by write_bytes. + + This method is required by AbstractBufferedFile interface but not used + since SharePointFileSystem.write_bytes() handles all uploads directly. + + Args: + final: Unused, required by interface + + Returns: + Always True + """ return True
195-197: Uselogging.exceptionfor better error context.Replace
logging.errorwithlogging.exceptionto automatically include the traceback, which will aid debugging.Proposed fix
return results except Exception as e: - logger.error(f"Error listing path {path}: {e}") + logger.exception(f"Error listing path {path}: {e}") raise
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (3)
backend/uv.lockis excluded by!**/*.lockfrontend/public/icons/connector-icons/SharePoint.pngis excluded by!**/*.pngunstract/connectors/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
.claude/skills/connector-ops/SKILL.md.claude/skills/connector-ops/assets/templates/database_template.py.claude/skills/connector-ops/assets/templates/filesystem_template.py.claude/skills/connector-ops/assets/templates/init_template.py.claude/skills/connector-ops/assets/templates/json_schema_template.json.claude/skills/connector-ops/assets/templates/queue_template.py.claude/skills/connector-ops/assets/templates/test_integration_template.py.claude/skills/connector-ops/assets/templates/test_mock_template.py.claude/skills/connector-ops/references/connector_patterns.md.claude/skills/connector-ops/references/json_schema_examples.md.claude/skills/connector-ops/references/test_patterns.md.claude/skills/connector-ops/scripts/fetch_logo.py.claude/skills/connector-ops/scripts/verify_connector.pyunstract/connectors/pyproject.tomlunstract/connectors/src/unstract/connectors/filesystems/sharepoint/__init__.pyunstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.pyunstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.jsonunstract/connectors/tests/filesystems/test_sharepoint_fs.py
🧰 Additional context used
🧬 Code graph analysis (7)
.claude/skills/connector-ops/assets/templates/test_integration_template.py (3)
.claude/skills/connector-ops/assets/templates/database_template.py (4)
test_credentials(132-153)get_engine(87-130)execute(155-175)sql_to_db_mapping(177-204).claude/skills/connector-ops/assets/templates/filesystem_template.py (1)
test_credentials(123-142).claude/skills/connector-ops/assets/templates/queue_template.py (2)
test_credentials(118-136)get_engine(85-116)
unstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (15)
SharePointFS(386-603)get_id(422-423)get_name(426-427)get_description(430-431)can_read(452-453)can_write(448-449)requires_oauth(456-457)python_social_auth_backend(460-461)get_json_schema(438-445)get_icon(434-435)is_dir_by_metadata(546-555)extract_metadata_file_hash(519-544)extract_modified_date(557-592)get_connector_root_dir(595-603)ls(180-197)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/__init__.py (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (1)
SharePointFS(386-603)
.claude/skills/connector-ops/scripts/verify_connector.py (4)
unstract/connectors/src/unstract/connectors/connectorkit.py (2)
connectors(22-23)get_connectors_list(74-113).claude/skills/connector-ops/assets/templates/database_template.py (1)
get_id(46-47).claude/skills/connector-ops/assets/templates/filesystem_template.py (1)
get_id(51-52).claude/skills/connector-ops/assets/templates/queue_template.py (1)
get_id(44-45)
.claude/skills/connector-ops/assets/templates/queue_template.py (3)
unstract/connectors/src/unstract/connectors/queues/unstract_queue.py (1)
lindex(90-91)unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)unstract/core/src/unstract/core/cache/redis_queue_client.py (2)
lpush(64-74)brpop(133-147)
.claude/skills/connector-ops/assets/templates/filesystem_template.py (2)
unstract/connectors/src/unstract/connectors/filesystems/unstract_file_system.py (1)
UnstractFileSystem(17-353)unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)
.claude/skills/connector-ops/assets/templates/database_template.py (2)
unstract/connectors/src/unstract/connectors/databases/unstract_db.py (1)
UnstractDB(16-340)unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)
🪛 Checkov (3.2.334)
.claude/skills/connector-ops/assets/templates/json_schema_template.json
[medium] 24-25: Basic Auth Credentials
(CKV_SECRET_4)
🪛 markdownlint-cli2 (0.18.1)
.claude/skills/connector-ops/references/test_patterns.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.10)
.claude/skills/connector-ops/assets/templates/test_integration_template.py
17-17: Expected an identifier
(invalid-syntax)
17-17: Expected an identifier
(invalid-syntax)
17-17: Expected an identifier
(invalid-syntax)
17-17: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
17-17: Expected one or more symbol names after import
(invalid-syntax)
25-25: Expected :, found {
(invalid-syntax)
25-25: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
25-26: Expected an expression
(invalid-syntax)
26-26: Unexpected indentation
(invalid-syntax)
218-218: Expected a statement
(invalid-syntax)
218-218: Expected :, found {
(invalid-syntax)
218-218: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
218-219: Expected an expression
(invalid-syntax)
219-219: Unexpected indentation
(invalid-syntax)
244-244: Expected a statement
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/init_template.py
10-10: Expected import, found {
(invalid-syntax)
10-10: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
10-10: Expected one or more symbol names after import
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/test_mock_template.py
15-15: Expected an identifier
(invalid-syntax)
15-15: Expected an identifier
(invalid-syntax)
15-15: Expected an identifier
(invalid-syntax)
15-15: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
15-15: Expected one or more symbol names after import
(invalid-syntax)
19-19: Expected :, found {
(invalid-syntax)
19-20: Expected an expression
(invalid-syntax)
20-20: Unexpected indentation
(invalid-syntax)
233-233: Expected a statement
(invalid-syntax)
.claude/skills/connector-ops/scripts/fetch_logo.py
43-43: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
44-44: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
69-69: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
70-70: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
98-98: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
99-99: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
126-126: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
127-127: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
153-153: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
154-154: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
unstract/connectors/tests/filesystems/test_sharepoint_fs.py
252-252: Do not catch blind exception: Exception
(BLE001)
.claude/skills/connector-ops/scripts/verify_connector.py
134-134: Do not catch blind exception: Exception
(BLE001)
137-137: Do not catch blind exception: Exception
(BLE001)
172-172: Do not catch blind exception: Exception
(BLE001)
192-192: subprocess call: check for execution of untrusted input
(S603)
216-216: subprocess call: check for execution of untrusted input
(S603)
.claude/skills/connector-ops/assets/templates/queue_template.py
21-21: Expected an identifier
(invalid-syntax)
21-22: Expected an expression
(invalid-syntax)
22-22: Unexpected indentation
(invalid-syntax)
96-96: Expected one or more symbol names after import
(invalid-syntax)
99-99: Expected an identifier
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/filesystem_template.py
25-25: Expected an identifier
(invalid-syntax)
25-26: Expected an expression
(invalid-syntax)
26-26: Unexpected indentation
(invalid-syntax)
105-105: Expected a module name
(invalid-syntax)
105-105: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
105-105: Expected one or more symbol names after import
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/database_template.py
21-21: Expected an identifier
(invalid-syntax)
21-22: Expected an expression
(invalid-syntax)
22-22: Unexpected indentation
(invalid-syntax)
95-95: Expected one or more symbol names after import
(invalid-syntax)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
51-51: Unused method argument: final
(ARG002)
106-106: Local variable credentials is assigned to but never used
Remove assignment to unused variable credentials
(F841)
125-128: Avoid specifying long messages outside the exception class
(TRY003)
180-180: Unused method argument: kwargs
(ARG002)
194-194: Consider moving this statement to an else block
(TRY300)
196-196: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
242-242: Unused method argument: kwargs
(ARG002)
253-253: Unused method argument: kwargs
(ARG002)
257-257: Consider moving this statement to an else block
(TRY300)
258-258: Do not catch blind exception: Exception
(BLE001)
266-266: Do not catch blind exception: Exception
(BLE001)
274-274: Do not catch blind exception: Exception
(BLE001)
277-277: Unused method argument: kwargs
(ARG002)
326-326: Unused method argument: kwargs
(ARG002)
334-334: Unused method argument: kwargs
(ARG002)
344-344: Unused method argument: recursive
(ARG002)
344-344: Unused method argument: kwargs
(ARG002)
366-366: Do not catch blind exception: Exception
(BLE001)
499-502: Avoid specifying long messages outside the exception class
(TRY003)
500-500: Use explicit conversion flag
Replace with conversion flag
(RUF010)
512-512: Consider moving this statement to an else block
(TRY300)
514-517: Avoid specifying long messages outside the exception class
(TRY003)
515-515: Use explicit conversion flag
Replace with conversion flag
(RUF010)
534-534: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
595-595: Unused static method argument: kwargs
(ARG004)
🔇 Additional comments (24)
.claude/skills/connector-ops/SKILL.md (2)
1-11: Well-structured skill documentation.The skill document provides comprehensive guidance for connector lifecycle management with clear sections for ADD, REMOVE, and MODIFY operations. The architecture overview and key files table are helpful for understanding the connector ecosystem.
459-467: Helpful operational notes.The important notes section covers critical concerns like fork safety for gRPC, UUID consistency, schema backwards compatibility, and icon naming conventions. These are valuable guardrails for connector development.
.claude/skills/connector-ops/assets/templates/json_schema_template.json (1)
1-68: Good JSON Schema template structure.The template correctly uses
allOfwithoneOfto offer mutually exclusive connection methods (URL vs parameters). Theformat: "password"is appropriately applied to sensitive fields. Placeholders ({display_name},{protocol},{default_port}) are clearly marked for substitution.The Checkov CKV_SECRET_4 warning about line 24-25 is a false positive—the description merely documents the URL format pattern, not actual credentials.
.claude/skills/connector-ops/assets/templates/init_template.py (1)
1-18: Template structure is correct.The template provides a clear pattern for connector
__init__.pyfiles with appropriate placeholders. The docstring documents all substitution points. The Ruff syntax errors are expected false positives since{connector_name}and{ClassName}are intentional placeholders, not valid Python identifiers..claude/skills/connector-ops/references/json_schema_examples.md (1)
590-609: Helpful field format and type reference tables.The reference tables at the end provide a quick lookup for UI rendering formats and JSON Schema types. This is valuable for developers creating new connector schemas.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json (1)
16-20: Good handling of personal vs business account distinction.The
is_personalboolean with clear description helps users understand the difference between OneDrive Personal and SharePoint/OneDrive for Business configurations..claude/skills/connector-ops/assets/templates/test_mock_template.py (1)
1-18: Comprehensive mock test template.The template provides good coverage of connector functionality including static methods, initialization with both config styles, connection handling, credential validation, and query execution. The placeholder documentation is clear.
The Ruff syntax errors are expected false positives since the placeholders are intentional.
.claude/skills/connector-ops/assets/templates/test_integration_template.py (1)
1-25: Well-designed integration test template.The template uses
@unittest.skipUnlesswith environment variable checks appropriately, allowing tests to be skipped in CI while running locally when services are available. The environment variable documentation in the class docstring is helpful..claude/skills/connector-ops/scripts/fetch_logo.py (2)
26-28: Good normalization function.The
normalize_namefunction provides consistent name handling across all logo sources by removing non-alphanumeric characters and lowercasing.
168-208: Well-structured main function with graceful degradation.The script tries multiple sources in priority order and provides helpful guidance when no logo is found. The exit codes (0 for success, 1 for failure) are appropriate.
The Ruff S310 warnings about URL scheme auditing are low risk since all URLs are constructed from hardcoded
https://base URLs combined with normalized service names. This is a developer utility script, not production code handling untrusted input..claude/skills/connector-ops/scripts/verify_connector.py (3)
28-54: LGTM!The directory structure validation is thorough, checking for the connector directory and all required files including the JSON schema.
57-92: LGTM!The metadata validation correctly checks for required fields and ensures
is_activeisTruefor the connector to be registered.
95-140: LGTM!The connector class validation properly checks for required static methods and attempts to invoke them, which will catch common implementation errors early.
.claude/skills/connector-ops/assets/templates/filesystem_template.py (1)
92-121: LGTM!The
get_fsspec_fsimplementation correctly uses lazy initialization with double-checked locking for thread safety and fork safety (importing inside the method). This follows the patterns documented inconnector_patterns.md..claude/skills/connector-ops/assets/templates/database_template.py (1)
87-131: LGTM!The
get_engineimplementation correctly uses lazy imports for fork safety and handles both connection URL and individual parameter modes. The SSL configuration is properly guarded.unstract/connectors/src/unstract/connectors/filesystems/sharepoint/__init__.py (1)
1-11: LGTM!The module correctly exports
SharePointFSand defines the requiredmetadatadictionary with all fields needed for Connectorkit registration (name,version,connector,description,is_active)..claude/skills/connector-ops/references/test_patterns.md (1)
27-173: LGTM!The mock-based test template is comprehensive, covering static methods, ID format validation, JSON schema validation, initialization modes, engine retrieval, credential testing, query execution, and type mapping—all essential areas for connector testing.
.claude/skills/connector-ops/references/connector_patterns.md (1)
1-376: LGTM!Excellent reference documentation covering essential patterns (fork-safe initialization, connection URL vs parameters, SSL/TLS, credential testing) and important anti-patterns (module-level heavy imports, hardcoded credentials, changing connector IDs). The code examples are practical and follow established best practices.
unstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
1-275: Comprehensive test coverage for SharePoint connector.The test suite provides excellent coverage of both unit and integration scenarios:
- Static metadata validation (ID, name, description, capabilities)
- JSON schema completeness
- Multiple initialization modes (client credentials, OAuth, personal OneDrive)
- Metadata extraction helpers (hash, modified date, directory detection)
- Path formatting edge cases
- Integration tests properly gated by environment variables
The bare Exception catch at line 252 is appropriate for gracefully skipping tests when files don't exist.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (5)
344-347: Clarify or implement recursive deletion behavior.The
recursiveparameter is accepted but never used in the implementation. Verify whether the SharePoint API automatically handles recursive deletion, or if this needs explicit implementation.If SharePoint automatically handles recursive deletion when deleting a folder, document this behavior:
def rm(self, path: str, recursive: bool = False, **kwargs: Any) -> None: - """Remove file or directory.""" + """Remove file or directory. + + Args: + path: Path to remove + recursive: Ignored - SharePoint API always recursively deletes folders + **kwargs: Additional arguments (unused) + """ item = self._get_item_by_path(path) item.delete_object().execute_query()Otherwise, implement the check:
def rm(self, path: str, recursive: bool = False, **kwargs: Any) -> None: """Remove file or directory.""" + if not recursive and self.isdir(path): + # Check if directory is empty + contents = self.ls(path, detail=False) + if contents: + raise ValueError(f"Directory {path} is not empty. Use recursive=True") + item = self._get_item_by_path(path) item.delete_object().execute_query()
399-420: Clean initialization and settings management.The lazy initialization pattern with proper locking is well-implemented. Settings are extracted and validated upfront, with the actual filesystem created on-demand.
463-504: Fork-safe and thread-safe lazy initialization.The double-checked locking with
_SHAREPOINT_INIT_LOCKproperly ensures thread-safety, and importing the Office365 library within the lock provides fork-safety. The tenant resolution logic for personal vs. business accounts is correct.
519-592: Robust metadata extraction with graceful fallbacks.The metadata extraction methods (
extract_metadata_file_hash,is_dir_by_metadata,extract_modified_date) handle multiple field names and data types with appropriate fallbacks:
- Hash extraction tries multiple fields (quickXorHash, sha256Hash, cTag, eTag) with proper eTag cleanup
- Date extraction handles both datetime objects and ISO strings with timezone normalization
- Logging provides helpful debugging context when fields are missing
594-603: Appropriate path normalization for SharePoint.The root directory formatting correctly strips leading slashes and adds a trailing slash, which aligns with SharePoint path conventions.
| def sql_to_db_mapping(self, value: Any, column_name: str | None = None) -> str: | ||
| """ | ||
| Map Python types to database types. | ||
|
|
||
| Args: | ||
| value: Python value to map | ||
| column_name: Optional column name hint | ||
|
|
||
| Returns: | ||
| Database type string | ||
| """ | ||
| if value is None: | ||
| return "TEXT" | ||
|
|
||
| if isinstance(value, bool): | ||
| return "BOOLEAN" | ||
| elif isinstance(value, int): | ||
| return "INTEGER" | ||
| elif isinstance(value, float): | ||
| return "DOUBLE PRECISION" | ||
| elif isinstance(value, dict): | ||
| return "JSON" # or JSONB for PostgreSQL | ||
| elif isinstance(value, list): | ||
| return "JSON" | ||
| elif isinstance(value, bytes): | ||
| return "BYTEA" | ||
| else: | ||
| return "TEXT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template missing required abstract methods from UnstractDB.
Based on the UnstractDB base class, this template should also implement:
prepare_multi_column_migration(self, table_name: str, column_name: str) -> str | listget_create_table_base_query(self, table: str) -> str
These are required for table migration and creation operations.
🔎 Proposed additions after `sql_to_db_mapping`
def prepare_multi_column_migration(
self, table_name: str, column_name: str
) -> str | list:
"""
Prepare SQL for column migration.
Args:
table_name: Name of the table
column_name: Name of the column to migrate
Returns:
SQL statement(s) for migration
"""
# Adapt for specific database:
# PostgreSQL: ALTER TABLE ... ADD COLUMN ...
return f"ALTER TABLE {table_name} ADD COLUMN IF NOT EXISTS {column_name}_v2 TEXT"
def get_create_table_base_query(self, table: str) -> str:
"""
Get base CREATE TABLE query.
Args:
table: Table name
Returns:
Base CREATE TABLE SQL
"""
# Adapt for specific database syntax
return f"CREATE TABLE IF NOT EXISTS {table}"🤖 Prompt for AI Agents
In .claude/skills/connector-ops/assets/templates/database_template.py around
lines 177 to 204, the template class is missing two required implementations
from UnstractDB: prepare_multi_column_migration(self, table_name: str,
column_name: str) -> str | list and get_create_table_base_query(self, table:
str) -> str; add these two methods below sql_to_db_mapping with matching
signatures, simple database-agnostic default implementations (e.g.,
prepare_multi_column_migration returns an ALTER TABLE ... ADD COLUMN IF NOT
EXISTS {column_name}_v2 TEXT string or a list of such statements, and
get_create_table_base_query returns a CREATE TABLE IF NOT EXISTS {table} base
string), ensure docstrings match other methods and return types follow the
annotated Union[str, list] where appropriate.
| def extract_metadata_file_hash(self, metadata: dict[str, Any]) -> str | None: | ||
| """ | ||
| Extract unique file hash from fsspec metadata. | ||
|
|
||
| Different storage systems use different keys for file hashes: | ||
| - S3/Minio: ETag | ||
| - Azure: content_md5 | ||
| - GCS: md5Hash | ||
|
|
||
| Args: | ||
| metadata: File metadata from fsspec | ||
|
|
||
| Returns: | ||
| File hash string or None | ||
| """ | ||
| # Try common hash fields | ||
| hash_fields = ["ETag", "etag", "md5Hash", "content_md5", "contentHash"] | ||
|
|
||
| for field in hash_fields: | ||
| if field in metadata: | ||
| value = metadata[field] | ||
| # Remove quotes from ETags | ||
| if isinstance(value, str): | ||
| return value.strip('"') | ||
| return str(value) | ||
|
|
||
| return None | ||
|
|
||
| def is_dir_by_metadata(self, metadata: dict[str, Any]) -> bool: | ||
| """ | ||
| Check if path is a directory from metadata. | ||
|
|
||
| Args: | ||
| metadata: File metadata from fsspec | ||
|
|
||
| Returns: | ||
| True if path is a directory | ||
| """ | ||
| # Different storage systems indicate directories differently | ||
| if metadata.get("type") == "directory": | ||
| return True | ||
| if metadata.get("StorageClass") == "DIRECTORY": | ||
| return True | ||
| if metadata.get("size") == 0 and metadata.get("name", "").endswith("/"): | ||
| return True | ||
|
|
||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template missing extract_modified_date abstract method.
Based on the UnstractFileSystem base class, this template should also implement extract_modified_date(self, metadata: dict[str, Any]) -> datetime | None. This method is required for file sorting operations.
🔎 Proposed addition after `is_dir_by_metadata`
def extract_modified_date(self, metadata: dict[str, Any]) -> datetime | None:
"""
Extract last modified date from fsspec metadata.
Args:
metadata: File metadata from fsspec
Returns:
Datetime object or None if not available
"""
from datetime import datetime, timezone
# Try common date fields
date_fields = ["LastModified", "last_modified", "mtime", "updated", "timeModified"]
for field in date_fields:
if field in metadata:
value = metadata[field]
if isinstance(value, datetime):
return value.replace(tzinfo=timezone.utc) if value.tzinfo is None else value
if isinstance(value, (int, float)):
# Unix timestamp
return datetime.fromtimestamp(value, tz=timezone.utc)
if isinstance(value, str):
try:
return datetime.fromisoformat(value.replace("Z", "+00:00"))
except ValueError:
continue
return None🤖 Prompt for AI Agents
In .claude/skills/connector-ops/assets/templates/filesystem_template.py around
lines 144 to 190, the template class is missing the required abstract method
extract_modified_date(self, metadata: dict[str, Any]) -> datetime | None; add
this method after is_dir_by_metadata that imports datetime and timezone, checks
common date fields (e.g., "LastModified", "last_modified", "mtime", "updated",
"timeModified"), handles values that are datetime (ensure tzinfo is set to UTC
if missing), numeric unix timestamps (convert with datetime.fromtimestamp(...,
tz=timezone.utc)), and ISO8601 strings (try datetime.fromisoformat after
replacing "Z" with "+00:00"), returning None if no valid date found.
| host=self.host, | ||
| port=int(self.port), | ||
| password=self.password or None, | ||
| # db=int(self.database), # For Redis | ||
| # ssl=self.ssl_enabled, | ||
| ) | ||
|
|
||
| # Test connection | ||
| self._client.ping() | ||
|
|
||
| return self._client | ||
|
|
||
| except Exception as e: | ||
| raise ConnectorError( | ||
| f"Failed to connect to {display_name}: {str(e)}", | ||
| treat_as_user_message=True | ||
| ) from e | ||
|
|
||
| def test_credentials(self) -> bool: | ||
| """ | ||
| Test queue credentials. | ||
|
|
||
| Returns: | ||
| True if connection successful | ||
|
|
||
| Raises: | ||
| ConnectorError: If connection fails | ||
| """ | ||
| try: | ||
| client = self.get_engine() | ||
| client.ping() | ||
| return True | ||
| except Exception as e: | ||
| raise ConnectorError( | ||
| f"Connection test failed: {str(e)}", | ||
| treat_as_user_message=True | ||
| ) from e | ||
|
|
||
| def enqueue(self, queue_name: str, message: str) -> Any: | ||
| """ | ||
| Add message to queue. | ||
|
|
||
| Args: | ||
| queue_name: Name of the queue | ||
| message: Message to enqueue | ||
|
|
||
| Returns: | ||
| Result from queue operation | ||
| """ | ||
| try: | ||
| client = self.get_engine() | ||
| # Adapt for specific queue library: | ||
| # Redis: client.lpush(queue_name, message) | ||
| # RabbitMQ: channel.basic_publish(...) | ||
| return client.lpush(queue_name, message) | ||
| except Exception as e: | ||
| raise ConnectorError(f"Failed to enqueue message: {str(e)}") from e | ||
|
|
||
| def dequeue(self, queue_name: str, timeout: int = 5) -> Any: | ||
| """ | ||
| Get and remove message from queue. | ||
|
|
||
| Args: | ||
| queue_name: Name of the queue | ||
| timeout: Timeout in seconds for blocking operation | ||
|
|
||
| Returns: | ||
| Message from queue or None if timeout | ||
| """ | ||
| try: | ||
| client = self.get_engine() | ||
| # Adapt for specific queue library: | ||
| # Redis: client.brpop(queue_name, timeout) | ||
| result = client.brpop(queue_name, timeout=timeout) | ||
| if result: | ||
| return result[1].decode("utf-8") | ||
| return None | ||
| except Exception as e: | ||
| raise ConnectorError(f"Failed to dequeue message: {str(e)}") from e | ||
|
|
||
| def peek(self, queue_name: str) -> Any: | ||
| """ | ||
| View next message without removing it. | ||
|
|
||
| Args: | ||
| queue_name: Name of the queue | ||
|
|
||
| Returns: | ||
| Next message or None if queue empty | ||
| """ | ||
| try: | ||
| client = self.get_engine() | ||
| # Adapt for specific queue library: | ||
| # Redis: client.lindex(queue_name, -1) | ||
| result = client.lindex(queue_name, -1) | ||
| if result: | ||
| return result.decode("utf-8") | ||
| return None | ||
| except Exception as e: | ||
| raise ConnectorError(f"Failed to peek message: {str(e)}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template missing required abstract methods from UnstractQueue.
Based on the UnstractQueue base class (from unstract_queue.py), this template is missing implementations for several abstract methods:
lset(self, queue_name: str, index: int, value: str) -> Nonellen(self, queue_name: str) -> intlindex(self, queue_name: str, index: int) -> Anykeys(self, pattern: str = "*") -> list[str]lrange(self, queue_name: str, start: int, end: int) -> list[Any]dequeue_batch(self, queue_name: str, count: int) -> list[Any]
Connectors generated from this template will fail at runtime due to missing abstract method implementations.
🔎 Proposed additions after the `peek` method
def lset(self, queue_name: str, index: int, value: str) -> None:
"""Set value at index in queue."""
try:
client = self.get_engine()
client.lset(queue_name, index, value)
except Exception as e:
raise ConnectorError(f"Failed to set value at index: {str(e)}") from e
def llen(self, queue_name: str) -> int:
"""Get length of queue."""
try:
client = self.get_engine()
return client.llen(queue_name)
except Exception as e:
raise ConnectorError(f"Failed to get queue length: {str(e)}") from e
def lindex(self, queue_name: str, index: int) -> Any:
"""Get value at index in queue."""
try:
client = self.get_engine()
result = client.lindex(queue_name, index)
if result:
return result.decode("utf-8")
return None
except Exception as e:
raise ConnectorError(f"Failed to get value at index: {str(e)}") from e
def keys(self, pattern: str = "*") -> list[str]:
"""Get keys matching pattern."""
try:
client = self.get_engine()
return [k.decode("utf-8") for k in client.keys(pattern)]
except Exception as e:
raise ConnectorError(f"Failed to get keys: {str(e)}") from e
def lrange(self, queue_name: str, start: int, end: int) -> list[Any]:
"""Get range of values from queue."""
try:
client = self.get_engine()
return [v.decode("utf-8") for v in client.lrange(queue_name, start, end)]
except Exception as e:
raise ConnectorError(f"Failed to get range: {str(e)}") from e
def dequeue_batch(self, queue_name: str, count: int) -> list[Any]:
"""Dequeue multiple messages."""
try:
client = self.get_engine()
results = []
for _ in range(count):
result = client.rpop(queue_name)
if result is None:
break
results.append(result.decode("utf-8"))
return results
except Exception as e:
raise ConnectorError(f"Failed to dequeue batch: {str(e)}") from e🧰 Tools
🪛 Ruff (0.14.10)
21-21: Expected an identifier
(invalid-syntax)
21-22: Expected an expression
(invalid-syntax)
22-22: Unexpected indentation
(invalid-syntax)
96-96: Expected one or more symbol names after import
(invalid-syntax)
99-99: Expected an identifier
(invalid-syntax)
🤖 Prompt for AI Agents
In .claude/skills/connector-ops/assets/templates/queue_template.py lines 1 to
199, the template class is missing implementations for required abstract methods
from UnstractQueue (lset, llen, lindex, keys, lrange, dequeue_batch), causing
instantiation/runtime failures; implement these methods after peek by using
get_engine() to call the underlying client (wrap calls in try/except and raise
ConnectorError on failure), decode bytes to strings where appropriate, return
proper types (int for llen, list[str] or list[Any] for
lrange/dequeue_batch/keys), and ensure dequeue_batch pops up to count items and
stops when empty.
| # Validate required vars | ||
| required = ["host", "database", "user"] | ||
| missing = [k for k in required if not cls.config.get(k)] | ||
| if missing: | ||
| raise unittest.SkipTest( | ||
| f"Missing required env vars: {CONNECTOR_PREFIX}_" + | ||
| ", {CONNECTOR_PREFIX}_".join(missing) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awkward string formatting in skip message.
The f-string concatenation in the SkipTest message is hard to read and may not produce the expected output.
🔎 Suggested fix
missing = [k for k in required if not cls.config.get(k)]
if missing:
+ missing_vars = ", ".join(f"{CONNECTOR_PREFIX}_{m.upper()}" for m in missing)
raise unittest.SkipTest(
- f"Missing required env vars: {CONNECTOR_PREFIX}_" +
- ", {CONNECTOR_PREFIX}_".join(missing)
+ f"Missing required env vars: {missing_vars}"
)Note: Since this is a template, the placeholder {CONNECTOR_PREFIX} complicates the fix. Consider using a simpler format like f"Missing: {missing}" and letting the outer skip message provide context.
🤖 Prompt for AI Agents
In .claude/skills/connector-ops/assets/templates/test_integration_template.py
around lines 50 to 57, the SkipTest message uses an awkward f-string
concatenation that may not render the expected repeated CONNECTOR_PREFIX
prefixes; replace this with a clearer message format — either use a simple
f"Missing required env vars: {missing}" (since this is a template and
CONNECTOR_PREFIX is a placeholder) or build the string explicitly by joining
missing with the prefix repeated for each entry (e.g., map or comprehension) so
the output is readable and correctly shows each missing variable with the
prefix.
| ```json | ||
| { | ||
| "title": "Connector Display Name", | ||
| "type": "object", | ||
| "allOf": [ | ||
| { | ||
| "required": ["connectorName"], | ||
| "properties": { | ||
| "connectorName": { | ||
| "type": "string", | ||
| "title": "Name of the connector", | ||
| "description": "A unique name to identify this connector instance" | ||
| } | ||
| } | ||
| }, | ||
| { | ||
| // Authentication and connection options here | ||
| } | ||
| ] | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid JSON comment in basic structure example.
Line 25 contains a JavaScript-style comment (// Authentication...) which is not valid JSON syntax. While this is for documentation purposes, it could mislead developers if they copy-paste the snippet.
🔎 Suggested fix
Replace the comment with a placeholder object or describe it outside the code block:
{
- // Authentication and connection options here
+ "description": "Authentication and connection options here (see patterns below)"
}Or use a documentation note outside the JSON block.
🤖 Prompt for AI Agents
In .claude/skills/connector-ops/references/json_schema_examples.md around lines
9 to 29, the JSON snippet contains an invalid JavaScript-style comment ("//
Authentication and connection options here") which breaks JSON; replace that
comment with a valid JSON placeholder object or property (for example an empty
object or an explicit "authentication": {} or "connectionOptions": {}) or remove
the comment entirely and add the explanatory note outside the code block so the
snippet remains valid JSON for copy-paste.
| if content.startswith(b"<svg") or b"<svg" in content[:500]: | ||
| svg_path = output_path.rsplit(".", 1)[0] + ".svg" | ||
| with open(svg_path, "wb") as f: | ||
| f.write(content) | ||
| print(f"[WorldVectorLogo] Logo saved to: {svg_path}") | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case: output_path without extension.
If output_path doesn't contain a ., rsplit(".", 1)[0] returns the full path and appending .svg would create an unexpected filename. Consider using pathlib for safer path manipulation.
🔎 Suggested fix using pathlib
+from pathlib import Path
+
def fetch_worldvectorlogo(name: str, output_path: str) -> bool:
...
- svg_path = output_path.rsplit(".", 1)[0] + ".svg"
+ svg_path = str(Path(output_path).with_suffix(".svg"))Apply the same pattern to fetch_simple_icons (line 74) and fetch_devicon (line 102).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if content.startswith(b"<svg") or b"<svg" in content[:500]: | |
| svg_path = output_path.rsplit(".", 1)[0] + ".svg" | |
| with open(svg_path, "wb") as f: | |
| f.write(content) | |
| print(f"[WorldVectorLogo] Logo saved to: {svg_path}") | |
| return True | |
| if content.startswith(b"<svg") or b"<svg" in content[:500]: | |
| svg_path = str(Path(output_path).with_suffix(".svg")) | |
| with open(svg_path, "wb") as f: | |
| f.write(content) | |
| print(f"[WorldVectorLogo] Logo saved to: {svg_path}") | |
| return True |
🤖 Prompt for AI Agents
In .claude/skills/connector-ops/scripts/fetch_logo.py around lines 48 to 53, the
code builds an SVG output filename by rsplit on ".", which fails when
output_path has no extension; replace this manual string operation with
pathlib.Path to construct the .svg filename safely (e.g.,
Path(output_path).with_suffix(".svg")), write the content to that path, and
return True; also apply the same pathlib-based fix to the similar code in
fetch_simple_icons (around line 74) and fetch_devicon (around line 102).
| Provide summary: | ||
| ``` | ||
| ## Connector Added: {Name} | ||
| **Files created:** | ||
| - `src/unstract/connectors/{type}/{name}/__init__.py` | ||
| - `src/unstract/connectors/{type}/{name}/{name}.py` | ||
| - `src/unstract/connectors/{type}/{name}/static/json_schema.json` | ||
| - `tests/{type}/test_{name}.py` | ||
| - `tests/{type}/test_{name}_integration.py` | ||
| - `frontend/public/icons/connector-icons/{Name}.png` | ||
| **Dependencies added:** | ||
| - `{package}~={version}` | ||
| **Verification:** | ||
| - Syntax check: PASSED | ||
| - Mock tests: PASSED (X tests) | ||
| - Test suite: PASSED | ||
| **To run integration tests:** | ||
| ```bash | ||
| export {CONNECTOR}_HOST=your_host | ||
| export {CONNECTOR}_USER=your_user | ||
| export {CONNECTOR}_PASSWORD=your_password | ||
| cd unstract/connectors && python -m pytest tests/{type}/test_{name}_integration.py -v | ||
| ``` | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested code block syntax issue in the report template.
The report template at line 274-279 contains a nested code block (bash inside the markdown code block), which creates malformed markdown. The inner backticks will be interpreted as the end of the outer block.
🔎 Suggested fix
Use indentation or a different approach to show the bash example:
**To run integration tests:**
-```bash
-export {CONNECTOR}_HOST=your_host
-export {CONNECTOR}_USER=your_user
-export {CONNECTOR}_PASSWORD=your_password
-cd unstract/connectors && python -m pytest tests/{type}/test_{name}_integration.py -v
-```
+
+ export {CONNECTOR}_HOST=your_host
+ export {CONNECTOR}_USER=your_user
+ export {CONNECTOR}_PASSWORD=your_password
+ cd unstract/connectors && python -m pytest tests/{type}/test_{name}_integration.py -vAlternatively, escape the inner backticks or use a different delimiter.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Provide summary: | |
| ``` | |
| ## Connector Added: {Name} | |
| **Files created:** | |
| - `src/unstract/connectors/{type}/{name}/__init__.py` | |
| - `src/unstract/connectors/{type}/{name}/{name}.py` | |
| - `src/unstract/connectors/{type}/{name}/static/json_schema.json` | |
| - `tests/{type}/test_{name}.py` | |
| - `tests/{type}/test_{name}_integration.py` | |
| - `frontend/public/icons/connector-icons/{Name}.png` | |
| **Dependencies added:** | |
| - `{package}~={version}` | |
| **Verification:** | |
| - Syntax check: PASSED | |
| - Mock tests: PASSED (X tests) | |
| - Test suite: PASSED | |
| **To run integration tests:** | |
| ```bash | |
| export {CONNECTOR}_HOST=your_host | |
| export {CONNECTOR}_USER=your_user | |
| export {CONNECTOR}_PASSWORD=your_password | |
| cd unstract/connectors && python -m pytest tests/{type}/test_{name}_integration.py -v | |
| ``` | |
| ``` | |
| Provide summary: |
🤖 Prompt for AI Agents
.claude/skills/connector-ops/SKILL.md lines 253-280: the markdown includes a
fenced code block that itself contains a fenced bash snippet, causing nested
backticks and malformed markdown; replace the inner fenced snippet with either
an indented code block (4-space or tab indentation for each line) or a plain
indented bash example, or escape the inner backticks so the outer fence remains
intact, ensuring the "To run integration tests:" section displays the commands
correctly without nested triple-backticks.
| "boxfs==0.2.1", # For Box | ||
| "gcsfs==2024.10.0", # For GoogleCloudStorage | ||
| "adlfs~=2024.7.0", # For AzureCloudStorage | ||
| "Office365-REST-Python-Client~=2.6.0", # For SharePoint/OneDrive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update dependency specification to latest stable version.
The latest stable release of Office365-REST-Python-Client is 2.6.2, released on May 11, 2025. While the current specification ~=2.6.0 will accept patch updates including 2.6.2, use ~=2.6.2 for clarity and to match the latest stable release tested with this codebase.
🤖 Prompt for AI Agents
In unstract/connectors/pyproject.toml around line 36, the dependency pin for
Office365-REST-Python-Client is set to "~=2.6.0"; update it to "~=2.6.2" to
explicitly target the latest stable release tested with this codebase—edit the
version specifier on that line to "~=2.6.2" and run dependency resolution (e.g.,
poetry lock / pip-compile) to ensure the lockfile and environment reflect the
updated version.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
Show resolved
Hide resolved
chore: remove claude artifacts from gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
.claude/skills/connector-ops/assets/templates/filesystem.py.template (1)
144-190: LGTM for existing methods; missingextract_modified_datealready flagged.The
extract_metadata_file_hashandis_dir_by_metadatamethods handle multiple storage backend patterns appropriately. The missingextract_modified_dateabstract method was already identified in a previous review..claude/skills/connector-ops/assets/templates/database.py.template (1)
177-204: Previous concern about missing abstract methods may still apply.As noted in an earlier review, this template may be missing required abstract methods (
prepare_multi_column_migrationandget_create_table_base_query) from theUnstractDBbase class. If these are indeed required, connectors generated from this template will fail at instantiation.Verify whether these methods are abstract in
UnstractDB:#!/bin/bash # Check if prepare_multi_column_migration and get_create_table_base_query are abstract methods rg -nP -A3 '@(abc\.)?abstractmethod' unstract/connectors/src/unstract/connectors/databases/unstract_db.py | rg -A3 '(prepare_multi_column_migration|get_create_table_base_query)'.claude/skills/connector-ops/assets/templates/queue.py.template (1)
1-199: Previous concern about missing abstract methods may still apply.As noted in an earlier review, this template may be missing several required abstract methods from
UnstractQueue:lset,llen,lindex,keys,lrange, anddequeue_batch. If these are indeed abstract requirements, connectors generated from this template will fail at instantiation.Verify whether these methods are abstract in
UnstractQueue:#!/bin/bash # Check if the mentioned methods are abstract in UnstractQueue base class rg -nP -A3 '@(abc\.)?abstractmethod' unstract/connectors/src/unstract/connectors/queues/unstract_queue.py | rg -A3 '(lset|llen|lindex|keys|lrange|dequeue_batch)'.claude/skills/connector-ops/SKILL.md (1)
274-279: Nested code block syntax issue remains unresolved.As noted in a previous review, the bash code block nested inside the report template will cause malformed markdown rendering.
🧹 Nitpick comments (2)
.claude/skills/connector-ops/assets/templates/filesystem.py.template (1)
66-74: Specify explicit encoding when opening files.The
open()call should specifyencoding="utf-8"to avoid potential issues on systems with different default encodings, and it's a best practice for reading JSON files.🔎 Proposed fix
@staticmethod def get_json_schema() -> str: schema_path = os.path.join( os.path.dirname(__file__), "static", "json_schema.json" ) - with open(schema_path, "r") as f: + with open(schema_path, "r", encoding="utf-8") as f: return f.read().claude/skills/connector-ops/SKILL.md (1)
32-40: Add language identifiers to fenced code blocks for better rendering.Several fenced code blocks throughout the document lack language identifiers (
bash,text, ormarkdown), which can affect syntax highlighting and rendering in some viewers.🔎 Examples of affected locations
- Line 32: Directory structure block (use
textorbash)- Line 254: Report template block (use
textormarkdown)- Line 280: Closing fence of report template
- Line 333: Report template for REMOVE operation (use
textormarkdown)- Line 416: Report template for MODIFY operation (use
textormarkdown)Also applies to: 254-256, 280-281, 333-335, 416-417
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (8)
.claude/skills/connector-ops/SKILL.md.claude/skills/connector-ops/assets/templates/database.py.template.claude/skills/connector-ops/assets/templates/filesystem.py.template.claude/skills/connector-ops/assets/templates/init.py.template.claude/skills/connector-ops/assets/templates/queue.py.template.claude/skills/connector-ops/assets/templates/test_integration.py.template.claude/skills/connector-ops/assets/templates/test_mock.py.template.gitignore
💤 Files with no reviewable changes (1)
- .gitignore
✅ Files skipped from review due to trivial changes (1)
- .claude/skills/connector-ops/assets/templates/test_integration.py.template
🧰 Additional context used
🪛 Checkov (3.2.334)
.claude/skills/connector-ops/assets/templates/test_mock.py.template
[medium] 33-34: Basic Auth Credentials
(CKV_SECRET_4)
🪛 markdownlint-cli2 (0.18.1)
.claude/skills/connector-ops/SKILL.md
32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
254-254: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
280-280: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
333-333: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
416-416: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (8)
.claude/skills/connector-ops/assets/templates/filesystem.py.template (4)
1-23: LGTM!The docstring clearly documents all placeholders, and the imports are appropriate for the template's functionality.
32-48: Good pattern: Lazy initialization for fork safety.The approach of storing settings in
__init__without initializing clients, combined with the thread lock for lazy initialization, correctly avoids gRPC/Celery fork issues.
92-121: LGTM!The double-checked locking pattern with thread safety is correctly implemented, and the dynamic import inside the method properly handles fork safety for Celery workers.
123-142: LGTM!The credential test approach of listing the bucket/root is a straightforward and effective way to validate access.
.claude/skills/connector-ops/assets/templates/init.py.template (1)
1-18: LGTM! Clean and consistent template structure.The template provides a clear, minimal structure for connector initialization modules with proper metadata exposure. The placeholder documentation is helpful for template users.
.claude/skills/connector-ops/assets/templates/database.py.template (1)
87-130: Good fork-safe connection pattern.The lazy import and comprehensive error handling with
ConnectorErrorprovide robust connection management. The support for both connection URL and individual parameter modes offers flexibility..claude/skills/connector-ops/assets/templates/test_mock.py.template (1)
1-234: Comprehensive test template with proper mocking patterns.The template provides excellent coverage across all connector capabilities: static metadata, initialization modes, connection handling, credential validation, query execution, and type mapping. The use of
@patchdecorators and mock objects follows best practices for unit testing..claude/skills/connector-ops/assets/templates/queue.py.template (1)
85-116: Good connection management with client pooling.The lazy initialization pattern with
_clientcaching and fork-safe imports provides efficient connection reuse. The test ping before returning the client ensures connection validity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (3)
188-190: Uselogging.exceptionfor better error context.When logging before re-raising an exception, use
logging.exceptioninstead oflogging.errorto automatically include the stack trace.🔎 Proposed fix
except Exception as e: - logger.error(f"Error listing path {path}: {e}") + logger.exception(f"Error listing path {path}: {e}") raise
337-340: Consider validatingrecursiveparameter.The
recursiveparameter is accepted but not validated. Ifrecursive=Falseis passed and the path is a non-empty directory, the delete may fail or unexpectedly succeed depending on the SharePoint API behavior. Consider validating this parameter or documenting the behavior.
526-532: Minor optimization: Usedict.getfor cleaner key checking.The conditional
if field in metadata and metadata[field]:can be simplified usingdict.get.🔎 Proposed fix
for field in hash_fields: - if field in metadata and metadata[field]: - value = metadata[field] + if value := metadata.get(field): if isinstance(value, str): # Remove quotes and version info from eTags return value.strip('"').split(",")[0] return str(value)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
🧰 Additional context used
🧬 Code graph analysis (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (2)
unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)unstract/connectors/src/unstract/connectors/filesystems/unstract_file_system.py (1)
UnstractFileSystem(17-353)
🪛 Ruff (0.14.10)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
51-51: Unused method argument: final
(ARG002)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
173-173: Unused method argument: kwargs
(ARG002)
187-187: Consider moving this statement to an else block
(TRY300)
189-189: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
235-235: Unused method argument: kwargs
(ARG002)
246-246: Unused method argument: kwargs
(ARG002)
250-250: Consider moving this statement to an else block
(TRY300)
251-251: Do not catch blind exception: Exception
(BLE001)
259-259: Do not catch blind exception: Exception
(BLE001)
267-267: Do not catch blind exception: Exception
(BLE001)
270-270: Unused method argument: kwargs
(ARG002)
319-319: Unused method argument: kwargs
(ARG002)
327-327: Unused method argument: kwargs
(ARG002)
337-337: Unused method argument: recursive
(ARG002)
337-337: Unused method argument: kwargs
(ARG002)
359-359: Do not catch blind exception: Exception
(BLE001)
492-495: Avoid specifying long messages outside the exception class
(TRY003)
493-493: Use explicit conversion flag
Replace with conversion flag
(RUF010)
505-505: Consider moving this statement to an else block
(TRY300)
507-510: Avoid specifying long messages outside the exception class
(TRY003)
508-508: Use explicit conversion flag
Replace with conversion flag
(RUF010)
527-527: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
588-588: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (13)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (13)
1-28: LGTM! Clean module structure with appropriate thread safety.The module header, imports, and global initialization lock are well-organized. The thread-safe initialization pattern using
_SHAREPOINT_INIT_LOCKis appropriate for preventing concurrent client creation issues.
51-53: Unusedfinalparameter is acceptable.The
finalparameter is unused but likely required by theAbstractBufferedFileinterface. This is acceptable and doesn't need to be changed.
91-125: Good thread-safe initialization pattern.The double-checked locking pattern is correctly implemented for lazy initialization of the SharePoint context. The support for both client credentials and OAuth token flows is well-structured.
153-171: Path normalization logic is correct.The path handling correctly normalizes empty and root paths to the "root" string expected by the SharePoint API.
246-268: Exception handling in existence checks is appropriate.The
exists,isdir, andisfilemethods intentionally catch all exceptions and returnFalse, which is the expected behavior for these predicate methods. The unusedkwargsparameters are likely required by the fsspec interface.
270-302: Directory creation logic is correct.The
mkdirmethod properly handles parent directory creation whencreate_parents=True, and_create_foldercorrectly uses the Office365 API to create folders.
304-335: File operations implemented correctly.The file read/write operations properly interface with the SharePoint API. Unused
kwargsparameters are acceptable as they're likely required by the fsspec interface.
346-376: Walk implementation is correct.The recursive directory traversal properly handles
maxdepthand gracefully skips inaccessible directories by catching exceptions. This is the expected behavior for a walk operation.
414-454: Connector metadata methods are well-defined.The static methods provide appropriate metadata for the connector. The clarification that
requires_oauthreturnsFalsebecause both OAuth and client credentials are supported is helpful.
464-497: Lazy initialization with proper thread safety.The double-checked locking with the global
_SHAREPOINT_INIT_LOCKensures thread-safe initialization. The logic for determining the tenant ("consumers" for personal, tenant_id or "common" for business) is correct.
499-510: Credential testing approach is sound.Testing credentials by attempting to list the root directory is an effective way to verify both authentication and basic access permissions.
550-585: Robust date extraction with proper timezone handling.The
extract_modified_datemethod correctly tries multiple date fields and ensures all returned datetimes are timezone-aware and converted to UTC. The handling of ISO format with "Z" suffix is appropriate.
587-596: Root directory normalization is correct for SharePoint.The implementation correctly removes leading slashes and adds trailing slashes, which aligns with SharePoint path conventions. This differs from other filesystems and is appropriate.
| def _fetch_range(self, start: int, end: int) -> bytes: | ||
| """Fetch a range of bytes from the file.""" | ||
| if self._file_content is None: | ||
| self._file_content = self.fs._download_file(self.path) | ||
| return self._file_content[start:end] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential performance issue: Downloads entire file on first access.
The _fetch_range method downloads the entire file content into memory on the first range request, regardless of the requested range size. For large files where only small portions are read, this could cause unnecessary memory usage and slow performance.
Consider implementing true range-based downloads using the Microsoft Graph API's range request capabilities if the Office365 library supports it.
🤖 Prompt for AI Agents
In
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
around lines 45 to 49, _fetch_range currently downloads the entire file into
memory on first access which wastes memory and is slow for large files; change
it to perform true range downloads instead: use the Office365/Microsoft Graph
API range download (or the library's equivalent) to request only bytes
start..end, falling back to a small cached window strategy or existing
full-download only if range requests are unsupported; ensure the method returns
exactly the requested bytes and that any temporary buffering is bounded and
released promptly.
gaya3-zipstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
connector-ops)Why
connector-opsskill provides Claude with structured guidance for developing, testing, and maintaining connectors in the Unstract ecosystemHow
Connector-Ops Skill
.claude/skills/connector-ops/SKILL.mdSharePoint Connector
SharePointFSclass extendingConnectorBaseget_fsspec_fs(),test_credentials(),get_connector_root_dir()Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
No - This PR only adds new functionality:
.claude/directory and only affects Claude Code sessionsDatabase Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
office365-rest-python-client>=2.5.0- Added tounstract/connectors/pyproject.tomlNotes on Testing
pytest unstract/connectors/tests/filesystems/test_sharepoint_fs.pyScreenshots
N/A - Backend/skill changes only
Checklist
I have read and understood the Contribution Guidelines.
🤖 Generated with Claude Code