A

Skill 详情

azure-storage-file-datalake-py

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.

来源平台:GitHub
来源标识:sickn33/antigravity-awesome-skills
源文件:原始说明
数据处理 超热门 GitHub 中 风险 下载 1.76万Stars 3.68万 GitHub Copilot
来源平台GitHub
文档版本SKILL.md
热度超热门
排名信号下载 1.76万
概述 安装 文档 下载

快速判断

Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations.

最后校验2026-05-27
来源平台GitHub
安全提示
下载副本ZIP 可用

适合任务

  • 把重复任务整理成可复用的 AI 操作流程。
  • 让 AI 在特定场景下按统一规范执行。
  • 为团队或个人工作流提供可复制的任务说明。

输入与输出

输入:任务目标、上下文材料、文件路径、约束条件或需要处理的内容。

输出:按 Skill 说明生成的文档、代码、检查结果、计划、建议或操作步骤。

示例任务

  • 使用 azure-storage-file-datalake-py 帮我处理当前任务,并说明执行前需要确认的输入。
  • 根据 azure-storage-file-datalake-py 的说明,给我一个安全的使用步骤清单。

安装方式

  1. 下载本站提供的 Skill ZIP 并解压。
  2. 把解压后的 Skill 目录放入当前 AI 工具支持的 skills 目录。
  3. 如需在线查看原始内容,可打开 GitHub 的 SKILL.md

在线原始地址:azure-storage-file-datalake-py/SKILL.md

风险边界

使用前请检查权限、外部依赖和要处理的数据类型。不要把密码、密钥、身份信息或敏感客户资料交给未经确认的 Skill。

SKILL.md 文档介绍

Azure Data Lake Storage Gen2 SDK for Python

Hierarchical file system for big data analytics workloads.

Installation

pip install azure-storage-file-datalake azure-identity

Environment Variables

AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net

Authentication

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

credential = DefaultAzureCredential()
account_url = "https://<account>.dfs.core.windows.net"

service_client = DataLakeServiceClient(account_url=account_url, credential=credential)

Client Hierarchy

| Client | Purpose |

|--------|---------|

| DataLakeServiceClient | Account-level operations |

| FileSystemClient | Container (file system) operations |

| DataLakeDirectoryClient | Directory operations |

| DataLakeFileClient | File operations |

File System Operations

# Create file system (container)
file_system_client = service_client.create_file_system("myfilesystem")

# Get existing
file_system_client = service_client.get_file_system_client("myfilesystem")

# Delete
service_client.delete_file_system("myfilesystem")

# List file systems
for fs in service_client.list_file_systems():
    print(fs.name)

Directory Operations

file_system_client = service_client.get_file_system_client("myfilesystem")

# Create directory
directory_client = file_system_client.create_directory("mydir")

# Create nested directories
directory_client = file_system_client.create_directory("path/to/nested/dir")

# Get directory client
directory_client = file_system_client.get_directory_client("mydir")

# Delete directory
directory_client.delete_directory()

# Rename/move directory
directory_client.rename_directory(new_name="myfilesystem/newname")

File Operations

Upload File

# Get file client
file_client = file_system_client.get_file_client("path/to/file.txt")

# Upload from local file
with open("local-file.txt", "rb") as data:
    file_client.upload_data(data, overwrite=True)

# Upload bytes
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)

# Append data (for large files)
file_client.append_data(data=b"chunk1", offset=0, length=6)
file_client.append_data(data=b"chunk2", offset=6, length=6)
file_client.flush_data(12)  # Commit the data

Download File

file_client = file_system_client.get_file_client("path/to/file.txt")

# Download all content
download = file_client.download_file()
content = download.readall()

# Download to file
with open("downloaded.txt", "wb") as f:
    download = file_client.download_file()
    download.readinto(f)

# Download range
download = file_client.download_file(offset=0, length=100)

Delete File

file_client.delete_file()

List Contents

# List paths (files and directories)
for path in file_system_client.get_paths():
    print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")

# List paths in directory
for path in file_system_client.get_paths(path="mydir"):
    print(path.name)

# Recursive listing
for path in file_system_client.get_paths(path="mydir", recursive=True):
    print(path.name)

File/Directory Properties

# Get properties
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Last modified: {properties.last_modified}")

# Set metadata
file_client.set_metadata(metadata={"processed": "true"})

Access Control (ACL)

# Get ACL
acl = directory_client.get_access_control()
print(f"Owner: {acl['owner']}")
print(f"Permissions: {acl['permissions']}")

# Set ACL
directory_client.set_access_control(
    owner="user-id",
    permissions="rwxr-x---"
)

# Update ACL entries
from azure.storage.filedatalake import AccessControlChangeResult
directory_client.update_access_control_recursive(
    acl="user:user-id:rwx"
)

Async Client

from azure.storage.filedatalake.aio import DataLakeServiceClient
from azure.identity.aio import DefaultAzureCredential

async def datalake_operations():
    credential = DefaultAzureCredential()
    
    async with DataLakeServiceClient(
        account_url="https://<account>.dfs.core.windows.net",
        credential=credential
    ) as service_client:
        file_system_client = service_client.get_file_system_client("myfilesystem")
        file_client = file_system_client.get_file_client("test.txt")
        
        await file_client.upload_data(b"async content", overwrite=True)
        
        download = await file_client.download_file()
        content = await download.readall()

import asyncio
asyncio.run(datalake_operations())

Best Practices

1. Use hierarchical namespace for file system semantics

2. Use append_data + flush_data for large file uploads

3. Set ACLs at directory level and inherit to children

4. Use async client for high-throughput scenarios

5. Use get_paths with recursive=True for full directory listing

6. Set metadata for custom file attributes

7. Consider Blob API for simple object storage use cases

When to Use

This skill is applicable to execute the workflow or actions described in the overview.

Limitations

  • Use this skill only when the task clearly matches the scope described above.
  • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
  • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
建议反馈