Welcome to hbp_archive’s documentation!¶
Warning
The data stored during the Human Brain Project are no longer accessible using this software. You should instead use the ebrains_storage project to access data through the EBRAINS Data Proxy (Bucket) API.
A high-level API for interacting with the Human Brain Project archival storage at CSCS.
Author: Andrew Davison (CNRS), Shailesh Appukuttan (CNRS) and Eszter Agnes Papp (University of Oslo)
License: Apache License, Version 2.0, see LICENSE.txt
Documentation: https://hbp-archive.readthedocs.io
Installation:
pip install hbp_archive
Example Usage¶
from hbp_archive import Container, PublicContainer, Project, Archive
# Working with a public container
container = PublicContainer("https://object.cscs.ch/v1/AUTH_id/my_container")
files = container.list()
local_file = container.download("README.txt")
print(container.read("README.txt"))
number_of_files = container.count()
size_in_MB = container.size("MB")
# Working with a private container
container = Container("MyContainer", username="xyzabc") # you will be prompted for your password
files = container.list()
local_file = container.download("README.txt", overwrite=True) # default is not to overwrite existing files
print(container.read("README.txt"))
number_of_files = container.count()
size_in_MB = container.size("MB")
container.move("my_file.dat", "a_subdirectory", "new_name.dat") # move/rename file within a container
# Reading a file directly, without downloading it
with container.open("my_data.txt") as fp:
data = np.loadtxt(fp)
# Working with a project
my_proj = Project('MyProject', username="xyzabc")
container = my_proj.get_container("MyContainer")
# Listing all your projects
archive = Archive(username="xyzabc")
projects = archive.projects
container = archive.find_container("MyContainer") # will search through all projects
Regarding CSCS Authentication¶
The Python Client attempts to simplify the CSCS authentication process. The users have the following options (in order of priority):
Setting an environment variable named
CSCS_PASSwith your CSCS password. On Linux, this can be done as:export CSCS_PASS='putyourpasswordhere'Environment variables set like this are only stored temporally. When you exit the running instance of bash by exiting the terminal, they get discarded. To save this permanentally, write the above command into ~/.bashrc or ~/.profile (you might need to reload these files by, for example,
source ~/.bashrc)Enter your CSCS password when prompted by the Python Client.
File¶
- class hbp_archive.File(name, bytes, content_type, hash, last_modified, container=None)[source]¶
A representation of a file in a container.
The following actions can be performed:
Action
Method
Get directory name
Get file name
Download a file
Read contents of a file
Move a file
Rename a file
Copy a file
Delete a file
Get size of file
- property dirname¶
Returns the directory name from file path.
- Returns:
Directory path of file.
- Return type:
string
- property basename¶
Returns the file name from file path.
- Returns:
Name of file.
- Return type:
string
- download(local_directory, with_tree=True, overwrite=False)[source]¶
Download this file to a local directory.
- Parameters:
local_directory (string) – Local directory path where file is to be saved.
with_tree (boolean, optional) – Specify if directory structure of file is to be retained.
overwrite (boolean, optional) – Specify if any already existing file should be overwritten.
- Returns:
Path of file created inside specified local directory.
- Return type:
string
- read(decode='utf-8', accept=[])[source]¶
Read and return the contents of this file in the container.
- Parameters:
file_path (string) – Path of file to be retrieved.
decode (string, optional) – Files containing text will be decoded using specified encoding (default: ‘utf-8’). To prevent any attempt at decoding, set decode=False.
accept (boolean, optional) – To force decoding, put the expected content type in accept.
- Returns:
Contents of the specified file.
- Return type:
string (unicode)
- move(target_directory, new_name=None, overwrite=False)[source]¶
Move this file to the specified directory.
- Parameters:
target_directory (string) – Target directory where the file is to be moved.
new_name (string, optional) – New name to be assigned to file (including extension, if any).
overwrite (boolean, optional) – Specify if any already existing file should be overwritten.
- rename(new_name, overwrite=False)[source]¶
Rename this file within the source directory.
- Parameters:
new_name (string) – New name to be assigned to file (including extension, if any).
overwrite (boolean, optional) – Specify if any already existing file should be overwritten.
- copy(target_directory, new_name=None, overwrite=False)[source]¶
Copy this file to specified directory.
- Parameters:
target_directory (string) – Target directory where the file is to be copied.
new_name (string, optional) – New name to be assigned to file (including extension, if any).
overwrite (boolean, optional) – Specify if any already existing file at target location should be overwritten.
Container¶
- class hbp_archive.Container(container, username, token=None, project=None)[source]¶
A representation of a CSCS storage container. Can be used to operate both public and private CSCS containers. A CSCS account is needed to use this class.
The following actions can be performed:
Action
Method
Get metadata about the container
Get url if container is public
List all files in container
Return a file from given path
Get number of files in container
Get total size of data in container
Upload file(s) to container
Download a file from container
Read contents of file in container
Copy a file in container
Move a file in container
Delete a file in container
Copy a directory in container
Move a directory in container
Delete a directory in container
List users with access to container
Grant container access to user
Revoke container access from user
- property metadata¶
Metadata about the container.
- Returns:
Dictionary with metadata about the container.
- Return type:
dict
- property public_url¶
Get url if container is public.
- Returns:
URL to access public container; returns None for private containers.
- Return type:
string
- list(dir_path=None, content_type=None, newer_than=None, older_than=None, contains_substring=None, extension=None)[source]¶
List all files in the container.
- Parameters:
dir_path (string) – base directory of files to be listed, default is set to root directory.
content_type (string) – content_type of files to be listed.
newer_than (datetime) – start timestamp for files to be listed.
older_than (datetime) – end timestamp for files to be listed.
contains_substring (string) – substring to be matched for files to be listed.
extension (string) – extension to be matched for files to be listed.
- Returns:
List of hbp_archive.File objects existing in container.
- Return type:
list
- get(file_path)[source]¶
Return a File object for the file at the given path.
- Parameters:
file_path (string) – Path of file to be retrieved.
- Returns:
Requested hbp_archive.File object from container.
- Return type:
hbp_archive.File
- count()[source]¶
Number of files in the container
- Returns:
Count of number of files in the container.
- Return type:
int
- size(units='bytes')[source]¶
Total size of all data in the container
- Parameters:
units (string) – Requested units for output. Options: ‘bytes’ (default), ‘kB’, ‘MB’, ‘GB’, ‘TB’
- Returns:
Total size of all data in the container in requested units.
- Return type:
float
- upload(local_paths, remote_directory='', overwrite=False)[source]¶
Upload file(s) to the container.
- Parameters:
local_paths (string, list of strings) – Local path of file(s) to be uploaded.
remote_directory (string, optional) – Remote directory path where data is to be uploaded. Default is root directory.
overwrite (boolean, optional) – Specify if any already existing file at target should be overwritten.
- Returns:
List of strings indicating file paths created on container.
- Return type:
list
Note
Using the command-line “swift upload” will likely be faster since it uses a pool of threads to perform multiple uploads in parallel. It is thus recommended for bulk uploads.
- download(file_paths, local_directory='.', with_tree=True, overwrite=False)[source]¶
Download a file from the container.
- Parameters:
file_paths (string, list of strings) – Path of file(s) to be downloaded.
local_directory (string, optional) – Local directory path where file is to be saved.
with_tree (boolean, optional) – Specify if directory structure of file is to be retained.
overwrite (boolean, optional) – Specify if any already existing file should be overwritten.
- Returns:
Path of file created inside specified local directory.
- Return type:
string
- read(file_path, decode='utf-8', accept=[])[source]¶
Read and return the contents of a file in the container.
- Parameters:
file_path (string) – Path of file to be retrieved.
decode (string, optional) – Files containing text will be decoded using specified encoding (default: ‘utf-8’). To prevent any attempt at decoding, set decode=False.
accept (boolean, optional) – To force decoding, put the expected content type in accept.
- Returns:
Contents of the specified file.
- Return type:
string (unicode)
- copy(file_path, target_directory, new_name=None, overwrite=False)[source]¶
Copy a file to the specified directory.
- Parameters:
file_path (string) – Path of file to be copied.
target_directory (string) – Target directory where the file is to be copied.
new_name (string, optional) – New name to be assigned to file (including extension, if any).
overwrite (boolean, optional) – Specify if any already existing file should be overwritten.
- move(file_path, target_directory, new_name=None, overwrite=False)[source]¶
Move a file to the specified directory.
- Parameters:
file_path (string) – Path of file to be moved.
target_directory (string) – Target directory where the file is to be moved.
new_name (string, optional) – New name to be assigned to file (including extension, if any).
overwrite (boolean, optional) – Specify if any already existing file should be overwritten.
- delete(file_path)[source]¶
Delete the specified file.
- Parameters:
file_path (string) – Path of file to be deleted.
- copy_directory(directory_path, target_directory, new_name=None, overwrite=False)[source]¶
- Copy a directory to the specified directory location.
The original tree structure of the directory will be maintained at the target location.
- Parameters:
directory_path (string) – Path of directory to be copied.
target_directory (string) – Path of target directory where specified directory is to be copied.
new_name (string, optional) – New name to be assigned to directory.
overwrite (boolean, optional) – Specify if any already existing files at target location should be overwritten. If False (default value), then only non-conflicting files will be copied over.
- move_directory(directory_path, target_directory, new_name=None, overwrite=False)[source]¶
- Move a directory to the specified directory location.
Can also be used to rename a directory. The original tree structure of the directory will be maintained at the target location.
- Parameters:
directory_path (string) – Path of directory to be copied.
target_directory (string) – Path of target directory where specified directory is to be copied.
new_name (string, optional) – New name to be assigned to directory.
overwrite (boolean, optional) – Specify if any already existing files at target location should be overwritten. If False (default value), then only non-conflicting files will be copied over.
- delete_directory(directory_path)[source]¶
Delete the specified directory (and its contents).
- Parameters:
directory_path (string) – Path of directory to be deleted.
- access_control(show_usernames=True)[source]¶
List the users that have access to this container.
- Parameters:
show_usernames (boolean, optional) – default is True
- Returns:
Dictionary with keys ‘read’ and ‘write’; each having a value in the form of a list of usernames
- Return type:
dict
- grant_access(username, mode='read')[source]¶
Give read or write access to the given user.
- Parameters:
username (string) – username of user to be granted access; set to ‘PUBLIC’ to give public read-only access (no password required)
mode (string, optional) – the access permission to be granted: ‘read’/’write’; default = ‘read’
Note
Use restricted to Superusers/Operators.
- revoke_access(username, mode='read')[source]¶
Remove read or write access from the given user.
- Parameters:
username (string) – username of user to be revoked access; set to ‘PUBLIC’ to make a container private
mode (string, optional) – the access permission to be revoked: ‘read’/’write’; default = ‘read’
Note
Use restricted to Superusers/Operators.
PublicContainer¶
- class hbp_archive.PublicContainer(url)[source]¶
A representation of a public CSCS storage container. Can be used to operate only public CSCS containers. A CSCS account is not needed to use this class.
The following actions can be performed:
Action
Method
List all files in container
Return a file from given path
Get number of files in container
Get total size of data in container
Download a file from container
Read contents of file in container
Note
This class only permits read-only operations. For other features, you may access a public container via the
Containerclass.- list(dir_path=None, content_type=None, newer_than=None, older_than=None, contains_substring=None, extension=None, refresh=False)[source]¶
List all files in the container.
- Parameters:
dir_path (string) – base directory of files to be listed, default is set to root directory.
content_type (string) – content_type of files to be listed.
newer_than (datetime) – start timestamp for files to be listed.
older_than (datetime) – end timestamp for files to be listed.
contains_substring (string) – substring to be matched for files to be listed.
extension (string) – extension to be matched for files to be listed.
refresh (boolean) – to force refreshing, in case contents have changed.
- Returns:
List of hbp_archive.File objects existing in container.
- Return type:
list
- get(file_path)[source]¶
Return a File object for the file at the given path.
- Parameters:
file_path (string) – Path of file to be retrieved.
- Returns:
Requested hbp_archive.File object from container.
- Return type:
hbp_archive.File
- count()[source]¶
Number of files in the container.
- Returns:
Count of number of files in the container.
- Return type:
int
- size(units='bytes')[source]¶
Total size of all data in the container.
- Parameters:
units (string) – Requested units for output. Options: ‘bytes’ (default), ‘kB’, ‘MB’, ‘GB’, ‘TB’
- Returns:
Total size of all data in the container in requested units.
- Return type:
float
- download(file_path, local_directory='.', with_tree=True, overwrite=False)[source]¶
Download a file from the container.
- file_pathstring
Path of file to be downloaded.
- local_directorystring, optional
Local directory path where file is to be saved.
- with_treeboolean, optional
Specify if directory structure of file is to be retained.
- overwriteboolean, optional
Specify if any already existing file should be overwritten.
- Returns:
Path of file created inside specified local directory.
- Return type:
string
- read(file_path, decode='utf-8', accept=[])[source]¶
Read and return the contents of a file in the container.
- Parameters:
file_path (string) – Path of file to be retrieved.
decode (string, optional) – Files containing text will be decoded using specified encoding (default: ‘utf-8’). To prevent any attempt at decoding, set decode=False.
accept (boolean, optional) – To force decoding, put the expected content type in accept.
- Returns:
Contents of the specified file.
- Return type:
string (unicode)
Project¶
- class hbp_archive.Project(project, username, token=None, archive=None)[source]¶
A representation of a CSCS Project.
The following actions can be performed:
Action
Method / Property
Create a container inside project
Rename a container inside project
Delete a container inside project
Get a container from project
List containers that you can access
Get names of containers in project
Get mapping of usernames to user ids
- create_container(container_name, public=False)[source]¶
Create a container inside the current project
- Parameters:
container_name (string) – name to be assigned to container
public (boolean, optional) – specify if container is to be made public; default is private
Note
Use restricted to Superusers/Operators.
- rename_container()[source]¶
Rename a container inside the current project
Note
Use restricted to Superusers/Operators.
- delete_container(container_name)[source]¶
Delete a container from the current project
- Parameters:
container_name (string) – name of container to be deleted
Note
Use restricted to Superusers/Operators.
- get_container(name)[source]¶
Get a container from project.
- Parameters:
name (string) – name of the container to be retrieved.
- Returns:
Requested Container object from Project.
- Return type:
‘hbp_archive.Container’
- property containers¶
Containers you have access to in this project.
- Returns:
Dictionary with keys as names of containers and their values being the corresponding ‘hbp_archive.Container’ object.
- Return type:
dict
- property container_names¶
Returns a list of container names
- Returns:
List of strings indicating container names in Project.
- Return type:
list
- property users¶
Return a mapping from usernames to user ids
- Returns:
dict of mapping from usernames to user ids.
- Return type:
dict
Archive¶
- class hbp_archive.Archive(username, token=None)[source]¶
A representation of the Human Brain Project archival storage (openstack swift) at CSCS.
The following actions can be performed:
Action
Method / Property
List projects that you can access
Search for container in all projects
- property projects¶
Projects you have access to
- Returns:
Dictionary with keys as names of projects and their values being the corresponding ‘hbp_archive.Project’ object.
- Return type:
dict
Misc¶
- hbp_archive.scale_bytes(value, units)[source]¶
Convert a value in bytes to a different unit.
- Parameters:
value (int) – Value (in bytes) to be converted.
units (string) – Requested units for output. Options: ‘bytes’, ‘kB’, ‘MB’, ‘GB’, ‘TB’
- Returns:
Value in requested units.
- Return type:
float
- hbp_archive.set_logger(location='screen', level='INFO')[source]¶
Set the logging specifications for this module.
- Parameters:
location (string / None, optional) – Can be set to following options: - ‘screen’ (case insensitive; default) : display log messages on screen - None : disable logging - Any other input will be considered as filename for logging to a file
level (string, option) – Specify the logging level. Options: ‘DEBUG’/’INFO’/’WARNING’/’ERROR’/’CRITICAL’