API Reference¶
User Functions¶
|
Given a path or paths, return a list of |
|
Given a path or paths, return one |
|
Open file(s) which can be resolved to local |
|
Instantiate filesystems for given protocol and arguments |
|
Fetch named protocol implementation from the registry |
|
Create key-value interface for given URL and options |
- fsspec.open_files(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, auto_mkdir=True, expand=True, **kwargs)[source]¶
Given a path or paths, return a list of
OpenFile
objects.For writing, a str path must contain the “*” character, which will be filled in by increasing numbers, e.g., “part*” -> “part1”, “part2” if num=2.
For either reading or writing, can instead provide explicit list of paths.
- Parameters
- urlpath: string or list
Absolute or relative filepath(s). Prefix with a protocol like
s3://
to read from alternative filesystems. To read from multiple files you can pass a globstring or a list of paths, with the caveat that they must all have the same protocol.- mode: ‘rb’, ‘wt’, etc.
- compression: string
Compression to use. See
dask.bytes.compression.files
for options.- encoding: str
For text mode only
- errors: None or str
Passed to TextIOWrapper in text mode
- name_function: function or None
if opening a set of files for writing, those files do not yet exist, so we need to generate their names by formatting the urlpath for each sequence number
- num: int [1]
if writing mode, number of files we expect to create (passed to name+function)
- protocol: str or None
If given, overrides the protocol found in the URL.
- newline: bytes or None
Used for line terminator in text mode. If None, uses system default; if blank, uses no translation.
- auto_mkdir: bool (True)
If in write mode, this will ensure the target directory exists before writing, by calling
fs.mkdirs(exist_ok=True)
.- expand: bool
- **kwargs: dict
Extra options that make sense to a particular storage connection, e.g. host, port, username, password, etc.
- Returns
- An
OpenFiles
instance, which is a ist ofOpenFile
objects that can - be used as a single context
- An
Examples
>>> files = open_files('2015-*-*.csv') >>> files = open_files( ... 's3://bucket/2015-*-*.csv.gz', compression='gzip' ... )
- fsspec.open(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, **kwargs)[source]¶
Given a path or paths, return one
OpenFile
object.- Parameters
- urlpath: string or list
Absolute or relative filepath. Prefix with a protocol like
s3://
to read from alternative filesystems. Should not include glob character(s).- mode: ‘rb’, ‘wt’, etc.
- compression: string
Compression to use. See
dask.bytes.compression.files
for options.- encoding: str
For text mode only
- errors: None or str
Passed to TextIOWrapper in text mode
- protocol: str or None
If given, overrides the protocol found in the URL.
- newline: bytes or None
Used for line terminator in text mode. If None, uses system default; if blank, uses no translation.
- **kwargs: dict
Extra options that make sense to a particular storage connection, e.g. host, port, username, password, etc.
- Returns
OpenFile
object.
Examples
>>> openfile = open('2015-01-01.csv') >>> openfile = open( ... 's3://bucket/2015-01-01.csv.gz', ... compression='gzip' ... ) >>> with openfile as f: ... df = pd.read_csv(f)
- fsspec.open_local(url, mode='rb', **storage_options)[source]¶
Open file(s) which can be resolved to local
For files which either are local, or get downloaded upon open (e.g., by file caching)
- Parameters
- url: str or list(str)
- mode: str
Must be read mode
- storage_options:
passed on to FS for or used by open_files (e.g., compression)
- fsspec.filesystem(protocol, **storage_options)[source]¶
Instantiate filesystems for given protocol and arguments
storage_options
are specific to the protocol being chosen, and are passed directly to the class.
- fsspec.get_filesystem_class(protocol)[source]¶
Fetch named protocol implementation from the registry
The dict
known_implementations
maps protocol names to the locations of classes implementing the corresponding file-system. When used for the first time, appropriate imports will happen and the class will be placed in the registry. All subsequent calls will fetch directly from the registry.Some protocol implementations require additional dependencies, and so the import may fail. In this case, the string in the “err” field of the
known_implementations
will be given as the error message.
- fsspec.get_mapper(url, check=False, create=False, missing_exceptions=None, **kwargs)[source]¶
Create key-value interface for given URL and options
The URL will be of the form “protocol://location” and point to the root of the mapper required. All keys will be file-names below this location, and their values the contents of each key.
Also accepts compound URLs like zip::s3://bucket/file.zip , see
fsspec.open
.- Parameters
- url: str
Root URL of mapping
- check: bool
Whether to attempt to read from the location before instantiation, to check that the mapping does exist
- create: bool
Whether to make the directory corresponding to the root before instantiating
- missing_exceptions: None or tuple
If given, these excpetion types will be regarded as missing keys and return KeyError when trying to read data. By default, you get (FileNotFoundError, IsADirectoryError, NotADirectoryError)
- Returns
FSMap
instance, the dict-like key-value store.
Base Classes¶
|
An abstract super-class for pythonic file-systems |
Filesystem transaction write context |
|
|
Convenient class to derive from to provide buffering |
|
Wrap a FileSystem instance as a mutable wrapping. |
|
Async file operations, default implementations |
|
File-like object to be used in a context |
|
List of OpenFile instances |
|
Pass-though cache: doesn't keep anything, calls every time |
|
Filesystem, deterministic token, and paths from a urlpath and options. |
|
Caching of directory listings, in a structure like |
|
Dict-like registry, but immutable |
Add implementation class to the registry |
- class fsspec.spec.AbstractFileSystem(*args, **kwargs)[source]¶
An abstract super-class for pythonic file-systems
Implementations are expected to be compatible with or, better, subclass from here.
- Attributes
transaction
A context within which files are committed together upon exit
Methods
cat
(path[, recursive, on_error])Fetch (potentially multiple) paths' contents
cat_file
(path)Get the content of a file
checksum
(path)Unique value for current version of file
Clear the cache of filesystem instances.
copy
(path1, path2[, recursive])Copy within two locations in the filesystem
cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy.
created
(path)Return the created timestamp of a file as a datetime.datetime
current
()Return the most recently created FileSystem
delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm.
disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du.
download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get.
du
(path[, total, maxdepth])Space used by files within a path
Finish write transaction, non-context version
exists
(path)Is there a file at the given path
expand_path
(path[, recursive, maxdepth])Turn one or more globs or directories into a list of all matching files
find
(path[, maxdepth, withdirs])List all files below path.
from_json
(blob)Recreate a filesystem instance from JSON representation
get
(rpath, lpath[, recursive])Copy file(s) to local.
get_file
(rpath, lpath, **kwargs)Copy single remote file to local
get_mapper
(root[, check, create])Create key/value store based on this file-system
glob
(path, **kwargs)Find files by glob-matching.
head
(path[, size])Get the first
size
bytes from fileinfo
(path, **kwargs)Give details of entry at path
invalidate_cache
([path])Discard any cached directory information
isdir
(path)Is this entry directory-like?
isfile
(path)Is this entry file-like?
listdir
(path[, detail])Alias of FilesystemSpec.ls.
ls
(path[, detail])List objects at path.
makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir.
makedirs
(path[, exist_ok])Recursively make directories
mkdir
(path[, create_parents])Create directory entry at path
mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs.
modified
(path)Return the modified timestamp of a file as a datetime.datetime
move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv.
mv
(path1, path2[, recursive, maxdepth])Move file(s) from one location to another
open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem
pipe
(path[, value])Put value into path
pipe_file
(path, value, **kwargs)Set the bytes of given file
put
(lpath, rpath[, recursive])Copy file(s) from local.
put_file
(lpath, rpath, **kwargs)Copy single file to remote
read_block
(fn, offset, length[, delimiter])Read a block of bytes from
rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv.
rm
(path[, recursive, maxdepth])Delete files.
rm_file
(path)Delete a file
rmdir
(path)Remove a directory, if empty
sign
(path[, expiration])Create a signed URL representing the given path
size
(path)Size in bytes of file
Begin write transaction for deferring files, non-context version
stat
(path, **kwargs)Alias of FilesystemSpec.info.
tail
(path[, size])Get the last
size
bytes from fileto_json
()JSON representation of this filesystem instance
touch
(path[, truncate])Create empty file, or update timestamp
ukey
(path)Hash of file properties, to tell if it has changed
upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put.
walk
(path[, maxdepth])Return all files belows path
cp_file
- cat(path, recursive=False, on_error='raise', **kwargs)[source]¶
Fetch (potentially multiple) paths’ contents
Returns a dict of {path: contents} if there are multiple paths or the path has been otherwise expanded
- on_error“raise”, “omit”, “return”
If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
- checksum(path)[source]¶
Unique value for current version of file
If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.
This should normally be overridden; default will probably capture creation/modification timestamp (which would be good) or maybe access timestamp (which would be bad)
- classmethod clear_instance_cache()[source]¶
Clear the cache of filesystem instances.
Notes
Unless overridden by setting the
cachable
class attribute to False, the filesystem class stores a reference to newly created instances. This prevents Python’s normal rules around garbage collection from working, since the instances refcount will not drop to zero untilclear_instance_cache
is called.
- classmethod current()[source]¶
Return the most recently created FileSystem
If no instance has been created, then create one with defaults
- du(path, total=True, maxdepth=None, **kwargs)[source]¶
Space used by files within a path
- Parameters
- path: str
- total: bool
whether to sum all the file sizes
- maxdepth: int or None
maximum number of directory levels to descend, None for unlimited.
- kwargs: passed to ``ls``
- Returns
- Dict of {fn: size} if total=False, or int otherwise, where numbers
- refer to bytes used.
- expand_path(path, recursive=False, maxdepth=None)[source]¶
Turn one or more globs or directories into a list of all matching files
- find(path, maxdepth=None, withdirs=False, **kwargs)[source]¶
List all files below path.
Like posix
find
command without conditions- Parameters
- pathstr
- maxdepth: int or None
If not None, the maximum number of levels to descend
- withdirs: bool
Whether to include directory paths in the output. This is True when used by glob, but users usually only want files.
- kwargs are passed to ``ls``.
- static from_json(blob)[source]¶
Recreate a filesystem instance from JSON representation
See
.to_json()
for the expected structure of the input- Parameters
- blob: str
- Returns
- file system instance, not necessarily of this particular class.
- get(rpath, lpath, recursive=False, **kwargs)[source]¶
Copy file(s) to local.
Copies a specific file or tree of files (if recursive=True). If lpath ends with a “/”, it will be assumed to be a directory, and target files will go within. Can submit a list of paths, which may be glob-patterns and will be expanded.
Calls get_file for each source.
- get_mapper(root, check=False, create=False)[source]¶
Create key/value store based on this file-system
Makes a MutibleMapping interface to the FS at the given root path. See
fsspec.mapping.FSMap
for further details.
- glob(path, **kwargs)[source]¶
Find files by glob-matching.
If the path ends with ‘/’ and does not contain “*”, it is essentially the same as
ls(path)
, returning only files.We support
"**"
,"?"
and"[..]"
. We do not support ^ for pattern negation.Search path names that contain embedded characters special to this implementation of glob may not produce expected results; e.g., ‘foo/bar/starredfilename’.
kwargs are passed to
ls
.
- info(path, **kwargs)[source]¶
Give details of entry at path
Returns a single dictionary, with exactly the same information as
ls
would withdetail=True
.The default implementation should calls ls and could be overridden by a shortcut. kwargs are passed on to
`ls()
.Some file systems might not be able to measure the file’s size, in which case, the returned dict will include
'size': None
.- Returns
- dict with keys: name (full path in the FS), size (in bytes), type (file,
- directory, or something else) and other FS-specific keys.
- invalidate_cache(path=None)[source]¶
Discard any cached directory information
- Parameters
- path: string or None
If None, clear all listings cached else listings at or under given path.
- ls(path, detail=True, **kwargs)[source]¶
List objects at path.
This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested.
The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:
full path to the entry (without protocol)
size of the entry, in bytes. If the value cannot be determined, will be
None
.type of entry, “file”, “directory” or other
Additional information may be present, aproriate to the file-system, e.g., generation, checksum, etc.
May use refresh=True|False to allow use of self._ls_from_cache to check for a saved listing and avoid calling the backend. This would be common where listing may be expensive.
- Parameters
- path: str
- detail: bool
if True, gives a list of dictionaries, where each is the same as the result of
info(path)
. If False, gives a list of paths (str).- kwargs: may have additional backend-specific options, such as version
information
- Returns
- List of strings if detail is False, or list of directory information
- dicts if detail is True.
- makedirs(path, exist_ok=False)[source]¶
Recursively make directories
Creates directory at path and any intervening required directories. Raises exception if, for instance, the path already exists but is a file.
- Parameters
- path: str
leaf directory name
- exist_ok: bool (False)
If True, will error if the target already exists
- mkdir(path, create_parents=True, **kwargs)[source]¶
Create directory entry at path
For systems that don’t have true directories, may create an for this instance only and not touch the real filesystem
- Parameters
- path: str
location
- create_parents: bool
if True, this is equivalent to
makedirs
- kwargs:
may be permissions, etc.
- mv(path1, path2, recursive=False, maxdepth=None, **kwargs)[source]¶
Move file(s) from one location to another
- open(path, mode='rb', block_size=None, cache_options=None, **kwargs)[source]¶
Return a file-like object from the filesystem
The resultant instance must function correctly in a context
with
block.- Parameters
- path: str
Target file
- mode: str like ‘rb’, ‘w’
See builtin
open()
- block_size: int
Some indication of buffering - this is a value in bytes
- cache_optionsdict, optional
Extra arguments to pass through to the cache.
- encoding, errors, newline: passed on to TextIOWrapper for text mode
- pipe(path, value=None, **kwargs)[source]¶
Put value into path
(counterpart to
cat
) Parameters ———- path: string or dict(str, bytes)If a string, a single remote location to put
value
bytes; if a dict, a mapping of {path: bytesvalue}.- value: bytes, optional
If using a single path, these are the bytes to put there. Ignored if
path
is a dict
- put(lpath, rpath, recursive=False, **kwargs)[source]¶
Copy file(s) from local.
Copies a specific file or tree of files (if recursive=True). If rpath ends with a “/”, it will be assumed to be a directory, and target files will go within.
Calls put_file for each source.
- read_block(fn, offset, length, delimiter=None)[source]¶
Read a block of bytes from
Starting at
offset
of the file, readlength
bytes. Ifdelimiter
is set then we ensure that the read starts and stops at delimiter boundaries that follow the locationsoffset
andoffset + length
. Ifoffset
is zero then we start at zero. The bytestring returned WILL include the end delimiter string.If offset+length is beyond the eof, reads to eof.
- Parameters
- fn: string
Path to filename
- offset: int
Byte offset to start read
- length: int
Number of bytes to read
- delimiter: bytes (optional)
Ensure reading starts and stops at delimiter bytestring
See also
utils.read_block
Examples
>>> fs.read_block('data/file.csv', 0, 13) b'Alice, 100\nBo' >>> fs.read_block('data/file.csv', 0, 13, delimiter=b'\n') b'Alice, 100\nBob, 200\n'
Use
length=None
to read to the end of the file. >>> fs.read_block(‘data/file.csv’, 0, None, delimiter=b’n’) # doctest: +SKIP b’Alice, 100nBob, 200nCharlie, 300’
- rm(path, recursive=False, maxdepth=None)[source]¶
Delete files.
- Parameters
- path: str or list of str
File(s) to delete.
- recursive: bool
If file(s) are directories, recursively delete contents and then also remove the directory
- maxdepth: int or None
Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible.
- sign(path, expiration=100, **kwargs)[source]¶
Create a signed URL representing the given path
Some implementations allow temporary URLs to be generated, as a way of delegating credentials.
- Parameters
- pathstr
The path on the filesystem
- expirationint
Number of seconds to enable the URL for (if supported)
- Returns
- URLstr
The signed URL
- Raises
- NotImplementedErrorif method is not implemented for a fileystem
- to_json()[source]¶
JSON representation of this filesystem instance
- Returns
- str: JSON structure with keys cls (the python location of this class),
protocol (text name of this class’s protocol, first one in case of multiple), args (positional args, usually empty), and all other kwargs as their own keys.
- touch(path, truncate=True, **kwargs)[source]¶
Create empty file, or update timestamp
- Parameters
- path: str
file location
- truncate: bool
If True, always set file size to 0; if False, update timestamp and leave file unchanged, if backend allows this
- property transaction¶
A context within which files are committed together upon exit
Requires the file class to implement .commit() and .discard() for the normal and exception cases.
- walk(path, maxdepth=None, **kwargs)[source]¶
Return all files belows path
List all files, recursing into subdirectories; output is iterator-style, like
os.walk()
. For a simple list of files,find()
is available.Note that the “files” outputted will include anything that is not a directory, such as links.
- Parameters
- path: str
Root to recurse into
- maxdepth: int
Maximum recursion depth. None means limitless, but not recommended on link-based file-systems.
- kwargs: passed to ``ls``
- class fsspec.spec.Transaction(fs)[source]¶
Filesystem transaction write context
Gathers files for deferred commit or discard, so that several write operations can be finalized semi-atomically. This works by having this instance as the
.transaction
attribute of the given filesystemMethods
complete
([commit])Finish transaction: commit or discard all deferred files
start
()Start a transaction on this FileSystem
- class fsspec.spec.AbstractBufferedFile(fs, path, mode='rb', block_size='default', autocommit=True, cache_type='readahead', cache_options=None, **kwargs)[source]¶
Convenient class to derive from to provide buffering
In the case that the backend does not provide a pythonic file-like object already, this class contains much of the logic to build one. The only methods that need to be overridden are
_upload_chunk
,_initate_upload
and_fetch_range
.- Attributes
- closed
Methods
close
()Close file
commit
()Move from temp to final destination
discard
()Throw away temporary file
fileno
(/)Returns underlying file descriptor if one exists.
flush
([force])Write buffered data to backend store.
info
()File information about this path
isatty
(/)Return whether this is an 'interactive' stream.
read
([length])Return data from cache, or fetch pieces as necessary
readable
()Whether opened for reading
readinto
(b)mirrors builtin file's readinto method
readline
()Read until first occurrence of newline character
Return all data, split by the newline character
readuntil
([char, blocks])Return data between current position and first occurrence of char
seek
(loc[, whence])Set current file location
seekable
()Whether is seekable (only in read mode)
tell
()Current file location
truncate
Truncate file to size bytes.
writable
()Whether opened for writing
write
(data)Write data to buffer.
writelines
(lines, /)Write a list of lines to stream.
readinto1
- flush(force=False)[source]¶
Write buffered data to backend store.
Writes the current buffer, if it is larger than the block-size, or if the file is being closed.
- Parameters
- force: bool
When closing, write the last block even if it is smaller than blocks are allowed to be. Disallows further writing to this file.
- read(length=- 1)[source]¶
Return data from cache, or fetch pieces as necessary
- Parameters
- length: int (-1)
Number of bytes to read; if <0, all remaining bytes.
- readinto(b)[source]¶
mirrors builtin file’s readinto method
https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
- readline()[source]¶
Read until first occurrence of newline character
Note that, because of character encoding, this is not necessarily a true line ending.
- readuntil(char=b'\n', blocks=None)[source]¶
Return data between current position and first occurrence of char
char is included in the output, except if the end of the tile is encountered first.
- Parameters
- char: bytes
Thing to find
- blocks: None or int
How much to read in each go. Defaults to file blocksize - which may mean a new read on every call.
- class fsspec.asyn.AsyncFileSystem(*args, **kwargs)[source]¶
Async file operations, default implementations
Passes bulk operations to asyncio.gather for concurrent operation.
Implementations that have concurrent batch operations and/or async methods should inherit from this class instead of AbstractFileSystem. Docstrings are copied from the un-underscored method in AbstractFileSystem, if not given.
- Attributes
transaction
A context within which files are committed together upon exit
Methods
cat
(path[, recursive, on_error])Fetch (potentially multiple) paths' contents
cat_file
(path)Get the content of a file
checksum
(path)Unique value for current version of file
clear_instance_cache
()Clear the cache of filesystem instances.
copy
(path1, path2[, recursive])Copy within two locations in the filesystem
cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy.
created
(path)Return the created timestamp of a file as a datetime.datetime
current
()Return the most recently created FileSystem
delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm.
disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du.
download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get.
du
(path[, total, maxdepth])Space used by files within a path
end_transaction
()Finish write transaction, non-context version
exists
(path)Is there a file at the given path
expand_path
(path[, recursive, maxdepth])Turn one or more globs or directories into a list of all matching files
find
(path[, maxdepth, withdirs])List all files below path.
from_json
(blob)Recreate a filesystem instance from JSON representation
get
(rpath, lpath[, recursive])Copy file(s) to local.
get_file
(rpath, lpath, **kwargs)Copy single remote file to local
get_mapper
(root[, check, create])Create key/value store based on this file-system
glob
(path, **kwargs)Find files by glob-matching.
head
(path[, size])Get the first
size
bytes from fileinfo
(path, **kwargs)Give details of entry at path
invalidate_cache
([path])Discard any cached directory information
isdir
(path)Is this entry directory-like?
isfile
(path)Is this entry file-like?
listdir
(path[, detail])Alias of FilesystemSpec.ls.
ls
(path[, detail])List objects at path.
makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir.
makedirs
(path[, exist_ok])Recursively make directories
mkdir
(path[, create_parents])Create directory entry at path
mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs.
modified
(path)Return the modified timestamp of a file as a datetime.datetime
move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv.
mv
(path1, path2[, recursive, maxdepth])Move file(s) from one location to another
open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem
pipe
(path[, value])Put value into path
pipe_file
(path, value, **kwargs)Set the bytes of given file
put
(lpath, rpath[, recursive])Copy file(s) from local.
put_file
(lpath, rpath, **kwargs)Copy single file to remote
read_block
(fn, offset, length[, delimiter])Read a block of bytes from
rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv.
rm
(path[, recursive])Delete files.
rm_file
(path)Delete a file
rmdir
(path)Remove a directory, if empty
sign
(path[, expiration])Create a signed URL representing the given path
size
(path)Size in bytes of file
start_transaction
()Begin write transaction for deferring files, non-context version
stat
(path, **kwargs)Alias of FilesystemSpec.info.
tail
(path[, size])Get the last
size
bytes from fileto_json
()JSON representation of this filesystem instance
touch
(path[, truncate])Create empty file, or update timestamp
ukey
(path)Hash of file properties, to tell if it has changed
upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put.
walk
(path[, maxdepth])Return all files belows path
cp_file
- class fsspec.FSMap(root, fs, check=False, create=False, missing_exceptions=None)[source]¶
Wrap a FileSystem instance as a mutable wrapping.
The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.
- Parameters
- root: string
prefix for all the files
- fs: FileSystem instance
- check: bool (=True)
performs a touch at the location, to check for write access.
Examples
>>> fs = FileSystem(**parameters) >>> d = FSMap('my-data/path/', fs) or, more likely >>> d = fs.get_mapper('my-data/path/')
>>> d['loc1'] = b'Hello World' >>> list(d.keys()) ['loc1'] >>> d['loc1'] b'Hello World'
Methods
clear
()Remove all keys below root - empties out mapping
delitems
(keys)Remove multiple keys from the store
get
(k[,d])getitems
(keys[, on_error])Fetch multiple items from the store
items
()keys
()pop
(k[,d])If key is not found, d is returned if given, otherwise KeyError is raised.
popitem
()as a 2-tuple; but raise KeyError if D is empty.
setdefault
(k[,d])setitems
(values_dict)Set the values of multiple items in the store
update
([E, ]**F)If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
values
()- getitems(keys, on_error='raise')[source]¶
Fetch multiple items from the store
If the backend is async-able, this might proceed concurrently
- Parameters
- keys: list(str)
They keys to be fetched
- on_error“raise”, “omit”, “return”
If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
- Returns
- dict(key, bytes|exception)
- class fsspec.core.OpenFile(fs, path, mode='rb', compression=None, encoding=None, errors=None, newline=None)[source]¶
File-like object to be used in a context
Can layer (buffered) text-mode and compression over any file-system, which are typically binary-only.
These instances are safe to serialize, as the low-level file object is not created until invoked using with.
- Parameters
- fs: FileSystem
The file system to use for opening the file. Should match the interface of
dask.bytes.local.LocalFileSystem
.- path: str
Location to open
- mode: str like ‘rb’, optional
Mode of the opened file
- compression: str or None, optional
Compression to apply
- encoding: str or None, optional
The encoding to use if opened in text mode.
- errors: str or None, optional
How to handle encoding errors if opened in text mode.
- newline: None or str
Passed to TextIOWrapper in text mode, how to handle line endings.
Methods
close
()Close all encapsulated file objects
open
()Materialise this as a real open file without context
- open()[source]¶
Materialise this as a real open file without context
The file should be explicitly closed to avoid enclosed file instances persisting. This code-path monkey-patches the file-like objects, so they can close even if the parent OpenFile object has already been deleted; but a with-context is better style.
- class fsspec.core.OpenFiles(*args, mode='rb', fs=None)[source]¶
List of OpenFile instances
Can be used in a single context, which opens and closes all of the contained files. Normal list access to get the elements works as normal.
A special case is made for caching filesystems - the files will be down/uploaded together at the start or end of the context, and this may happen concurrently, if the target filesystem supports it.
Methods
append
(object, /)Append object to the end of the list.
clear
(/)Remove all items from list.
copy
(/)Return a shallow copy of the list.
count
(value, /)Return number of occurrences of value.
extend
(iterable, /)Extend list by appending elements from the iterable.
index
(value[, start, stop])Return first index of value.
insert
(index, object, /)Insert object before index.
pop
([index])Remove and return item at index (default last).
remove
(value, /)Remove first occurrence of value.
reverse
(/)Reverse IN PLACE.
sort
(*[, key, reverse])Sort the list in ascending order and return None.
- class fsspec.core.BaseCache(blocksize, fetcher, size)[source]¶
Pass-though cache: doesn’t keep anything, calls every time
Acts as base class for other cachers
- Parameters
- blocksize: int
How far to read ahead in numbers of bytes
- fetcher: func
Function of the form f(start, end) which gets bytes from remote as specified
- size: int
How big this file is
- fsspec.core.get_fs_token_paths(urlpath, mode='rb', num=1, name_function=None, storage_options=None, protocol=None, expand=True)[source]¶
Filesystem, deterministic token, and paths from a urlpath and options.
- Parameters
- urlpath: string or iterable
Absolute or relative filepath, URL (may include protocols like
s3://
), or globstring pointing to data.- mode: str, optional
Mode in which to open files.
- num: int, optional
If opening in writing mode, number of files we expect to create.
- name_function: callable, optional
If opening in writing mode, this callable is used to generate path names. Names are generated for each partition by
urlpath.replace('*', name_function(partition_index))
.- storage_options: dict, optional
Additional keywords to pass to the filesystem class.
- protocol: str or None
To override the protocol specifier in the URL
- expand: bool
Expand string paths for writing, assuming the path is a directory
- class fsspec.dircache.DirCache(use_listings_cache=True, listings_expiry_time=None, max_paths=None, **kwargs)[source]¶
Caching of directory listings, in a structure like
- {“path0”: [
- {“name”: “path0/file0”,
“size”: 123, “type”: “file”, …
}, {“name”: “path0/file1”, }, … ],
“path1”: […]
}
Parameters to this class control listing expiry or indeed turn caching off
- __init__(use_listings_cache=True, listings_expiry_time=None, max_paths=None, **kwargs)[source]¶
- Parameters
- use_listings_cache: bool
If False, this cache never returns items, but always reports KeyError, and setting items has no effect
- listings_expiry_time: int (optional)
Time in seconds that a listing is considered valid. If None, listings do not expire.
- max_paths: int (optional)
The number of most recent listings that are considered valid; ‘recent’ refers to when the entry was set.
- class fsspec.registry.ReadOnlyRegistry(target)[source]¶
Dict-like registry, but immutable
Maps backend name to implementation class
To add backend implementations, use
register_implementation
- fsspec.registry.register_implementation(name, cls, clobber=True, errtxt=None)[source]¶
Add implementation class to the registry
- Parameters
- name: str
Protocol name to associate with the class
- cls: class or str
if a class: fsspec-compliant implementation class (normally inherits from
fsspec.AbstractFileSystem
, gets added straight to the registry. If a str, the full path to an implementation class like package.module.class, which gets added to known_implementations, so the import is deferred until the filesystem is actually used.- clobber: bool (optional)
Whether to overwrite a protocol with the same name; if False, will raise instead.
- errtxt: str (optional)
If given, then a failure to import the given class will result in this text being given.
Built-in Implementations¶
A filesystem over classic |
|
Interface to files on local storage |
|
A filesystem based on a dict of BytesIO objects |
|
Interface to files in github |
|
Interface to HDFS over HTTP using the WebHDFS API. |
|
Read contents of ZIP archive as a file-system |
|
Locally caching filesystem, layer over any other FS |
|
Caches whole remote files on first access |
|
Caches whole remote files on first access |
|
View of the files as seen by a Jupyter server (notebook or lab) |
- class fsspec.implementations.ftp.FTPFileSystem(*args, **kwargs)[source]¶
A filesystem over classic
- __init__(host, port=21, username=None, password=None, acct=None, block_size=None, tempdir='/tmp', timeout=30, **kwargs)[source]¶
You can use _get_kwargs_from_urls to get some kwargs from a reasonable FTP url.
Authentication will be anonymous if username/password are not given.
- Parameters
- host: str
The remote server name/ip to connect to
- port: int
Port to connect with
- username: str or None
If authenticating, the user’s identifier
- password: str of None
User’s password on the server, if using
- acct: str or None
Some servers also need an “account” string for auth
- block_size: int or None
If given, the read-ahead or write buffer size.
- tempdir: str
Directory on remote to put temporary files when in a transaction
- class fsspec.implementations.local.LocalFileSystem(*args, **kwargs)[source]¶
Interface to files on local storage
- Parameters
- auto_mkdirs: bool
Whether, when opening a file, the directory containing it should be created (if it doesn’t already exist). This is assumed by pyarrow code.
- __init__(auto_mkdir=False, **kwargs)[source]¶
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supports directory listing caching. Pass use_listings_cache=False to disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to force creating a new instance even if a matching instance exists, and prevent storing this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- class fsspec.implementations.memory.MemoryFileSystem(*args, **kwargs)[source]¶
A filesystem based on a dict of BytesIO objects
This is a global filesystem so instances of this class all point to the same in memory filesystem.
- __init__(*args, **storage_options)¶
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supports directory listing caching. Pass use_listings_cache=False to disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to force creating a new instance even if a matching instance exists, and prevent storing this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- class fsspec.implementations.webhdfs.WebHDFS(*args, **kwargs)[source]¶
Interface to HDFS over HTTP using the WebHDFS API. Supports also HttpFS gateways.
Three auth mechanisms are supported:
- insecure: no auth is done, and the user is assumed to be whoever they
say they are (parameter user), or a predefined value such as “dr.who” if not given
- spnego: when kerberos authentication is enabled, auth is negotiated by
requests_kerberos https://github.com/requests/requests-kerberos . This establishes a session based on existing kinit login and/or specified principal/password; paraneters are passed with
kerb_kwargs
- token: uses an existing Hadoop delegation token from another secured
service. Indeed, this client can also generate such tokens when not insecure. Note that tokens expire, but can be renewed (by a previously specified user) and may allow for proxying.
- __init__(host, port=50070, kerberos=False, token=None, user=None, proxy_to=None, kerb_kwargs=None, data_proxy=None, use_https=False, **kwargs)[source]¶
- Parameters
- host: str
Name-node address
- port: int
Port for webHDFS
- kerberos: bool
Whether to authenticate with kerberos for this connection
- token: str or None
If given, use this token on every call to authenticate. A user and user-proxy may be encoded in the token and should not be also given
- user: str or None
If given, assert the user name to connect with
- proxy_to: str or None
If given, the user has the authority to proxy, and this value is the user in who’s name actions are taken
- kerb_kwargs: dict
Any extra arguments for HTTPKerberosAuth, see https://github.com/requests/requests-kerberos/blob/master/requests_kerberos/kerberos_.py
- data_proxy: dict, callable or None
If given, map data-node addresses. This can be necessary if the HDFS cluster is behind a proxy, running on Docker or otherwise has a mismatch between the host-names given by the name-node and the address by which to refer to them from the client. If a dict, maps host names host->data_proxy[host]; if a callable, full URLs are passed, and function must conform to url->data_proxy(url).
- use_https: bool
Whether to connect to the Name-node using HTTPS instead of HTTP
- kwargs
- class fsspec.implementations.zip.ZipFileSystem(*args, **kwargs)[source]¶
Read contents of ZIP archive as a file-system
Keeps file object open while instance lives.
This class is pickleable, but not necessarily thread-safe
- __init__(fo='', mode='r', target_protocol=None, target_options=None, block_size=5242880, **kwargs)[source]¶
- Parameters
- fo: str or file-like
Contains ZIP, and must exist. If a str, will fetch file using open_files(), which must return one file exactly.
- mode: str
Currently, only ‘r’ accepted
- target_protocol: str (optional)
If
fo
is a string, this value can be used to override the FS protocol inferred from a URL- target_options: dict (optional)
Kwargs passed when instantiating the target FS, if
fo
is a string.
- class fsspec.implementations.cached.CachingFileSystem(*args, **kwargs)[source]¶
Locally caching filesystem, layer over any other FS
This class implements chunk-wise local storage of remote files, for quick access after the initial download. The files are stored in a given directory with random hashes for the filenames. If no directory is given, a temporary one is used, which should be cleaned up by the OS after the process ends. The files themselves as sparse (as implemented in MMapCache), so only the data which is accessed takes up space.
Restrictions:
the block-size must be the same for each access of a given file, unless all blocks of the file have already been read
caching can only be applied to file-systems which produce files derived from fsspec.spec.AbstractBufferedFile ; LocalFileSystem is also allowed, for testing
- __init__(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, fs=None, same_names=False, compression=None, **kwargs)[source]¶
- Parameters
- target_protocol: str (optional)
Target filesystem protocol. Provide either this or
fs
.- cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory, and will be cleaned up by the OS when this process ends (or later). If a list, each location will be tried in the order given, but only the last will be considered writable.
- cache_check: int
Number of seconds between reload of cache metadata
- check_files: bool
Whether to explicitly see if the UID of the remote file matches the stored one before using. Warning: some file systems such as HTTP cannot reliably give a unique hash of the contents of some path, so be sure to set this option to False.
- expiry_time: int
The time in seconds after which a local copy is considered useless. Set to falsy to prevent expiry. The default is equivalent to one week.
- target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
- fs: filesystem instance
The target filesystem to run against. Provide this or
protocol
.- same_names: bool (optional)
By default, target URLs are hashed, so that files from different backends with the same basename do not conflict. If this is true, the original basename is used.
- compression: str (optional)
To decompress on download. Can be ‘infer’ (guess from the URL name), one of the entries in
fsspec.compression.compr
, or None for no decompression.
- class fsspec.implementations.cached.WholeFileCacheFileSystem(*args, **kwargs)[source]¶
Caches whole remote files on first access
This class is intended as a layer over any other file system, and will make a local copy of each file accessed, so that all subsequent reads are local. This is similar to
CachingFileSystem
, but without the block-wise functionality and so can work even when sparse files are not allowed. See its docstring for definition of the init arguments.The class still needs access to the remote store for listing files, and may refresh cached files.
- __init__(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, fs=None, same_names=False, compression=None, **kwargs)¶
- Parameters
- target_protocol: str (optional)
Target filesystem protocol. Provide either this or
fs
.- cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory, and will be cleaned up by the OS when this process ends (or later). If a list, each location will be tried in the order given, but only the last will be considered writable.
- cache_check: int
Number of seconds between reload of cache metadata
- check_files: bool
Whether to explicitly see if the UID of the remote file matches the stored one before using. Warning: some file systems such as HTTP cannot reliably give a unique hash of the contents of some path, so be sure to set this option to False.
- expiry_time: int
The time in seconds after which a local copy is considered useless. Set to falsy to prevent expiry. The default is equivalent to one week.
- target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
- fs: filesystem instance
The target filesystem to run against. Provide this or
protocol
.- same_names: bool (optional)
By default, target URLs are hashed, so that files from different backends with the same basename do not conflict. If this is true, the original basename is used.
- compression: str (optional)
To decompress on download. Can be ‘infer’ (guess from the URL name), one of the entries in
fsspec.compression.compr
, or None for no decompression.
- class fsspec.implementations.cached.SimpleCacheFileSystem(*args, **kwargs)[source]¶
Caches whole remote files on first access
This class is intended as a layer over any other file system, and will make a local copy of each file accessed, so that all subsequent reads are local. This implementation only copies whole files, and does not keep any metadata about the download time or file details. It is therefore safer to use in multi-threaded/concurrent situations.
This is the only of the caching filesystems that supports write: you will be given a real local open file, and upon close and commit, it will be uploaded to the target filesystem; the writability or the target URL is not checked until that time.
- __init__(**kwargs)[source]¶
- Parameters
- target_protocol: str (optional)
Target filesystem protocol. Provide either this or
fs
.- cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory, and will be cleaned up by the OS when this process ends (or later). If a list, each location will be tried in the order given, but only the last will be considered writable.
- cache_check: int
Number of seconds between reload of cache metadata
- check_files: bool
Whether to explicitly see if the UID of the remote file matches the stored one before using. Warning: some file systems such as HTTP cannot reliably give a unique hash of the contents of some path, so be sure to set this option to False.
- expiry_time: int
The time in seconds after which a local copy is considered useless. Set to falsy to prevent expiry. The default is equivalent to one week.
- target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
- fs: filesystem instance
The target filesystem to run against. Provide this or
protocol
.- same_names: bool (optional)
By default, target URLs are hashed, so that files from different backends with the same basename do not conflict. If this is true, the original basename is used.
- compression: str (optional)
To decompress on download. Can be ‘infer’ (guess from the URL name), one of the entries in
fsspec.compression.compr
, or None for no decompression.
- class fsspec.implementations.github.GithubFileSystem(*args, **kwargs)[source]¶
Interface to files in github
An instance of this class provides the files residing within a remote github repository. You may specify a point in the repos history, by SHA, branch or tag (default is current master).
Given that code files tend to be small, and that github does not support retrieving partial content, we always fetch whole files.
When using fsspec.open, allows URIs of the form:
“github://path/file”, in which case you must specify org, repo and may specify sha in the extra args
‘github://org:repo@/precip/catalog.yml’, where the org and repo are part of the URI
‘github://org:repo@sha/precip/catalog.yml’, where tha sha is also included
sha
can be the full or abbreviated hex of the commit you want to fetch from, or a branch or tag name (so long as it doesn’t contain special characters like “/”, “?”, which would have to be HTTP-encoded).For authorised access, you must provide username and token, which can be made at https://github.com/settings/tokens
- __init__(org, repo, sha='master', username=None, token=None, **kwargs)[source]¶
Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
- Parameters
- use_listings_cache, listings_expiry_time, max_paths:
passed to
DirCache
, if the implementation supports directory listing caching. Pass use_listings_cache=False to disable such caching.- skip_instance_cache: bool
If this is a cachable implementation, pass True here to force creating a new instance even if a matching instance exists, and prevent storing this instance.
- asynchronous: bool
- loop: asyncio-compatible IOLoop or None
- class fsspec.implementations.jupyter.JupyterFileSystem(*args, **kwargs)[source]¶
View of the files as seen by a Jupyter server (notebook or lab)
- __init__(url, tok=None, **kwargs)[source]¶
- Parameters
- urlstr
Base URL of the server, like “http://127.0.0.1:8888”. May include token in the string, which is given by the process when starting up
- tokstr
If the token is obtained separately, can be given here
- kwargs
Other Known Implementations¶
Read Buffering¶
|
Cache which reads only when we get beyond a block of data |
|
Cache which holds data in a in-memory bytes object |
|
memory-mapped sparse file cache |
|
Cache holding memory as a set of blocks. |
- class fsspec.caching.ReadAheadCache(blocksize, fetcher, size)[source]¶
Cache which reads only when we get beyond a block of data
This is a much simpler version of BytesCache, and does not attempt to fill holes in the cache or keep fragments alive. It is best suited to many small reads in a sequential order (e.g., reading lines from a file).
- class fsspec.caching.BytesCache(blocksize, fetcher, size, trim=True)[source]¶
Cache which holds data in a in-memory bytes object
Implements read-ahead by the block size, for semi-random reads progressing through the file.
- Parameters
- trim: bool
As we read more data, whether to discard the start of the buffer when we are more than a blocksize ahead of it.
- class fsspec.caching.MMapCache(blocksize, fetcher, size, location=None, blocks=None)[source]¶
memory-mapped sparse file cache
Opens temporary file, which is filled blocks-wise when data is requested. Ensure there is enough disc space in the temporary location.
This cache method might only work on posix
- class fsspec.caching.BlockCache(blocksize, fetcher, size, maxblocks=32)[source]¶
Cache holding memory as a set of blocks.
Requests are only ever made blocksize at a time, and are stored in an LRU cache. The least recently accessed block is discarded when more than maxblocks are stored.
- Parameters
- blocksizeint
The number of bytes to store in each block. Requests are only ever made for blocksize, so this should balance the overhead of making a request against the granularity of the blocks.
- fetcherCallable
- sizeint
The total size of the file being cached.
- maxblocksint
The maximum number of blocks to cache for. The maximum memory use for this cache is then
blocksize * maxblocks
.
Methods
The statistics on the block cache.