2020-09-28 19:29:44 +02:00
|
|
|
---
|
|
|
|
title: "HDFS Remote"
|
|
|
|
description: "Remote for Hadoop Distributed Filesystem"
|
|
|
|
---
|
|
|
|
|
2021-07-20 20:45:41 +02:00
|
|
|
# {{< icon "fa fa-globe" >}} HDFS
|
2020-09-28 19:29:44 +02:00
|
|
|
|
|
|
|
[HDFS](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) is a
|
|
|
|
distributed file-system, part of the [Apache Hadoop](https://hadoop.apache.org/) framework.
|
|
|
|
|
|
|
|
Paths are specified as `remote:` or `remote:path/to/dir`.
|
|
|
|
|
2021-10-14 15:40:18 +02:00
|
|
|
## Configuration
|
|
|
|
|
2020-09-28 19:29:44 +02:00
|
|
|
Here is an example of how to make a remote called `remote`. First run:
|
|
|
|
|
|
|
|
rclone config
|
|
|
|
|
|
|
|
This will guide you through an interactive setup process:
|
|
|
|
|
|
|
|
```
|
|
|
|
No remotes found - make a new one
|
|
|
|
n) New remote
|
|
|
|
s) Set configuration password
|
|
|
|
q) Quit config
|
|
|
|
n/s/q> n
|
|
|
|
name> remote
|
|
|
|
Type of storage to configure.
|
|
|
|
Enter a string value. Press Enter for the default ("").
|
|
|
|
Choose a number from below, or type in your own value
|
|
|
|
[skip]
|
|
|
|
XX / Hadoop distributed file system
|
|
|
|
\ "hdfs"
|
|
|
|
[skip]
|
|
|
|
Storage> hdfs
|
|
|
|
** See help for hdfs backend at: https://rclone.org/hdfs/ **
|
|
|
|
|
|
|
|
hadoop name node and port
|
|
|
|
Enter a string value. Press Enter for the default ("").
|
|
|
|
Choose a number from below, or type in your own value
|
|
|
|
1 / Connect to host namenode at port 8020
|
|
|
|
\ "namenode:8020"
|
|
|
|
namenode> namenode.hadoop:8020
|
|
|
|
hadoop user name
|
|
|
|
Enter a string value. Press Enter for the default ("").
|
|
|
|
Choose a number from below, or type in your own value
|
|
|
|
1 / Connect to hdfs as root
|
|
|
|
\ "root"
|
|
|
|
username> root
|
|
|
|
Edit advanced config? (y/n)
|
|
|
|
y) Yes
|
|
|
|
n) No (default)
|
|
|
|
y/n> n
|
|
|
|
Remote config
|
|
|
|
--------------------
|
|
|
|
[remote]
|
|
|
|
type = hdfs
|
|
|
|
namenode = namenode.hadoop:8020
|
|
|
|
username = root
|
|
|
|
--------------------
|
|
|
|
y) Yes this is OK (default)
|
|
|
|
e) Edit this remote
|
|
|
|
d) Delete this remote
|
|
|
|
y/e/d> y
|
|
|
|
Current remotes:
|
|
|
|
|
|
|
|
Name Type
|
|
|
|
==== ====
|
|
|
|
hadoop hdfs
|
|
|
|
|
|
|
|
e) Edit existing remote
|
|
|
|
n) New remote
|
|
|
|
d) Delete remote
|
|
|
|
r) Rename remote
|
|
|
|
c) Copy remote
|
|
|
|
s) Set configuration password
|
|
|
|
q) Quit config
|
|
|
|
e/n/d/r/c/s/q> q
|
|
|
|
```
|
|
|
|
|
|
|
|
This remote is called `remote` and can now be used like this
|
|
|
|
|
|
|
|
See all the top level directories
|
|
|
|
|
|
|
|
rclone lsd remote:
|
|
|
|
|
|
|
|
List the contents of a directory
|
|
|
|
|
|
|
|
rclone ls remote:directory
|
|
|
|
|
|
|
|
Sync the remote `directory` to `/home/local/directory`, deleting any excess files.
|
|
|
|
|
|
|
|
rclone sync -i remote:directory /home/local/directory
|
|
|
|
|
|
|
|
### Setting up your own HDFS instance for testing
|
|
|
|
|
|
|
|
You may start with a [manual setup](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html)
|
|
|
|
or use the docker image from the tests:
|
|
|
|
|
|
|
|
If you want to build the docker image
|
|
|
|
|
|
|
|
```
|
|
|
|
git clone https://github.com/rclone/rclone.git
|
|
|
|
cd rclone/fstest/testserver/images/test-hdfs
|
|
|
|
docker build --rm -t rclone/test-hdfs .
|
|
|
|
```
|
|
|
|
|
|
|
|
Or you can just use the latest one pushed
|
|
|
|
|
|
|
|
```
|
|
|
|
docker run --rm --name "rclone-hdfs" -p 127.0.0.1:9866:9866 -p 127.0.0.1:8020:8020 --hostname "rclone-hdfs" rclone/test-hdfs
|
|
|
|
```
|
|
|
|
|
|
|
|
**NB** it need few seconds to startup.
|
|
|
|
|
|
|
|
For this docker image the remote needs to be configured like this:
|
|
|
|
|
|
|
|
```
|
|
|
|
[remote]
|
|
|
|
type = hdfs
|
|
|
|
namenode = 127.0.0.1:8020
|
|
|
|
username = root
|
|
|
|
```
|
|
|
|
|
|
|
|
You can stop this image with `docker kill rclone-hdfs` (**NB** it does not use volumes, so all data
|
|
|
|
uploaded will be lost.)
|
|
|
|
|
|
|
|
### Modified time
|
|
|
|
|
|
|
|
Time accurate to 1 second is stored.
|
|
|
|
|
|
|
|
### Checksum
|
|
|
|
|
|
|
|
No checksums are implemented.
|
|
|
|
|
|
|
|
### Usage information
|
|
|
|
|
|
|
|
You can use the `rclone about remote:` command which will display filesystem size and current usage.
|
|
|
|
|
|
|
|
### Restricted filename characters
|
|
|
|
|
|
|
|
In addition to the [default restricted characters set](/overview/#restricted-characters)
|
|
|
|
the following characters are also replaced:
|
|
|
|
|
|
|
|
| Character | Value | Replacement |
|
|
|
|
| --------- |:-----:|:-----------:|
|
|
|
|
| : | 0x3A | : |
|
|
|
|
|
|
|
|
Invalid UTF-8 bytes will also be [replaced](/overview/#invalid-utf8).
|
|
|
|
|
|
|
|
{{< rem autogenerated options start" - DO NOT EDIT - instead edit fs.RegInfo in backend/hdfs/hdfs.go then run make backenddocs" >}}
|
2021-11-01 17:42:05 +02:00
|
|
|
### Standard options
|
2020-09-28 19:29:44 +02:00
|
|
|
|
|
|
|
Here are the standard options specific to hdfs (Hadoop distributed file system).
|
|
|
|
|
|
|
|
#### --hdfs-namenode
|
|
|
|
|
2021-11-01 17:42:05 +02:00
|
|
|
Hadoop name node and port.
|
|
|
|
|
|
|
|
E.g. "namenode:8020" to connect to host namenode at port 8020.
|
2020-09-28 19:29:44 +02:00
|
|
|
|
|
|
|
- Config: namenode
|
|
|
|
- Env Var: RCLONE_HDFS_NAMENODE
|
|
|
|
- Type: string
|
|
|
|
- Default: ""
|
|
|
|
|
|
|
|
#### --hdfs-username
|
|
|
|
|
2021-11-01 17:42:05 +02:00
|
|
|
Hadoop user name.
|
2020-09-28 19:29:44 +02:00
|
|
|
|
|
|
|
- Config: username
|
|
|
|
- Env Var: RCLONE_HDFS_USERNAME
|
|
|
|
- Type: string
|
|
|
|
- Default: ""
|
|
|
|
- Examples:
|
|
|
|
- "root"
|
2021-11-01 17:42:05 +02:00
|
|
|
- Connect to hdfs as root.
|
2020-09-28 19:29:44 +02:00
|
|
|
|
2021-11-01 17:42:05 +02:00
|
|
|
### Advanced options
|
2020-09-28 19:29:44 +02:00
|
|
|
|
|
|
|
Here are the advanced options specific to hdfs (Hadoop distributed file system).
|
|
|
|
|
2021-01-16 17:52:08 +02:00
|
|
|
#### --hdfs-service-principal-name
|
|
|
|
|
2021-11-01 17:42:05 +02:00
|
|
|
Kerberos service principal name for the namenode.
|
2021-01-16 17:52:08 +02:00
|
|
|
|
|
|
|
Enables KERBEROS authentication. Specifies the Service Principal Name
|
2021-11-01 17:42:05 +02:00
|
|
|
(SERVICE/FQDN) for the namenode. E.g. \"hdfs/namenode.hadoop.docker\"
|
|
|
|
for namenode running as service 'hdfs' with FQDN 'namenode.hadoop.docker'.
|
2021-01-16 17:52:08 +02:00
|
|
|
|
|
|
|
- Config: service_principal_name
|
|
|
|
- Env Var: RCLONE_HDFS_SERVICE_PRINCIPAL_NAME
|
|
|
|
- Type: string
|
|
|
|
- Default: ""
|
|
|
|
|
|
|
|
#### --hdfs-data-transfer-protection
|
|
|
|
|
2021-11-01 17:42:05 +02:00
|
|
|
Kerberos data transfer protection: authentication|integrity|privacy.
|
2021-01-16 17:52:08 +02:00
|
|
|
|
|
|
|
Specifies whether or not authentication, data signature integrity
|
|
|
|
checks, and wire encryption is required when communicating the the
|
|
|
|
datanodes. Possible values are 'authentication', 'integrity' and
|
|
|
|
'privacy'. Used only with KERBEROS enabled.
|
|
|
|
|
|
|
|
- Config: data_transfer_protection
|
|
|
|
- Env Var: RCLONE_HDFS_DATA_TRANSFER_PROTECTION
|
|
|
|
- Type: string
|
|
|
|
- Default: ""
|
|
|
|
- Examples:
|
|
|
|
- "privacy"
|
|
|
|
- Ensure authentication, integrity and encryption enabled.
|
|
|
|
|
2020-09-28 19:29:44 +02:00
|
|
|
#### --hdfs-encoding
|
|
|
|
|
|
|
|
This sets the encoding for the backend.
|
|
|
|
|
2021-11-01 17:42:05 +02:00
|
|
|
See the [encoding section in the overview](/overview/#encoding) for more info.
|
2020-09-28 19:29:44 +02:00
|
|
|
|
|
|
|
- Config: encoding
|
|
|
|
- Env Var: RCLONE_HDFS_ENCODING
|
|
|
|
- Type: MultiEncoder
|
|
|
|
- Default: Slash,Colon,Del,Ctl,InvalidUtf8,Dot
|
|
|
|
|
|
|
|
{{< rem autogenerated options stop >}}
|
2021-10-14 15:40:18 +02:00
|
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
|
|
- No server-side `Move` or `DirMove`.
|
|
|
|
- Checksums not implemented.
|