OpenStack Object Storage (SWIFT)

logo

Swift is a multi-tenant, highly scalable and durable object storage system that was designed to store large amounts of unstructured data at low cost via a RESTful http API.

The main advantage of object storage is very low implementation cost compared to enterprise-grade storage, while ensuring scalability and data redundancy.

http://www.confreaks.com/videos/3528-openstacksummitatl2014-the-biggest-thing-in-openstack-swift-since-it-was-open-sourced-storage-policies

Installation:

https://github.com/openstack/swift/branches
http://docs.openstack.org/developer/swift/development_saio.html
http://docs.openstack.org/developer/swift/howto_installmultinode.html
http://blog.adityapatawari.com/2014/01/openstack-101-how-to-setup-openstack_12.html

More Reading about SWIFT

http://rubiojr.rbel.co/hack/2013/02/06/understanding-the-swift-ring-ruby-edition/
https://julien.danjou.info/blog/2012/openstack-swift-consistency-analysis
http://www.openstack.org/blog/2012/02/1997/
http://jclouds.apache.org/ (multi cloud support)
https://swiftstack.com/blog/2012/11/21/how-the-ring-works-in-openstack-swift/

Some Basic commands to work with SWIFT setup
1) python -c ‘import swift; print swift.__version__'”
2) GET auth token

curl -k -v -H ‘X-Storage-User: system:root’ -H ‘X-Storage-Pass: testpass’ http://xx.xx.xx.xx:8080/auth/v1.0

< HTTP/1.1 200 OK
< X-Storage-Url: http://xx.xx.xx.xx:8080/v1/AUTH_system
< X-Storage-Token: AUTH_2fbd62f8d6fc4ccd8a90d6a07

3) Delete file using curl

curl -v -X DELETE -i -H “X-Auth-Token: $OS_AUTH_TOKEN” http://xx.xx.xx.xx:8080/v1/AUTH_2fbd62f8d6fc4ccd8a90d6a07/myfiles/aaa.txt

4) Put a file using curl

curl -v -X PUT -i -T aaa.txt -H “X-Auth-Token: $OS_AUTH_TOKEN” http://xx.xx.xx.xx:8080/v1/AUTH_2fbd62f8d6fc4ccd8a90d6a07/myfiles/aaa.txt

5) Running swift bench for SWIFT setup for performance numbers

https://github.com/openstack/swift-bench/blob/master/etc/swift-bench.conf-sample
   With keystone
swift-bench -V 2.0 -A http://xx.xx.xx.xx:35357/v2.0 -U admin:admin -K admin
   Without keystone
swift-bench -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing

6) To check if Kernel is stuck
ps axo pid,wchan:32,cmd
sudo strace -p PID

7) swift command to check/upload/download a object

swift -V 2.0 -A http://xx.xx.xx.xx:35357/v2.0 -U service:test -K testpass stat
swift -V 2.0 -A http://xx.xx.xx.xx:35357/v2.0 -U service:test -K testpass upload myfiles proxy.log
swift -V 2.0 -A http://xx.xx.xx.xx:35357/v2.0 -U service:test -K testpass download myfiles

SWIFT’s Object Placement Strategy (The Ring) Swift uses a data structure called “Ring” to map a URL for an object to a particular location in the cluster where the object is stored. It is static mapping, one could not change on the fly.

0) Replica placement is also handled by the ring.

0.1)
The ring data structure consists of three top level fields: a list of devices in the                 cluster,  a list of lists of device ids indicating partition to device                             assignments, and an integer indicating the number of bits to shift an MD5           hash to calculate the partition for the hash.
a) P0{“devs”: [{“device”: “sd0”, “id”: 0, “ip”: “10.3.1.1”, “meta”: “”, “port”: 6001,                              “region”: 1, “replication_ip”: “10.3.1.1”, “replication_port”: 6001, “weight”:                                     200.0, “zone”: 1},
b) e.g. Partition Assignment List _replica2part2dev _replica2part2dev[2][7989] =                                    device for 3rd replica of partition 7989

0.2) The Ring maintains this mapping using zones, devices, partitions, and replicas. Each partition in the Ring is replicated three times by default across the cluster, and the locations for a partition are stored in the mapping maintained by the Ring. The Ring is also responsible for determining which devices are used for handoff should a failure occur.

0.3) For a given partition number, each replica’s device will not be in the same zone as any other replica’s device.

0.4) The ring builder assigns each replica of each partition to the device that desires the most partitions at that point while keeping it as far away as possible from other replicas. The ring builder prefers to assign a replica to a device in a regions that has no replicas already; should there be no such region available, the ring builder will try to find a device in a different zone; if not possible, it will look on a different server; failing that, it will just look for a device that has no replicas; finally, if all other options are exhausted, the ring builder will assign the replica to the device that has the fewest replicas already assigned. Note that assignment of multiple replicas to one device will only happen if the ring has fewer devices than it has replicas.

0.5) To check ring detail use swift-ring-builder command

e.g. swift-ring-builder /tmp/container.builder list_parts z1

0.6) more details

http://docs.openstack.org/training-guides/content/operator-object-storage-node.html

1) Regions, zones, servers and drives form a hierarchy for data placement.
1.1) Regions are used only when distributing a cluster over geographic sites.
1.2) A zone is defined as a unique domain of something that can fail, such as power or a networking segment.

2) OpenStack Swift places three copies of every object across the cluster in as unique-as-possible locations: first by region, then zone, then server, then drive.
A quorum is required — at least two of the three writes must be successful before the client is notified that the upload was successful.

3) As a distributed storage system, the ring is deployed to every node in the cluster, both proxies and object servers.

4) All objects have their own metadata.

5) The Ring maps Partitions to physical locations of object/container/account on disk.
An account database contains the list of containers in that account. A container database contains the list of objects in that container.

6) After Object placement the Container database is updated asynchronously to reflect that there is a new object in it.

9) The Container Server’s primary job is to handle listings of objects. It does not know where those objects are, just what objects are in a specific container.
The listings are stored as SQLite database files, and replicated across the cluster similar to how objects are.
Statistics are also tracked that include the total number of objects, and total storage usage for that container.

10) The Account Server is very similar to the Container Server, excepting that it is responsible for listings of containers rather than objects.

11) If a replicator detects that a remote drive has failed, the replicator uses the get_more_nodes interface for the ring to choose an alternate node with which to synchronize.

13) When a disk fails, replica data is automatically distributed to the other zones to ensure there are three copies of the data.

Notes:

1) Post Grizzly token format default to PKI in place of UUID. change in keystone.conf provider and format to UUID if you want to see token in short form though PKI tokens are then much more secure since the service can trust where the token is coming from and much more efficient since it doesn’t have to validate it on every request like done for UUID token.

2) Sample keystonerc
export OS_SERVICE_TOKEN=b83d2580bf023
export OS_SERVICE_ENDPOINT=http://xx.xx.xx.xx:35357/v2.0
export OS_USERNAME=admin
export OS_PASSWORD=asmin
export OS_TENANT_NAME=admin
export OS_AUTH_URL=http://xx.xx.xx.xx:35357/v2.0

3) Sample command to connect to keystone DB and remove all the token entry. This cloud slow performance during performance test.

mysql -u root -p
show databases;
use keystone;
show tables;
mysql> SELECT COUNT(*) FROM token;
+———-+
| COUNT(*) |
+———-+
| 1931349 |
+———-+
1 row in set (1.64 sec)

keystone-manage token_flush
DESCRIBE token;
or

DELETE FROM token WHERE expires <= NOW();

or

mysql -u “root” “-popenstack” “keystone” -e “truncate token;”
-rw-rw—- 1 mysql mysql 6621757440 May 2 11:06 ibdata1

mysql -u “root” “-popenstack” “keystone” -e “show table status;”

========================FILE PUT CALL FLOW====================

*******************Proxy server – Check Bucket *******************************
Apr 1 00:15:16 vm2 proxy-server Authenticating user token
Apr 1 00:15:16 vm2 proxy-server Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role
Apr 1 00:15:16 vm2 proxy-server Storing 1d7b54a9453148109bb6dd token in memcache
Apr 1 00:15:16 vm2 proxy-server Using identity: {‘roles’: [u’admin’], ‘user’: u’test’, ‘tenant’: (u’9d6496dd85bc406e94ae26eebf’, u’service’)} (txn: txcda0af985c5f46c5b3656-00533)
Apr 1 00:15:16 vm2 proxy-server allow user with role admin as account admin (txn: txcda0af985c5f46c5b3656-00533a6784) (client_ip: xx.xx.xx.xx)

Apr 1 00:15:16 vm2 proxy-server xx.xx.xx.xx xx.xx.xx.xx 01/Apr/2014/07/15/16 PUT /v1/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12 HTTP/1.0 201 – – 1d7b54a9453148109bb6dd2628022334 – – – txcda0af985c5f46c5b3656-00533a6784 – 0.0651 – –

Apr 1 00:15:16 vm2 container-server 127.0.0.1 – – [01/Apr/2014:07:15:16 +0000] “PUT /sdb2/15/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12” 201 – “txcda0af985c5f46c5b3656-00533a6784” “PUT http://xx.xx.xx.xx:8080/v1/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12&#8221; “proxy-server 4797” 0.0246
Apr 1 00:15:16 vm2 account-server 127.0.0.1 – – [01/Apr/2014:07:15:16 +0000] “PUT /sdb2/711/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12” 201 – “txcda0af985c5f46c5b3656-00533a6784” “PUT http://127.0.0.1:6031/sdb3/15/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12&#8221; “container-server 4795” 0.0141 “”

**********************Proxy server – check File *********************************
Apr 1 00:15:16 vm2 proxy-server Authenticating user token
Apr 1 00:15:16 vm2 proxy-server Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role
Apr 1 00:15:17 vm2 proxy-server Storing f62461c755de4b418b868cd473aa60cc token in memcache
Apr 1 00:15:17 vm2 proxy-server Using identity: {‘roles’: [u’admin’], ‘user’: u’ceph’, ‘tenant’: (u’9d6496dd85bc406e94ae26eebf3ff317′, u’service’)} (txn: tx60e267bb98664bca84185-00533a6784)
Apr 1 00:15:17 vm2 proxy-server allow user with role admin as account admin (txn: tx60e267bb98664bca84185-00533a6784) (client_ip: xx.xx.xx.xx)
Apr 1 00:15:17 vm2 proxy-server xx.xx.xx.xx xx.xx.xx.xx 01/Apr/2014/07/15/17 HEAD /v1/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error HTTP/1.0 404 – – f62461c755de4b418b868cd473aa60cc – – – tx60e267bb98664bca84185-00533a6784 – 0.0260 – –

Apr 1 00:15:17 vm2 object-server 127.0.0.1 – – [01/Apr/2014:07:15:17 +0000] “HEAD /sdb2/644/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error” 404 – “HEAD http://xx.xx.xx.xx:8080/v1/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error&#8221; “tx60e267bb98664bca84185-00533a6784” “proxy-server 4797” 0.0013

**********************Proxy server – PUT File *********************************
Apr 1 00:15:17 vm2 proxy-server Authenticating user token
Apr 1 00:15:17 vm2 proxy-server Removing headers from request environment: X-Identity-Status,X-Domain-Id,X-Domain-Name,X-Project-Id,X-Project-Name,X-Project-Domain-Id,X-Project-Domain-Name,X-User-Id,X-User-Name,X-User-Domain-Id,X-User-Domain-Name,X-Roles,X-Service-Catalog,X-User,X-Tenant-Id,X-Tenant-Name,X-Tenant,X-Role
Apr 1 00:15:17 vm2 proxy-server Returning cached token f62461c755de4b418b868cd473aa60cc
Apr 1 00:15:17 vm2 proxy-server Using identity: {‘roles’: [u’admin’], ‘user’: u’ceph’, ‘tenant’: (u’9d6496dd85bc406e94ae26eebf3ff317′, u’service’)} (txn: tx6ec6e2950e424897bb9e3-00533a6785)
Apr 1 00:15:17 vm2 proxy-server allow user with role admin as account admin (txn: tx6ec6e2950e424897bb9e3-00533a6785) (client_ip: xx.xx.xx.xx)
Apr 1 00:15:17 vm2 proxy-server xx.xx.xx.xx xx.xx.xx.xx 01/Apr/2014/07/15/17 PUT /v1/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error HTTP/1.0 201 – – f62461c755de4b418b868cd473aa60cc – – – tx6ec6e2950e424897bb9e3-00533a6785 – 0.0476 – –

Apr 1 00:15:17 vm2 object-server 127.0.0.1 – – [01/Apr/2014:07:15:17 +0000] “PUT /sdb2/644/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error” 201 – “PUT http://xx.xx.xx.xx:8080/v1/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error&#8221; “tx6ec6e2950e424897bb9e3-00533a6785” “proxy-server 4797” 0.0248
Apr 1 00:15:17 vm2 container-server 127.0.0.1 – – [01/Apr/2014:07:15:17 +0000] “PUT /sdb2/15/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error” 201 – “tx6ec6e2950e424897bb9e3-00533a6785” “PUT http://xx.xx.xx.xx:8080/sdb4/644/AUTH_9d6496dd85bc406e94ae26eebf3ff317/tempfile12/expirer.error&#8221; “obj-server 4784” 0.0004
TroubleShooting

 

==============================================================

 

Advertisements
This entry was posted in ObjectStorage and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s